## 1. Stock Market
***
In this part of the project, we study data from stock market. The data is available on this [Dropbox Link](https://www.dropbox.com/s/83l60htndqpn3fv/finance_data.zip?dl=0). The goal of this part is to study correlation structures among fluctuation patterns of stock prices using tools from graph theory. The intuition is that investors will have similar strategies of investment for stocks that are effected by the same economic factors. For example, the stocks belonging to the transportation sector may have different absolute prices, but if for example fuel prices change or are expected to change significantly in the near future, then you would expect the investors to buy or sell all stocks similarly and maximize their returns. Towards that goal, we construct different graphs based on similarities among the time series of returns on different stocks at different time scales (day vs a week). Then, we study properties of such graphs. The data is obtained from Yahoo Finance website for 3 years. You’re provided with a number of csv tables, each containing several fields: Date, Open, High, Low, Close, Volume, and Adj Close price. The files are named according to Ticker Symbol of each stock. You may find the market sector for each company in Name sector.csv. We recommend doing this part of the project (Q1 - Q8) in R.

### 1. Return correlation
***
In this part of the project, we will compute the correlation among log-normalized stock-return time series data. Before giving the expression for correlation, we introduce the following notation:
+ $p_i(t)$ is the closing price of stock i at the t-th day
+ $q_i(t)$ is the return of stock i over a period of $[t − 1, t]$
$$
q_i(t) = \frac{p_i(t) − p_i(t − 1)}{p_i(t − 1)}
$$
+ $r_i(t)$ is the log-normalized return stock i over a period of $[t − 1, t]$
$$
r_i(t) = \log(1 + q_i(t))
$$
Then with the above notation, we define the correlation between the log-normalized stock-return time series data of stocks i and j as
$$
ρ_{ij} = \frac{⟨r_i(t)r_j(t)⟩ − ⟨r_i(t)⟩⟨r_j(t)⟩}{\sqrt{(⟨r_i(t)^2⟩ − ⟨r_i(t)⟩^2)(⟨r_j(t)^2⟩ − ⟨r_j(t)⟩^2)}}
$$
where ⟨·⟩ is a temporal average on the investigated time regime (for our data set it is over 3 years).

#### QUESTION 1: What are upper and lower bounds on $ρ_{ij}$? Provide a justification for using lognormalized return $(r_i(t))$ instead of regular return $(q_i(t))$.

> Ans:

### 2. Constructing correlation graphs
***
In this part, we construct a correlation graph using the correlation coefficient computed in the previous section. The correlation graph has the stocks as the nodes and the edge weights are given by the following expression
$$
w_{ij} = \sqrt{2(1 − ρ_{ij})}
$$
Compute the edge weights using the above expression and construct the correlation graph.

#### QUESTION 2: Plot a histogram showing the un-normalized distribution of edge weights.

> Ans:

### 3. Minimum spanning tree (MST)
***
In this part of the project, we will extract the MST of the correlation graph and interpret it.

#### QUESTION 3: Extract the MST of the correlation graph. Each stock can be categorized into a sector, which can be found in Name sector.csv file. Plot the MST and color-code the nodes based on sectors. Do you see any pattern in the MST? The structures that you find in MST are called Vine clusters. Provide a detailed explanation about the pattern you observe.

> Ans:

#### QUESTION 4: Run a community detection algorithm (for example walktrap) on the MST obtained above. Plot the communities formed. Compute the homogeneity and completeness of the clustering. (you can use the ’clevr’ library in r to compute homogeneity and completeness).

> Ans:

### 4. Sector clustering in MST’s
***
In this part, we want to predict the market sector of an unknown stock. We will explore two methods for performing the task. In order to evaluate the performance of the methods we define the following metric
$$
α = \frac{1}{|V|}\sum_{v_i∈V}P(v_i ∈ S_i)
$$
where $S_i$ is the sector of node i. Define
$$
P(v_i ∈ S_i) = \frac{|Q_i|}{|N_i|}
$$
where $Q_i$ is the set of neighbors of node i that belong to the same sector as node i and $N_i$ is
the set of neighbors of node i. Compare α with the case where
$$
P(v_i ∈ S_i) = \frac{|S_i|}{|V|}
$$

#### QUESTION 5: Report the value of α for the above two cases and provide an interpretation for the difference.

> Ans:

### 5. Correlation graphs for weekly data
***
In the previous parts, we constructed the correlation graph based on daily data. In this part of the project, we will construct a correlation graph based on WEEKLY data. To create the graph, sample the stock data weekly on Mondays and then calculate $ρ_{ij}$ using the sampled data. If there is a holiday on a Monday, we ignore that week. Create the correlation graph based on weekly data.

#### QUESTION 6: Repeat questions 2,3,4,5 on the WEEKLY data.

### 6. Correlation graphs for MONTHLY data
***
In this part of the project, we will construct a correlation graph based on MONTHLY data. To create the graph, sample the stock data Monthly on 15th and then calculate $ρ_{ij}$ using the sampled data. If there is a holiday on the 15th, we ignore that month. Create the correlation graph based on MONTHLY data.

#### QUESTION 7: Repeat questions 2,3,4,5 on the MONTHLY data.

#### QUESTION 8: Compare and analyze all the results of daily data vs weekly data vs monthly data. What trends do you find? What changes? What remains similar? Give reason for your observations. Which granularity gives the best results when predicting the sector of an unknown stock and why?

> Ans: