The data for this assignment consists of monthly observations on the prices of the largest 136 stocks in Australia from Dec 1999 to Jun 2014. You will need to place the data file in the same folder (location) as this notebook. 

Consider a portfolio constructed by holding <u>one share in each stock</u> in the dataset that has a price recorded at every time period. We'll use $P_{it}$ to denote the price of the $i^{\text th}$ stock at time $t$ and $P_t$ to denote the price of the portfolio at time $t$.

In [None]:
import os
import numpy as np
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

In [None]:
# Make sure your data is in the same folder as this notebook
assert os.path.isfile('./AusFirms.csv'), 'Data file Missing!'

In [None]:
file = os.path.basename('./AusFirms.csv')
ausFirms = pd.read_csv(file, parse_dates = ['date'], date_format='%m/%d/%y')
ausFirms.drop(columns=['rf', 'mkt'], inplace=True)
ausFirms.set_index('date', inplace=True)

### RUN THE 3 CELLS ABOVE BUT PLEASE DON'T MODIFY ANYTHING ABOVE THIS LINE

#### 1. Delete any stock that has missing observations.

In [None]:
# Enter your code here


#### 2. How many stocks are remaining?

In [None]:
# Enter your code here


#### 3. Create a new series called $P$ that records the portfolio value for each date. Since you have one share of each stock in the portfolio, you just need to add all the prices.

In [None]:
# Enter your code here


#### 4. Let's say the starting date is 1 and the ending date is $T$. For the portfolio, calculate the simple return: $\frac{P_T}{P_1}-1$; and the log return: $\log\left(\frac{P_T}{P_1 }\right)$.

In [None]:
# Your code here (enter your code inside the { . } for each return)
print(f'Simple return = { }; Log return = { }')

#### 5. Calculate the portfolio weights: $\text{w}_{it} = \frac{P_{it}}{P_t}$. 

If you use the *divide* method from Pandas, this is just one line and you will end up with a dataframe the same size as ausFirms.

In [None]:
# Your code here (one line)


#### 6. Calculate the simple returns for each stock: $R_{it} = \frac{P_{it}}{P_{i,t-1}}-1$ and the log returns for each stock: $r_{it} = \log\frac{P_{it}}{P_{i,t-1}}$.

You only need one line for each return. To get $P_{i,t-1}$, you can use the *shift* method in Pandas.

In [None]:
# Your code here


#### 7. Now calculate the portfolio returns as follows:
- For simple returns: $R_{Pt} = w_{1,t-1}R_{1t} + w_{2,t-1}R_{2t} + \cdots + w_{n,t-1}R_{nt}$
- For log returns: $r_{Pt} = \log (w_{1,t-1}e^{r_{1t}} + w_{2,t-1}e^{r_{2t}} + \cdots + w_{n,t-1}e^{r_{nt}})$

Note that the time index on the weights is one period behind the returns.

In [None]:
# Your code here (one line for each type of return). After calculating the returns, remove the first row because it is not valid.


#### 8. Plot the portfolio return series - simple and log

In [None]:
# Both plots on the same figure. Use two different linestyles and include a legend.


#### 9. For each stock in the portfolio, calculate the excess kurtosis of the log returns. Then, create a histogram of the kurtosis values.

You can use the *kurtosis* function in scipy for this. The result from this function should be an array of kurtosis values. Then, create a histogram from this array. You can do this entire step in one line if you want to.

In [None]:
# Your code here. 


What do you conclude about the distributions of stock returns from the histogram you've just created? Enter your answer in the cell below.

#### 9a. Now calculate the excess kurtosis of the portfolio log return.

You should see a big difference between the kurtosis of a typical stock return and the kurtosis of the portfolio. This is one reason why portfolio returns can often be approximated using a normal distribution but individual stock returns cannot.

In [None]:
# Your code here


#### 10. Calculate autocorrelation function over 6 lags for the portfolio log return.

You can use the *acf* function in statsmodels to make this easy.

In [None]:
# Your code here.


In class, we saw an example where the DM/USD exchange rate had very low autocorrelation values and we took that as evidence of market efficiency. What do you observe here? Enter your answer in the cell below.