# Exercise 5 - IWM vs TNA

The purpose of this exercise is to perform a correlation and volatility analysis on two ETFs: IWM and TNA. Here is a brief description of each:

**IWM** - tracks the Russell 2000 Index.

**TNA** - a leveraged ETF who's returns aim to be 300% of the Russell 2000 Index.

This exercise is going to be a bit more challenging in that some of the questions involve multiple steps and I won't be walking you through each step.

#### 0) What should you do at the beginning of any notebook?

In [1]:
import numpy as np
import pandas as pd

#### 1) Based on the above descriptions of the ETFs, if $\sigma_{I}$ is the volatility of IWM and $\sigma_{T}$ is the volatility of TNA, what number would expect the volatility ratio $\sigma_{T} / \sigma_{I}$ to be close to?

*Since TNA is suppose to return 3x of IWM, the volatilty ratio should be close to 3.*

#### 2) Read-in the CSV named `iwm_tna_2014_2018.csv` and make sure that the data is from 2014-2018.

In [2]:
# reading in the data
df_iwm_tna = pd.read_csv('../data/iwm_tna_2014_2018.csv')
df_iwm_tna.head()

Unnamed: 0,symbol,date,open,high,low,close,volume,adjusted,return
0,IWM,2014-01-02,115.089996,115.120003,113.639999,114.110001,44247700,106.608597,0.0
1,IWM,2014-01-03,114.529999,114.900002,114.110001,114.690002,26468000,107.150459,0.005083
2,IWM,2014-01-06,115.220001,115.269997,113.709999,113.760002,36198000,106.281593,-0.008109
3,IWM,2014-01-07,114.190002,115.160004,114.110001,114.709999,28545300,107.169136,0.008351
4,IWM,2014-01-08,114.779999,115.080002,114.07,114.860001,30763600,107.309288,0.001308


In [3]:
# refactoring the `date` column to datetime
df_iwm_tna['date'] = pd.to_datetime(df_iwm_tna['date'])
df_iwm_tna.dtypes

symbol              object
date        datetime64[ns]
open               float64
high               float64
low                float64
close              float64
volume               int64
adjusted           float64
return             float64
dtype: object

In [4]:
# printing the first and last dates
print(min(df_iwm_tna.date))
print(max(df_iwm_tna.date))

2014-01-02 00:00:00
2018-12-31 00:00:00


#### 3) Verify your hypothesis from question #1 by calculating the volatility of IWM and TNA over the entire period in the data set.

In [5]:
# defining a custom function to calculate volatility from daily returns
def volatility(dly_ret):
    return np.std(dly_ret) * np.sqrt(252)

In [6]:
# calculating volatility for each symbol using .groupby()
# notice that the TNA volatility is about 3 times that of IWM
df_iwm_tna.groupby(['symbol'])['return'].agg([volatility]).reset_index()

Unnamed: 0,symbol,volatility
0,IWM,0.162563
1,TNA,0.482408


#### 4) The correlation between two random variables  is a number betwee -1 and 1.  A correlation of 1 means the two variables always move in the same direction, a correlation of -1 means that they move in opposite directions.  A value of 0 means that there is no relationship between the direction of their movements.


#### What do you think is the correlation between the daily returns of IWM and TNA?


*Both ETFs attempt to track the Russell 2000 index, so we would expect their correlation to be close to 1.*

#### 5) Let's verify your hypothesis from question #4.  Use `numpy.corrcoef(x, y)` to calculate the correlation between the IWM returns and TNA returns.

In [7]:
# separating returns for each ETF
ser_iwm_ret = df_iwm_tna[df_iwm_tna.symbol == 'IWM']['return']
ser_tna_ret = df_iwm_tna[df_iwm_tna.symbol == 'TNA']['return']

# calculating the correlation, which is in the off-diagonal of the output, 
# notice that it is very close to +1
np.corrcoef(ser_iwm_ret, ser_tna_ret)

array([[1.        , 0.99925113],
       [0.99925113, 1.        ]])

#### 6) Calculate the monthly volatilities for IWM and TNA for each month in the data set.

In [8]:
# creating month and year columns
df_iwm_tna['year'] = df_iwm_tna['date'].dt.year
df_iwm_tna['month'] = df_iwm_tna['date'].dt.month
df_iwm_tna.head()

Unnamed: 0,symbol,date,open,high,low,close,volume,adjusted,return,year,month
0,IWM,2014-01-02,115.089996,115.120003,113.639999,114.110001,44247700,106.608597,0.0,2014,1
1,IWM,2014-01-03,114.529999,114.900002,114.110001,114.690002,26468000,107.150459,0.005083,2014,1
2,IWM,2014-01-06,115.220001,115.269997,113.709999,113.760002,36198000,106.281593,-0.008109,2014,1
3,IWM,2014-01-07,114.190002,115.160004,114.110001,114.709999,28545300,107.169136,0.008351,2014,1
4,IWM,2014-01-08,114.779999,115.080002,114.07,114.860001,30763600,107.309288,0.001308,2014,1


In [9]:
# calculating monthly volatilities with .groupby()
df_vol = \
    df_iwm_tna.groupby(['symbol' , 'year', 'month'])['return'].agg([volatility]).reset_index()
df_vol.head()

Unnamed: 0,symbol,year,month,volatility
0,IWM,2014,1,0.160876
1,IWM,2014,2,0.159055
2,IWM,2014,3,0.157328
3,IWM,2014,4,0.189052
4,IWM,2014,5,0.165355


#### 6) As we saw in a previous exercise, the volatilty ratio of TNA to IWM is close to 3.0 for the entire period of 2014-2018.  Generate a dataframe that calculates this volaility ratio for each month.

In [10]:
# putting monthly volatility for each ETF into a separate dataframe
df_vol_iwm = df_vol[df_vol.symbol == 'IWM']
df_vol_tna = df_vol[df_vol.symbol == 'TNA']

In [11]:
# combining monthly volatilities into a single dataframe with an inner-join
df_monthly_vol = \
    pd.merge(df_vol_iwm, df_vol_tna, on=['year', 'month'], suffixes=('_IWM', '_TNA'))\
        [['year', 'month', 'volatility_IWM', 'volatility_TNA']]
df_monthly_vol.head()

Unnamed: 0,year,month,volatility_IWM,volatility_TNA
0,2014,1,0.160876,0.465755
1,2014,2,0.159055,0.478568
2,2014,3,0.157328,0.462771
3,2014,4,0.189052,0.559747
4,2014,5,0.165355,0.494465


In [12]:
# calculating the vol_ratio in a separate column
df_monthly_vol['vol_ratio'] = (df_monthly_vol.volatility_TNA / df_monthly_vol.volatility_IWM)
df_monthly_vol.head()

Unnamed: 0,year,month,volatility_IWM,volatility_TNA,vol_ratio
0,2014,1,0.160876,0.465755,2.895113
1,2014,2,0.159055,0.478568,3.008823
2,2014,3,0.157328,0.462771,2.941444
3,2014,4,0.189052,0.559747,2.960807
4,2014,5,0.165355,0.494465,2.990316


#### 7) What were the maximum and minimum monthly volatility ratios?

In [13]:
# minimum - April 2018
df_monthly_vol.sort_values(['vol_ratio']).head(1)

Unnamed: 0,year,month,volatility_IWM,volatility_TNA,vol_ratio
51,2018,4,0.165525,0.477178,2.882819


In [14]:
# maximum - September 2018
df_monthly_vol.sort_values(['vol_ratio'], ascending=False).head(1)

Unnamed: 0,year,month,volatility_IWM,volatility_TNA,vol_ratio
56,2018,9,0.075781,0.232567,3.068936


#### 8) Calculate the maximum and minimum volatility ratios for each year.

In [15]:
# this can be done by a .groupby()
df_monthly_vol.groupby(['year'])['vol_ratio'].agg([np.min, np.max]).reset_index()

Unnamed: 0,year,amin,amax
0,2014,2.895113,3.008823
1,2015,2.900046,2.994853
2,2016,2.95235,3.030261
3,2017,2.928104,3.012094
4,2018,2.882819,3.068936
