# Assignment: Using Machine Learning for Hedging

Welcome to the first assignment !

# Problem description

We will solve a Regression task that is very common in Finance
- Given the return of "the market", predict the return of a particular stock

That is
- Given the return of a proxy for "the market" at time $t$, predict the return of, e.g., Apple at time $t$.

As we will explain
being able to predict the relationship between two financial instruments opens up possibilities
- Use one instrument to "hedge" or reduce the risk of holding the other
- Create strategies whose returns are independent of "the market"
    - Hopefully make a profit regardless of whether the market goes up or down

## Goal

You will create models of increasing complexity in order to explain the return of Apple (ticker \aapl)
- The first model will have a single feature: return of the market proxy, ticker $\spy$
- Subsequent models will add the return of other tickers as additional features

## Objectives
We will be using Linear Regression to establish the relationship between the returns of individual equities and "the market".

The purpose of the assignment is two-fold
- to get you up to speed with Machine Learning in general, and `sklearn` in particular
- to get you up to speed with the other programming tools (e.g., Pandas) that will help you in data preparation, etc.

## How to report your answers
I will mix explanation of the topic with tasks that you must complete. Look for 
the string "**Queston**" to find a task that you must perform.
Most of the tasks will require you to assign values to variables and execute a `print` statement.

**Motivation**

If you **do not change** the print statement then the GA (or a machine) can automatically find your answer to each part by searching for the string.


# Standard imports

In [None]:
# Standard imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import sklearn

import os
import math

%matplotlib inline

# Get The data

The data are the daily prices of a number of individual equities and equity indices.
The prices are arranged in a series in ascending date order (a timeseries).
- There are many `.csv` files for equity or index in the directory `/resource/asnlib/publicdata/data`

You should get the "Adjusted Close" price data into some sort of data structure.  Pandas DataFrame is super useful
so I recommend that's what you use.
      
**Question:**
- Complete function `read_ticker()` to load AAPL and SPY `Adj Close` price

**Hint:**
- look up the Pandas `read_csv()` method

In [None]:
DATA_DIR = './data'
if not os.path.isdir(DATA_DIR):
    DATA_DIR  = "../resource/asnlib/publicdata/data"
DATE_ATTR="Dt"
PRICE_ATTR = "Adj Close"

aapl = pd.DataFrame()
spy = pd.DataFrame()

def read_ticker(ticker):
    '''
    Load the ticker data
    
    Arguments:
    ticker: name of your ticker, string
    '''
    
    ### BEGIN SOLUTION
    df = pd.read_csv( DATA_DIR + "/" + ticker + ".csv", index_col=DATE_ATTR)
    return df[[PRICE_ATTR]]
    ### END SOLUTION

aapl, spy = read_ticker('AAPL'), read_ticker("SPY")

# Have a look at the data

We will not go through all steps in the Recipe, nor in depth.

But here's a peek

In [None]:
# Print your results
print("AAPL: ", aapl.head())
print("SPY: ", spy.head())

Expected outputs should be similar to this:   
AAPL:    
<table> 
    <tr> 
        <td>Dt</td><td>Adj Close</td>
    </tr>
    <tr> 
        <td>2017-01-03</td><td>110.9539</td>
    </tr>
    <tr> 
        <td>2017-01-04</td><td>110.8297</td>
    </tr>
    <tr> 
        <td>2017-01-05</td><td>111.3933</td>
    </tr>
    <tr> 
        <td>2017-01-06</td><td>112.6351</td>
    </tr>
    <tr> 
        <td>2017-01-09</td><td>113.6668</td> 
    </tr>
</table>   
SPY:     
<table> 
    <tr> 
        <td>Dt</td><td>Adj Close</td>
    </tr>
    <tr> 
        <td>2017-01-03</td><td>213.8428</td> 
    </tr>
    <tr> 
        <td>2017-01-04</td><td>215.1149</td>
    </tr>
    <tr> 
        <td>2017-01-05</td><td>214.9440</td>
    </tr>
    <tr> 
        <td>2017-01-06</td><td>215.7131</td> 
    </tr>
    <tr> 
        <td>2017-01-09</td><td>215.0010</td>
    </tr>
</table>

In [None]:
# Print the Start time and End time
print("Start time: ", aapl.index.min())
print("End time: ", aapl.index.max())

# Prepare the data

In Finance, it is very typical to work with *relative changes* (e.g., percent price change)
rather than *absolute changes* (price change) or *levels* (prices).

Without going into too much detail
- Relative changes are more consistent over time than either absolute changes or levels
- The consistency can facilitate the use of data over a longer time period

For example, let's suppose that prices are given in units of USD (dollar)
- A price change of 1 USD is more likely for a stock with price level 100 than price level 10
    - A relative change of $1/100 = 1 %$ is more likely than a change of $1/10 = 10%$
    - So relative changes are less dependent on price level than either price changes or price levels
    
    
To compute the *return* (percent change in prices)
 for ticker $\aapl$ (Apple) on date $t$

$$
\begin{array}[lll]\\
\ret_\aapl^\tp = \frac{\price _\aapl^\tp}{\price _\aapl^{(t-1)}} -1 \\
\text{where} \\
\price_\aapl^\tp \text{ denotes the price of ticker } \aapl \text{ on date } t \\
\ret_\aapl^\tp \text{ denotes the return of ticker } \aapl \text{ on date } t
\end{array}
$$


## Transformations: transform the data

Our first task is to transform the data from price levels (Adj Close)
to Percent Price Changes.

**Note**

We will need to apply **identical** transformations to both the training and test data examples.

In the cells that immediately follow, we will do this only for both traning and test data examples.

**Question:**
- Complete function `add_ret()` to compute the returns of tickers. Name the column of returns "Return"

**Hint:**
- look up the Pandas `pct_change()` method    

In [None]:
RET_ATTR = 'Return'
PRICE_ATTR = 'Adj Close'

def add_ret(ticker_df):
    '''
    Add a return column for ticker price
    
    Arguments:
    ticker_df: Adjusted price data of your ticker
    '''
    ### BEGIN SOLUTION
    ticker_df[RET_ATTR] = ticker_df[PRICE_ATTR].pct_change()
    return ticker_df
    ### END SOLUTION

aapl_ret = add_ret(aapl)
spy_ret = add_ret(spy)
print("AAPL", aapl_ret.head())
print("SPY", spy_ret.head())

Expected outputs should be similar to this:   
AAPL:    
<table> 
    <tr> 
        <td>Dt</td><td>Adj Close</td><td>Return</td>
    </tr>
    <tr> 
        <td>2017-01-03</td><td>110.9539</td><td>NaN</td> 
    </tr>
    <tr> 
        <td>2017-01-04</td><td>110.8297</td><td>-0.001119</td> 
    </tr>
    <tr> 
        <td>2017-01-05</td><td>111.3933</td><td>0.005085</td> 
    </tr>
    <tr> 
        <td>2017-01-06</td><td>112.6351</td><td>0.011148</td> 
    </tr>
    <tr> 
        <td>2017-01-09</td><td>113.6668</td><td>0.009160</td> 
    </tr>
</table>   
SPY:     
<table> 
    <tr> 
        <td>Dt</td><td>Adj Close</td><td>Return</td>
    </tr>
    <tr> 
        <td>2017-01-03</td><td>213.8428</td><td>NaN</td> 
    </tr>
    <tr> 
        <td>2017-01-04</td><td>215.1149</td><td>0.005949</td> 
    </tr>
    <tr> 
        <td>2017-01-05</td><td>214.9440</td><td>-0.000794</td> 
    </tr>
    <tr> 
        <td>2017-01-06</td><td>215.7131</td><td>0.003578</td> 
    </tr>
    <tr> 
        <td>2017-01-09</td><td>215.0010</td><td>-0.003301</td> 
    </tr>
</table>

## Select a specific year 

We only want the returns for the year 2018; discard any other return


**Questions:**
- Complete function `select_yr()` to select the data in 2018 
- Complete function `summarize()` to get the number of returns, return on the earliest data, return on the lastest data and the average return
- Replace the 0 values in the following cell with your answers, and execute the print statements

In [None]:
ticker_returns = np.array([]) # Returns of the ticker for year 2018
idx_returns = np.array([])    # Returns of the index for year 2018

num_returns = 0  # Number of returns in year 2018
first_return = 0 # The return on the earliest date in 2018
last_return  = 0 # The return on the latest date in 2018
avg_return  = 0  # The average return over the  year 2018

def select_yr(ticker_df, year):
    '''
    Select the returns which are in 2018 year, discard any other return
    
    Arguments:
    ticker_df: ticker data with a column Return
    year: the year you want to select
    '''
    ### BEGIN SOLUTION
    df_yr = ticker_df[ ticker_df[RET_ATTR].notnull() ][ ( str(year) + "-01-01"):(str(year) + "-12-31")]
    return df_yr
    ### END SOLUTION

def summarize(return_df, ATTR):
    '''
    Fetch the number of returns, return on the earliest data, return on the lastest data and the average return
    
    Argument:
    return_df: returns in 2018
    ATTR: name of column you want to summarize
    '''
    ### BEGIN SOLUTION
    num_returns = len(return_df)
    first_return = return_df[ATTR].iloc[0]  # The return on the earliest date 
    last_return  = return_df[ATTR].iloc[-1] # The return on the latest date
    avg_return   = return_df[ATTR].mean()  # The average return
    
    return num_returns, first_return, last_return, avg_return
    ### END SOLUTION

# Select 2018
aapl_ret_2018, spy_ret_2018 = select_yr(aapl_ret, 2018), select_yr(spy_ret, 2018)

# Assign to answer variables
ticker_returns = aapl_ret_2018[ RET_ATTR ].values
index_returns = spy_ret_2018[ RET_ATTR ].values
num_returns, first_return, last_return, avg_return = summarize( aapl_ret_2018, RET_ATTR)

print("There are {num:d} returns. First={first:.3%}, Last={last:.3%}, Avg={avg:.3%}".format(num=num_returns, first=first_return, last=last_return, avg=avg_return))

In [None]:
### BEGIN HIDDEN TESTS
assert np.allclose( [num_returns], [251], rtol=1e-05, atol=1e-03 )
assert np.allclose( [first_return * 100], [ 1.790 ], rtol=1e-05, atol=1e-03 )
assert np.allclose( [last_return * 100], [ 0.966 ], rtol=1e-05, atol=1e-03 )
assert np.allclose( [avg_return * 100], [ - 0.006 ], rtol=1e-05, atol=1e-03 )

# test the ticker_returns and index_returns
def select_yr_(ticker_df, year):
    df_yr = ticker_df[ ticker_df[RET_ATTR].notnull() ][ ( str(year) + "-01-01"):(str(year) + "-12-31")]
    return df_yr
# Select 2018
aapl_ret_2018_, spy_ret_2018_ = select_yr_(aapl_ret, 2018), select_yr_(spy_ret, 2018)
assert np.allclose( ticker_returns, aapl_ret_2018_[ RET_ATTR ].values )
assert np.allclose( index_returns, spy_ret_2018_[ RET_ATTR ].values )
### END HIDDEN TESTS

## Split into Train and Test datasets

Right now you have transformed the data and taken the data in one specific period. To prepare dataset for training and test, you need to split the data into two sets by choosing the members of each set at random.

To facilitate grading for this assignment, we will *use a specific test set*
- the training set are the returns for the months of January through September (inclusive), i.e., 9 months
- the test set are the returns for the months of October through December (inclusive), i.e., 3 months

Thus, you will be using the early part of the data for training, and the latter part of the data for testing.

**Question:**   
- Complete the function `split()` to split you dataset into train set (2018/01/01 - 2018/09/30) and test set (2018/10/01 - 2018/12/31)
- Replace the 0 values in the following cell with your answers, and execute the print statements

In [None]:
train_ticker_returns = np.array([]) # Returns of the ticker for training period
train_idx_returns    = np.array([]) # Returns of the index for training period

train_num_returns = 0  # Number of returns in train set
train_first_return = 0 # The return on the earliest date in train set
train_last_return  = 0 # The return on the latest date in train set
train_avg_return  = 0  # The average return over the  year train set

test_num_returns = 0  # Number of returns in test set
test_first_return = 0 # The return on the earliest date in test set
test_last_return  = 0 # The return on the latest date in test set
test_avg_return  = 0  # The average return over the  year test set

def split(return_df):
    '''
    Split data into train and test dataset
    
    Arguments:
    return_df: dataset you want to split
    '''
    ### BEGIN SOLUTION
    df_train = return_df["2018-01-01":"2018-09-30"]
    df_test  = return_df["2018-10-01":"2018-12-31"]
    
    return df_train, df_test
    ### END SOLUTION

aapl_train, aapl_test = split(aapl_ret_2018)
spy_train, spy_test = split(spy_ret_2018)

# Assign to answer variables
### BEGIN SOLUTION
train_num_returns, train_first_return, train_last_return, train_avg_return = summarize( aapl_train, RET_ATTR)

test_num_returns, test_first_return, test_last_return, test_avg_return = summarize( aapl_test, RET_ATTR)

# Assign to answer variables
train_ticker_returns, train_idx_returns = aapl_train.values, spy_train.values

### END SOLUTION

print("Train set: There are {num:d} returns. First={first:.2%}, Last={last:.2%}, Avg={avg:.2%}".format(num=train_num_returns, 
                                                                                                         first=train_first_return, 
                                                                                                         last=train_last_return, 
                                                                                                         avg=train_avg_return))

print("Test set: There are {num:d} returns. First={first:.2%}, Last={last:.2%}, Avg={avg:.2%}".format(num=test_num_returns, 
                                                                                                         first=test_first_return, 
                                                                                                         last=test_last_return, 
                                                                                                         avg=test_avg_return))

Your expected outputs should be:    
Train set: There are 188 returns. First=1.79%, Last=0.35%, Avg=0.17%     
Test set: There are 63 returns. First=0.67%, Last=0.97%, Avg=-0.53%

In [None]:
### BEGIN HIDDEN TESTS
def split_(return_df):
    df_train = return_df["2018-01-01":"2018-09-30"]
    df_test  = return_df["2018-10-01":"2018-12-31"]
    
    return df_train, df_test
aapl_train_, aapl_test_ = split_(aapl_ret_2018)
spy_train_, spy_test_ = split_(spy_ret_2018)
assert np.allclose( train_ticker_returns, aapl_train_.values )
assert np.allclose( train_idx_returns,    spy_train_.values )
### END HIDDEN TESTS

# regression

Use Linear Regression to predict the return of a ticker from the return of the $\spy$ index.
For example, for ticker $\aapl$

$$
\ret_\aapl^\tp =  \beta_{\aapl, \spy} * \ret_\spy^\tp + \epsilon_{\aapl}^\tp
$$

That is
- each example is a pair consisting of one day's return 
    - of the ticker (e.g., $\aapl$).  This is the target (e.g, $\y$ in our lectures)
    - of the index $\spy$. This is a feature vector of length 1 (e.g., $\x$ in our lectures)

You will use Linear Regression to solve for parameter $\beta_{\aapl, \spy}$ 

- In the lectures we used the symbol $\Theta$ to denote the parameter vector; here we use $\mathbf{\beta}$
- In Finance the symbol $\beta$ is often used to denote the relationship between returns. 
- You may should add an "intercept" so that the feature vector is length 2 rather than length 1
    - $\x^\tp = \begin{pmatrix}
        1 \\
        \ret_\spy^\tp
        \end{pmatrix}$
- Report the $\mathbf{\beta}$ parameter vector you obtain for $\aapl$
    - you will subsequently do this for another ticker in a different part of the assignment
        - so think ahead: you may want to parameterize your code
        - change the assignment to `ticker` when you report the next part


**Questions:**
- Complete the function `createModel()` to build your linear regression model
- Complete the function `regress()` to do regression and return intercept and coefficients
- Replace the 0 values in the following cell with your answers, and execute the print statements


In [None]:
from sklearn import datasets, linear_model

beta_0 = 0    # The regression parameter for the constant
beta_SPY = 0  # The regression parameter for the return of SPY
ticker = "AAPL"

def createModel():
    '''
    Build your linear regression model using sklearn
    '''
    ### BEGIN SOLUTION
    model = linear_model.LinearRegression()
    return model
    ### END SOLUTION

def regress(model, dep_df, ind_df):
    '''
    Do regression using returns of your ticker and index
    
    Arguments:
    model: model you build with method "createModel()"
    dep_df: ticker returns
    ind_df: index returns
    '''
    ### BEGIN SOLUTION
    _= model.fit( ind_df[ [RET_ATTR] ].values, dep_df[ [RET_ATTR] ].values )
    
    return model.intercept_[0], model.coef_[0][0]
    ### END SOLUTION

# Assign to answer variables
### BEGIN SOLUTION
regr = createModel()

beta_0, beta_SPY = regress(regr, aapl_train, spy_train)

### END SOLUTION

print("{t:s}: beta_0={b0:3.3f}, beta_SPY={b1:3.3f}".format(t=ticker, b0=beta_0, b1=beta_SPY))

Your expected outputs should be:
<table> 
    <tr> 
        <td>  
            beta_0
        </td>
        <td>
         0.001
        </td>
    </tr>
    <tr> 
        <td>
            beta_SPY
        </td>
        <td>
         1.071
        </td>
    </tr>

</table>

In [None]:
### BEGIN HIDDEN TESTS
def createModel_():
    model = linear_model.LinearRegression()
    return model

def regress_(model, dep_df, ind_df):
    _= model.fit( ind_df[ [RET_ATTR] ].values, dep_df[ [RET_ATTR] ].values )
    
    return model.intercept_[0], model.coef_[0][0]

model_test = createModel_()
aapl_beta_0, aapl_beta_1 = regress_(model_test, aapl_train, spy_train)
assert np.allclose( [beta_0, beta_SPY], [aapl_beta_0, aapl_beta_1] )
### END HIDDEN TESTS

## Train the model using Cross valiation

Use 5-fold cross validation

**Question:**
- Complete the function `compute_cross_val_avg()` to compute the average score of 5-fold cross validation
- Replace the 0 values in the following cell with your answers, and execute the print statements

**Hint:**  
- You can use the `cross_val_score` in `sklearn.model_selection`

In [None]:
from sklearn.model_selection import cross_val_score

cross_val_avg = 0 # average score of cross validation
k = 5             # 5-fold cross validation

def compute_cross_val_avg(model, dep_df, ind_df, k):
    '''
    Compute the average score of k-fold cross validation
    
    Arguments:
    model: model you build with method "createModel()"
    dep_df: ticker returns
    ind_df: index returns
    k: k-fold cross validation
    '''
    ### BEGIN SOLUTION
    cross_val_score_ = cross_val_score(model, np.expand_dims(ind_df[RET_ATTR].values, axis=1), dep_df[RET_ATTR].values, cv=k)
    return np.mean(cross_val_score_)
    ### END SOLUTION

    
cross_val_avg = compute_cross_val_avg(regr, aapl_train, spy_train, 5)
print("{t:s}: Avg cross val score = {sc:3.2f}".format(t=ticker, sc=cross_val_avg) )

In [None]:
### BEGIN HIDDEN TESTS
def compute_cross_val_avg_(model, dep_df, ind_df, k):
    cross_val_score_ = cross_val_score(model, np.expand_dims(ind_df[RET_ATTR].values, axis=1), dep_df[RET_ATTR].values, cv=k)
    return np.mean(cross_val_score_)
cross_val_avg_ = compute_cross_val_avg_(regr, aapl_train, spy_train, 5)
assert np.allclose(cross_val_avg, cross_val_avg_)
### END HIDDEN TESTS

## Evaluate Loss (in sample RMSE) and Performance (Out of sample RMSE)

**Question:**
- Complete the function `computeRMSE()` to calculate the Root of Mean Square Error (RMSE)
- Replace the 0 values in the following cell with your answers, and execute the print statements

In [None]:
from sklearn.metrics import mean_squared_error

rmse_in_sample = 0 # in sample loss
rmse_out_sample = 0 # out of sample performance

# Predicted  in-sample returns of AAPL using SPY index
aapl_predicted_in_sample = regr.predict(np.expand_dims(spy_train[RET_ATTR].values, axis=1))
# Predicted out-of-sample returns of AAPL using SPY index
aapl_predicted_out_sample = regr.predict(np.expand_dims(spy_test[RET_ATTR].values, axis=1))

def computeRMSE( target, predicted ):
    '''
    Calculate the RMSE
    
    Arguments:
    target: real ticker returns
    predicted: predicted ticker returns
    '''
    ### BEGIN SOLUTION
    rmse = np.sqrt( mean_squared_error(target,  predicted))
    return rmse
    ### END SOLUTION
    
rmse_in_sample = computeRMSE(aapl_train[RET_ATTR].values, aapl_predicted_in_sample)
rmse_out_sample = computeRMSE(aapl_test[RET_ATTR].values, aapl_predicted_out_sample)

print("In Sample Root Mean squared error: {:.3f}".format( rmse_in_sample ) )
print("Out of Sample Root Mean squared error: {:.3f}".format( rmse_out_sample ) )

In [None]:
### BEGIN HIDDEN TESTS
def computeRMSE_( target, predicted ):
    rmse = np.sqrt( mean_squared_error(target,  predicted))
    return rmse
rmse_in_sample_test = computeRMSE_(aapl_train[RET_ATTR].values, regr.predict(np.expand_dims(spy_train[RET_ATTR].values, axis=1)))
rmse_out_sample_test = computeRMSE_(aapl_test[RET_ATTR].values, regr.predict(np.expand_dims(spy_test[RET_ATTR].values, axis=1)))
assert np.allclose(rmse_in_sample, rmse_in_sample_test)
assert np.allclose(rmse_out_sample, rmse_out_sample_test)
### END HIDDEN TESTS

## Hedged returns

Why is being able to predict the return of a ticker, given the return of another instrument (e.g., the market proxy) useful ?
- It **does not** allow us to predict the future
    - To predict $\ret_\aapl^\tp$, we require the same day return of the proxy $\ret_\spy$
- It **does** allow us to predict how much $\aapl$ will outperform the market proxy

Consider an investment that goes long (i.e, holds a positive quantity of $\aapl$
- Since the relationship between returns is positive
    - You will likely make money if the market goes up
    - You will likely lose money if the market goes down
    
Consider instead a *hedged* investment
- Go long 1 USD of $\aapl$
- Go short (hold a negative quantity) $\beta_{\aapl,\spy}$ USD of the market proxy $\spy$

Your *hedged return* on this long/short portfolio will be
$$
{\ret'}_{\aapl}^\tp = \ret_\aapl^\tp - \beta_{\aapl, \spy} * \ret_\spy^\tp
$$

As long as
$$
\ret_\aapl^\tp \gt \beta_{\aapl, \spy} * \ret_\spy^\tp
$$
you will make a profit, regardless of whether the market proxy rises or falls !

That is: you make money as long as $\aapl$ *outperforms* the market proxy.


This hedged portfolio is interesting
- Because your returns are independent of the market
- The volatility of your returns is likely much lower than the volatility of the long-only investment
- There is a belief that it is difficult to predict the market $\ret_\spy$
- But you might be able to discover a ticker (e.g., $\aapl$) that will outpeform the market

This is a real world application of the Regression task in Finance.

## Compute the hedged return on the test data examples
$$
{\ret'}_{\aapl}^\tp = \ret_\aapl^\tp - \beta_{\aapl, \spy} * \ret_\spy^\tp
$$
for all dates $t$ in the **test set**.  

**Question:**
- Complete the function `compute_hedged_series` to get the hedged series
- Replace the 0 values in the following cell with your answers

In [None]:
hedged_num_returns = 0  # Number of returns in hedged series
hedged_first_return = 0 # The return on the earliest date in hedged series
hedged_last_return  = 0 # The return on the latest date in hedged series
hedged_avg_return  = 0  # The average return over the hedged series

def compute_hedged_series(model, dep_df, ind_df):
    '''
    Compute the hedged series
    
    Arguments:
    model: model you build with method "createModel()"
    dep_df: ticker returns in test dataset
    ind_df: index returns in test dataset
    '''
    ### BEGIN SOLUTION
    ind_val = ind_df[RET_ATTR].values
    hedged_series = dep_df[RET_ATTR].values - model.coef_[0][0] * ind_val
    hedged_series = pd.DataFrame({'Return':hedged_series}, index=dep_df.index)
    return hedged_series
    ### END SOLUTION

hedged_series = compute_hedged_series(regr, aapl_test, spy_test)
hedged_num_returns, hedged_first_return, hedged_last_return, hedged_avg_return = summarize(hedged_series, RET_ATTR)
ticker="AAPL"
print("{t:s} hedged returns: There are {num:d} returns. First={first:.2%}, Last={last:.2%}, Avg={avg:.2%}".format(t=ticker,
                                                                                                                    num=hedged_num_returns,
                                                                                                                    first=hedged_first_return, 
                                                                                                                    last=hedged_last_return, 
                                                                                                                    avg=hedged_avg_return))

In [None]:
### BEGIN HIDDEN TESTS
assert np.allclose( [hedged_num_returns], [63], rtol=1e-04, atol=1e-02 )
assert np.allclose( [hedged_first_return * 100], [ 0.30 ], rtol=1e-04, atol=1e-02 )
assert np.allclose( [hedged_last_return * 100], [ 0.03 ], rtol=1e-04, atol=1e-02 )
assert np.allclose( [hedged_avg_return * 100], [ -0.29 ], rtol=1e-04, atol=1e-02 )
### END HIDDEN TESTS

# $\fb$ regression

Repeat the regression you carried out for $\aapl$ but this time instead for the ticker $\fb$ (Facebook)

**Motivation**

The idea is to encourage you to build re-usable pieces of code.

So if you created some functions in solving Part 1, you may reuse these functions to easily solve part 2,
particulary if you treated the ticker (e.g., $\aapl$ or $\fb$) as a parameter to your functions.

If you simply copy and paste the code from Part 1 you will only get partial credit.


**Question:**
- Compute the intercept and coefficients of your model using new ticker "FB"
- Replace the 0 values in the following cell with your answers

In [None]:
beta_0 = 0    # The regression parameter for the constant
beta_SPY = 0  # The regression parameter for the return of SPY
ticker = "FB"

### BEGIN SOLUTION
fb = read_ticker(ticker)
fb_ret = add_ret(fb)
fb_ret_2018 = select_yr(fb_ret, 2018)
fb_train, fb_test = split(fb_ret_2018)
new_regr = createModel()
beta_0, beta_SPY = regress(new_regr, fb_train, spy_train)
### END SOLUTION
print("{t:s}: beta_0={b0:3.2f}, beta_SPY={b1:3.2f}".format(t=ticker, b0=beta_0, b1=beta_SPY))


Your expected outputs should be:
<table> 
    <tr> 
        <td>  
            beta_0
        </td>
        <td>
         -0.00
        </td>
    </tr>
    <tr> 
        <td>
            beta_SPY
        </td>
        <td>
         1.29
        </td>
    </tr>

</table>

In [None]:
### BEGIN HIDDEN TESTS
fb_test_ = read_ticker(ticker)
add_ret(fb_test_)
fb_2018_test = select_yr_(fb_test_, 2018)
fb_train_test, fb_test_test = split_(fb_2018_test)
new_regr_ = createModel_()
beta_0_test, beta_SPY_test = regress_(new_regr_, fb_train_test, spy_train)
assert np.allclose([beta_0, beta_SPY], [beta_0_test, beta_SPY_test], rtol=1e-05, atol=1e-02)
### END HIDDEN TESTS

## Train the model using Cross valiation

Use 5-fold cross validation

**Question:**
- Replace the 0 values in the following cell with your answers

In [None]:
cross_val_avg = 0

### BEGIN SOLUTION
cross_val_avg = compute_cross_val_avg(new_regr, fb_train, spy_train, 5)
### END SOLUTION
print("{t:s}: Avg cross val score = {sc:3.2f}".format(t=ticker, sc=cross_val_avg) )

In [None]:
### BEGIN HIDDEN TESTS
cross_val_avg_test = compute_cross_val_avg_(new_regr, fb_train, spy_train, 5)
assert np.allclose([cross_val_avg], [cross_val_avg_test])
### END HIDDEN TESTS

## Evaluate Loss (in sample RMSE) and Performance (Out of sample RMSE)

**Question:**  
- Compute the in sample RMSE and out of sample RMSE

In [None]:
rmse_in_sample = 0 # in sample loss
rmse_out_sample = 0 # out of sample performance

# Predicted  in-sample returns of FB using SPY index
fb_predicted_in_sample = new_regr.predict(np.expand_dims(spy_train[RET_ATTR].values, axis=1))
# Predicted out-of-sample returns of FB using SPY index
fb_predicted_out_sample = new_regr.predict(np.expand_dims(spy_test[RET_ATTR].values, axis=1))


### BEGIN SOLUTION
rmse_in_sample = computeRMSE(fb_train[RET_ATTR].values, fb_predicted_in_sample)
rmse_out_sample = computeRMSE(fb_test[RET_ATTR].values, fb_predicted_out_sample)
### END SOLUTION

print("In Sample Root Mean squared error: {:.3f}".format( rmse_in_sample ) )
print("Out of Sample Root Mean squared error: {:.3f}".format( rmse_out_sample ) )

In [None]:
### BEGIN HIDDEN TESTS
rmse_in_sample_test = computeRMSE_(fb_train[RET_ATTR].values, fb_predicted_in_sample)
rmse_out_sample_test = computeRMSE_(fb_test[RET_ATTR].values, fb_predicted_out_sample)

assert np.allclose(rmse_in_sample, rmse_in_sample_test)
assert np.allclose(rmse_out_sample, rmse_out_sample_test)
### END HIDDEN TESTS

## Compute the hedged return on the test data examples
**Question:**
- Replace the 0 values in the following cell with your answers, and execute the print statements

In [None]:
hedged_num_returns = 0  # Number of returns in hedged series
hedged_first_return = 0 # The return on the earliest date in hedged series
hedged_last_return  = 0 # The return on the latest date in hedged series
hedged_avg_return  = 0  # The average return over the hedged series

### BEGIN SOLUTION
hedged_series = compute_hedged_series(new_regr, fb_test, spy_test)
hedged_num_returns, hedged_first_return, hedged_last_return, hedged_avg_return = summarize(hedged_series, RET_ATTR)
### END SOLUTION
ticker="FB"
print("{t:s} hedged returns: There are {num:d} returns. First={first:.2%}, Last={last:.2%}, Avg={avg:.2%}".format(t=ticker,
                                                                                                                    num=hedged_num_returns,
                                                                                                                    first=hedged_first_return, 
                                                                                                                    last=hedged_last_return, 
                                                                                                                    avg=hedged_avg_return))


In [None]:
### BEGIN HIDDEN TESTS
assert np.allclose( [hedged_num_returns], [63], rtol=1e-04, atol=1e-02 )
assert np.allclose( [hedged_first_return * 100], [ -1.68 ], rtol=1e-04, atol=1e-02 )
assert np.allclose( [hedged_last_return * 100], [ -2.72 ], rtol=1e-04, atol=1e-02 )
assert np.allclose( [hedged_avg_return * 100], [ -0.04 ], rtol=1e-04, atol=1e-02 )
### END HIDDEN TESTS