# Assignment: Using Machine Learning for Hedging

Welcome to the first assignment.  

We will show how Machine Learning can be used in Finance to build multi-asset portfolios that have better risk/return characteristics than a portfolio consisting of a single asset.

# Objectives
We will be using Linear Regression to establish the relationship between the returns of individual equities and "the market".

The purpose of the assignment is two-fold
- to get you up to speed with Machine Learning in general, and `sklearn` in particular
- to get you up to speed with the other programming tools (e.g., Pandas) that will help you in data preparation, etc.

# How to report your answers
I will mix explanation of the topic with tasks that you must complete. Look for 
the string "**Queston**" to find a task that you must perform.
Most of the tasks will require you to assign values to variables and execute a `print` statement.

**Motivation**

If you **do not change** the print statement then the GA (or a machine) can automatically find your answer to each part by searching for the string.


In [None]:
# Standard imports
import pandas as pd
import numpy as np

import os

# The data

The data are the daily prices of a number of individual equities and equity indices.
The prices are arranged in a series in ascending date order (a timeseries).
- There are many `.csv` files for equity or index in the directory `/resource/asnlib/publicdata/data`

## Reading the data

You should get the price data into some sort of data structure.  Pandas DataFrame is super useful
so I recommend that's what you use.

## Preliminary data preparation

In the rest of the assignment we will *not* be working with prices but with *returns* (percent change in prices).
For example, for ticker $\aapl$ (Apple)

$$
\begin{array}[lll]\\
\ret_\aapl^\tp = \frac{\price _\aapl^\tp}{\price _\aapl^{(t-1)}} -1 \\
\text{where} \\
\price_\aapl^\tp \text{ denotes the price of ticker } \aapl \text{ on date } t \\
\ret_\aapl^\tp \text{ denotes the return of ticker } \aapl \text{ on date } t
\end{array}
$$

- You will want to convert the price data into return data
- We only want the returns for the year 2018; discard any other return


**Questions:**
- Complete function `read_ticker()` to load AAPL and SPY data
- Complete function `add_ret()` to calculate the returns of tickers.

**Hints:**
- look up the Pandas `read_csv()` method
- look up the Pandas `pct_change()` method    

In [None]:
DATA_DIR = './data'
if not os.path.isdir(DATA_DIR):
    DATA_DIR  = "../resource/asnlib/publicdata/data"
DATE_ATTR="Dt"
PRICE_ATTR = "Close"
RET_ATTR = "Return"

aapl = pd.DataFrame()
spy = pd.DataFrame()

def read_ticker(ticker):
    '''
    Load the ticker data
    
    Arguments:
    ticker: name of your ticker, string
    '''
    
    ### BEGIN SOLUTION
    df = pd.read_csv( DATA_DIR + "/" + ticker + ".csv", index_col=DATE_ATTR)
    return df
    ### END SOLUTION


def add_ret(ticker_df):
    '''
    Add a return column for ticker price
    
    Arguments:
    ticker_df: Price data of your ticker, pandas.DataFrame
    '''
    ### BEGIN SOLUTION
    ticker_df[RET_ATTR] = ticker_df[PRICE_ATTR].pct_change()
    ### END SOLUTION

    
aapl, spy = read_ticker('AAPL'), read_ticker("SPY")
add_ret(aapl), add_ret(spy)

# Print your results
print("AAPL: ", aapl[[PRICE_ATTR, RET_ATTR]].head())
print("SPY: ", spy[[PRICE_ATTR, RET_ATTR]].head())

Expected outputs should be similar to this:   
AAPL:    
<table> 
    <tr> 
        <td>Dt</td><td>Close</td><td>Return</td>
    </tr>
    <tr> 
        <td>2017-01-03</td><td>116.15</td><td>NaN</td> 
    </tr>
    <tr> 
        <td>2017-01-04</td><td>116.02</td><td>-0.001119</td> 
    </tr>
    <tr> 
        <td>2017-01-05</td><td>116.61</td><td>0.005085</td> 
    </tr>
    <tr> 
        <td>2017-01-06</td><td>117.91</td><td>0.011148</td> 
    </tr>
    <tr> 
        <td>2017-01-09</td><td>118.99</td><td>0.009160</td> 
    </tr>
</table>   
SPY:     
<table> 
    <tr> 
        <td>Dt</td><td>Close</td><td>Return</td>
    </tr>
    <tr> 
        <td>2017-01-03</td><td>225.24</td><td>NaN</td> 
    </tr>
    <tr> 
        <td>2017-01-04</td><td>226.58</td><td>0.005949</td> 
    </tr>
    <tr> 
        <td>2017-01-05</td><td>226.40</td><td>-0.000794</td> 
    </tr>
    <tr> 
        <td>2017-01-06</td><td>227.21</td><td>0.003578</td> 
    </tr>
    <tr> 
        <td>2017-01-09</td><td>226.46</td><td>-0.003301</td> 
    </tr>
</table>

**Questions:**
- Complete function `select_yr()` to select the data in 2018 
- Complete function `summarize()` to get the number of returns, return on the earliest data, return on the lastest data and the average return
- Replace the 0 values in the following cell with your answers, and execute the print statements

In [None]:
ticker_returns = np.array([]) # Returns of the ticker for year 2018
idx_returns = np.array([])    # Returns of the index for year 2018

num_returns = 0  # Number of returns in year 2018
first_return = 0 # The return on the earliest date in 2018
last_return  = 0 # The return on the latest date in 2018
avg_return  = 0  # The average return over the  year 2018

def select_yr(ticker_df, year):
    '''
    Select the returns which are in 2018 year, discard any other return
    
    Arguments:
    ticker_df: ticker data with a column Return
    year: the year you want to select
    '''
    ### BEGIN SOLUTION
    df_yr = ticker_df[ ticker_df[RET_ATTR].notnull() ][ ( str(year) + "-01-01"):(str(year) + "-12-31")]
    return df_yr
    ### END SOLUTION

def summarize(return_df, ATTR):
    '''
    Fetch the number of returns, return on the earliest data, return on the lastest data and the average return
    
    Argument:
    return_df: returns in 2018
    ATTR: name of column you want to summarize
    '''
    ### BEGIN SOLUTION
    num_returns = len(return_df)
    first_return = return_df[ATTR].iloc[0]  # The return on the earliest date 
    last_return  = return_df[ATTR].iloc[-1] # The return on the latest date
    avg_return   = return_df[ATTR].mean()  # The average return
    
    return num_returns, first_return, last_return, avg_return
    ### END SOLUTION

# Select 2018
aapl_2018, spy_2018 = select_yr(aapl, 2018), select_yr(spy, 2018)

# Assign to answer variables
ticker_returns = aapl_2018[ RET_ATTR ].values
index_returns = spy_2018[ RET_ATTR ].values
num_returns, first_return, last_return, avg_return = summarize( aapl_2018, RET_ATTR)

print("There are {num:d} returns. First={first:.3%}, Last={last:.3%}, Avg={avg:.3%}".format(num=num_returns, first=first_return, last=last_return, avg=avg_return))

In [None]:
### BEGIN HIDDEN TESTS
assert np.allclose( [num_returns], [251], rtol=1e-05, atol=1e-03 )
assert np.allclose( [first_return * 100], [ 1.790 ], rtol=1e-05, atol=1e-03 )
assert np.allclose( [last_return * 100], [ 0.967 ], rtol=1e-05, atol=1e-03 )
assert np.allclose( [avg_return * 100], [ - 0.012 ], rtol=1e-05, atol=1e-03 )

# test the ticker_returns and index_returns
def select_yr_(ticker_df, year):
    df_yr = ticker_df[ ticker_df[RET_ATTR].notnull() ][ ( str(year) + "-01-01"):(str(year) + "-12-31")]
    return df_yr
# Select 2018
aapl_2018_, spy_2018_ = select_yr_(aapl, 2018), select_yr_(spy, 2018)
assert np.allclose( ticker_returns, aapl_2018_[ RET_ATTR ].values )
assert np.allclose( index_returns, spy_2018_[ RET_ATTR ].values )
### END HIDDEN TESTS

# Split into Train and Test datasets

In general, you will split the data into two sets by choosing the members of each set at random.

To facilitate grading for this assignment, we will *use a specific test set*
- the training set are the returns for the months of January through September (inclusive), i.e., 9 months
- the test set are the returns for the months of October through December (inclusive), i.e., 3 months

Thus, you will be using the early part of the data for training, and the latter part of the data for testing.

**Question:**   
- Complete the function `split()` to split you dataset into train set (2018/01/01 - 2018/09/30) and test set (2018/10/01 - 2018/12/31)
- Replace the 0 values in the following cell with your answers, and execute the print statements

In [None]:
train_ticker_returns = np.array([]) # Returns of the ticker for training period
train_idx_returns    = np.array([]) # Returns of the index for training period

train_num_returns = 0  # Number of returns in train set
train_first_return = 0 # The return on the earliest date in train set
train_last_return  = 0 # The return on the latest date in train set
train_avg_return  = 0  # The average return over the  year train set

test_num_returns = 0  # Number of returns in test set
test_first_return = 0 # The return on the earliest date in test set
test_last_return  = 0 # The return on the latest date in test set
test_avg_return  = 0  # The average return over the  year test set

def split(return_df):
    '''
    Split data into train and test dataset
    
    Arguments:
    return_df: dataset you want to split
    '''
    ### BEGIN SOLUTION
    df_train = return_df["2018-01-01":"2018-09-30"]
    df_test  = return_df["2018-10-01":"2018-12-31"]
    
    return df_train, df_test
    ### END SOLUTION

aapl_train, aapl_test = split(aapl_2018)
spy_train, spy_test = split(spy_2018)

# Assign to answer variables
### BEGIN SOLUTION
train_num_returns, train_first_return, train_last_return, train_avg_return = summarize( aapl_train, RET_ATTR)

test_num_returns, test_first_return, test_last_return, test_avg_return = summarize( aapl_test, RET_ATTR)

# Assign to answer variables
train_ticker_returns, train_idx_returns = aapl_train.values, spy_train.values

### END SOLUTION

print("Train set: There are {num:d} returns. First={first:.2%}, Last={last:.2%}, Avg={avg:.2%}".format(num=train_num_returns, 
                                                                                                         first=train_first_return, 
                                                                                                         last=train_last_return, 
                                                                                                         avg=train_avg_return))

print("Test set: There are {num:d} returns. First={first:.2%}, Last={last:.2%}, Avg={avg:.2%}".format(num=test_num_returns, 
                                                                                                         first=test_first_return, 
                                                                                                         last=test_last_return, 
                                                                                                         avg=test_avg_return))

Your expected outputs should be:    
Train set: There are 188 returns. First=1.79%, Last=0.35%, Avg=0.16%     
Test set: There are 63 returns. First=0.67%, Last=0.97%, Avg=-0.54%

In [None]:
### BEGIN HIDDEN TESTS
def split_(return_df):
    df_train = return_df["2018-01-01":"2018-09-30"]
    df_test  = return_df["2018-10-01":"2018-12-31"]
    
    return df_train, df_test
aapl_train_, aapl_test_ = split_(aapl_2018)
spy_train_, spy_test_ = split_(spy_2018)
assert np.allclose( train_ticker_returns, aapl_train_.values )
assert np.allclose( train_idx_returns,    spy_train_.values )
### END HIDDEN TESTS

# $\aapl$ regression

Use Linear Regression to predict the return of a ticker from the return of the $\spy$ index.
For example, for ticker $\aapl$

$$
\ret_\aapl^\tp =  \beta_{\aapl, \spy} * \ret_\spy^\tp + \epsilon_{\aapl}^\tp
$$

That is
- each example is a pair consisting of one day's return 
    - of the ticker (e.g., $\aapl$).  This is the target (e.g, $\y$ in our lectures)
    - of the index $\spy$. This is a feature vector of length 1 (e.g., $\x$ in our lectures)

You will use Linear Regression to solve for parameter $\beta_{\aapl, \spy}$ 

- In the lectures we used the symbol $\Theta$ to denote the parameter vector; here we use $\mathbf{\beta}$
- In Finance the symbol $\beta$ is often used to denote the relationship between returns. 
- You may should add an "intercept" so that the feature vector is length 2 rather than length 1
    - $\x^\tp = \begin{pmatrix}
        1 \\
        \ret_\spy^\tp
        \end{pmatrix}$
- Report the $\mathbf{\beta}$ parameter vector you obtain for $\aapl$
    - you will subsequently do this for another ticker in a different part of the assignment
        - so think ahead: you may want to parameterize your code
        - change the assignment to `ticker` when you report the next part


**Questions:**
- Complete the function `createModel()` to build your linear regression model
- Complete the function `regress()` to do regression and return intercept and coefficients
- Complete the function `computeRMSE()` to calculate the Root of Mean Square Error (RMSE)
- Replace the 0 values in the following cell with your answers, and execute the print statements


In [None]:
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error

beta_0 = 0    # The regression parameter for the constant
beta_SPY = 0  # The regression parameter for the return of SPY
ticker = "AAPL"

def createModel():
    '''
    Build your linear regression model using sklearn
    '''
    ### BEGIN SOLUTION
    model = linear_model.LinearRegression()
    return model
    ### END SOLUTION

def regress(model, dep_df, ind_df):
    '''
    Do regression using returns of your ticker and index
    
    Arguments:
    model: model you build with method "createModel()"
    dep_df: ticker returns
    ind_df: index returns
    '''
    ### BEGIN SOLUTION
    _= model.fit( ind_df[ [RET_ATTR] ].values, dep_df[ [RET_ATTR] ].values )
    
    return model.intercept_[0], model.coef_[0][0]
    ### END SOLUTION

def computeRMSE( target, predicted ):
    '''
    Calculate the RMSE
    
    Arguments:
    target: real ticker returns
    predicted: predicted ticker returns
    '''
    ### BEGIN SOLUTION
    rmse = np.sqrt( mean_squared_error(target,  predicted))
    return rmse
    ### END SOLUTION

# Assign to answer variables
### BEGIN SOLUTION
regr = createModel()

beta_0, beta_SPY = regress(regr, aapl_train, spy_train)

### END SOLUTION

print("{t:s}: beta_0={b0:3.3f}, beta_SPY={b1:3.3f}".format(t=ticker, b0=beta_0, b1=beta_SPY))

Your expected outputs should be:
<table> 
    <tr> 
        <td>  
            beta_0
        </td>
        <td>
         0.001
        </td>
    </tr>
    <tr> 
        <td>
            beta_SPY
        </td>
        <td>
         1.071
        </td>
    </tr>

</table>

In [None]:
### BEGIN HIDDEN TESTS
def createModel_():
    model = linear_model.LinearRegression()
    return model

def regress_(model, dep_df, ind_df):
    _= model.fit( ind_df[ [RET_ATTR] ].values, dep_df[ [RET_ATTR] ].values )
    
    return model.intercept_[0], model.coef_[0][0]

def computeRMSE_( target, predicted ):
    rmse = np.sqrt( mean_squared_error(target,  predicted))
    return rmse
model_test = createModel_()
aapl_beta_0, aapl_beta_1 = regress_(model_test, aapl_train, spy_train)
assert np.allclose( [beta_0, beta_SPY], [aapl_beta_0, aapl_beta_1] )
### END HIDDEN TESTS

**Question:**
- Complete the function `compute_cross_val_avg()` to compute the average score of 5-fold cross validation
- Replace the 0 values in the following cell with your answers, and execute the print statements

**Hint:**  
- You can use the `cross_val_score` in `sklearn.model_selection`

In [None]:
from sklearn.model_selection import cross_val_score

cross_val_avg = 0 # average score of cross validation
k = 5             # 5-fold cross validation

def compute_cross_val_avg(model, dep_df, ind_df, k):
    '''
    Compute the average score of k-fold cross validation
    
    Arguments:
    model: model you build with method "createModel()"
    dep_df: ticker returns
    ind_df: index returns
    k: k-fold cross validation
    '''
    ### BEGIN SOLUTION
    cross_val_score_ = cross_val_score(model, np.expand_dims(ind_df[RET_ATTR].values, axis=1), dep_df[RET_ATTR].values, cv=k)
    return np.mean(cross_val_score_)
    ### END SOLUTION

    
cross_val_avg = compute_cross_val_avg(regr, aapl_train, spy_train, 5)
print("{t:s}: Avg cross val score = {sc:3.2f}".format(t=ticker, sc=cross_val_avg) )

In [None]:
### BEGIN HIDDEN TESTS
def compute_cross_val_avg_(model, dep_df, ind_df, k):
    cross_val_score_ = cross_val_score(model, np.expand_dims(ind_df[RET_ATTR].values, axis=1), dep_df[RET_ATTR].values, cv=k)
    return np.mean(cross_val_score_)
cross_val_avg_ = compute_cross_val_avg_(regr, aapl_train, spy_train, 5)
assert np.allclose(cross_val_avg, cross_val_avg_)
### END HIDDEN TESTS

## $\aapl$ hedged returns

- Compute the series
$$
{\ret'}_{\aapl}^\tp = \ret_\aapl^\tp - \beta_{\aapl, \spy} * \ret_\spy^\tp
$$
for all dates $t$ in the test set.  
- Sort the dates in ascending order and plot the timeseries ${\ret}'_{\aapl}$

${\ret}'_{\aapl}$ is called the "hedged return" of $\aapl$
- It is the daily return you would realize if you created a portfolio that was
    - long 1 dollar of $\aapl$
    - short $\beta_{\aapl, \spy}$ dollars of the index $\spy$
- It represents the outperformance of $\aapl$ relative to the index $\spy$
    - $\spy$ is the proxy for "the market" (it tracks the S&P 500 index)
    - The hedged return is the *value added* by going long $\aapl$ rather than just going "long the market"
    - Sometimes referred to as the "alpha" ($\alpha_\aapl$)
- So **if** you are able to correctly forecast that $\aapl$ will have positive outperformance (i.e, have $\alpha_\aapl > 0$ most days)
    - then you can earn a positive return regardless of whether the market ($\spy$) goes up or down !
    - this is much lower risk than just holding $\aapl$ long
    - people will pay you very well if you can really forecast correctly !

**Question:**
- Complete the function `compute_hedged_series` to get the hedged series
- Replace the 0 values in the following cell with your answers

In [None]:
hedged_num_returns = 0  # Number of returns in hedged series
hedged_first_return = 0 # The return on the earliest date in hedged series
hedged_last_return  = 0 # The return on the latest date in hedged series
hedged_avg_return  = 0  # The average return over the hedged series

def compute_hedged_series(model, dep_df, ind_df):
    '''
    Compute the hedged series
    
    Arguments:
    model: model you build with method "createModel()"
    dep_df: ticker returns in test dataset
    ind_df: index returns in test dataset
    '''
    ### BEGIN SOLUTION
    ind_val = ind_df[RET_ATTR].values
    hedged_series = dep_df[RET_ATTR].values - model.coef_[0][0] * ind_val
    hedged_series = pd.DataFrame({'Return':hedged_series}, index=dep_df.index)
    return hedged_series
    ### END SOLUTION

hedged_series = compute_hedged_series(regr, aapl_test, spy_test)
hedged_num_returns, hedged_first_return, hedged_last_return, hedged_avg_return = summarize(hedged_series, RET_ATTR)
ticker="AAPL"
print("{t:s} hedged returns: There are {num:d} returns. First={first:.2%}, Last={last:.2%}, Avg={avg:.2%}".format(t=ticker,
                                                                                                                    num=hedged_num_returns,
                                                                                                                    first=hedged_first_return, 
                                                                                                                    last=hedged_last_return, 
                                                                                                                    avg=hedged_avg_return))

In [None]:
### BEGIN HIDDEN TESTS
assert np.allclose( [hedged_num_returns], [63], rtol=1e-04, atol=1e-02 )
assert np.allclose( [hedged_first_return * 100], [ 0.30 ], rtol=1e-04, atol=1e-02 )
assert np.allclose( [hedged_last_return * 100], [ 0.03 ], rtol=1e-04, atol=1e-02 )
assert np.allclose( [hedged_avg_return * 100], [ -0.29 ], rtol=1e-04, atol=1e-02 )
### END HIDDEN TESTS

# $\fb$ regression

Repeat the regression you carried out for $\aapl$ but this time instead for the ticker $\fb$ (Facebook)

**Motivation**

The idea is to encourage you to build re-usable pieces of code.

So if you created some functions in solving Part 1, you may reuse these functions to easily solve part 2,
particulary if you treated the ticker (e.g., $\aapl$ or $\fb$) as a parameter to your functions.

If you simply copy and paste the code from Part 1 you will only get partial credit.


**Question:**
- Compute the intercept and coefficients of your model using new ticker "FB"
- Replace the 0 values in the following cell with your answers

In [None]:
beta_0 = 0    # The regression parameter for the constant
beta_SPY = 0  # The regression parameter for the return of SPY
ticker = "FB"

### BEGIN SOLUTION
fb = read_ticker(ticker)
add_ret(fb)
fb_2018 = select_yr(fb, 2018)
fb_train, fb_test = split(fb_2018)
new_regr = createModel()
beta_0, beta_SPY = regress(new_regr, fb_train, spy_train)
### END SOLUTION
print("{t:s}: beta_0={b0:3.2f}, beta_SPY={b1:3.2f}".format(t=ticker, b0=beta_0, b1=beta_SPY))


Your expected outputs should be:
<table> 
    <tr> 
        <td>  
            beta_0
        </td>
        <td>
         -0.00
        </td>
    </tr>
    <tr> 
        <td>
            beta_SPY
        </td>
        <td>
         1.29
        </td>
    </tr>

</table>

In [None]:
### BEGIN HIDDEN TESTS
fb_test_ = read_ticker(ticker)
add_ret(fb_test_)
fb_2018_test = select_yr_(fb_test_, 2018)
fb_train_test, fb_test_test = split_(fb_2018_test)
new_regr_ = createModel_()
beta_0_test, beta_SPY_test = regress_(new_regr_, fb_train_test, spy_train)
assert np.allclose([beta_0, beta_SPY], [beta_0_test, beta_SPY_test], rtol=1e-05, atol=1e-02)
### END HIDDEN TESTS

**Question:**
- Replace the 0 values in the following cell with your answers

In [None]:
cross_val_avg = 0

### BEGIN SOLUTION
cross_val_avg = compute_cross_val_avg(new_regr, fb_train, spy_train, 5)
### END SOLUTION
print("{t:s}: Avg cross val score = {sc:3.2f}".format(t=ticker, sc=cross_val_avg) )

In [None]:
### BEGIN HIDDEN TESTS
cross_val_avg_test = compute_cross_val_avg_(new_regr, fb_train, spy_train, 5)
assert np.allclose([cross_val_avg], [cross_val_avg_test])
### END HIDDEN TESTS

**Question:**
- Replace the 0 values in the following cell with your answers, and execute the print statements

In [None]:
hedged_num_returns = 0  # Number of returns in hedged series
hedged_first_return = 0 # The return on the earliest date in hedged series
hedged_last_return  = 0 # The return on the latest date in hedged series
hedged_avg_return  = 0  # The average return over the hedged series

### BEGIN SOLUTION
hedged_series = compute_hedged_series(new_regr, fb_test, spy_test)
hedged_num_returns, hedged_first_return, hedged_last_return, hedged_avg_return = summarize(hedged_series, RET_ATTR)
### END SOLUTION
ticker="FB"
print("{t:s} hedged returns: There are {num:d} returns. First={first:.2%}, Last={last:.2%}, Avg={avg:.2%}".format(t=ticker,
                                                                                                                    num=hedged_num_returns,
                                                                                                                    first=hedged_first_return, 
                                                                                                                    last=hedged_last_return, 
                                                                                                                    avg=hedged_avg_return))


In [None]:
### BEGIN HIDDEN TESTS
assert np.allclose( [hedged_num_returns], [63], rtol=1e-04, atol=1e-02 )
assert np.allclose( [hedged_first_return * 100], [ -1.68 ], rtol=1e-04, atol=1e-02 )
assert np.allclose( [hedged_last_return * 100], [ -2.72 ], rtol=1e-04, atol=1e-02 )
assert np.allclose( [hedged_avg_return * 100], [ -0.03 ], rtol=1e-04, atol=1e-02 )
### END HIDDEN TESTS

# Returns to prices

- You have already computed the predicted returns of $\aapl$ for each date in the test set.
- Create the predicted *price* timeseries for $\aapl$ for the date range in the test set
- Plot (on the same graph) the actual price timeseries of $\aapl$ and the predicted price timeseries.

There is a particular reason that we choose to perform the Linear Regression on returns rather than prices.

It is beyond the scope of this lecture to explain why, but we want to show that we can easily convert
back into prices.

**Question:**
- Replace the 0 values in the following cell with your answers, and execute the print statements

In [None]:
num_prices = 0  # Number of prices in price series
first_price = 0 # The price on the earliest date in price series
last_price  = 0 # The price on the latest date in price series
avg_price  = 0  # The average price over the price series

### BEGIN SOLUTION
def compute_predicted_price(model, dep_train, ind_test):
    '''
    Compute the predicted price based on predicted returns
    
    Arguments:
    model: model you build with method "createModel()"
    dep_train: ticker returns in train dataset, to get the Last price
    ind_df: index returns in test dataset
    '''
    predicted_returns = model.predict(np.expand_dims(ind_test['Return'].values, axis=1))  
    last_price = dep_train[PRICE_ATTR].values[-1]
    returns = predicted_returns + 1.
    predicted_price = np.cumsum(np.insert(returns.reshape(-1), 0, last_price))[1:]
    return pd.DataFrame({'Price': predicted_price})

predicted_price = compute_predicted_price(new_regr, aapl_train, spy_test)
num_prices, first_price, last_price, avg_price = summarize(predicted_price, 'Price')
### END SOLUTION
ticker="AAPL"
print("{t:s} predicted prices: There are {num:d} prices. First={first:3.2f}, Last={last:3.2f}, Avg={avg:3.2f}".format(t=ticker,
                                                                                                                    num=num_prices,
                                                                                                                    first=first_price, 
                                                                                                                    last=last_price, 
                                                                                                                    avg=avg_price))
 

In [None]:
### BEGIN HIDDEN TESTS
assert np.allclose( [num_prices], [63], rtol=1e-04, atol=1e-02 )
assert np.allclose( [first_price], [ 226.74 ], rtol=1e-04, atol=1e-02 )
assert np.allclose( [last_price], [ 288.51 ], rtol=1e-04, atol=1e-02 )
assert np.allclose( [avg_price], [ 257.62 ], rtol=1e-04, atol=1e-02 )
### END HIDDEN TESTS

# Extra credit

The data directory has the prices of many other indices.
- Any ticker in the directory beginning with the letter "X" is an index

Choose *one* index (we'll call it $I$) other than $\spy$ to use as a second feature and compute the Linear Regression

$$
\ret_\aapl^\tp = \beta^T \x + \epsilon_{\aapl}^\tp
$$

where $\x$ is the feature vector
  - $\x^\tp = \begin{pmatrix}
        1 \\
        \ret_\spy^\tp \\
        \ret_I^\tp \\
        \end{pmatrix}$

That is, predict the returns of $\aapl$ in terms of a constant, the returns of $\spy$ and the returns of another index $I$.

**Question**
There is no specified format.  Treat this like an interview question and show off your analytical
and explanatory skills. Be sure to explain how you came about choosing the second index.