<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Preliminaries" data-toc-modified-id="Preliminaries-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Preliminaries</a></span></li><li><span><a href="#Common-uses-for-regression-results" data-toc-modified-id="Common-uses-for-regression-results-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Common uses for regression results</a></span><ul class="toc-item"><li><span><a href="#Conditional-predictions" data-toc-modified-id="Conditional-predictions-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Conditional predictions</a></span></li><li><span><a href="#Unconditional-predictions" data-toc-modified-id="Unconditional-predictions-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Unconditional predictions</a></span></li><li><span><a href="#Variance-decomposition" data-toc-modified-id="Variance-decomposition-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Variance decomposition</a></span></li></ul></li><li><span><a href="#Interaction-effects-between-explanatory-variables" data-toc-modified-id="Interaction-effects-between-explanatory-variables-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Interaction effects between explanatory variables</a></span></li><li><span><a href="#Non-linear-explanatory-variables" data-toc-modified-id="Non-linear-explanatory-variables-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Non-linear explanatory variables</a></span></li></ul></div>

# Preliminaries

In [None]:
# Import packages
import pandas as pd
import numpy as np
import yfinance as yf
import pandas_datareader as pdr
import statsmodels.api as sm

In [None]:
# Load Fama-French factor data
ff3f = pdr.DataReader('F-F_Research_Data_Factors', 'famafrench', '2012-01-01')[0]/100
ff3f.head(2)

Load data on TSLA and clean it:

In [None]:
    # Download monthly prices (keep only Adjusted Close prices)
firm_prices = yf.download('TSLA', '2012-12-01', '2020-12-31', interval = '1mo')['Adj Close'].dropna().to_frame()

    # Calculate monthly returns, drop missing, convert from Series to DataFrame
firm_ret = firm_prices.pct_change().dropna()

    # Rename "Adj Close" to "TSLA"
firm_ret.rename(columns = {'Adj Close': 'TSLA'}, inplace = True)

    # Convert index to monthly period date
firm_ret.index = firm_ret.index.to_period('M')
firm_ret.head(2)

In [None]:
# Merge the two datasets
data = firm_ret.join(ff3f)
data['const'] = 1
data.head(2)

In [None]:
# Set up the data
    # Dependent variable (left side of the equal sign)
y = data['TSLA'] - data['RF']
y.head(2)

In [None]:
    # Independent variable(s) (right side of the equal sign)
X = data[['const','Mkt-RF']]
X.head(2)

In [None]:
# Run regression and store results in "res" object
res = sm.OLS(y,X).fit()
print(res.summary())

**Challenge**:

Estimate the Fama-French three factor model using the data gathered above

# Common uses for regression results

Assume that we ran a regression of the form:


$$y_t = \alpha + \beta \cdot x_t + \epsilon_t $$

In the CAPM regression we ran above, $y_t$ is the excess return on TSLA and $x_t$ is the excess return on the market:

$$R_{i,t} - R_{f,t} = \alpha_i + \beta_i (R_{m,t} - R_{f,t}) + \epsilon_{i,t}$$


## Conditional predictions

We can use the results of our regression to estimate what we should expect the value of the dependent variable to be, if we knew the value of the independent variable(s). Mathematically, this is given by:

$$ E[y_t | x_t] = \alpha + \beta \cdot x_t $$

**Example**:

Using the results from the single-factor regression above, what is the expected excess return of TSLA if the market excess return is 2%? 

In [None]:
# Extract coefficients from the results object

In [None]:
# Conditional prediction

**Challenge**:

Using the results from the three-factor regression above, what is the expected excess return of TSLA if the market excess return is 2%, the SMB return -1% and the HML return is 0.5%? 

In [None]:
# Extract params

In [None]:
# Prediction

## Unconditional predictions

We can use the results of our regression to estimate what we should expect the value of the dependent variable to be, using our best guess for the value of the independent variable(s). Mathematically, this is given by:

$$ E[y_t] = \alpha + \beta \cdot E[x_t] $$

**Example**:

Using the results from the regression above, what is the expected excess return of TSLA (i.e the risk premium on TSLA)? To answer this question, we must first estimate $E[R_m - R_f]$ (i.e. the market risk premium). We do so by taking an average of the excess returns on the market over a very long time (below we use the last 90 years). 

In [None]:
# Download 100 years of data on market excess returns

In [None]:
# Estimate (monthly) market risk premium

In [None]:
# Estimate TSLA risk premium

**Challenge**:

Estimate the risk-premium of TSLA using the three-factor model, and risk-premia estimated using the last 90 years of data.

In [None]:
# Estimate risk-premia

In [None]:
# Estimate TSLA risk premium

## Variance decomposition

The regression results can allow us to decompose the total variance of the dependent variable into the portion that can be explained by the variance in the explanatory variables and the portion that can not be explained by these variables. Mathematically, the regression equation implies:

$$ Var[Y] = \beta^2 \cdot Var[X] + Var[\epsilon] $$

**Example**:

Using the results from the regression above, calculate the total variance of TSLA, as well as its systematic variance and its idiosyncratic variance.

In [None]:
# Total risk of tesla (variance)

In [None]:
# Systematic risk

In [None]:
# Idiosyncratic risk

In [None]:
# Another way of calculating idiosyncratic risk (=variance of residuals (epsilon) from the regression )

In [None]:
# Print all three of them out

In [None]:
# Print as percentages of total risk

**Challenge**:

Using the Fama-French three factor model, what percentage of TSLA total risk is diversifiable and what percentage is undiversifiable?

# Interaction effects between explanatory variables

In some circumstances, we might want our linear regression model to allow the relation between two variables to depend on a third variable:

$$ Y_t = \alpha + (\beta + \gamma \cdot Z_t) \cdot X_t + \delta \cdot Z_t + \epsilon_t $$

Note that the effect of X on Y (i.e. $\beta + \gamma \cdot Z_t$) depends on the value of a third variable ($Z_t$).

The regression above is often written (equivalently) as:


$$ Y_t = \alpha + \beta  \cdot X_t + \gamma \cdot Z_t \cdot X_t + \delta \cdot Z_t + \epsilon_t $$

where the $Z_t \cdot X_t$ term is called the **interaction** between the X and Z variables. This interaction term needs to be constructed in the data before we run our regression (by taking the product of X and Z). 

**Dummy variables** (or "indicator" variables) are variables that take only the values 0 or 1. They are often used in interaction terms (as the $Z$ variable above) to test if the relation between the main variables of interest (Y and X) is significantly different when some particular condition is satisfied (i.e. Z will equal 1 when the condition is satisfied and 0 when it is not).

**Example**:

Using the same data as in the regressions above, test if: 
1. TSLA's alpha is significantly different before 2015 than after 2015. 
2. TSLA's beta is significantly different before 2015 than after 2015. 

In this example, the $Z_t$ variable will have a value of 0 before 2015 and a value of 1 after 2015. So, before 2015, the equation above becomes 

$$ Y_t = \alpha + \beta  \cdot X_t  + \epsilon_t $$

and after 2015, it becomes 

$$ Y_t = \alpha + \delta +  (\beta + \gamma) X_t +  \epsilon_t $$

Hence, $\delta$ tells us the difference between the firm's alpha after 2015 and its alpha before 2015. And $\gamma$ tells us the difference between the firm's beta after 2015 and its beta before 2015.

In [None]:
# Create dummy variable that = 1 after 2015 and 0 before

In [None]:
# Create interaction term

In [None]:
# Is the beta of the firm significantly different (at the 5% level) after 2015?

In [None]:
# Is the alpha of the firm significantly different (at the 5% level) after 2015?

# Non-linear explanatory variables

In some circumstances we might want to test if there is a significant non-linear relationship between two variables of interest. For example, to test for a quadratic relation between Y and X, we can run the following regression:

$$ Y_t = \alpha + \beta \cdot X_t + \gamma \cdot X_t^2 + \epsilon_t$$

The $X^2$ variable needs to be created ahead of time in the data, before we run the regression. 

**Example**:

Using the market model above, test if there is a significant quadratic relation between TSLA excess returns and market excess returns.

In [None]:
# Create quadratic market excess returns