# Module 2: Factor Models

In this module, we will explore linear factor models (LFMs). 

## Table of Contents:
&nbsp;&nbsp;0. [Motivation of Factor Models](#0)

&nbsp;&nbsp;1. [Introduction to Linear Factor Models](#1)


&nbsp;&nbsp;2. [Factor Model in Asset Return Interpretation](#2)   

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2.0 [Model Setup](#2.0)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2.1 [Plotting and Exploring the data](#2.1)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2.1 [OLS Results](#2.2)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2.2 [OLS Drawbacks](#2.3)


&nbsp;&nbsp;3. [Alternative ML Methods](#3)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3.0 [LASSO Regression](#3.0)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3.1 [LASSO with cross validation](#3.1)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3.2 [Elastic Net](#3.2)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3.3 [Best Subset Regression](#3.3)


&nbsp;&nbsp;4. [Relaxing the time period assumption, Regime Analysis](#4)   

&nbsp;&nbsp;5. [Additional Resources](#5)

&nbsp;&nbsp;6. [User Section](#6)

## 0. Motivation of Factor Models <a class="anchor" id="0"></a>

Factors models are widely used in industry and serve two main purposes.

The first is to reduce the complexity of modeling asset price movements.  For instance, trying to build a model that completely explains stock price movements is near impossible.  In order to build a model for your favorite stock one would need to model supply, demand, sentiment, current and expected future earnings of the stock, news, interest rates, risk premia...

It's near impossible to calibrate such a complicated model!  Instead, factor investors assume that there are N important factors that drive a portion of the asset returns.  They then say that at the portfolio level, asset specific movements can be averaged out, and only those N variables remain.  So to understand what drives the portfolio returns we only need to model the effect of that small number of factors.

Alternativly, understanding the factor loadings of the individual assets allows us to estimate the covariance of our returns.  We state without proof that if one understands the factor loadings and the covariance of the factor returns, one can then compute an estimate for the covariance of the assets themselves.

Finally factor models can also be used for hedging.  We again state without proof that the factor loadings represent the hedging ratio one would use to minimize the volatility of your portfolio.

In this module we will walk though multiple ways of estimating factor loadings, and discuss their relative strenghts and weaknesses.


## 2. Factor Model in Asset Return Interpretation <a class="anchor" id="2"></a>

If $y_t$ represent an asset return at time t, the linear factor model can help us interpret the source of the asset return and attribute it to the factor returns.

In this example, we are interested in explaining the asset returns with a five-factor model:

1) World Equity: This factor represents worldwide equity returns.

2) US Treasury: This factor contains return from treasury bonds in United States, the bonds with the least risk.

3) Bond Risk Premia: This is a credit factor that captures extra yield from risky bonds.  Defined as the spread between high risk bonds and US Treasury bonds.

4) Inflation Protection: This is a "style" factor that considers the difference between real and nominal returns, thus balances the need for both.

5) Currency Protection: This is also a "style" factor that includes risk premium for US domestic assets.


### 2.0 Model Setup <a class="anchor" id="2.0"></a>
For the first step, let's import necessary packages and define our functions (for later use):

In [4]:
!pip install cvxpy

Collecting cvxpy
  Downloading cvxpy-1.1.1.tar.gz (990 kB)
[K     |████████████████████████████████| 990 kB 1.9 MB/s eta 0:00:01
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h    Preparing wheel metadata ... [?25ldone
[?25hCollecting scs>=1.1.3
  Downloading scs-2.1.2.tar.gz (3.5 MB)
[K     |████████████████████████████████| 3.5 MB 13.9 MB/s eta 0:00:01
[?25hCollecting osqp>=0.4.1
  Downloading osqp-0.6.1-cp37-cp37m-manylinux2010_x86_64.whl (211 kB)
[K     |████████████████████████████████| 211 kB 31.3 MB/s eta 0:00:01
Collecting ecos>=2
  Downloading ecos-2.0.7.post1-cp37-cp37m-manylinux1_x86_64.whl (147 kB)
[K     |████████████████████████████████| 147 kB 27.4 MB/s eta 0:00:01
[?25hProcessing /home/jovyan/.cache/pip/wheels/56/b0/fe/4410d17b32f1f0c3cf54cdfb2bc04d7b4b8f4ae377e2229ba0/future-0.18.2-py3-none-any.whl
Building wheels for collected packages: cvxpy, scs
  Building wheel for cvxpy (PEP 517) ... [

In [4]:
%cd new_version

/home/jovyan/portfolio-analysis-python/course_3_python_and_machine_learning_for_asset_management/new_version


In [5]:
#import all the necessary packages
import numpy as np #for numerical array data
import pandas as pd #for tabular data
import matplotlib.pyplot as plt #for plotting purposes

%matplotlib inline
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12

import importlib as imp

import FactorModelLib as fm #the code that wraps around the sikitlearn implementations
import config


import warnings
warnings.filterwarnings('ignore')

Next, read our data and check the assets/factors we have:

In [7]:
all_data = pd.read_csv(config.dataPath)
all_data.head()
all_data[config.dateName] = pd.to_datetime(all_data[config.dateName])

### 2.1 Plotting and Exploring the data <a class="anchor" id="2.1"></a>

First things first, let's look at the data.