In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import pandas_datareader.data as web

def get_factors(factors='CAPM',freq='daily'):   
    
    if freq=='monthly':
        freq_label=''
    else:
        freq_label='_'+freq


    if factors=='CAPM':
        fama_french = web.DataReader("F-F_Research_Data_Factors"+freq_label, "famafrench",start="1921-01-01")
        daily_data = fama_french[0]
    
     
        df_factor = daily_data[['RF','Mkt-RF']] 
    elif factors=='FF3':
        fama_french = web.DataReader("F-F_Research_Data_Factors"+freq_label, "famafrench",start="1921-01-01")
        daily_data = fama_french[0]

        df_factor = daily_data[['RF','Mkt-RF','SMB','HML']]
    elif factors=='FF5':

        fama_french = web.DataReader("F-F_Research_Data_Factors"+freq_label, "famafrench",start="1921-01-01")
        daily_data = fama_french[0]

        df_factor = daily_data[['RF','Mkt-RF','SMB','HML']]
        fama_french2 = web.DataReader("F-F_Research_Data_5_Factors_2x3"+freq_label, "famafrench",start="1921-01-01")
        daily_data2 = fama_french2[0]

        df_factor2 = daily_data2[['RMW','CMA']]
        df_factor=df_factor.merge(df_factor2,on='Date',how='outer')    
        
    else:
        fama_french = web.DataReader("F-F_Research_Data_Factors"+freq_label, "famafrench",start="1921-01-01")
        daily_data = fama_french[0]

        df_factor = daily_data[['RF','Mkt-RF','SMB','HML']]
        fama_french2 = web.DataReader("F-F_Research_Data_5_Factors_2x3"+freq_label, "famafrench",start="1921-01-01")
        daily_data2 = fama_french2[0]

        df_factor2 = daily_data2[['RMW','CMA']]
        df_factor=df_factor.merge(df_factor2,on='Date',how='outer')   
        fama_french = web.DataReader("F-F_Momentum_Factor"+freq_label, "famafrench",start="1921-01-01")
        df_factor=df_factor.merge(fama_french[0],on='Date')
        df_factor.columns=['RF','Mkt-RF','SMB','HML','RMW','CMA','MOM']    
    if freq=='monthly':
        df_factor.index = pd.to_datetime(df_factor.index.to_timestamp())
    else:
        df_factor.index = pd.to_datetime(df_factor.index)
        


    return df_factor/100

**Bet sizing under uncertainty**

This is the optimal IF YOU ARE 100% CONFIDENT of your alphas

But of course you don't really know, so the industry developed many Ad-hoc approaches for **bet sizing**



**Bet sizing** is one of the great skills in a portfolio manager because it requires instinct for uncertainty 

The different Approaches (w is a scalar that controls the overall size of the portfolio )

- **Mean-variance** rule:
  
$$W_i=w \frac{\alpha_i}{\sigma_{\epsilon,i}^2}$$

- **1/N** rule: ignore the magnitude of the alpha and simply bet on the direction of your idea

$$W_i=\frac{1}{N}\left((\alpha_i>0)-(\alpha_i<0)\right)$$

  * this is good if you have good hunches for mispricing, but you don't get the magnitudes quite right

- **Proportional** rule: Buy/sell proportional to the alpha

$$W_i=w \alpha_i$$

- **Risky-parity** rule: assumes the Appraisal ratio of your different ideas are the same $\frac{\mu_i}{\sigma_i}=\frac{\mu_j}{\sigma_j}$


$$W_i=w \frac{1}{\sigma_{\epsilon,i}}$$


- **Minimum-Variance** rule: 

$$W_i=w \frac{1}{\sigma_{\epsilon,i}^2}$$

  - This assumes alphas are all the same and focus on using the information in the variance matrix to boost the Sharpe ratio
  

- **Variance shrinkage** rule: you shrink the variance-covariance matrix towards a particular value

$$W_i=w \frac{\alpha_i}{\sigma_{\epsilon,i}^2(1-\tau)+\tau\sigma_{shrink}^2}$$

  - where $\tau\in[0,1]$ is the shrinkage factor




Some examples you should play with


> ### Stop and Practice
> Solve for optimal weights, maximum Sharpe ratio
> 
> Target vol of 10% annualized
>
> 1. same alpha, same betas, same idio vol
> 2. different alpha, same beta , same idio vol
> 3. same alpha, different betas, same idio vol
> 4. same alpha, same betas, different idio vol


In [None]:
A=np.array([0.3,0.2,0.1,0.05])
B=np.array([1,2,0.5,-0.5])
Sigmae=np.diag([0.4,0.4,0.4,0.4])**2

## Exercise: Investing in characteristic-based factors from the perspective of a CAPM investor

In a few classes we will discuss Multi-factor models and the variety of factors that people in the industry use

Six factors are particularly popular both in the industry and in the academic community

A few of them have ETFS that aim to replicate them, which potentially allow retail investors to get exposure to them cheaply and also industry people to easily hedge their factor exposures. (We will investigate carefully to what extent these ETFs do a good job..coming soon in a theater near you!) 

For now we will use this data to take the perspective of a "CAPM-Investor", i.e. someone that has the market as their risk-factor and see the other factors as non-systematic risk.

>#### Alert
>This will be stylized in the sense that we will use in sample moments, so we cannot really compare Sharpe ratios
>
>Why not? Because by construction the Mean-variance will always beat everyone else.
>
>We use these alternative methods exactly because the in sample moments are often not great guide for the forward looking moments we care about, so a in-sample comparison does not reflect the reality of trading that requires use past information to trade.
>
> We will later discuss how to make this comparison ( it is not rocket science: divide your sample in estimation and testing samples!)


1. We will target Total portfolio with volatility 10%
2. We will get the factors and focus on the sample that we have all the factors available
3. We will estimate out single-factor models for these factors and build our factor model matrixes
4. We will be imposing that the residuals are uncorrelated
   1. This is never true in a particular sample
   2. If you have a good enough factor model, you impose it because whatever correlation is in the sample is noise (it is testable!)
   3. Here this is a toy exercise and these correlations are likely real (for example, mom and HML have shown to be correlated in many different situations)
5. We will  apply the different rules

In [None]:
df_ff6=get_factors('ff6',freq='monthly').dropna()
df_ff6.tail()

Unnamed: 0_level_0,RF,Mkt-RF,SMB,HML,RMW,CMA,MOM
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2024-08-01,0.0048,0.0161,-0.0355,-0.0113,0.0085,0.0086,0.0479
2024-09-01,0.004,0.0174,-0.0017,-0.0259,0.0004,-0.0026,-0.006
2024-10-01,0.0039,-0.0097,-0.0101,0.0089,-0.0138,0.0103,0.0287
2024-11-01,0.004,0.0651,0.0463,-0.0005,-0.0262,-0.0217,0.009
2024-12-01,0.0037,-0.0317,-0.0273,-0.0295,0.0182,-0.011,0.0005


In [None]:
import statsmodels.api as sm

# Define the factors and the market factor
factors = ['SMB', 'HML', 'RMW', 'CMA', 'MOM']
market_factor = 'Mkt-RF'

# Initialize lists to store the results
Alpha = []
Beta = []
residuals = []
Alpha_se = []
# Run univariate regressions
for factor in factors:
    X = sm.add_constant(df_ff6[market_factor])
    y = df_ff6[factor]
    model = sm.OLS(y, X).fit()
    Alpha.append(model.params['const'])
    Beta.append(model.params[market_factor])
    residuals.append(model.resid)

# Convert Alpha and Beta to numpy arrays
Alpha = np.array(Alpha)
Beta = np.array(Beta)

# Calculate the variance-covariance matrix of the residuals
#under the assumption that the residuals are uncorrelated ( they are not!)

residuals_matrix = np.vstack(residuals).T
Sigma_e = np.diag(np.diag(np.cov(residuals_matrix.T)))

# Display the results
print("Alpha:", Alpha*12)
print("Beta:", Beta)
print("Sigma_e:", Sigma_e*12)



Alpha: [0.00570763 0.04279558 0.04072549 0.04250479 0.08432384]
Beta: [ 0.20237897 -0.13569538 -0.09332071 -0.16729257 -0.15829589]
Sigma_e: [[0.01021215 0.         0.         0.         0.        ]
 [0.         0.01031929 0.         0.         0.        ]
 [0.         0.         0.00568454 0.         0.        ]
 [0.         0.         0.         0.00447722 0.        ]
 [0.         0.         0.         0.         0.02048366]]


Mean Variance


In [None]:
# The zeros below need to be replaced by the actual values!

VolTarget=0.3/12**0.5 # making it monthly as the data
W=Alpha@np.linalg.inv(Sigma_e)
print(W)
# optimal RELATIVE weights, need to calibrate the volatility
# compute the variance of the W portoflio
VarW=0
print(VarW)
#adjusting the weights to meet the volatility target
w= 0
print(w)
# Ww is our final weights with the volatility target
Ww=w*W
# final weights
print(Ww)

# check vol (must be 30%)

vol=0

# market exposure

Portfolio_beta=0

# amount to buy in the market to hedge it completely
h=-Portfolio_beta

AppraisalRatio=0

print(f"your volatility is {vol}")
print(f"Your Appraisal Ratio is {AppraisalRatio}")
print(f"So your optimal portfolio with market neutral exposure is \n {Ww[0]} in SMB, {Ww[1]} in HML, {Ww[2]} in RMW, {Ww[3]} in CMA, {Ww[4]} in MOM, and {h} in Mkt-RF")



[0.55890596 4.14714181 7.16424937 9.49356702 4.1166404 ]
0
0
[0. 0. 0. 0. 0.]
your volatility is 0
Your Appraisal Ratio is 0
So your optimal portfolio with market neutral exposure is 
 0.0 in SMB, 0.0 in HML, 0.0 in RMW, 0.0 in CMA, 0.0 in MOM, and 0 in Mkt-RF


In [None]:
# lets wrap it up in a function

def sizing(W,Alpha,Sigma_e,Beta,VolTarget=0.3/12**0.5):
    # need to replace the zeros below with the actual values!
    VarW=0
    #adjusting the weights to meet the volatility target
    w= 0
    Ww=w*W
    vol=0
    print(f"your volatility is {vol}")
    Portfolio_beta=0
    h=-Portfolio_beta
    AppraisalRatio=0
    print(f"So your optimal portfolio with market neutral exposure is \n {Ww[0]} in SMB, {Ww[1]} in HML, {Ww[2]} in RMW, {Ww[3]} in CMA, {Ww[4]} in MOM, and {h} in Mkt-RF")
    print(f"Your Appraisal Ratio is {AppraisalRatio}")
    return AppraisalRatio, Ww

W=Alpha@np.linalg.inv(Sigma_e)
sr_alpha,W_alpha=sizing(W,Alpha,Sigma_e,Beta)

your volatility is 0
So your optimal portfolio with market neutral exposure is 
 0.0 in SMB, 0.0 in HML, 0.0 in RMW, 0.0 in CMA, 0.0 in MOM, and 0 in Mkt-RF
Your Appraisal Ratio is 0


1/N rule

In [None]:
W=np.ones(5)/5
sr_alpha,W_alpha=sizing(W,Alpha,Sigma_e,Beta)

your volatility is 0
So your optimal portfolio with market neutral exposure is 
 0.0 in SMB, 0.0 in HML, 0.0 in RMW, 0.0 in CMA, 0.0 in MOM, and 0 in Mkt-RF
Your Appraisal Ratio is 0


Proportional rule

In [None]:
W=Alpha
sr_alpha,W_alpha=sizing(W,Alpha,Sigma_e,Beta)

your volatility is 0
So your optimal portfolio with market neutral exposure is 
 0.0 in SMB, 0.0 in HML, 0.0 in RMW, 0.0 in CMA, 0.0 in MOM, and 0 in Mkt-RF
Your Appraisal Ratio is 0


Risky Parity rule

In [None]:
W=np.ones(5)@np.linalg.inv(Sigma_e**0.5)
sr_alpha,W_alpha=sizing(W,Alpha,Sigma_e,Beta)

your volatility is 0
So your optimal portfolio with market neutral exposure is 
 0.0 in SMB, 0.0 in HML, 0.0 in RMW, 0.0 in CMA, 0.0 in MOM, and 0 in Mkt-RF
Your Appraisal Ratio is 0


Minimum Variance rule

In [None]:
W=np.ones(5)@np.linalg.inv(Sigma_e)
sr_alpha,W_alpha=sizing(W,Alpha,Sigma_e,Beta)

your volatility is 0
So your optimal portfolio with market neutral exposure is 
 0.0 in SMB, 0.0 in HML, 0.0 in RMW, 0.0 in CMA, 0.0 in MOM, and 0 in Mkt-RF
Your Appraisal Ratio is 0


Variance Shrinkage rule


- I will shrink them to the the average vol
- I will shrink by 50%
- The idea makes sense when you think the different assets are kind of similar so you think a good chunk of the sample variation is noise

In [None]:
tau=0.5
sigma_alpha=np.mean(Alpha_se)
Sigma=Sigma_e*(1-tau)+tau*np.eye(5)*np.mean(np.diag(Sigma_e))
W=Alpha@np.linalg.inv(Sigma)
sr_alpha,W_alpha=sizing(W,Alpha,Sigma_e,Beta)

your volatility is 0
So your optimal portfolio with market neutral exposure is 
 0.0 in SMB, 0.0 in HML, 0.0 in RMW, 0.0 in CMA, 0.0 in MOM, and 0 in Mkt-RF
Your Appraisal Ratio is 0


  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)


## Portfolios and factor models


Say you have

- a vector of asset excess returns R (N by 1),
- weights W (N by 1),
- factor loadings B (N by 1),
- alphas A (N by 1),
- residuals U (N by 1)
- f is a scalar excess return factor

then I can write my portfolio return as

$$r_p=W.T @R=W.T @(A+Bf+U)$$

Because $E[U]=0$, Thus it follows that

$$E[r_p]=W.T @(A+BE[f])$$

and

$$Var[r_p]=Var(W.T @(A+Bf+U))=(W.T@B)@Var(f)@(W.T@B).T+W.T@Var(U)@W$$

simplifying a little bit we get

$$Var[r_p]=Var(W.T @(A+Bf+U))=(W.T@B)@(B.T@W)*Var(f)+W.T@Var(U)@W$$

-This is the total portfolio variance, What is the portfolio factor risk?




## Risk Profile Analysis


- What each asset is contributing to my overall portfolio risk?

- what is the asset that at the margin contributes the most?

- If I want to reduce my factor risk, what is the asset that would reduce it most efficiently?



That is, if I were to increase/decrease the position in an asset by a little bit  asset how much my risk would change?

$$2B@(B.T@W)Var(f)+2Var(U)@W$$

This is a vector where each entry tells you the increase in risk produced my a marginal increase in each asset

So the highest number in this vector tell us what is the asset that allows us to achieve the highest risk reduction per change in position

> Question: How do you find the asset that would lead to highest reduction in factor risk?


>If you want understand where the formular come from, you need to know a bit of calculus because to answer these questions we differentiate the risk contribution by the weight W.



Lets apply this to our problem

I will estiamte a signle factor model here so we cna use realistic numbers, but we will dsicuss this step next class.

For now just think this as blackbox that is giving you numbers for B, Var(f), Var(U)

In [None]:
import statsmodels.api as sm

def factor_model_estimation(df):
  """
  Runs a regression of each asset in df on the market column, storing the slope coefficients,
  the variance of the market, and the variance of the residuals in a diagonal matrix.

  Args:
    df: A pandas DataFrame containing asset returns, with a 'MKT' column for the market returns.

  Returns:
    A tuple containing:
      - betas: A pandas DataFrame with slope coefficients for each asset.
      - market_variance: The variance of the market returns.
      - residual_variance_matrix: A diagonal matrix containing the variance of the residuals for each asset.
  """

  betas = pd.DataFrame()
  residual_variances = []

  for asset in df.columns:

    # Run the regression
    X = df['MKT']
    Y = df[asset]
    X = sm.add_constant(X)
    model = sm.OLS(Y, X).fit()

    # Store the beta (slope coefficient)
    betas.loc[asset, 'beta'] = model.params['MKT']

    # Store the residual variance
    residual_variances.append(model.resid.var())

  market_variance = df['MKT'].var()
  residual_variance_matrix = np.diag(residual_variances)
  betas=betas.to_numpy()
  betas.shape=(5,1)

  return betas, market_variance, residual_variance_matrix


In [None]:


B, Varf, VarU=factor_model_estimation(Data)
print(B)
print(Varf)
print(VarU)

[[1.        ]
 [0.05671103]
 [0.66599021]
 [0.64903334]
 [0.09602475]]
0.0019496947245542036
[[9.30297741e-35 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00]
 [0.00000000e+00 1.22233384e-03 0.00000000e+00 0.00000000e+00
  0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 2.68475481e-03 0.00000000e+00
  0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 1.36361424e-03
  0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  3.88658814e-04]]


Now lets implement this formula

$$2B@(B.T@W)Var(f)+2Var(U)@W$$




In [None]:
W=np.ones((5,1))/5

In [None]:
2*B @ (B.T @ W) * Varf + 2 * VarU @ W

array([[0.00192455],
       [0.00059808],
       [0.00235563],
       [0.00179454],
       [0.00034027]])