# 8. Beta

## 8.1 Estimating Beta

### 8.1.1 Description

***CAPM regression/one-factor market model regression***:
$$
r_{i,t} = \alpha _{i}+\beta _{i}MKT_{t}+\varepsilon _{i,t}
$$
| Data Frequencies | Estimation Period          | Minimum Valid Data points           |
| ---------------- | -------------------------- | ----------------------------------- |
| Daily Return     | 1, 3, 6, **12**, 24 months | 15, 50, 100, **200**, 450 daily obs |
| Monthly Retun    | 1, 2, 3, **5** years       | 10, 20, 24, **24** monthly obs      |

*Most common: (12, 200) & (5, 24)*

***Scholes and Willams (1997): account for nonsynchronous trading***:
$$
r_{i,t} = a_{i}+b_{i}^{-}MKT_{t-1}+e_{i,t}^{-} 
$$
$$
r_{i,t} = a_{i}+b_{i}MKT_{t}+e_{i,t}
$$
$$
r_{i,t} = a_{i}+b_{i}^{+}MKT_{t+1}+e_{i,t}^{+} 
$$
and define
$$
\beta _{i}^{SW}=\frac{\hat b_{i}^{-}+\hat b_{i}+\hat b_{i}^{+}}{1+2\rho}
$$
$\rho$: first-order serial correlation of $MKT$
| Data Frequencies | Estimation Period          | Minimum Valid Data points           |
| ---------------- | -------------------------- | ----------------------------------- |
| Daily Return     | 12 months, [t-11, t]       | 200 daily obs                       |

***Dimson(1979): infrequently traded***
$$
r_{i,t} = \alpha _{i}+\sum _{k=-5}^{k=5} b_{i}^{k}MKT_{t+k}+\varepsilon _{i,t}
$$
$$
\beta _{i}^{D}=\sum _{k=-5}^{k=5} \hat b_{i}^{k}
$$
| Data Frequencies | Estimation Period          | Minimum Valid Data points           |
| ---------------- | -------------------------- | ----------------------------------- |
| Daily Return     | 12 months, [t-11, t]       | 200 daily obs                       |

### 8.1.2 Codes

In [1]:
# Import
import pandas as pd
import numpy as np
from numba import *
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (10, 6) #set default figure size
from pandas.tseries.offsets import *
import pyreadstat

In [2]:
# Parameters
file_path = "E:\Study\Data\CRSP_Stock\Data\Source\CRSP_M.sas7bdat"
start_date = '1963-6-30'
end_date = '2012-12-31'

# Read Data
crsp, meta = pyreadstat.read_sas7bdat(file_path)
crsp.columns = crsp.columns.map(lambda x: x.lower())

# Define the Sample
# crsp_m['date']=pd.to_datetime(crsp_m['date'].astype(str))
crsp['date']=pd.to_datetime(crsp['date'], format='%Y-%m-%d')
crsp['jdate']=crsp['date'] + MonthEnd(0) # Line up date to be end of month
crsp = crsp[(crsp['jdate'] >= start_date) & (crsp['jdate'] <= end_date)]

# change variable format to int
crsp[['permco','permno']] = crsp[['permco','permno']].astype(int)

# Fama-French Monthly Factor Data
ff_m, meta = pyreadstat.read_sas7bdat("E:\Study\Data\CRSP_Stock\Data\Source\FF_Factor_M.sas7bdat")
ff_m.columns = ff_m.columns.map(lambda x: x.lower())
ff_m['dateff']=pd.to_datetime(ff_m['dateff'], format='%Y-%m-%d')
ff_m['jdate'] = ff_m['dateff'] + MonthEnd(0)
crsp = pd.merge(crsp, ff_m[['jdate', 'mktrf', 'rf']], how='left', on=['jdate'])

In [3]:
# Delisting Adjustment
dlstcd_list = [500, 520] + list(range(551,575)) + [580, 584]
dlstcd_in_list = crsp['dlstcd'].map(lambda x: x in dlstcd_list)
dlret_isna = crsp['dlret'].isna()
dlstcd_isna = crsp['dlstcd'].isna()
ret_isna = crsp['ret'].isna()
crsp.loc[(dlstcd_in_list) & (dlret_isna), 'dlret'] = -0.3
crsp.loc[(- dlstcd_in_list) & (- dlstcd_isna) & (dlret_isna), 'dlret'] = -1

crsp['retadj'] = crsp['ret']
crsp.loc[- crsp['dlret'].isna(), 'retadj'] = crsp.loc[- crsp['dlret'].isna(), 'dlret'] # can't use dlret_isna now

In [4]:
# Year and Month Variables
crsp['year'] = crsp['date'].dt.year
crsp['month'] = crsp['date'].dt.month

# Excess Stock Returns
crsp['ex_retadj'] = crsp['retadj'] - crsp['rf']
crsp['ex_ret'] = crsp['ret'] - crsp['rf']

# Market Values
crsp['me'] = crsp['shrout'] * crsp['altprc'].abs() / 1000 # measured in millions of dollars
# crsp['me'] = crsp['shrout'] * crsp['prc'].abs() / 1000

# Market Factor
crsp['ex_mkt'] = crsp['vwretd'] - crsp['rf']


# U.S.-based common stocks: sharecode ('SHRCD') = 10 or 11
crsp.query('shrcd == 10 or shrcd == 11', inplace=True)

In [5]:
# Beta Function
def calc_beta(df, min_periods):

    # drop all rows with any NaN values
    # df = df[~np.isnan(df).any(axis=1)] # numba not support advanced index for array
    mask = np.isnan(df)
    m,n = mask.shape
    nan_rows = []
    for i in range(m):
        for j in range(n):
            if mask[i, j] == 1:
                nan_rows.append(i)
    all_rows = list(range(m))
    keep_rows = list(set(all_rows).difference(set(nan_rows)))
    df = df[np.array(keep_rows),:]
    
    # rebuild a contigous numpy array for faster reg
    x = np.ascontiguousarray(df[:,0:1]) # first column is the market
    x = np.concatenate((np.ones_like(x), x), axis=1)
    y = np.ascontiguousarray(df[:,1])

    if len(df) >= min_periods:
        b=(np.linalg.pinv(x.T @ x)) @ x.T @ y
        return b[0], b[1]
    else:
        return np.nan, np.nan

In [6]:
# Rolling Estimation
def rolling_beta(df, window, min_obs):

    df = df.set_index('date')
    df.index = pd.DatetimeIndex(df.index)

    capm = pd.DataFrame(df['permno'])
    capm[['alpha', 'beta']] = np.nan

    grp = df.groupby('permno')
    for stock, sub_df in grp:
        sub2_df = sub_df[['mkt_ret', 'ret']].sort_index() 
        result = sub2_df.rolling(window, min_periods=min_obs, method="table").apply(calc_beta, raw=True, engine="numba", args=(min_obs,))
        capm.loc[capm.permno == stock, ['alpha', 'beta']] = result.values

    return capm

In [7]:
crsp_ret = crsp[['permno', 'jdate', 'ex_ret', 'ex_mkt']].rename(columns={'jdate':'date', 'ex_ret':'ret', 'ex_mkt':'mkt_ret'})

In [8]:
windows = ['365d', '730d', '1095d', '1825d']
min_obs = [10, 20, 24, 24]
beta = pd.DataFrame()
for win, min in zip(windows, min_obs):
    beta_estimation = rolling_beta(crsp_ret, window=win, min_obs=min)
    print(beta_estimation.groupby('date')['beta'].describe().mean())

count    4451.003361
mean        1.126461
std         1.351209
min        -9.633205
25%         0.362513
50%         1.001369
75%         1.777139
max        16.612498
dtype: float64
count    4078.082353
mean        1.129708
std         0.945934
min        -4.823656
25%         0.535169
50%         1.025617
75%         1.620098
max         9.329323
dtype: float64
count    3956.416807
mean        1.129041
std         0.817727
min        -3.489991
25%         0.596798
50%         1.035354
75%         1.564901
max         7.246288
dtype: float64
count    3992.662185
mean        1.128494
std         0.737462
min        -3.353787
25%         0.641002
50%         1.043523
75%         1.526090
max         6.363536
dtype: float64
