# Tutorial 9: Data Analysis in-class practices part II
### 9.1 CAPM Model exercises
Q1: Please download `stock_px_2.csv` and `risk_premium.csv` files from Moodle and load them into Pandas Dataframes with name `stock` and `risk_premium` respectively. The first column with dates should be set as DatetimeIndex.

`stock_px_2.csv` file contains the daily stock price information of Apple (column `AAPL`), Microsoft (column `MSFT`) and ExxonMobil (column `XOM`) from 2003 to 2011. \
`risk_premium.csv` file contains the daily CAPM risk premium and risk free rate data from 2003 to 2011.

In [1]:
import pandas as pd
import statsmodels.api as sm

In [2]:
stock = pd.read_csv(
    "../data/stock_px_2.csv",
    index_col=0, #takes the first column as index
    parse_dates=True #automatically convert date strings into datetime objects
)
risk_premium = pd.read_csv(
    "../data/risk_premium.csv",
    index_col=0, 
    parse_dates=True
)
print(stock.head())
print(risk_premium.head())

            AAPL   MSFT    XOM     SPX
2003-01-02  7.40  21.11  29.22  909.03
2003-01-03  7.45  21.14  29.24  908.59
2003-01-06  7.45  21.52  29.96  929.01
2003-01-07  7.43  21.93  28.95  922.93
2003-01-08  7.28  21.31  28.83  909.93
            MKT-RF     RF
2003-01-02    3.14  0.005
2003-01-03   -0.11  0.005
2003-01-06    2.13  0.005
2003-01-07   -0.63  0.005
2003-01-08   -1.34  0.005


Q2: Merge the columns of `AAPL`, `MSFT` and `XOM` from `stock` DataFrame with all columns from `risk_premium` DataFrame into `data2` DataFrame. \
`AAPL`, `MSFT` and `XOM` columns represent daily data of stock prices, `MKT-RF` column represents the daily data of market risk premium in percentage points and `RF` column represents daily risk-free return in percentage points.

In [3]:
stock_list = ['AAPL', 'MSFT', 'XOM']
data2 = pd.merge(
    left = stock[stock_list], 
    right = risk_premium, 
    left_index=True, 
    right_index=True
)

In [4]:
print(data2.head(10))

            AAPL   MSFT    XOM  MKT-RF     RF
2003-01-02  7.40  21.11  29.22    3.14  0.005
2003-01-03  7.45  21.14  29.24   -0.11  0.005
2003-01-06  7.45  21.52  29.96    2.13  0.005
2003-01-07  7.43  21.93  28.95   -0.63  0.005
2003-01-08  7.28  21.31  28.83   -1.34  0.005
2003-01-09  7.34  21.93  29.44    1.89  0.005
2003-01-10  7.36  21.97  29.03    0.04  0.005
2003-01-13  7.32  22.16  28.91   -0.12  0.005
2003-01-14  7.30  22.39  29.17    0.55  0.005
2003-01-15  7.22  22.11  28.77   -1.32  0.005


Q3: Create `data3` DataFrame with `MKT-RF` and `RF` columns copied from `data2` DataFrame, then divide by 100. `AAPL`, `MSFT` and `XOM` columns in `data3` DataFrame represents the excess return of each stock over risk free rate. Remove observations with missing values.

In [5]:
# generate a new DataFrame for daily return of each stock
data3 = data2[stock_list].pct_change()
# divide the MKT-RF and RF values by 100 in the new DataFrame
data3[['MKT-RF', 'RF']] = (
    data2[['MKT-RF', 'RF']] 
    / 100
)
# calculate excess return for each stock
data3[stock_list] = (
    data3[stock_list].sub(
        data3["RF"], 
        axis=0 #subtract two columns from the same DataFrame
    )
)
# drop rows with missing values
data3 = data3.dropna()

In [6]:
data3

Unnamed: 0,AAPL,MSFT,XOM,MKT-RF,RF
2003-01-03,0.006707,0.001371,0.000634,-0.0011,0.00005
2003-01-06,-0.000050,0.017925,0.024574,0.0213,0.00005
2003-01-07,-0.002735,0.019002,-0.033762,-0.0063,0.00005
2003-01-08,-0.020238,-0.028322,-0.004195,-0.0134,0.00005
2003-01-09,0.008192,0.029044,0.021109,0.0189,0.00005
...,...,...,...,...,...
2011-10-10,0.051406,0.026286,0.036977,0.0344,0.00000
2011-10-11,0.029526,0.002227,-0.000131,0.0015,0.00000
2011-10-12,0.004747,-0.001481,0.011669,0.0107,0.00000
2011-10-13,0.015515,0.008160,-0.010238,-0.0021,0.00000


Q4: Write a function to compute CAPM alpha and CAPM market beta for the specifed stock using ordinary linear regression (OLS). \
The function should include the following arguments:
 1. ```data```: Input DataFrame
 2. ```yvar```: Column name from the input DataFrame, representing excess returns over risk free rate for the specified stock.
 3. ```xvar```: Column name from the input DataFrame, representing the market risk premium.
 
The output should be a `pd.Series` object with CAPM alpha and CAPM market beta estimates of the specified stock. \
Hint: Regress daily excess return of each stock against market risk premium to obtain CAPM parameters.

In [7]:
def regress(data, yvar, xvar):
    model = sm.OLS(
        data[yvar], #dependent variable
        sm.add_constant(data[xvar]), #independent variable
    )
    results = model.fit()
    out = results.params #Series containing alpha and beta parameters
    out.index =  ["CAPM_alpha", "CAPM_beta"] #rename index to CAPM_alpha and CAPM_beta
    return out

Q5: Use the function in Q4 to evaluate CAPM alpha and CAPM market beta for Apple (`AAPL`), Microsoft (`MSFT`) and ExxonMobil (`XOM`). Output should be presented as a single DataFrame.

In [8]:
CAPM_output = [
    regress(data3, ticker, "MKT-RF") 
    for ticker in stock_list
] #store the regression parameters in each stock into a list
CAPM_df = pd.DataFrame(
    CAPM_output, 
    index=stock_list
)
print(CAPM_df)

      CAPM_alpha  CAPM_beta
AAPL    0.001781   1.036262
MSFT   -0.000047   0.933148
XOM     0.000264   0.933813


The following lines of code generate the regression table of CAPM regression for each stock:

In [9]:
def regress_v2(data, yvar, xvar):
    model = sm.OLS(
        data[yvar], #dependent variable
        sm.add_constant(data[xvar]), #independent variable
    )
    results = model.fit()
    return results

In [15]:
print(regress_v2(data3, "AAPL", "MKT-RF").params)

const     0.001781
MKT-RF    1.036262
dtype: float64


In [12]:
#Rsquared for AAPL CAPM regression
regress_v2(data3, "AAPL", "MKT-RF").rsquared 

np.float64(0.32329646214177665)

In [13]:
#Rsquared for MSFT CAPM regression
regress_v2(data3, "MSFT", "MKT-RF").rsquared 

np.float64(0.5005079089578209)

In [14]:
#Rsquared for XOM CAPM regression
regress_v2(data3, "XOM", "MKT-RF").rsquared 

np.float64(0.5635562501452642)