### Assignment 3
###### Submission by Mugunth Dhinesh Kumar and Rahul Krishnani

In [1]:
# Place all your imports here
import yfinance as yf
import pandas_datareader.data as web
import numpy as np
import statsmodels.api as sm
import pandas as pd
from statsmodels.tsa.ar_model import AutoReg

This exercise involves estimating CAPM on: Gold, Exxon, General Electric, IBM, Microsoft, and Walmart. After downloading the price data (for these assets and also for the S&P 500 index and the 4-week treasury bill), you'll conduct the following steps:
- Calculate the monthly returns for each.
- Letting $r_{it}-r_{ft}$ represent the excess return on asset $i$ and $r_{mt}-r_{ft}$ represent the excess return on the market (proxied by the S&P 500), estimate: $r_{it}-r_{ft} = \alpha + \beta(r_{mt}-r_{ft}) + u_t$.
- For each asset test the restrictions $\alpha = 0$ and $\beta = 0$ both individually and jointly.

In [2]:
# This step takes care of downloading the prices
# Adjusted stock prices from Yahoo! Finance
data = yf.download('^GSPC XOM GE IBM MSFT WMT GC=F', start='2005-01-01', end='2020-01-31', interval='1mo')['Adj Close']
# 4-week Treasury Bill rate from FRED
tbill = web.DataReader(['TB4WK'], 'fred', start='2005-01-01', end='2019-12-31')

[*********************100%%**********************]  7 of 7 completed


Create a new dataframe called *ret* that contains the continuous compounded monthly return for each asset in *data*. This should only take one line.

In [3]:
# Your code here
ret=np.log(data).diff()

If you take a look at *ret*, you'll notice that the first row is NaN and the first return is for Feb 2005. This is expected since the first return will be based on the prices in the first two rows. However, since the prices are for the beginning of each month, it means that the first return should really be for Jan 2005. Fix this. After you're done, what was previously the return for Feb 2005 should be now assigned to Jan 2005 and so on for all the remaining months.

In [4]:
# Your code here
ret=ret.shift(-1, axis=0)

Next, let's get to the *tbill* data. Treasury bills are quoted on a 360-day discount basis. We have the 4-week Treasury Bill rate and the rate is quoted on an annualized basis. To convert into the 4-week rate (which is what we need since all our data are monthly), scale the quoted rate by multiplying by 4/52 (or 1/13). The other adjustment to make is that the quote is in percentage while our other returns data is in decimal format. So you'll need to fix this too. After fixing the Treasury rate, join that series with the *ret* dataframe. All of this can be done in one step.

In [5]:
# Your code here
ret["TBILL"]=(tbill*4/52/100)

Now you're ready to estimate CAPM for each of the six assets (don't forget to add the constant for each). There are three steps for each of them:
1. Estimate the model (you can use OLS from statsmodels for this) and print out the summary.
2. Test the restriction $\alpha = 0$ and $\beta = 1$ **individually**.
3. Test the restriction $\alpha = 0$ and $\beta = 1$ **jointly**.
The three code cells below are for Exxon. Follow the same pattern for: GE, IBM, Microsoft, Walmart, Gold. At the end, insert a new Markdown cell and discuss how the $\beta$ for Gold differs from the rest and what it means.

**Important note**: All the returns data have a missing value at the bottom. In addition, the gold data from Yahoo! Finance also has additional missing observations. Don't delete any of these manually. Instead, you can take care of this by using an argument inside the OLS function. Look up the documentation here: https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLS.html#statsmodels.regression.linear_model.OLS

In [6]:
# Estimating the model for all the assets

# Iterating over coloumns in ret
for stock in ret.columns:
    # Skip TBILL and ^GSPC column
    if stock == 'TBILL':
        continue
    if stock == '^GSPC':
        continue    
    
    # setting up data for OLS regression 
    y = ret[stock] - ret['TBILL']
    x = ret['^GSPC'] - ret['TBILL']
    x = sm.add_constant(x)
    
    # Fit the model
    model = sm.OLS(y, x, missing='drop').fit()
    
    # Print results
    print(f"{stock} Results")
    print(model.summary())
    print("\n" * 3)


GC=F Results
                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.005
Model:                            OLS   Adj. R-squared:                 -0.003
Method:                 Least Squares   F-statistic:                    0.6646
Date:                Fri, 05 Apr 2024   Prob (F-statistic):              0.416
Time:                        21:59:17   Log-Likelihood:                 196.35
No. Observations:                 128   AIC:                            -388.7
Df Residuals:                     126   BIC:                            -383.0
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0068      0.005      1

In [7]:
#Test the individual restrictions for Exxon

Test for  $\alpha $ :
- H0: $\alpha = 0$
- H1: $\alpha ≠ 0$

Test for $\beta $ :
- H0: $\alpha = 1$
- H1: $\alpha ≠ 1$

In [8]:
# Test the joint restriction for Exxon

Test for $\alpha $ & $\beta $:
- H0: $\alpha $ = 0 & $\beta $ = 1
- H1: $\alpha $ ≠ 0 & $\beta $ ≠ 1

In [9]:
# Iterate over columns in ret
for stock in ret.columns:
    # Skip TBILL and ^GSPC columns
    if stock in ['TBILL', '^GSPC']:
        continue
    
    # setting up data for OLS regression 
    y = ret[stock] - ret['TBILL']
    x = ret['^GSPC'] - ret['TBILL']
    x = sm.add_constant(x)
    
    # Fitting the model
    model = sm.OLS(y, x, missing='drop').fit()
    
    # Performing individual t-tests on the intercept and slope terms
    t_test_results_intercept = model.t_test([1, 0])
    t_test_results_slope = model.t_test([0, 1])
    
    # Performing F-test on the joint significance of intercept and slope
    f_test_results_joint = model.f_test([[1, 0], [0, 1]])
    
    # Print results
    print(f"{stock} T-Test Results for Intercept")
    print(t_test_results_intercept)
    print(f"{stock} T-Test Results for Slope")
    print(t_test_results_slope)
    print(f"{stock} F-Test Results for Joint Significance")
    print(f_test_results_joint)
    print("\n" * 4)
    

GC=F T-Test Results for Intercept
                             Test for Constraints                             
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
c0             0.0068      0.005      1.458      0.147      -0.002       0.016
GC=F T-Test Results for Slope
                             Test for Constraints                             
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
c0             0.0923      0.113      0.815      0.416      -0.132       0.317
GC=F F-Test Results for Joint Significance
<F test: F=1.4639277730642495, p=0.23523314672277182, df_denom=126, df_num=2>





GE T-Test Results for Intercept
                             Test for Constraints                             
                 coef    std err          t      P>|t|      [0.025 

#### Discussion- Gold's $\beta$
Gold has a $\beta$ co-efficient of 0.09. The individual and joint hypothesis tests reveal a t-stat of 0.815 and f-stat of 1.463. The corresponding p values are 0.416 and 0.235 respectively which are not significant even at the 1% level. This indicates that Gold's returns are not significantly related to the returns of the market index. Further, this could imply that gold is not subject to the same systematic risk factors as the market.

#### AR Model Fitting

For each of the 6 assets, estimate an AR(1) model. From the results of the estimation, create a markdown cell and comment on if the first-order autocorrelation is statistically significant.

In [10]:
# Estimate AR(1) for Exxon
assets = ['GC=F', 'GE', 'IBM', 'MSFT', 'WMT','XOM']
for asset in assets:
    ar1_model = AutoReg(ret[asset], lags=1, missing='drop').fit()
    print(f"Results for {asset}:")
    print(ar1_model.summary())
    print("\n" * 2)

Results for GC=F:
                            AutoReg Model Results                             
Dep. Variable:                   GC=F   No. Observations:                  128
Model:                     AutoReg(1)   Log Likelihood                 194.039
Method:               Conditional MLE   S.D. of innovations              0.053
Date:                Fri, 05 Apr 2024   AIC                           -382.078
Time:                        21:59:18   BIC                           -373.545
Sample:                             1   HQIC                          -378.611
                                  128                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0082      0.005      1.736      0.082      -0.001       0.017
GC=F.L1       -0.0539      0.089     -0.608      0.543      -0.228       0.120
                                  

  self._init_dates(dates, freq)


#### Discussion: First order Auto-correlation of Assets
The first-order autocorrelation coefficients for all assets (Gold, GE, IBM, MSFT, WMT, and XOM) in the autoregressive (AR) models are not statistically significant, with p-values ranging from 0.089 to 0.570. This indicates that there is no significant linear relationship between the current period's return and the return from the previous period for any of the assets. This means the past returns do not significantly predict future returns for these assets based on their AR models. Hence, there is no evidence to suggest significant autocorrelation in the returns of these assets at the first order lag.

Now, for each of the 6 assets, estimate an AR(2) model. Then, for each asset, comment on if this is a better fit compared to the AR(1) model. Explain why.

In [11]:
# Estimate AR(2) for Exxon
assets = ['GC=F', 'GE', 'IBM', 'MSFT', 'WMT', 'XOM']
for asset in assets:
    ar2_model = AutoReg(ret[asset], lags=2, missing='drop').fit()
    print(f"Results for {asset}:")
    print(ar2_model.summary())
    print("\n" * 2)

Results for GC=F:
                            AutoReg Model Results                             
Dep. Variable:                   GC=F   No. Observations:                  128
Model:                     AutoReg(2)   Log Likelihood                 192.830
Method:               Conditional MLE   S.D. of innovations              0.052
Date:                Fri, 05 Apr 2024   AIC                           -377.659
Time:                        21:59:18   BIC                           -366.314
Sample:                             2   HQIC                          -373.050
                                  128                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0075      0.005      1.570      0.117      -0.002       0.017
GC=F.L1       -0.0470      0.089     -0.530      0.596      -0.221       0.127
GC=F.L2        0.1062      0.089  

  self._init_dates(dates, freq)


#### Discussion - AR(2) vs AR(1) model
In the AR1 models, both gold, the GSPC index, and all the stocks exhibit lower AIC, BIC, and HQIC values compared to the AR2 models:

- ⁠Fit: better model fit:
- Complexity: AR1 has better balance between good fit and simplicity than AR2 model
- Interpretability: AR1 is a more suitable choice for time-series analysis
- Characteristics: stronger correlation at lag 1 than at 2 lags