# Tutorial 9: Data Analysis in-class practices
### 9.1 Datetime objects in Pandas
- We may convert a Series of date strings into ```Datetime``` object by using ```pd.to_datetime``` function.


In [2]:
import pandas as pd
date_str = pd.Series(["2020-11-08","2020-11-15","2020-11-22","2020-11-29","2020-12-06"])
date1 = pd.to_datetime(date_str, format="%Y-%m-%d") #convert string into datetime object
print(date1)

0   2020-11-08
1   2020-11-15
2   2020-11-22
3   2020-11-29
4   2020-12-06
dtype: datetime64[ns]


- Series of Datetime objects can be converted to date strings using ```Series.dt.strftime``` method. The argument consists of a string of format codes. \
Selected Format codes are listed as follows:
  * ```%Y```: year in 4 digits
  * ```%m```: month in 2 digits
  * ```%b```: abbreviated month name
  * ```%d```: day in 2 digits
  * ```%a```: abbreviated weekday name
  * ```%A```: full weekday name
  * ```%H```: hour in a day (24-hours system)
  * ```%M```: minute in an hour
  * ```%S```: second in a minute

In [3]:
date_str2 = date1.dt.strftime("%a %d %b %Y")
print(date_str2)

0    Sun 08 Nov 2020
1    Sun 15 Nov 2020
2    Sun 22 Nov 2020
3    Sun 29 Nov 2020
4    Sun 06 Dec 2020
dtype: object


In [4]:
date_str3 = date1.dt.strftime("%d/%m/%Y %H:%M")
print(date_str3)

0    08/11/2020 00:00
1    15/11/2020 00:00
2    22/11/2020 00:00
3    29/11/2020 00:00
4    06/12/2020 00:00
dtype: object


### 9.2 Practice questions on Data Analysis
Q1: Please download ```stock_px_2.csv``` and ```risk_premium.csv``` files from Moodle and load them into Pandas Dataframes with name ```stock``` and ```risk_premium``` respectively. The first column with dates should be set as DatetimeIndex. 

In [5]:
stock = pd.read_csv("stock_px_2.csv",index_col=0, parse_dates=True)
risk_premium = pd.read_csv("risk_premium.csv",index_col=0, parse_dates=True)
print(stock.head())
print(risk_premium.head())

            AAPL   MSFT    XOM     SPX
2003-01-02  7.40  21.11  29.22  909.03
2003-01-03  7.45  21.14  29.24  908.59
2003-01-06  7.45  21.52  29.96  929.01
2003-01-07  7.43  21.93  28.95  922.93
2003-01-08  7.28  21.31  28.83  909.93
            MKT-RF     RF
2003-01-02    3.14  0.005
2003-01-03   -0.11  0.005
2003-01-06    2.13  0.005
2003-01-07   -0.63  0.005
2003-01-08   -1.34  0.005


Q2: Merge the columns of ```AAPL```, ```MSFT``` and ```XOM``` from ```stock``` DataFrame with all columns from ```risk_premium``` DataFrame into ```Data2``` DataFrame. \
```AAPL```, ```MSFT``` and ```XOM``` columns represent daily data of stock prices, ```MKT-RF``` column represents the daily data of market risk premium in percentage points and ```RF``` column represents daily risk-free return in percentage points.

In [6]:
Data2 = pd.merge(left = stock[["AAPL", "MSFT", "XOM"]], right = risk_premium, left_index=True, right_index=True)
print(Data2.head())

            AAPL   MSFT    XOM  MKT-RF     RF
2003-01-02  7.40  21.11  29.22    3.14  0.005
2003-01-03  7.45  21.14  29.24   -0.11  0.005
2003-01-06  7.45  21.52  29.96    2.13  0.005
2003-01-07  7.43  21.93  28.95   -0.63  0.005
2003-01-08  7.28  21.31  28.83   -1.34  0.005


Q3: Write a function to compute CAPM alpha and CAPM market beta for the specifed stock using ordinary linear regression (OLS). \
The function should include the following arguments:
 1. ```data```: Input DataFrame
 2. ```yvar```: Column name from the input DataFrame, representing the stock price of the specified stock for evaluating CAPM parameters
 3. ```xvar```: Column name from the input DataFrame, representing the market risk premium.
 
The output should be a Pandas Series with CAPM alpha and CAPM market beta estimates of the specified stock. \
Hint: Regress daily stock return against market risk premium to obtain CAPM parameters.

In [9]:
import statsmodels.api as sm
def regress(data, yvar, xvar):
    #calculate arithmetic return of the stock in yvar
    daily_return = data[yvar].pct_change()
    mkt_risk_premium = data[xvar]/100
    mkt_risk_premium = sm.add_constant(mkt_risk_premium)
    
    daily_return_minus_rf = daily_return - data["RF"]/100
    #remove observations with nan when running OLS regression
    model = sm.OLS(daily_return_minus_rf, mkt_risk_premium, missing="drop")
    results = model.fit()
    output = results.params
    return output

Q4: Use the function in Q3 to evaluate CAPM alpha and CAPM market beta for Apple (```AAPL```), Microsoft (```MSFT```) and ExxonMobil (```XOM```). Output should be presented as a single DataFrame.

In [11]:
CAPM_output = [regress(Data2, ticker, "MKT-RF") for ticker in ["AAPL", "MSFT", "XOM"]]
CAPM_df = pd.DataFrame(CAPM_output, index=["AAPL", "MSFT", "XOM"])
CAPM_df.rename(columns={'const':'CAPM_alpha', 'MKT-RF':'CAPM_beta'}, inplace=True)
print(CAPM_df)

      CAPM_alpha  CAPM_beta
AAPL    0.001781   1.036262
MSFT   -0.000047   0.933148
XOM     0.000264   0.933813


In [15]:
daily_return = Data2["AAPL"].pct_change()
mkt_risk_premium = Data2["MKT-RF"]/100
mkt_risk_premium = sm.add_constant(mkt_risk_premium)

daily_return_minus_rf = daily_return - Data2["RF"]/100
#remove observations with nan when running OLS regression
model = sm.OLS(daily_return_minus_rf, mkt_risk_premium, missing="drop")
results = model.fit()
print(results.summary())
print(f"The R-squared value in this regression is {results.rsquared}.")

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.323
Model:                            OLS   Adj. R-squared:                  0.323
Method:                 Least Squares   F-statistic:                     1056.
Date:                Wed, 29 Nov 2023   Prob (F-statistic):          9.55e-190
Time:                        14:29:27   Log-Likelihood:                 5502.0
No. Observations:                2213   AIC:                        -1.100e+04
Df Residuals:                    2211   BIC:                        -1.099e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0018      0.000      4.158      0.0

The following lines of code generate the regression table of CAPM regression for each stock:

In [16]:
daily_return = Data2["MSFT"].pct_change()
mkt_risk_premium = Data2["MKT-RF"]/100
mkt_risk_premium = sm.add_constant(mkt_risk_premium)

daily_return_minus_rf = daily_return - Data2["RF"]/100
#remove observations with nan when running OLS regression
model = sm.OLS(daily_return_minus_rf, mkt_risk_premium, missing="drop")
results = model.fit()
print(results.summary())
print(f"The R-squared value in this regression is {results.rsquared}.")

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.501
Model:                            OLS   Adj. R-squared:                  0.500
Method:                 Least Squares   F-statistic:                     2215.
Date:                Wed, 29 Nov 2023   Prob (F-statistic):               0.00
Time:                        14:29:47   Log-Likelihood:                 6553.5
No. Observations:                2213   AIC:                        -1.310e+04
Df Residuals:                    2211   BIC:                        -1.309e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const      -4.701e-05      0.000     -0.177      0.8

In [17]:
daily_return = Data2["XOM"].pct_change()
mkt_risk_premium = Data2["MKT-RF"]/100
mkt_risk_premium = sm.add_constant(mkt_risk_premium)

daily_return_minus_rf = daily_return - Data2["RF"]/100
#remove observations with nan when running OLS regression
model = sm.OLS(daily_return_minus_rf, mkt_risk_premium, missing="drop")
results = model.fit()
print(results.summary())
print(f"The R-squared value in this regression is {results.rsquared}.")

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.564
Model:                            OLS   Adj. R-squared:                  0.563
Method:                 Least Squares   F-statistic:                     2855.
Date:                Wed, 29 Nov 2023   Prob (F-statistic):               0.00
Time:                        14:30:13   Log-Likelihood:                 6832.5
No. Observations:                2213   AIC:                        -1.366e+04
Df Residuals:                    2211   BIC:                        -1.365e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0003      0.000      1.126      0.2