In [65]:
import numpy as np
import pandas as pd
import statsmodels.api as sm

df = pd.read_excel(r'C:\Users\slove\OneDrive - The University of Western Ontario\MFE_Python\MFEcodonomics\Data\ceosal2.xls')

In [29]:
# Define the dependent variable (CEO salary) and independent variables
Y = df['salary']
X = df[['comten', 'ceoten', 'sales']] #years w company, years as CEO, sales

# Add a constant term (intercept)
X = sm.add_constant(X)

# Run the OLS regression
model = sm.OLS(Y, X)
results = model.fit() #HC0 gives us robust estimates?

# Display the summary of the regression
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                 salary   R-squared:                       0.177
Model:                            OLS   Adj. R-squared:                  0.163
Method:                 Least Squares   F-statistic:                     12.38
Date:                Tue, 01 Oct 2024   Prob (F-statistic):           2.24e-07
Time:                        20:52:55   Log-Likelihood:                -1362.0
No. Observations:                 177   AIC:                             2732.
Df Residuals:                     173   BIC:                             2745.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        674.1790     89.435      7.538      0.0

In [30]:
# Extract coefficient estimates
coefficients = results.params

# Extract standard errors
standard_errors = results.bse

# Extract 95% confidence intervals
confidence_intervals = results.conf_int(alpha=0.05)

# Combine everything into a single DataFrame for clarity
regression_results = pd.DataFrame({
    'Coefficient': coefficients,
    'Standard Error': standard_errors,
    '95% CI Lower': confidence_intervals[0],
    '95% CI Upper': confidence_intervals[1]
})

# Display the results
regression_results

Unnamed: 0,Coefficient,Standard Error,95% CI Lower,95% CI Upper
const,674.178959,89.434836,497.655043,850.702876
comten,-3.057121,3.504801,-9.974795,3.860554
ceoten,15.626927,6.006818,3.770841,27.483012
sales,0.038581,0.006732,0.025293,0.051869


**Theory A** suggests that a CEO’s overall time with the company should increase their salary: 
- Regression shows that `comten` has a negative, and statistically insignificant effect (-3.06, p-value = 0.384). 
- Results **do not support Theory A**, no evidence time with company meaningfully impacts CEO salary. 

**Theory B** proposes that only time as CEO matters:
- While regression shows that `ceoten` significantly increases salary (15.63, p-value = 0.01), `sales` also have a strong positive effect (0.0386, p-value = 0). 
- Thus, the evidence **partially supports Theory B**, as both CEO tenure and sales are important factors.

# 2. 

In [72]:
# Load the dataset from the CAPM worksheet
df = pd.read_excel(r'C:\Users\slove\OneDrive - The University of Western Ontario\MFE_Python\MFEcodonomics\Data\capm.xls')

In [67]:
start_date = 1990.01
end_date = 1994.12

fav = df[(df[' DATE '] >= start_date) & (df[' DATE '] <= end_date)]
fav = fav.iloc[:,[0, 1, 2, 5]] #using IBM as selected stock

#construct excess returns
fav['excess_IBM'] = fav['IBM'] - fav['Risk Free Proxy (30 day T-Bills)']
fav['excess_mkt'] = fav['Market Return Proxy (Value Weighted NYSE,NASDAQ,AMEX)'] - fav['Risk Free Proxy (30 day T-Bills)']

In [71]:
#CAPM
Y = fav['excess_IBM']
X = fav['excess_mkt']

X = sm.add_constant(X) # add the alpha (intercept)

model = sm.OLS(Y, X)
results = model.fit()
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:             excess_IBM   R-squared:                       0.044
Model:                            OLS   Adj. R-squared:                  0.027
Method:                 Least Squares   F-statistic:                     2.665
Date:                Tue, 01 Oct 2024   Prob (F-statistic):              0.108
Time:                        23:32:27   Log-Likelihood:                 70.207
No. Observations:                  60   AIC:                            -136.4
Df Residuals:                      58   BIC:                            -132.2
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.0033      0.010     -0.336      0.7

**b.**
- **β = 0.4514** 
- Suggests IBM's excess returns moves less than the markets, about 45.1% as much. 
- But p-value = 0.108 implies this relationship may not be statistically significant (at 5% level). 
- 95% CI = [-0.102, 1.005], which includes zero, reinforcing lack of significance.

**c.** When market excess return = 0, excess return for IBM is **α = -0.0033**
- With p-value = 0.738 (not statistically significant), and **95% CI = [-0.023, 0.017]**. 
- Since this interval includes zero, we cannot be confident that IBM's excess return differs from zero when the market’s excess return is zero.

**d.** The **R-squared = 0.044**, meaning only 4.4% of the variation in IBM’s excess returns is explained by the market's excess returns, indicating that the majority of the variation is due to other factors.