Download / extract data for a stock of your choice for a 10 year duration. Next, download market data relevant for your stock. If your stock is listed in:
- The US, download data for the S&P500 or other US stock market index.
- The UK, download data for the FTSE100, FTSE250, or other UK stock market index.
- Japan, download data for the NIKKEI.
- India, download data for the Nifty or SENSEX.
- Any other country, download data for your national level stock market index.

Importantly, make sure that the dates for your stock data and the market data are exactly the same. Delete any observation where you don't have data for both.

# Questions for this assignment #

### 1. Create a function which calculates the Beta of a stock given a dataframe object as an input parameter. Your function should NOT use NumPy's .var() or .cov() methods. Instead, it should estimate the Beta manually (i.e. applying the formula for the Beta from scratch.)

In [1]:
# import libraries
import pandas as pd
import numpy as np

In [2]:
#import data
df = pd.read_csv('https://raw.githubusercontent.com/kpace1111/portfoliomanagement/main/amzn_sp500_price10y.csv')
df.set_index('date', inplace=True)

In [3]:
#calculate returns for both amzn and sp500 in one go
returns_df = df.pct_change(1)
#remove NaN
returns_df.dropna(inplace=True)
#change column names
new_col_names = ['r_amzn', 'r_sp500']
returns_df.columns = new_col_names


In [4]:
#calculate Variance
returns_df['deviations'] = returns_df['r_sp500'] - returns_df['r_sp500'].mean()
returns_df['squared_deviations'] = returns_df['deviations'] **2

#need to remove NA - if not the number of observations can be overestimated since NaNs aren't return observations
sum(returns_df['squared_deviations'].dropna())
sum_squared_deviations = np.sum(returns_df['squared_deviations'])
var_sp500 = sum_squared_deviations / (len(returns_df['squared_deviations'].dropna()) - 1) #len finds n
returns_df.head()

Unnamed: 0_level_0,r_amzn,r_sp500,deviations,squared_deviations
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
11/02/2012,0.000861,-0.009379,-0.009835,9.7e-05
11/05/2012,0.008606,0.002164,0.001708,3e-06
11/06/2012,0.013652,0.007853,0.007398,5.5e-05
11/07/2012,-0.023569,-0.023705,-0.02416,0.000584
11/08/2012,-0.019828,-0.012205,-0.01266,0.00016


In [5]:
#calculate deviations
deviations = returns_df - returns_df.mean()
#individual deviations
new_col_names = ['deviations_amzn', 'deviations_sp500', 'deviations', 'squared_deviations']
deviations.columns = new_col_names
#product of deviations
product_deviations = deviations['deviations_amzn'] * deviations['deviations_sp500'] #pandas series, not df
#cov
cov_amzn_sp500 = product_deviations.sum() / (len(product_deviations)-1)


In [6]:
#beta = cov/var
beta_amzn = cov_amzn_sp500 / var_sp500
beta_amzn

1.106048854114013

### 2. Calculate the Beta of your stock using the covariance and variance functions / methods built in to NumPy.

In [7]:
#cov
cov = np.cov(returns_df['r_sp500'], returns_df['r_amzn'])[0][1]
#variance
var_sp500 = np.var(returns_df['r_sp500'], ddof=1)
Beta = cov / var_sp500
Beta

1.1060488541140143

### 3. Estimate the Beta of your stock using an appropriate module from SciPy. You may also use other packages, for instance, StatsModels.


In [8]:
from scipy.stats import linregress

In [9]:
#y is dependent variable, x is independent, x impacts y, slope = beta
linregress(y=returns_df['r_amzn'], x=returns_df['r_sp500'])[0]

1.1060488541140128

### 4. Comment on why your Beta estimates may be different, even though you're using exactly the same dataset for all 3 preceding questions. Please think about why, even if your own Beta estimates were identical for all 3 cases.

Numpy functions defaults to the unbiased version, normalizing with ddof=none, meaning (N - 0), whereas if ddof is set to 1, it will observse n-1, rather than n.