<a href="https://colab.research.google.com/github/prof-rossetti/intro-to-python/blob/main/notebooks/applied-ds/Applied_Statistics_for_Finance_Calculating_Beta_to_the_Market_(Summer_2023).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, we will attempt to calculate a given stock's beta to the market, as one way of assessing the risk of that stock.

We will fetch some data about a number of stocks and the market.

While we have the stock and market data, we will illustrate the previously introduced statistical concept of correlation, just for fun, as we see which stocks may be positively or negatively correlated with eachother.

Then we will cover new statistical concpets of variance and covariance as we use these in our calculations for beta.

## Understanding Beta

https://www.investopedia.com/ask/answers/070615/what-formula-calculating-beta.asp


> Beta is a measure used in fundamental analysis to determine the volatility of an asset or portfolio in relation to the overall market. The overall market has a beta of 1.0, and individual stocks are ranked according to how much they deviate from the market.

> A stock that swings more than the market over time has a beta greater than 1.0. If a stock moves less than the market, the stock's beta is less than 1.0. High-beta stocks tend to be riskier but provide the potential for higher returns. Low-beta stocks pose less risk but typically yield lower returns.

> As a result, beta is often used as a risk-reward measure, meaning it helps investors determine how much risk they are willing to take to achieve the return for taking on that risk. A stock's price variability is important to consider when assessing risk. If you think of risk as the possibility of a stock losing its value, beta is useful as a proxy for risk.



> To calculate the beta of a security, the covariance between the return of the security and the return of the market must be known, as well as the variance of the market returns.



\begin{align}
        Beta = \frac{Covariance} {Variance}
\end{align}

> **Covariance** measures how two stocks move together. A positive covariance means the stocks tend to move together when their prices go up or down. A negative covariance means the stocks move opposite of each other.

> **Variance**, on the other hand, refers to how far a stock moves relative to its mean. For example, variance is used in measuring the volatility of an individual stock's price over time. Covariance is used to measure the correlation in price moves of two different stocks.

> The formula for calculating beta is the covariance of the return of an asset with the return of the benchmark, divided by the variance of the return of the benchmark over a certain period.

OK, so let's fetch some data for a handful of stocks, and then use it to illustrate statistical concepts of variance and covariance, and then we will calculate beta to the market for one or moe of the stocks.

## Fetching Stock and Market Data

Installing packages:

In [116]:
# setup cell (run and leave as is)

%%capture

!pip install yahooquery

Getting stock (and market) prices data:

In [117]:
# setup cell (run and leave as is)

# https://yahooquery.dpguthrie.com/guide/ticker/intro/
from yahooquery import Ticker

symbols = ["AAPL", "GOOGL", "META", "MSFT", "NFLX", "AMZN", "NVDA",
           "BAC", "JPM"
]
all_symbols = symbols + ["SPY"] # adding market index, but leaving the symbols variable as just a list of the individual stocks (in case we need it later)
companies = Ticker(all_symbols)
print(type(companies))

<class 'yahooquery.ticker.Ticker'>


In [118]:
from pandas import to_datetime

histories_df = companies.history()
histories_df["symbol"] = histories_df.index.get_level_values(0)
histories_df["date"] = to_datetime(histories_df.index.get_level_values(1)).date
histories_df.reset_index(drop=True, inplace=True)
print(len(histories_df)) #> 1053 rows
histories_df[["date", "symbol", "adjclose"]].head()

1220


Unnamed: 0,date,symbol,adjclose
0,2023-01-03,AAPL,124.706833
1,2023-01-04,AAPL,125.993095
2,2023-01-05,AAPL,124.656975
3,2023-01-06,AAPL,129.243622
4,2023-01-09,AAPL,129.772079


In [119]:
# quick check for null values (because some stocks may have different history lengths)
histories_df["adjclose"].isnull().sum()  #> 0 ok looks good. can proceed without concern for nulls

0

In [120]:
prices_pivot = histories_df.pivot(columns="symbol", values="adjclose", index="date")
prices_pivot.head()

symbol,AAPL,AMZN,BAC,GOOGL,JPM,META,MSFT,NFLX,NVDA,SPY
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2023-01-03,124.706833,85.82,33.030487,89.120003,133.084793,124.739998,238.460144,294.950012,143.11087,377.96814
2023-01-04,125.993095,85.139999,33.65147,88.080002,134.325806,127.370003,228.029129,309.410004,147.449707,380.886139
2023-01-05,124.656975,83.120003,33.582474,86.199997,134.296051,126.940002,221.270844,309.700012,142.611023,376.53891
2023-01-06,129.243622,86.080002,33.91761,87.339996,136.865875,130.020004,223.878601,315.549988,148.549393,385.173737
2023-01-09,129.772079,87.360001,33.405048,88.019997,136.300308,129.470001,226.05838,315.170013,156.237289,384.955383


In [121]:
print(len(prices_pivot))

earliest_date = prices_pivot.index.min()
print(earliest_date)

latest_date = prices_pivot.index.max()
print(latest_date)

122
2023-01-03
2023-06-28


#### Percent Change / Stock Returns

Let's do some plotting of growth rates, just so we can get a better sense of the data. Also, we will use these returns as the basis for our beta calculations later.

In [122]:
import warnings
warnings.filterwarnings("ignore")

returns_df = prices_pivot.copy()
for symbol in all_symbols:
    growth_col = f"{symbol}_growth"
    # calculate relative growth :
    returns_df[growth_col] = (returns_df[symbol].pct_change(periods=1) + 1).cumprod()
    returns_df[growth_col].iloc[0] = 1

print(len(returns_df.columns))
growth_cols = [f"{symbol}_growth" for symbol in all_symbols]
returns_df = returns_df[growth_cols]
print(len(returns_df.columns))
returns_df.head()

20
10


symbol,AAPL_growth,GOOGL_growth,META_growth,MSFT_growth,NFLX_growth,AMZN_growth,NVDA_growth,BAC_growth,JPM_growth,SPY_growth
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2023-01-03,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
2023-01-04,1.010314,0.98833,1.021084,0.956257,1.049025,0.992076,1.030318,1.0188,1.009325,1.00772
2023-01-05,0.9996,0.967235,1.017637,0.927915,1.050008,0.968539,0.996507,1.016711,1.009101,0.996219
2023-01-06,1.03638,0.980027,1.042328,0.938851,1.069842,1.00303,1.038002,1.026858,1.028411,1.019064
2023-01-09,1.040617,0.987657,1.037919,0.947992,1.068554,1.017945,1.091722,1.01134,1.024161,1.018486


In [123]:
import plotly.express as px

chart_df = returns_df.copy()
chart_df["date"] = chart_df.index
px.line(chart_df, x="date", y=growth_cols, title=f"Stock Returns vs Market", height=350)

## Correlation

We have previously covered correlation. While we have the stock and market data, let's take a brief detour to measure correlation between stocks.

In [124]:
cor_mat = returns_df.corr(method="spearman")
cor_mat

symbol,AAPL_growth,GOOGL_growth,META_growth,MSFT_growth,NFLX_growth,AMZN_growth,NVDA_growth,BAC_growth,JPM_growth,SPY_growth
symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
AAPL_growth,1.0,0.9191,0.985335,0.988801,0.555798,0.904592,0.97693,-0.636469,0.155373,0.874916
GOOGL_growth,0.9191,1.0,0.925213,0.93241,0.663906,0.934556,0.887461,-0.560418,0.11028,0.878899
META_growth,0.985335,0.925213,1.0,0.98475,0.531874,0.890012,0.978014,-0.648362,0.120055,0.855509
MSFT_growth,0.988801,0.93241,0.98475,1.0,0.574032,0.908937,0.974194,-0.638799,0.13129,0.874063
NFLX_growth,0.555798,0.663906,0.531874,0.574032,1.0,0.736172,0.494615,0.041104,0.435214,0.785201
AMZN_growth,0.904592,0.934556,0.890012,0.908937,0.736172,1.0,0.857459,-0.445023,0.247772,0.921943
NVDA_growth,0.97693,0.887461,0.978014,0.974194,0.494615,0.857459,1.0,-0.679644,0.108621,0.817469
BAC_growth,-0.636469,-0.560418,-0.648362,-0.638799,0.041104,-0.445023,-0.679644,1.0,0.555462,-0.257554
JPM_growth,0.155373,0.11028,0.120055,0.13129,0.435214,0.247772,0.108621,0.555462,1.0,0.458944
SPY_growth,0.874916,0.878899,0.855509,0.874063,0.785201,0.921943,0.817469,-0.257554,0.458944,1.0


In [125]:
# https://plotly.com/python/heatmaps/
# https://plotly.com/python-api-reference/generated/plotly.express.imshow.html
import plotly.express as px

title = f"Spearman Correlation between Stock Prices (from {earliest_date} to {latest_date})"
fig = px.imshow(cor_mat,
          height=750, # title=title,
          text_auto= ".2f", #True,
          color_continuous_scale="Greens",
          color_continuous_midpoint=0
)
fig.update_layout(title={'text': title, 'x':0.485, 'xanchor': 'center'}) # https://stackoverflow.com/questions/64571789/center-plotly-title-by-default
fig.show()

What can we learn from the corrleation matrix?

Which pair or pairs of companies are most and least correlated with eachother?

Which companies are most and least correlated with the market?

If you own AAPL, which company should you consider buying if you want to hedge your risk?

## Calculating Beta

OK, correlation is fun to look at, but it is not a component of our beta calculation, so let's return our focus to calculating beta.

We saw from the "Understanding Beta" section that we need to calculcate the variance of the market, as well as the covariance of each stock with respect to the market.

Luckily pandas makes this easy.

### Variance

https://www.investopedia.com/terms/v/variance.asp

<img src="https://www.investopedia.com/thmb/_hIorwcVnDj-oKWhpTu_qnuUldM=/750x0/filters:no_upscale():max_bytes(150000):strip_icc():format(webp)/Variance-TAERM-ADD-Source-464952914f77460a8139dbf20e14f0c0.jpg" height=300>

> FYI: standard deviation is the square root of the variance!

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.var.html



In [126]:
returns_df.var()

symbol
AAPL_growth     0.016376
GOOGL_growth    0.017813
META_growth     0.133469
MSFT_growth     0.020710
NFLX_growth     0.016227
AMZN_growth     0.018530
NVDA_growth     0.282052
BAC_growth      0.008243
JPM_growth      0.001358
SPY_growth      0.001471
dtype: float64

In [127]:
returns_df.std() ** 2 # squaring the standard deviation, for comparison

symbol
AAPL_growth     0.016376
GOOGL_growth    0.017813
META_growth     0.133469
MSFT_growth     0.020710
NFLX_growth     0.016227
AMZN_growth     0.018530
NVDA_growth     0.282052
BAC_growth      0.008243
JPM_growth      0.001358
SPY_growth      0.001471
dtype: float64

### Covariance

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.cov.html

> Computes the pairwise covariance among the series of a DataFrame. The returned data frame is the covariance matrix of the columns of the DataFrame.

> This method is generally used for the analysis of time series data to understand the relationship between different measures across time.

In [128]:
cov_mat = returns_df.cov()
cov_mat

symbol,AAPL_growth,GOOGL_growth,META_growth,MSFT_growth,NFLX_growth,AMZN_growth,NVDA_growth,BAC_growth,JPM_growth,SPY_growth
symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
AAPL_growth,0.016376,0.015481,0.046072,0.017962,0.010605,0.015566,0.064312,-0.008371,0.000645,0.004318
GOOGL_growth,0.015481,0.017813,0.044506,0.018112,0.012545,0.017136,0.063651,-0.007713,0.00075,0.004404
META_growth,0.046072,0.044506,0.133469,0.051639,0.029635,0.044273,0.186153,-0.024576,0.001575,0.011994
MSFT_growth,0.017962,0.018112,0.051639,0.02071,0.012416,0.017975,0.072677,-0.009486,0.000743,0.004829
NFLX_growth,0.010605,0.012545,0.029635,0.012416,0.016227,0.014904,0.050749,-0.001329,0.002154,0.004276
AMZN_growth,0.015566,0.017136,0.044273,0.017975,0.014904,0.01853,0.066642,-0.006172,0.001581,0.004868
NVDA_growth,0.064312,0.063651,0.186153,0.072677,0.050749,0.066642,0.282052,-0.031563,0.00337,0.017812
BAC_growth,-0.008371,-0.007713,-0.024576,-0.009486,-0.001329,-0.006172,-0.031563,0.008243,0.001546,-0.001288
JPM_growth,0.000645,0.00075,0.001575,0.000743,0.002154,0.001581,0.00337,0.001546,0.001358,0.000639
SPY_growth,0.004318,0.004404,0.011994,0.004829,0.004276,0.004868,0.017812,-0.001288,0.000639,0.001471


If we want to calculate the covariance of "this with respect to that", we can access the specific value from this matrix. For example, the covariance of NFLX with respect to the market:

In [129]:
# if we have well defined index and columns, we can use the loc method and specify the name of the row, then the name of the column
# ... df.loc[row_name, col_name]

cov_mat.loc["NFLX_growth", "SPY_growth"]

0.004276437943388122

### Beta

UPDATE: using the **returns** to calculate beta:

In [130]:
# calculating beta to market for a given company:
symbol = "NFLX"

# get covariance between this stock and the market
cov_mat = returns_df.cov()
cov = cov_mat.loc[symbol + "_growth", "SPY_growth"] # using loc method to access a given [row, col] combo
print(f"COVARIANCE OF {symbol} WITH RESPECT TO THE MARKET:", cov)

COVARIANCE OF NFLX WITH RESPECT TO THE MARKET: 0.004276437943388122


In [131]:
var = returns_df["SPY_growth"].var()
print(f"VARIANCE OF THE MARKET:", var)

VARIANCE OF THE MARKET: 0.0014707944617810104


In [132]:
beta = cov / var
print(f"BETA OF {symbol} WITH RESPECT TO THE MARKET:", beta)

BETA OF NFLX WITH RESPECT TO THE MARKET: 2.9075700612917115


https://www.investopedia.com/investing/beta-gauging-price-fluctuations/

How can we interpret this beta value? What does it tell us about the company's stock, and the risk involved?