<a href="https://colab.research.google.com/github/prof-rossetti/intro-to-python/blob/main/notebooks/applied-ds/Applied_Statistics_for_Finance_Calculating_Beta_to_the_Market_(Summer_2023).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, we will attempt to calculate a given stock's beta to the market, as one way of assessing the risk of that stock.

We will fetch some data about a number of stocks and the market.

While we have the stock and market data, we will illustrate the previously introduced statistical concept of correlation, just for fun, as we see which stocks may be positively or negatively correlated with eachother.

Then we will cover new statistical concpets of variance and covariance as we use these in our calculations for beta.

## Understanding Beta

https://www.investopedia.com/ask/answers/070615/what-formula-calculating-beta.asp


> Beta is a measure used in fundamental analysis to determine the volatility of an asset or portfolio in relation to the overall market. The overall market has a beta of 1.0, and individual stocks are ranked according to how much they deviate from the market.

> A stock that swings more than the market over time has a beta greater than 1.0. If a stock moves less than the market, the stock's beta is less than 1.0. High-beta stocks tend to be riskier but provide the potential for higher returns. Low-beta stocks pose less risk but typically yield lower returns.

> As a result, beta is often used as a risk-reward measure, meaning it helps investors determine how much risk they are willing to take to achieve the return for taking on that risk. A stock's price variability is important to consider when assessing risk. If you think of risk as the possibility of a stock losing its value, beta is useful as a proxy for risk.



> To calculate the beta of a security, the covariance between the return of the security and the return of the market must be known, as well as the variance of the market returns.



\begin{align}
        Beta = \frac{Covariance} {Variance}
\end{align}

> **Covariance** measures how two stocks move together. A positive covariance means the stocks tend to move together when their prices go up or down. A negative covariance means the stocks move opposite of each other.

> **Variance**, on the other hand, refers to how far a stock moves relative to its mean. For example, variance is used in measuring the volatility of an individual stock's price over time. Covariance is used to measure the correlation in price moves of two different stocks.

> The formula for calculating beta is the covariance of the return of an asset with the return of the benchmark, divided by the variance of the return of the benchmark over a certain period.

OK, so let's fetch some data for a handful of stocks, and then use it to illustrate statistical concepts of variance and covariance, and then we will calculate beta to the market for one or moe of the stocks.

## Fetching Stock and Market Data

Installing packages:

In [None]:
# setup cell (run and leave as is)

%%capture

!pip install yahooquery

Getting stock (and market) prices data:

In [None]:
# setup cell (run and leave as is)

# https://yahooquery.dpguthrie.com/guide/ticker/intro/
from yahooquery import Ticker

symbols = ["AAPL", "GOOGL", "META", "MSFT", "NFLX", "AMZN", "NVDA",
           "BAC", "JPM"
]
companies = Ticker(symbols + ["SPY"]) # adding market to the data, but leaving the symbols variable as just a list of the individual stocks (for reference later on)
print(type(companies))

<class 'yahooquery.ticker.Ticker'>


In [None]:
from pandas import to_datetime

histories_df = companies.history()
histories_df["symbol"] = histories_df.index.get_level_values(0)
histories_df["date"] = to_datetime(histories_df.index.get_level_values(1)).date
histories_df.reset_index(drop=True, inplace=True)
print(len(histories_df)) #> 1053 rows
histories_df[["date", "symbol", "adjclose"]].head()

1220


Unnamed: 0,date,symbol,adjclose
0,2023-01-02,AAPL,124.706833
1,2023-01-03,AAPL,125.993095
2,2023-01-04,AAPL,124.656982
3,2023-01-05,AAPL,129.243622
4,2023-01-08,AAPL,129.772079


In [None]:
# quick check for null values (because some stocks may have different history lengths)
histories_df["adjclose"].isnull().sum()  #> 0 ok looks good. can proceed without concern for nulls

0

In [None]:
prices_pivot = histories_df.pivot(columns="symbol", values="adjclose", index="date")
print(len(prices_pivot))
prices_pivot.head()

122


symbol,AAPL,AMZN,BAC,GOOGL,JPM,META,MSFT,NFLX,NVDA,SPY
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2023-01-02,124.706833,85.82,33.030487,89.120003,133.084778,124.739998,238.460129,294.950012,143.11087,377.96814
2023-01-03,125.993095,85.139999,33.65147,88.080002,134.325806,127.370003,228.029114,309.410004,147.449707,380.886139
2023-01-04,124.656982,83.120003,33.582474,86.199997,134.296051,126.940002,221.270859,309.700012,142.611023,376.53891
2023-01-05,129.243622,86.080002,33.917606,87.339996,136.865875,130.020004,223.878616,315.549988,148.549393,385.173737
2023-01-08,129.772079,87.360001,33.405048,88.019997,136.300308,129.470001,226.058365,315.170013,156.237289,384.955383


In [None]:
earliest_date = prices_pivot.index.min()
print(earliest_date)

latest_date = prices_pivot.index.max()
print(latest_date)

2023-01-02
2023-06-28


Let's take a quick detour to do some plotting of growth rates, just so we can get a better sense of the data:

In [None]:
import plotly.express as px
import warnings
warnings.filterwarnings("ignore")

chart_symbol = "NVDA"  #@param ["AAPL", "GOOGL", "META", "MSFT", "NFLX", "AMZN", "NVDA", "BAC", "JPM"]

chart_df = prices_pivot.copy()
chart_df["date"] = chart_df.index
# calculate relative growth for charting:
chart_df["SPY_growth"] = (chart_df["SPY"].pct_change(periods=1) + 1).cumprod()
chart_df["SPY_growth"].iloc[0] = 1
chart_df[f"{chart_symbol}_growth"] = (chart_df[chart_symbol].pct_change(periods=1) + 1).cumprod()
chart_df[f"{chart_symbol}_growth"].iloc[0] = 1

px.line(chart_df, x="date", y=[f"{chart_symbol}_growth", "SPY_growth"],
        title=f"Stock Performance ({chart_symbol} vs Market)", height=350
)

## Correlation

We have previously covered correlation. While we have the stock and market data, let's take a brief detour to measure correlation between stocks.

In [None]:
cor_mat = prices_pivot.corr(method="spearman")
cor_mat

symbol,AAPL,AMZN,BAC,GOOGL,JPM,META,MSFT,NFLX,NVDA,SPY
symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
AAPL,1.0,0.904671,-0.636262,0.918896,0.152092,0.985341,0.98876,0.555667,0.976935,0.874825
AMZN,0.904671,1.0,-0.44499,0.93426,0.244937,0.89002,0.908816,0.736183,0.857444,0.921919
BAC,-0.636262,-0.44499,1.0,-0.560445,0.557958,-0.648159,-0.638454,0.041643,-0.679392,-0.257021
GOOGL,0.918896,0.93426,-0.560445,1.0,0.107742,0.925069,0.932449,0.663482,0.887338,0.878677
JPM,0.152092,0.244937,0.557958,0.107742,1.0,0.117045,0.128449,0.433191,0.105667,0.456754
META,0.985341,0.89002,-0.648159,0.925069,0.117045,1.0,0.984685,0.531746,0.977969,0.855487
MSFT,0.98876,0.908816,-0.638454,0.932449,0.128449,0.984685,1.0,0.57396,0.974154,0.873997
NFLX,0.555667,0.736183,0.041643,0.663482,0.433191,0.531746,0.57396,1.0,0.494615,0.785188
NVDA,0.976935,0.857444,-0.679392,0.887338,0.105667,0.977969,0.974154,0.494615,1.0,0.817509
SPY,0.874825,0.921919,-0.257021,0.878677,0.456754,0.855487,0.873997,0.785188,0.817509,1.0


In [None]:
# https://plotly.com/python/heatmaps/
# https://plotly.com/python-api-reference/generated/plotly.express.imshow.html
import plotly.express as px

title = f"Spearman Correlation between Stock Prices (from {earliest_date} to {latest_date})"
fig = px.imshow(cor_mat,
          height=750, # title=title,
          text_auto= ".2f", #True,
          color_continuous_scale="Greens",
          color_continuous_midpoint=0
)
fig.update_layout(title={'text': title, 'x':0.485, 'xanchor': 'center'}) # https://stackoverflow.com/questions/64571789/center-plotly-title-by-default
fig.show()

What can we learn from the corrleation matrix?

Which pair or pairs of companies are most and least correlated with eachother?

Which companies are most and least correlated with the market?

If you own AAPL, which company should you consider buying if you want to hedge your risk?

## Calculating Beta

OK, correlation is fun to look at, but it is not a component of our beta calculation, so let's return our focus to calculating beta.

We saw from the "Understanding Beta" section that we need to calculcate the variance for each stock, as well as the covariance of that stock with the market.

Luckily pandas makes this easy.

### Variance

https://www.investopedia.com/terms/v/variance.asp

<img src="https://www.investopedia.com/thmb/_hIorwcVnDj-oKWhpTu_qnuUldM=/750x0/filters:no_upscale():max_bytes(150000):strip_icc():format(webp)/Variance-TAERM-ADD-Source-464952914f77460a8139dbf20e14f0c0.jpg" height=300>

> FYI: standard deviation is the square root of the variance!

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.var.html



In [None]:
prices_pivot.var()

symbol
AAPL      254.371481
AMZN      136.354841
BAC         8.995601
GOOGL     141.279839
JPM        24.041445
META     2076.425020
MSFT     1177.059833
NFLX     1414.566404
NVDA     5774.936403
SPY       209.574939
dtype: float64

In [None]:
prices_pivot.std() ** 2 # squaring the standard deviation, for comparison

symbol
AAPL      254.371481
AMZN      136.354841
BAC         8.995601
GOOGL     141.279839
JPM        24.041445
META     2076.425020
MSFT     1177.059833
NFLX     1414.566404
NVDA     5774.936403
SPY       209.574939
dtype: float64

### Covariance

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.cov.html

> Computes the pairwise covariance among the series of a DataFrame. The returned data frame is the covariance matrix of the columns of the DataFrame.

> This method is generally used for the analysis of time series data to understand the relationship between different measures across time.

In [None]:
cov_mat = prices_pivot.cov()
cov_mat

symbol,AAPL,AMZN,BAC,GOOGL,JPM,META,MSFT,NFLX,NVDA,SPY
symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
AAPL,254.371481,166.394719,-34.483388,171.785475,10.642405,716.21761,533.701361,390.151709,1146.868856,203.096574
AMZN,166.394719,136.354841,-17.500936,130.866676,18.015394,473.702598,367.591126,377.463782,817.991156,157.621207
BAC,-34.483388,-17.500936,8.995601,-22.695658,6.797702,-101.288621,-74.727043,-13.030637,-149.249121,-16.064827
GOOGL,171.785475,130.866676,-22.695658,141.279839,8.855551,494.215359,384.4683,329.448733,810.783116,148.018352
JPM,10.642405,18.015394,6.797702,8.855551,24.041445,26.017884,23.480395,84.450181,63.944523,32.08895
META,716.21761,473.702598,-101.288621,494.215359,26.017884,2076.42502,1535.483194,1091.509423,3322.337092,564.683115
MSFT,533.701361,367.591126,-74.727043,384.4683,23.480395,1535.483194,1177.059833,873.744288,2479.115921,434.609817
NFLX,390.151709,377.463782,-13.030637,329.448733,84.450181,1091.509423,873.744288,1414.566404,2144.052626,476.478183
NVDA,1146.868856,817.991156,-149.249121,810.783116,63.944523,3322.337092,2479.115921,2144.052626,5774.936403,962.02462
SPY,203.096574,157.621207,-16.064827,148.018352,32.08895,564.683115,434.609817,476.478183,962.02462,209.574939


If we want to calculate the covariance of "this with respect to that", we can access the specific value from this matrix. For example, the covariance of NFLX with respect to the market:

In [None]:
# if we have well defined index and columns, we can use the loc method and specify the name of the row, then the name of the column
# ... df.loc[row_name, col_name]

cov_mat.loc["NFLX", "SPY"]

476.47818291686275

### Beta

In [None]:
# calculating beta to market for a given company:
symbol = "NVDA"

# get covariance between this stock and the market
cov = cov_mat.loc[symbol, "SPY"] # using loc method to access a given [row, col] combo
print(f"COVARIANCE OF {symbol} WITH RESPECT TO THE MARKET:", cov)

COVARIANCE OF NVDA WITH RESPECT TO THE MARKET: 962.0246198136847


In [None]:
var = prices_pivot["SPY"].var()
print(f"VARIANCE OF {symbol}:", var)

VARIANCE OF NVDA: 209.57493866485083


In [None]:
beta = cov / var
print(f"BETA OF {symbol} WITH RESPECT TO THE MARKET:", beta)

BETA OF NVDA WITH RESPECT TO THE MARKET: 4.590360975136155


How can we interpret this beta value? What does it tell us about the company's stock, and the risk involved?