### What is risk?
Variability is the best measure of risk we have. Imagine you invest $1000 in a stock that earns ON AVERAGE 15%/year. How did that 15% come about? Was it:
- +14%, +16%, +13%, +17%
<br><br>OR<br><br>
- +50%, -40%, -40%, +50%
<br><br>
Notice that those both yield an average of 15%, but the way each stock got there was vastly different. In the first case, your money will earn a stable amount over time. In the second, there's large variability from year to year.

### Statistical Measures to Quantify Risk

- Variance (S<sup>2</sup>) - Measures the dispersion of a set of data points around the mean (X bar is the mean) <br>
<img src='assets/variance_equation.png'>
- Standard Devation (S) - The square root of variance, it measures the average distance of data points from the mean

In [1]:
import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib as plt

In [2]:
tickers = ['PG', 'BEI.DE']

security_data = pd.DataFrame()

for t in tickers:
    security_data[t] = yf.download(t, start='2007-1-1', auto_adjust=False)['Adj Close']

security_data.tail()

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,PG,BEI.DE
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2025-06-17,158.520004,109.699997
2025-06-18,158.020004,107.349998
2025-06-20,159.080002,106.5
2025-06-23,161.029999,107.25
2025-06-24,160.360001,108.099998


In [3]:
# We take the log return because we examine each company separately in the timeframe
# This approach will tell us more about the independant behavior of each stock
security_returns = np.log(security_data / security_data.shift(1))
security_returns

Unnamed: 0_level_0,PG,BEI.DE
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2007-01-03,,
2007-01-04,-0.007621,0.006544
2007-01-05,-0.008624,-0.020772
2007-01-08,0.002202,0.000202
2007-01-09,-0.002517,-0.022858
...,...,...
2025-06-17,-0.014778,-0.007719
2025-06-18,-0.003159,-0.021655
2025-06-20,0.006686,-0.007950
2025-06-23,0.012183,0.007018


### PG

In [4]:
print(f'PG average daily return: {security_returns['PG'].mean()}')
print(f'PG average annual return: {security_returns['PG'].mean() * 250}')
print(f'PG daily standard deviation of return: {security_returns['PG'].std()}')
print(f'PG annual standard deviation return: {security_returns['PG'].std() * 250 ** 0.5}')

PG average daily return: 0.00030902461645240906
PG average annual return: 0.07725615411310227
PG daily standard deviation of return: 0.011673526898691531
PG annual standard deviation return: 0.18457466663553446


### Beiersdorf

In [5]:
print(f'BEI.DE average daily return: {security_returns['BEI.DE'].mean()}')
print(f'BEI.DE average annual return: {security_returns['BEI.DE'].mean() * 250}')
print(f'BEI.DE daily standard deviation of return: {security_returns['BEI.DE'].std()}')
print(f'BEI.DE annual standard deviation return: {security_returns['BEI.DE'].std() * 250 ** 0.5}')

BEI.DE average daily return: 0.0001954908431880639
BEI.DE average annual return: 0.048872710797015974
BEI.DE daily standard deviation of return: 0.013483191875342354
BEI.DE annual standard deviation return: 0.2131879822757946


### Benefits of Portfolio Diversification
There is a relationship between the prices of companies. While share prices are influenced by the state of the economy, the state of the economy impacts different industries in different ways. For instance, during a recession, it's easier to wait for a new car than wait to buy groceries, so Ford might be hit harder than Walmart.

### Measuring the relationship between stocks
Covariance measures the direction and strength of a linear relationship between two cariables in their original units
<br><img src='assets/covariance.png'><br>
<br>
Correlation coefficient is a standardized version of covariance that scales the value to a range between -1 and +1 (hence the std dev of x and y on the bottom)
<br><img src='assets/correlation_coefficient.png'><br>

A perfect correlation (+1) would imply house prices are directly proportionalte to house size, or that population density is directly proportionate to socioeconomic status, etc. A negative correlation (-1) would imply INVERSE proportionality - i.e house prices go up as the size of the house goes down (yes, that would be weird). No correlation (0) would imply the variables are absolutely independent of each other, such as the price of coffee in brazil vs the price of a house in london.

<br>

A Covariance Matrix is a representation of the way two or more variables relate to each other. The covariance of a variable with itself is just its variance - ie. cov(x,x) = var(x). When looking at a Covariance Matrix, the diagonal line from top-left to bottom-right is just the variances of each variable while all other values are covariances
<br><br>
<img src='assets/covariance_matrix.png'>

### Calculating Covariance and Correlation

In [8]:
# THANKFULLY pd.DataFrame.var() GETS THE VARIANCE FOR US! GOD BLESS AMERICA
pg_variance = security_returns['PG'].var()
pg_variance

np.float64(0.0001362712302544747)

In [9]:
bei_variance = security_returns['BEI.DE'].var()
bei_variance

np.float64(0.0001817964631472981)

In [10]:
pg_annual_variance = pg_variance * 250
pg_annual_variance

np.float64(0.03406780756361868)

In [11]:
bei_annual_variance = bei_variance * 250
bei_annual_variance

np.float64(0.04544911578682452)

In [12]:
# HOLY SHIT! pd.DataFrame.cov() COMPUTES PAIRWISE COVARIANCE OF COLUMNS! WE ARE SO BLESSED
cov_matrix = security_returns.cov()
cov_matrix

Unnamed: 0,PG,BEI.DE
PG,0.000136,4.1e-05
BEI.DE,4.1e-05,0.000182


In [13]:
annual_cov_matrix = cov_matrix * 250
annual_cov_matrix

Unnamed: 0,PG,BEI.DE
PG,0.034068,0.010324
BEI.DE,0.010324,0.045449


Note how the top-left and bottom-right are just the variances since it's the product of PG's cov with PG's cov and the product of Cov(BEI)xCov(BEI) (which are both just equal to Var(PG) and Var(BEI) respectively. Meanwhile the other two cells contain the actual covariance between the variables)

In [14]:
# AND IF THAT WASN'T ENOUGH, .corr() COMPUTES PAIRWISE CORRELATIONS OF COLUMNS
corr_matrix = security_returns.corr()
corr_matrix

Unnamed: 0,PG,BEI.DE
PG,1.0,0.261842
BEI.DE,0.261842,1.0


Note that the correlation matrix shown above displays the correlation between the two asset's RETURNS not their PRICES. The difference between the two is:
- corr(prices): focuses on stock price levels
- corr(returns) {aka the correlation of the RATES of returns}: reflects the dependence between prices at different times and focuses on the returns of your portfolio

<br>

Also: don't annualize the correlation table as it does not contain avg daily values, but rather the relationship between the two variables

### Portfolio Variance (The risk of multiple securities)
Remember from algebra that: (a + b)<sup>2</sup> = a<sup>2</sup> + 2ab + b<sup>2</sup>
<br>
Well, calculating portfolio variance (with 2 stocks) is similar: