<a href="https://colab.research.google.com/github/faijurrahman/Quant-Finance-Projects/blob/master/variance-and-measures-of-dispersion.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Variance & Measures of Dispersion

Dispersion measures how spread out our dataset is. One way risk is measured is by how spread out historically returns have been. Returns very tight around a central value give us less reason to worry. 

In [2]:
!pip install yfinance
import yfinance as yf
import requests
import datetime
import numpy as np
import matplotlib.pyplot as plt
from pandas_datareader import data as pdr
from matplotlib import style
from math import pi
yf.pdr_override()

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting yfinance
  Downloading yfinance-0.2.3-py2.py3-none-any.whl (50 kB)
[K     |████████████████████████████████| 50 kB 3.0 MB/s 
[?25hCollecting html5lib>=1.1
  Downloading html5lib-1.1-py2.py3-none-any.whl (112 kB)
[K     |████████████████████████████████| 112 kB 8.9 MB/s 
[?25hCollecting cryptography>=3.3.2
  Downloading cryptography-38.0.4-cp36-abi3-manylinux_2_24_x86_64.whl (4.0 MB)
[K     |████████████████████████████████| 4.0 MB 57.1 MB/s 
Collecting beautifulsoup4>=4.11.1
  Downloading beautifulsoup4-4.11.1-py3-none-any.whl (128 kB)
[K     |████████████████████████████████| 128 kB 62.6 MB/s 
[?25hCollecting frozendict>=2.3.4
  Downloading frozendict-2.3.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (110 kB)
[K     |████████████████████████████████| 110 kB 63.3 MB/s 
Collecting requests>=2.26
  Downloading requests-2.28.1-py3-none-any.whl (62 kB)
[K     

In [None]:
tickers = ['BSX', 'CMCSA', 'NFLX']
# tickers = ['BSX', 'CMCSA', 'F', 'HAL', 'JNJ', 'MET', 'NFLX', 'PEP', 'DGX', 'SYK', 'UAA']
tickerset = {}
for ticker in tickers:
    # df is of type <class 'pandas.core.frame.DataFrame'>
    df = pdr.get_data_yahoo(ticker, start="2020-01-01", end="2020-06-01", interval = "1d")
    tickerset[ticker] = df['Adj Close']

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


### Initial Calculations
Mean/Range: Trivial to define.

Mean Absolute Derivation (MAD): Average of the distances of observations from the arithmetic mean.

In [None]:
sample_ticker = 'NFLX'
X = tickerset[sample_ticker]
print(sample_ticker)

mu = np.mean(X)
print('Mean: ' + str(mu))

## "peak-to-peak"
range = np.ptp(X)
print('Range: ' + str(range))

abs_dispersion = [np.abs(mu - x) for x in X]
MAD = np.sum(abs_dispersion)/len(abs_dispersion)
print ('MAD:', MAD)

NFLX
Mean: 379.5203878161977
Range: 155.35000610351562
MAD: 32.896367643609636


### Variance & Std. Deviation

Variance (&sigma;<sup>2</sup>) : The average of the squared deviations around the mean

Standard deviation: Square root of the variance

In [None]:
print('Variance of X:', np.var(X))
print('Standard deviation of X:', np.std(X))

Variance of X: 1517.5312504991396
Standard deviation of X: 38.95550346869027


### Confirming Chebyshev's Inequality

The proportion of samples within k standard deviations (that is, within a distance of  k ⋅  standard deviation) of the mean is at least  1 − 1/k<sup>2</sup>  for all  k > 1.

This bound is rarely strict, but it is useful because it holds for all data sets and distributions.

In [None]:
k = 1.25
dist = k*np.std(X)
l = [x for x in X if abs(x - mu) <= dist]
print ('Observations within', k, 'stds of mean:', len(l))
print ('Confirming that', float(len(l))/len(X), '>', 1 - 1/k**2)

Observations within 1.25 stds of mean: 78
Confirming that 0.7572815533980582 > 0.36


### Semivariance & Semideviation

Variance and standard deviation tell us how volatile a quantity is, but do not differentiate between deviations upward and deviations downward. 

In the case of returns on an asset, we are more worried about deviations downward. This is addressed by semivariance and semideviation, which only count the observations that fall below the mean.

In [None]:
# No built-in semideviation
lows = [e for e in X if e <= mu]

semivar = np.sum( (lows - mu) ** 2 ) / len(lows)

print('Semivariance of X:', semivar)
print('Semideviation of X:', np.sqrt(semivar))

Semivariance of X: 1165.010429549552
Semideviation of X: 34.132249113551715


#### A Related Notion

Target semivariance (and target semideviation)... used to average the distance from a target of values which fall below that target:

In [None]:
B = 330
lows_B = [e for e in X if e <= B]
semivar_B = sum(map(lambda x: (x - B)**2,lows_B))/len(lows_B)

print('Target semivariance of X:', semivar_B)
print('Target semideviation of X:', np.sqrt(semivar_B))

Target semivariance of X: 192.30504586489405
Target semideviation of X: 13.867409486450383


### Closing

These computations will give sample statistics, that is, standard deviation of a sample of data. Whether or not this reflects the current true population standard deviation is not always obvious, and more effort has to be put into determining that. This can be problematic in finance as all data are time series, and the mean and variance may change over time. Different techniques and subtleties can be implemented to counter this.