# Mean, variance, and normal distribution

## Moments of distribution

Probability distributions have the following moments:
1. Mean (u)
2. Variance (sigma-squared) -- measure of the variablity in outcomes
3. Skewness -- measure of the tilt
4. Kurtosis -- measure of the thickness of the tails of the distribution


## Standard normal distribution
* A special case of the normal distribution when sigma (std) = 1 and mu (mean) = 0
* tend to have a skew near 0 and kurtosis near 3
* financial returns tend to have a skew > 0 and kurtosis > 3
* tend to have outliers and positive returns

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [3]:
fpath_csv = 'CSV_stockdata.csv'
StockPrices = pd.read_csv(fpath_csv, parse_dates=['Date'])
StockPrices = StockPrices.sort_values(by='Date')
StockPrices['Returns'] = StockPrices['Adj Close'].pct_change()
StockPrices.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Returns
0,2018-09-21,22.469999,22.549999,22.370001,22.379999,22.379999,335600,
1,2018-09-24,22.33,22.559999,21.889999,21.91,21.91,104900,-0.021001
2,2018-09-25,21.91,22.08,21.809999,21.85,21.85,107200,-0.002738
3,2018-09-26,21.889999,22.129999,21.67,21.790001,21.790001,170700,-0.002746
4,2018-09-27,21.700001,21.82,21.52,21.530001,21.530001,82200,-0.011932


In [4]:
# calculate the mean
np.mean(StockPrices['Returns'])

-0.005435528910180826

In [5]:
# Calculate the average annualized return assuming 252 trading days
# important: must add one before raising to the power of 252!

((1+np.mean(StockPrices['Returns']))**252)-1

-0.746778200682007

In [6]:
## Variance and Standard Deviation (volatility)
# variance = sigma-squared (std-squared)
# higher volatility = higher risk (measure dispersion of returns)

np.std(StockPrices['Returns'])

0.009600312789186207

In [7]:
# variance = std ** 2
np.std(StockPrices['Returns'])**2

9.216600565021226e-05

In [8]:
# Volatility scales with the square root of time
# multiple by the square root of the number of trading days in the year
np.std(StockPrices['Returns']) * np.sqrt(252)

0.1524002408917174

In [11]:
# Recap
# Import numpy as np
import numpy as np

# Calculate the average daily return of the stock
mean_return_daily = np.mean(StockPrices['Returns'])
print('Mean daily return: ', mean_return_daily)

# Calculate the implied annualized average return
mean_return_annualized = ((1+mean_return_daily)**252)-1
print('Mean annualized return: ', mean_return_annualized)

# Calculate the standard deviation of daily return of the stock
sigma_daily = np.std(StockPrices['Returns'])
print('Daily standard deviation (sigma): ', sigma_daily)

# Calculate the daily variance
variance_daily = sigma_daily**2
print('Daily variance (sigma-squared): ', variance_daily)

# Annualize the standard deviation
sigma_annualized = sigma_daily*np.sqrt(252)
print('Annualized deviation: ', sigma_annualized)

# Calculate the annualized variance
variance_annualized = sigma_annualized**2
print('Annualized variance: ', variance_annualized)

Mean daily return:  -0.005435528910180826
Mean annualized return:  -0.746778200682007
Daily standard deviation (sigma):  0.009600312789186207
Daily variance (sigma-squared):  9.216600565021226e-05
Annualized deviation:  0.1524002408917174
Annualized variance:  0.02322583342385349


## Skew and Kurtosis
### Skew
* Measure of how much a distrubution leans to the left or the right
* Negative - right-leaning
* Positive - left-leaning
* In Finance - we want positive skew - higher likelihood of significant returns on right hand side, compressed and predictible distribution of negative returns
* Above 0 -- possible non-normality

In [13]:
from scipy.stats import skew
skew(StockPrices['Returns'].dropna())

0.06737216053396748

### Kurtosis
* Measure of thickness of the tails of the distribution
* Used as a proxy for the probability of outliers
* Normal - around 3
* Financial returns tend to have positive excess kurtosis >3 (Leptokurtic)
* Often compared to a normal distribution - so many in Python will automatically return excess (-3)
* In scipy, kurtosis function computes **excesss kurtosis**
* \> 0 indicates non-normality
* High excess kurtosis indicates high risk (large movements)
* High kurtosis distributions said to have "thick tails" -- outliers are more common

In [14]:
from scipy.stats import kurtosis
kurtosis(StockPrices['Returns'].dropna())

-0.8394822399421926

## Shapiro-Wilk tests
* estimate the probability the data is normally distrbuted
* null hypothesis: the data is normally-distrubuted
    * if p <= 0.05, can safely reject the null hypothesis and assume the data are non-normal
    
You can use the shapiro() function from scipy.stats to run a Shapiro-Wilk test of normality on the stock returns. The function will return two values in a list. The first value is the t-stat of the test, and the second value is the p-value. You can use the p-value to make a judgement about the normality of the data. If the p-value is less than or equal to 0.05, you can safely reject the null hypothesis of normality and assume that the data are non-normally distributed.

In [16]:
## Recap
# Import skew from scipy.stats
from scipy.stats import skew

# Drop the missing values
clean_returns = StockPrices['Returns'].dropna()

# Calculate the third moment (skewness) of the returns distribution
returns_skewness = skew(clean_returns)
print(returns_skewness)

# Import kurtosis from scipy.stats
from scipy.stats import kurtosis

# Calculate the excess kurtosis of the returns distribution
excess_kurtosis = kurtosis(clean_returns)
print(excess_kurtosis)

# Derive the true fourth moment of the returns distribution (the true kurtosis = excess_kurtosis + 3)
fourth_moment = excess_kurtosis+3
print(fourth_moment)

# Import shapiro from scipy.stats
from scipy.stats import shapiro

# Run the Shapiro-Wilk test on the stock returns
shapiro_results = shapiro(clean_returns)
print("Shapiro results:", shapiro_results)

# Extract the p-value from the shapiro_results
p_value = shapiro_results[1]
print("P-value: ", p_value)

0.06737216053396748
-0.8394822399421926
2.1605177600578074
Shapiro results: (0.9626290202140808, 0.5976030826568604)
P-value:  0.5976030826568604
