# Python for Finance 2020

MSc in Finance, Universidade Católica Portuguesa

João Brogueira de Sousa [jbsousa@ucp.pt]

## Plotting and testing Normality of stock market returns

The [Normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) is one of the most important probability distributions in Finance. You have previously met different financial models with the assumption that the data is distributed according to the Normal distribution (*Portfolio theory*, certain derivations of the *CAPM*, etc.).

In this notebook we will use our Python skills to plot the distribution of returns and test their Normality empirically.

We will use [statsmodels](https://www.statsmodels.org/stable/about.html#about-statsmodels), "a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests".

In [None]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import scipy.stats as sps # https://docs.scipy.org/doc/scipy/reference/stats.html
import statsmodels.api as stm # https://www.statsmodels.org/stable/api.html

The following imports time series of the price level of Vanguard's [S&P 500 ETF]((https://investor.vanguard.com/etf/profile/VOO)) (`VOO`) and [Total World Stock ETF]((https://investor.vanguard.com/etf/profile/VT)) (`VT`), and Apple (`AAPL`) and Microsoft (`MSFT`) stocks, downloaded from [finance.yahoo.com](https://finance.yahoo.com/quote/AAPL?p=AAPL&.tsrc=fin-srch).

In [None]:
# Import data on Vanguard's ETFs VOO, VT, Apple and Microsoft:
data = pd.read_csv("VOO.csv")

data.head()

As we usually do, we set the Date column as the `index`:

In [None]:
data.set_index('Date', inplace=True) # Set the Index

In [None]:
data.index = pd.to_datetime(data.index, format="%d/%m/%Y") # Set Index dtype=datetime

In [None]:
data.head()

Let's compute log returns from the level data:

In [None]:
log_returns = np.log(data/data.shift(1))

log_returns.head()

In [None]:
log_returns.dropna(inplace=True)
type(log_returns)

The first thing we may want to do in order to visualize the empirical distribution of the return data we have just computed is to produce an histogram. 

Since we are working with a pandas DataFrame, we can use the ready to use `hist` method:

In [None]:
# log_returns.hist?

In [None]:
fig = log_returns.hist(bins=50, figsize=(10,8))

If we want to compute descriptive statistics of these data, we can use `scipy.stats.describe` function:

In [None]:
# sps.describe?

Let's create and print a table for each asset, with the descriptive statistics returned by `describe`:

In [None]:
# Here we use `scpys.describe` for descriptive statistics
ticker = ['VOO', 'VT', 'AAPL', 'MSFT']

# Details about printing data here: 
# https://docs.python.org/3/tutorial/inputoutput.html
for tic in ticker:
    print(f'{tic}') 
    print(32 * '-')
    dstat = sps.describe(log_returns[tic].dropna())
    print('%15s %15s' % ('statistic', 'value'))
    print(32 * '-')
    print('%15s %15.4f' % ('size', dstat[0]))
    print('%15s %15.4f' % ('min', dstat[1][0]))
    print('%15s %15.4f' % ('max', dstat[1][1]))
    print('%15s %15.4f' % ('mean', dstat[2]))
    print('%15s %15.4f' % ('var', dstat[3]))
    print('%15s %15.4f' % ('skew', dstat[4]))
    print('%15s %15.4f' % ('kurtosis', dstat[5]))

The four kurtosis values seem to be especially far from what we would expect from normally distributed variables.

One common graphical way to compare two distributions (in this case, distribution of returns and normal distribution) is a quantile-quantile plot ([QQ plot](https://en.wikipedia.org/wiki/Q%E2%80%93Q_plot)).

[statsmodels qqplot](https://www.statsmodels.org/stable/generated/statsmodels.graphics.gofplots.qqplot.html) is a function that produces a QQ plot. As default, it plots the quantiles of the data (vertical axis) against the quantiles of a standard normal distribution (horizontal axis).

If the data is normal, the resulting quantiles would be located along a 45 degree line. If the data is distributed according to a linear transformation of a standard normal, the resulting plot would be along a straight line (not necessarily the 45 degree line). Deviations from such a line highlight differences relative to the normal, such has skewness or excess kurtosis.

In [None]:
# stm.qqplot?

In [None]:
tickers = ['VOO', 'VT', 'AAPL', 'MSFT']

num_rows, num_cols = 2, 2

fig, axes = plt.subplots(num_rows, num_cols, figsize=(12,10))

for i in range(num_rows):
    for j in range(num_cols):
        # Notice the indexing of `tickers`:
        stm.qqplot(log_returns[tickers[num_rows*i+j]], line='s', ax = axes[i,j])
        axes[i,j].set(title=f'{tickers[num_rows*i+j]}')

The figures above *suggests* fat-tailed empirical distributions: compared to the standard normal, the data exhibits higher frequency of large negative and positive values.

Finally, we can formally test if there is evidence to reject the assumption of normally distributed returns. We can easily do it since `scipy.stats` includes tests for skewness and kurtosis, and normality.

In [None]:
# sps.skewtest?

In [None]:
# sps.kurtosistest?

In [None]:
# sps.normaltest?

In [None]:
for tic in ticker:
    print(32 * '-')
    print(f'{tic}') 
    print(32 * '-')
    print('Skew of data      %14.3f' % sps.skew(log_returns[tic]))
    print('Skew test p-value %14.3f' % sps.skewtest(log_returns[tic])[1])
    print('Kurt of data      %14.3f' % sps.kurtosis(log_returns[tic]))
    print('Kurt test p-value %14.3f' % sps.kurtosistest(log_returns[tic])[1])
    print('Norm test p-value %14.3f' % sps.normaltest(log_returns[tic])[1]) 

Here's an alternative using [plotly](https://plot.ly/python/table/):

In [None]:
data = pd.DataFrame({'Tickers': [], 'Skew of data': [], 'Skew test p-value': [], 'Kurt of data': [],
                    'Kurt test p-value': [], 'Norm test p-value': []})

for tic in ticker:
    data = data.append({'Tickers': tic, 'Skew of data': round(sps.skew(log_returns[tic]),2), 
                        'Skew test p-value': round(sps.skewtest(log_returns[tic])[1],2), 
                        'Kurt of data': round(sps.kurtosis(log_returns[tic]),2),
                        'Kurt test p-value': round(sps.kurtosistest(log_returns[tic])[1],2), 
                        'Norm test p-value': round(sps.normaltest(log_returns[tic])[1],2)}, 
                      ignore_index=True)


In [None]:
data.values

You may need to install plotly if it's not available in your Python distribution (you would get an error while importing). To do so, uncomment the following line:

In [None]:
# pip install plotly==4.5.2

In [None]:
import plotly.graph_objects as go

fig = go.Figure(data=[go.Table(
    header=dict(values=list(data.columns),
                fill_color='paleturquoise',
                align='right'),
    cells=dict(values=data.values.T,
               fill_color='lavender',
               align='right'))
])

fig.show()