# Goal: Testing whether Returns are Normally Distributed

## Hey, Python techies!!

A good way to think about returns of a financial asset is to view them as realizations of a random variable. Such a random variable must then have a distribution, from which the observed returns are sampled. It is often convenient to assume that returns are normally distributed. But are they? The Jarque-Bera-Test offers a way to formally test that assumption.

The test statistic for the Jarque-Bera test is

$$JB = \frac{N}{6} \left( S^2 + 0.25 (K - 3)^2 \right),$$

where $N$ is the amount of observed returns, $S$ is their skewness and $K$ is their kurtosis. A normal distribution has a skewness of 0 and kurtosis of 3, so the $JB$ statistic would be 0. Under the null hypothesis that returns are normally distributed, $JB$ is $\chi^2$-distributed with 2 degrees of freedom.

## Let's now check whether daily returns are normally distributed

For our analysis, we'll need the pandas, the numpy and scipy's stats subpackage, which we directly load in the beginning.

In [1]:
import pandas as pd
import numpy as np
import scipy.stats

Next, we load prices of the S&P 500 in code line 4 and compute daily log returns in code line 5. 

In [2]:
sp500 = pd.read_csv("sp500.csv", parse_dates = ['Date'])
sp500['ret_daily'] = np.log(sp500['price']) - np.log(sp500['price'].shift())
sp500.head()

Unnamed: 0,Date,price,ret_daily
0,1999-12-31,1469.25,
1,2000-01-03,1455.22,-0.009595
2,2000-01-04,1399.42,-0.039099
3,2000-01-05,1402.11,0.00192
4,2000-01-06,1403.45,0.000955


We remove the NaN value in the beginning of the return time series via the 'dropna' function in code line 7.

In [3]:
sp500 = sp500.dropna()
sp500.head()

Unnamed: 0,Date,price,ret_daily
1,2000-01-03,1455.22,-0.009595
2,2000-01-04,1399.42,-0.039099
3,2000-01-05,1402.11,0.00192
4,2000-01-06,1403.45,0.000955
5,2000-01-07,1441.47,0.02673


We now compute the $JB$ statistic by implementing the formula above. Thus, we make use of the 'skew' and 'kurtosis' functions of scipy's *stats* subpackage.

In [4]:
S = scipy.stats.skew(sp500['ret_daily'])
K = scipy.stats.kurtosis(sp500['ret_daily'], fisher = False)
N = sp500.shape[0]
JB = (N / 6) * (S**2 + 0.25 * (K - 3)**2)
JB

17375.470184643305

The actual value of the test statistic tells us little about how likely it is to observe such a value, even if returns are actually normally distributed. For that, we need to compute its probability. We do that by using the 'chi2.cdf' function of the scipy.stats package. This function calculates the cumulative probability function for the $\chi^2$ distribution, which is the probability that we observed a value smaller or equal to $JB$. The function takes a second argument, which is the degrees of freedome of the $chi^2$ distribution. That's 2 in our case.

In [5]:
cdf = scipy.stats.chi2.cdf(JB, 2)
cdf

1.0

We are interested in the probability of the null hypothesis being true, despite we observed $JB$ as it is. The probability for that is 1 minus the probability of observing a value smaller than $JB$. Hence, we can now compute the p-value for the Jarque-Bera test.

In [None]:
p = 1 - cdf
p

The p-value is 0, indicating that there is a probability of 0%, i.e. no chance, of observing $JB \approx 17375.47$ if daily returns were actually normally distributed. We did show empirically that returns are not normally distributed.