# Deviations from Normality

A distribution is symmetric if it looks the same to the left and right of the center point. 

Skewness is a measure of lack of symmetry. 

Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution.

In [1]:
%load_ext autoreload
%autoreload 2

import pandas as pd
import portfolio_tool_kit as ptk
hfi = ptk.get_hfi_returns()
hfi.head()

Unnamed: 0_level_0,Convertible Arbitrage,CTA Global,Distressed Securities,Emerging Markets,Equity Market Neutral,Event Driven,Fixed Income Arbitrage,Global Macro,Long/Short Equity,Merger Arbitrage,Relative Value,Short Selling,Funds Of Funds
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1997-01,0.0119,0.0393,0.0178,0.0791,0.0189,0.0213,0.0191,0.0573,0.0281,0.015,0.018,-0.0166,0.0317
1997-02,0.0123,0.0298,0.0122,0.0525,0.0101,0.0084,0.0122,0.0175,-0.0006,0.0034,0.0118,0.0426,0.0106
1997-03,0.0078,-0.0021,-0.0012,-0.012,0.0016,-0.0023,0.0109,-0.0119,-0.0084,0.006,0.001,0.0778,-0.0077
1997-04,0.0086,-0.017,0.003,0.0119,0.0119,-0.0005,0.013,0.0172,0.0084,-0.0001,0.0122,-0.0129,0.0009
1997-05,0.0156,-0.0015,0.0233,0.0315,0.0189,0.0346,0.0118,0.0108,0.0394,0.0197,0.0173,-0.0737,0.0275


## Skewness

Intuitively, a negative skew means that you get more negative returns than you would have expected if the returns were distributed like the normal distribution.

If the distribution is negatively skewed, the expected value i.e. the mean is less than the median. If distribution is positively skewed, the expected value (again, the mean) is greater than the median.

In [9]:
pd.concat([hfi.mean(), hfi.median(), hfi.mean()>hfi.median()], axis=1)

Unnamed: 0,0,1,2
Convertible Arbitrage,0.005508,0.0065,False
CTA Global,0.004074,0.0014,True
Distressed Securities,0.006946,0.0089,False
Emerging Markets,0.006253,0.0096,False
Equity Market Neutral,0.004498,0.0051,False
Event Driven,0.006344,0.0084,False
Fixed Income Arbitrage,0.004365,0.0055,False
Global Macro,0.005403,0.0038,True
Long/Short Equity,0.006331,0.0079,False
Merger Arbitrage,0.005356,0.006,False


In [10]:
import scipy.stats
scipy.stats.skew(hfi)

array([-2.63959223,  0.17369864, -1.30084204, -1.16706749, -2.12443538,
       -1.40915356, -3.94032029,  0.98292188, -0.39022677, -1.32008333,
       -1.81546975,  0.76797484, -0.36178308])

The skewness is given by:

$$ S(R) = \frac{E[ (R-E(R))^3 ]}{\sigma_R^3} $$

In [12]:
def skewness(r):
    """
    Computes skewness of supplied series or a dataframe
    Returns a float or a series
    """
    demeaned_r = r - r.mean()
    # use the population standard deviation, so set dof=0
    sigma_r = r.std(ddof=0)
    exp = (demeaned_r**3).mean()
    return exp/sigma_r**3
    

In [14]:
skewness(hfi).sort_values()

Fixed Income Arbitrage   -3.940320
Convertible Arbitrage    -2.639592
Equity Market Neutral    -2.124435
Relative Value           -1.815470
Event Driven             -1.409154
Merger Arbitrage         -1.320083
Distressed Securities    -1.300842
Emerging Markets         -1.167067
Long/Short Equity        -0.390227
Funds Of Funds           -0.361783
CTA Global                0.173699
Short Selling             0.767975
Global Macro              0.982922
dtype: float64

It can be noted that only CTA Global is nearly normal

the results are matching to scipy.stats, so let's add it to the tool kit and verify

In [15]:
ptk.skewness(hfi).sort_values()

Fixed Income Arbitrage   -3.940320
Convertible Arbitrage    -2.639592
Equity Market Neutral    -2.124435
Relative Value           -1.815470
Event Driven             -1.409154
Merger Arbitrage         -1.320083
Distressed Securities    -1.300842
Emerging Markets         -1.167067
Long/Short Equity        -0.390227
Funds Of Funds           -0.361783
CTA Global                0.173699
Short Selling             0.767975
Global Macro              0.982922
dtype: float64

Also check if it works for normal distribution

In [20]:
import numpy as np
normal = np.random.normal(0, 0.15, (263, 1))
normal.mean(), normal.std()

(-0.011487138456918608, 0.14118469981710227)

In [21]:
ptk.skewness(normal)

0.04281009528205678

# Kurtosis

Intuitively, the kurtosis measures the "fatness" of the tails of the distribution. The normal distribution has a kurtosis of 3 and so if the kurtosis of your returns is less than 3 then it tends to have thinner tails, and if the kurtosis is greater than 3 then the distribution has fatter tails.

Kurtosis is given by:

$$ K(R) = \frac{E[ (R-E(R))^4 ]}{\sigma_R^4} $$


In [23]:
def kurtosis(r):
    """
    Computes kurtosis of supplied series or a dataframe
    Returns a float or a series
    """
    demeaned_r = r - r.mean()
    # use the population standard deviation, so set dof=0
    sigma_r = r.std(ddof=0)
    exp = (demeaned_r**4).mean()
    return exp/sigma_r**4

kurtosis(hfi).sort_values()

Here again CTA Global is normal

In [30]:
scipy.stats.kurtosis(hfi)

array([20.28083446, -0.04703963,  4.88998336,  6.25078841, 14.21855526,
        5.03582817, 26.84219928,  2.74167945,  1.52389258,  5.73894979,
        9.12120787,  3.11777175,  4.07015278])

In [31]:
scipy.stats.kurtosis(normal)

array([0.14542272])

In [32]:
kurtosis(normal)

3.14542271814986

Note that scipy.stats.kurtosis gives excess kurtosis wrt normal

Let's add the definition to tool kit and check

In [25]:
ptk.kurtosis(hfi).sort_values()

CTA Global                 2.952960
Long/Short Equity          4.523893
Global Macro               5.741679
Short Selling              6.117772
Funds Of Funds             7.070153
Distressed Securities      7.889983
Event Driven               8.035828
Merger Arbitrage           8.738950
Emerging Markets           9.250788
Relative Value            12.121208
Equity Market Neutral     17.218555
Convertible Arbitrage     23.280834
Fixed Income Arbitrage    29.842199
dtype: float64

## Running the Jarque-Bera Test for Normality

In [33]:
scipy.stats.jarque_bera(normal)

Jarque_beraResult(statistic=0.3120778161835205, pvalue=0.8555259028431145)

p-value can be used for accepting or rejecting test hypothesis. At a 1% level of significance, p-value should be greater than 0.01 to accept the hypothesis of normality, and if p-value is less than 0.01 then you must reject the hypothesis of normality.

In [34]:
scipy.stats.jarque_bera(hfi)

Jarque_beraResult(statistic=25656.585999171326, pvalue=0.0)

But the built in function is limited.. it can't give the result columnwise. So let's fix it

In [35]:
def is_normal(r, level=0.01):
    """
    Applies the Jarque-Bera test to determine if a Series is normal or not
    Test is applied at the 1% level by default
    Returns True if the hypothesis of normality is accepted, False otherwise
    """
    stat, p_value = scipy.stats.jarque_bera(r)
    return p_value > level

In [36]:
is_normal(hfi)

False

Again, we can't apply is_normal to entire dataframe

In [39]:
hfi.aggregate(is_normal)

Convertible Arbitrage     False
CTA Global                 True
Distressed Securities     False
Emerging Markets          False
Equity Market Neutral     False
Event Driven              False
Fixed Income Arbitrage    False
Global Macro              False
Long/Short Equity         False
Merger Arbitrage          False
Relative Value            False
Short Selling             False
Funds Of Funds            False
dtype: bool

or update the tool kit to handle the dataframe

In [40]:
ptk.is_normal(hfi)

Convertible Arbitrage     False
CTA Global                 True
Distressed Securities     False
Emerging Markets          False
Equity Market Neutral     False
Event Driven              False
Fixed Income Arbitrage    False
Global Macro              False
Long/Short Equity         False
Merger Arbitrage          False
Relative Value            False
Short Selling             False
Funds Of Funds            False
dtype: bool

Now let's test normality of FFME dataset

In [41]:
ffme = ptk.get_ffme_returns()
ptk.is_normal(ffme)

<= 0      False
Lo 30     False
Med 40    False
Hi 30     False
Lo 20     False
Qnt 2     False
Qnt 3     False
Qnt 4     False
Hi 20     False
Lo 10     False
Dec 2     False
Dec 3     False
Dec 4     False
Dec 5     False
Dec 6     False
Dec 7     False
Dec 8     False
Dec 9     False
Hi 10     False
dtype: bool

In [43]:
ptk.skewness(ffme)

<= 0           NaN
Lo 30     3.086756
Med 40    1.115321
Hi 30     0.456423
Lo 20     3.629829
Qnt 2     1.929089
Qnt 3     0.955631
Qnt 4     0.682897
Hi 20     0.345472
Lo 10     4.410739
Dec 2     2.840439
Dec 3     1.951865
Dec 4     1.879751
Dec 5     0.918593
Dec 6     1.035633
Dec 7     0.740747
Dec 8     0.672436
Dec 9     0.460350
Hi 10     0.233445
dtype: float64

In [44]:
ptk.kurtosis(ffme)

<= 0            NaN
Lo 30     32.745147
Med 40    15.467620
Hi 30     12.323991
Lo 20     38.285414
Qnt 2     22.673174
Qnt 3     14.219654
Qnt 4     12.789570
Hi 20     11.847243
Lo 10     46.845008
Dec 2     31.508124
Dec 3     22.540679
Dec 4     22.428206
Dec 5     13.868490
Dec 6     15.013275
Dec 7     12.953947
Dec 8     13.074687
Dec 9     12.833217
Hi 10     10.694654
dtype: float64