# Analysing and Visualising the Distributions of Hedge Fund Returns

We will look at some hedge fund returns and estimate whether they are normally distributed or not. This is important to understand and check because this is an assumption that traders make at times and can be disastrous if not checked. We will be doing so by calculating the skewness and kurtosis of the data.


**Skewness** can be calculated as:- 

$$ S(R) = \frac{E[ (R-E(R))^3 ]}{\sigma_R^3} $$


**Kurtosis** can be calculated as follows:-

$$ K(R) = \frac{E[ (R-E(R))^4 ]}{\sigma_R^4} $$

## Import Necessary libraries

In [None]:
# Data Analysis
import numpy as np 
import pandas as pd 

# Data Visualisation 
import matplotlib.pyplot as plt 
import seaborn as sns

## Read in the data

In [None]:
hedge_data = pd.read_csv('../input/python/edhec-hedgefundindices.csv',
                           header=0, index_col=0, parse_dates=True)
hedge_data.shape

In [None]:
# Lets look at the monthly data
hedge_data.head()

**File Description**: This is the returns, starting from 1997, the monthly returns for various kinds of hedge funds. Edhec_risk produces these indices, and makes them available for free on the website. So these are all different kinds of hedge fund strategies. First order of business is to convert this data to percent and change the index format. 

In [None]:
# Convert to percent 
hedge_data = hedge_data/100
hedge_data.index = hedge_data.index.to_period('M')
hedge_data

This is much better and we can start our analysis now. 

## Distribution Analysis

I will start by comparing the mean and the median. For a normal distribution, it should be the same. 

In [None]:
# Collect mean and median data 
pd.concat([hedge_data.mean(), hedge_data.median(), hedge_data.mean() > hedge_data.median()], 
          axis="columns")

The mean is lower than the median in almost all cases which indicates negative skewness. 

### Estimating the skewness of the data

In [None]:
# Make a skewness function 
def skewness(r):
    '''
        ARGS:
            Series or Dataframe
        
        RETURNS: 
            Float or a series data with the calculated skewness
    '''
    
    # Calculate the demeaned returns 
    demeaned_r = r - r.mean()
    
    # Use the population standard deviation, ddof=0
    sigma_r = r.std(ddof=0)
    
    # Calculate the expectation of the demeaned returns raised to the third power
    exp = (demeaned_r**3).mean()
    
    # Calcualte the skew
    skew = exp/sigma_r**3
    return skew

In [None]:
# Calculate the skewness and sort the returns 
skewness(hedge_data).sort_values()

A normal distribution has a skewness of zero. We can see very high negative skewness in the hedge data. There is a faster way to do this using a built in python function. 

In [None]:
# Using the stats library 
import scipy.stats as st
st.skew(hedge_data)

While these are the unsorted values, we see that they are identical.

### Estimating the Kurtosis 

Almost similar to skewness and we can make a function for this as well. 

In [None]:
# Make a kurtosis function 
def kurtosis(r):
    '''
        ARGS:
            Series or Dataframe
        
        RETURNS: 
            Float or a series data with the calculated kurtosis
    '''
    
    # Calculate the demeaned returns 
    demeaned_r = r - r.mean()
    
    # Use the population standard deviation, ddof=0
    sigma_r = r.std(ddof=0)
    
    # Calculate the expectation of the demeaned returns raised to the third power
    exp = (demeaned_r**4).mean()
    
    # Calcualte the skew
    kurt = exp/sigma_r**4
    return kurt

In [None]:
# Calculate the kurtosis and sort the returns 
kurtosis(hedge_data).sort_values(ascending=False)

Kurtosis for a normal return is roughly three and the hedge fund returns have really high Kurtosis.

## Jarque-Bera Statistical Test

In [None]:
# Using the built in function 
st.jarque_bera(hedge_data['CTA Global'])

In [None]:
# Function to apply the Jarque Bera test and 
# return whether the hedge is normally distributed or not 
def is_normal(r, level=0.01):
    '''
        ARG
            Series data
        RETURN
            True, if hypothesis of normality is accepted, False otherwise 
    '''
    statistic, p_val = st.jarque_bera(r)
    return p_val > level

In [None]:
is_normal(hedge_data)

The above command assumes the whole data to be combined which is not true. Let's change that. 

In [None]:
hedge_data.aggregate(is_normal)

We can see that only CTA Global is normal while everything else is not normal at 1% level.