# Skewness and Kurtosis

## Overview

In this section we will introduce two more statistics about the data namely <a href="https://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm">skewness and kurtosis</a>.
The first statistic meansure the lack of symmetry of the observed data. The second coefficient measures 
whether the data are heavy-tailed or light-tailed relative to the normal distribution [2]. Hence, data sets with high kurtosis tend to have heavy tails, or outliers. Data sets with low kurtosis tend to have light tails, or lack of outliers.

## Skewness and Kurtosis

### Skewness

The skewness $k$ measures the lack of symmetry of a distribution. It is defined as [1]



\begin{equation}
k = \frac{E\left[X-\mu\right]^3}{\sigma^3}
\end{equation}

The skewness for the normal distribution will be zero as the distribution is symmetric

### Kurtosis

The kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution. 
That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data sets with low kurtosis tend to have light tails, or lack of outliers. 
It is given by

\begin{equation}
b_2 = \frac{E\left[X-\mu\right]^4}{\sigma^4}
\end{equation}

The kurtosis for the standard normal distribution is 3 [2]. Thus in many cases the following definition is ised

\begin{equation}
b_2 = \frac{E\left[X-\mu\right]^4}{\sigma^4} -3 
\end{equation}

### Example 1

The following example shows how to calculate skewness and kurtosis with Python.

In [3]:
import numpy as np
from scipy.stats import kurtosis, skew

In [4]:
x = np.random.normal(0, 2, 10000)   

print('kurtosis of normal distribution (should be 0): {}'.format( kurtosis(x) ))
print('skewness of normal distribution (should be 0): {}'.format( skew(x) ))

kurtosis of normal distribution (should be 0): 0.05469279202667954
skewness of normal distribution (should be 0): -0.04329816932976951


## Summary

This section introduced two statistics, skewness and kurtosis. They allow us to evaluate whether the observed data is symmetric or the 
lack of symmetry, skewnees, and the departure of the data from
the normal distribution the kurtosis coefficient. The two coefficients overall can be used to assess the deviation of the 
observed data from the normal distribution

## References

1. Larry Wasserman, _All of Statistics. A Concise Course in Statistical Inference_, Springer 2003.
2. <a href="https://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm">Measures of skewness and kurtosis</a>