In [ ]:
import math
import scipy as sp
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# required for interactive plotting
from __future__ import print_function
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets
import numpy.polynomial as np_poly

from IPython.display import Math
from IPython.display import Latex

initialization  
$ \newcommand{\E}[1]{\mathbb{E}\left[#1\right]}$  
$ \newcommand{\V}[1]{\mathbb{V}\left[#1\right]}$
$ \newcommand{\P}{\mathbb{P}}$

Definition
==========

The fourth central moment is a measure of the heaviness of the tail of the distribution, compared to the normal distribution of the same variance. Since it is the expectation of a fourth power, the fourth central moment, where defined, is always positive; and except for a point distribution, it is always strictly positive. The fourth central moment of a normal distribution is $3\sigma^4$.

The kurtosis κ is defined to be the normalised fourth central moment minus 3 (Equivalently, as in the next section, it is the fourth cumulant divided by the square of the variance). Some authorities do not subtract three, but it is usually more convenient to have the normal distribution at the origin of coordinates.[4][5] If a distribution has heavy tails, the kurtosis will be high (sometimes called leptokurtic); conversely, light-tailed distributions (for example, bounded distributions such as the uniform) have low kurtosis (sometimes called platykurtic).

The kurtosis can be positive without limit, but κ must be greater than or equal to γ2 − 2; equality only holds for binary distributions. For unbounded skew distributions not too far from normal, κ tends to be somewhere in the area of γ2 and 2γ2.

Interpretations
===============
* Tail weight
* Peakedness (width of the peak)
* Lack of shoulders (distribution peaks and tails, not in between)

Measures
==========

If $\mu_i$ represents the $i^{th}$ moment about the mean, then
$$
\operatorname{Kurt}[X]
= \frac{\mu_4}{\sigma^4}
= \frac{ \E{X-\mu}^4 }
       { \left( \E{(X-\mu)^2} \right)^2}
$$
* This number measures heavy tails, and not peakedness; hence, the "peakedness" definition is misleading.
* For this measure, higher kurtosis means more of the variance is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations.

Kurtosis is bounded below by squared [skewness](skewness.ipynb):  
$$
 \frac{\mu_4}{\sigma^4} \geq \left(\frac{\mu_3}{\sigma^3}\right)^2 + 1,
$$

There is no upper limit.

Titbits
=======

* Kurtosis of any univariate normal distribution = 3
* Excess kurtosis:
  * is equal to kurtosis - 3
  * = 0 for normal
  * tells the deviation of the given distribution from being normal
  * 

On the meaning
==============
From [here][DeCarlo1997]

[DeCarlo1997]: http://dx.doi.org/10.1037/1082-989X.2.3.292 "On the meaning and use of kurtosis. DeCarlo, Lawrence T. Psychological Methods, Vol 2(3), Sep 1997, 292-307."

$$
\text{Let }
\beta_2 = \frac{ \E{X-\mu}^4 }
               { \left( \E{(X-\mu)^2} \right)^2} 
$$

Note that $\beta_2 -3 $ is the excess kurtosis.  
* If $\beta_2 - 3 > 0$
  * we have positive kurtosis or leptokurtic  
  * Taller
  * Peak: Higher than normal
  * Tails: Heavier than normal
* If $\beta_2 - 3 < 0$,
  * we have negative kurtosis or platykurtic
  * Flatter
  * Peak: Lower than normal
  * Tails: Lighter than normal
  
> The $t_5$ distribution shows the pattern of higher-lower-higher on each side, which is a common characteristic of distributions with excess kurtosis

...

> The uniform distribution crosses the normal twice on each side of the mean

Examples of leptokurtic symmetric distributions
1. Logistic distribution [$\beta_2 - 3 = 1.2$]
2. Laplace distribution  [$\beta_2 - 3 = 3$]

A simplified explanation
-------------------------

* Tailedness and peakedness are both part of ~, because ~ represents a movement of mass keeping the variance unchanged.
* It relates to the movement from(to) the shoulder to(from) the head and tails in case of positive (negative) ~.
* ~ reflects an excess (lightness) in either tails, head or both in case of postive (negative) ~.
* An approach by means of influence functions show that ~ primarily reflects the tails with center/head having a smaller influence.
* 

Misconceptions
--------------

1. Kurtosis as simply peakedness
1. On Tailedness and Peakedness
1. Kurtosis and Variance

**Kurtosis as simply peakedness**  
[Kaplansky 1945][Kaplansky1945] showed density functions with smaller peak but positive kurtosis and vice versa.





[Kaplansky1945]: http://dx.doi.org/10.1080/01621459.1945.10501856  "A Common Error concerning Kurtosis, Kaplansky I, Journal of the American Statistical Association, 1945"

**On tailedness and peakedness**  
Many textbooks describe positive kurtosis as indicating peakedness and light tails (rather than heavy tails) and vice versa for negative kurtosis.

**Kurtosis and Variance**  
* Positive(Negative) kurtosis is described as large(small) variance.
* It should be noted that kurtosis measure $\beta_2$ is scaled with respect to variance, so it is scale-free and not affected by it.
* ~ reflects the shape of the distribution *apart* from variance.
* Say $N_1$ is standard normal($\sigma^2 = 1$), $N_2$ has $\sigma^2 = 0.5$ and $N_3$ has $\sigma^2 = 2$. But all these have the same ~, which is $\beta_2 = 3$. Hence, larger(smaller) variance does not imply positive(negative) ~.

Use of Kurtosis
-----------------

* Mean, variance - location and variability of the distribution
* Skewness, Kurtosis - shape of the distribution
* ~ and skewness can be tested for checking the normality of distributions, up to sample sizes of size nine.
  * Use omnibus tests like
     1. Shapiro-Wilk test 
     1. D'Agostino & Pearson $K^2$ test.
* Multivariate testing can be preceded by checking the univariate normality of each variable. This is a necessary but not sufficient condition for multivariate normality
* Robustness: means affected more by skewness and variance is affected more by kurtosis.
* Outliers: Positive ~ can arise either because outliers are present or the underlying distribution is non-normal (in which case, heavy tail nonnormal distributions can be considered as alternatives to the normal)[?].

Limitations
-----------

1. More than one distributional shape can correspond to a single value of ~
1. Cannot be used when the moments are not finite
1. Does not necessarily allow comparisons between nonnormal distributions but only with respect to normal distribution.

For uniform distribution U(-w,+w),
$$
\E{x-\mu}^4 = \frac{1}{2w}\int_{-w}^{+w} x^4 dx
= \frac{w^5 - (-w)^5}{10w} = \frac{w^4}{5}  
$$
$$
\E{x-\mu}^2 = \frac{1}{2w}\int_{-w}^{+w} x^2 dx
= \frac{w^3 - (-w)^3}{6w} = \frac{w^2}{3}  
$$
$$
\beta_2 = \frac{w^4}{5} * \frac{3^2}{w^4} = \frac{9}{5}\\
\Rightarrow \beta_2 - 3 = \frac{-6}{5} = -1.2
$$

In [ ]:
def compute_normal(x, mu, sigma):
    exponent = (-1./2)*((x - mu)/sigma)**2
    scaling_factor = 1./(math.sqrt(math.pi)*sigma)
    return scaling_factor * math.exp(exponent)
def show_platykurtic(sigma=1.4, mu=0, width_rect=1.8):
    x = np.linspace(-4,4,100)
    mu = 0
    y = [compute_normal(xx, mu, sigma) for xx in x]
    plt.plot(x,y,'r.',label='normal')
    
    x_lim = plt.xlim(); x_range = x_lim[1] - x_lim[0]
    y_lim = plt.ylim(); y_range = y_lim[1] - y_lim[0]
    # uniform distribution
    ht_uniform = 1/(2.*width_rect)
    plt.axvline(x=-width_rect, ymin=0, ymax=(1.*ht_uniform)/y_range)
    plt.axvline(x=width_rect, ymin=0, ymax=(1.*ht_uniform)/y_range)
    plt.axhline(y=ht_uniform,
                xmin=(-width_rect+x_range/2.)/x_range,
                xmax=(width_rect+x_range/2.)/x_range,
                label='Uniform'
               )
    plt.legend()
    plt.title('b2 -3 = -1.2')
    plt.show()
interactive(show_platykurtic,
            sigma=(-5,5,0.1),
            mu=(-1,1,0.1),
            width_rect=(-4,4,0.1))
    

References
==========

1. On the meaning and the use of Kurtosis, Lawrence T DeCarlo [link](http://www.columbia.edu/~ld208/psymeth97.pdf)