In [1]:
import math
import scipy as sp
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# required for interactive plotting
from __future__ import print_function
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets
import numpy.polynomial as np_poly

from IPython.display import Math
from IPython.display import Latex

initialization  
$ \newcommand{\E}[1]{\mathbb{E}\left[#1\right]}$  
$ \newcommand{\V}[1]{\mathbb{V}\left[#1\right]}$
$ \newcommand{\P}{\mathbb{P}}$

[[wiki](http://www.wikiwand.com/en/Skewness)]

todo:
* [Coskewness](http://www.wikiwand.com/en/Coskewness)

General
=======
* The third central moment is a measure of the lopsidedness of the distribution
* Any symmetric distribution will have a third central moment, if defined, of zero.
* The normalised third central moment is called the skewness, often $\gamma$.  

Left and right Skewness
========================
* *Left-Skewed, Left-tailed, Negative-skew*  
  A distribution that is skewed to the left (the tail of the distribution is longer on the left) will have a negative skewness.
* *Right-Skewed, Right-tailed, Positive-skew*  
  A distribution that is skewed to the right (the tail of the distribution is longer on the right), will have a positive skewness.

Interpretation
==============
* For a unimodal distribution, negative skew indicates that the tail on the left side of the probability density function is longer or fatter than the right side – it does not distinguish these shapes.
* Conversely, positive skew indicates that the tail on the right side is longer or fatter than the left side.
* In cases where one tail is long but the other tail is fat, skewness does not obey a simple rule.
  * For example, a zero value indicates that the tails on both sides of the mean balance out, which is the case for a symmetric distribution, but is also true for an asymmetric distribution where the asymmetries even out, such as one tail being long but thin, and the other being short but fat.
* Further, in multimodal distributions and discrete distributions, skewness is also difficult to interpret


Relationship with mean, median
==============================

* Older notion of Nonparametric skew: $(\mu - \nu)/\sigma$  
where $\mu, \nu, \sigma$ are mean, median and standard deviation. 
* Here Positive skew => Mean > Median, Negative Skew => Mean < Median.  
* The modern definition of skewness doesn't generally have the same sign
* If distribution is symmetric, mean=median and zero skew.
  * If unimodal, mean=median=mode.
* But zero skewness doesn't imply mean = median

Pearson's moment coefficient of skewness
=========================================

$$
\gamma_1
= \E{\left(\frac{X-\mu}{\sigma}\right)^3}
= \frac{\mu_3}{\sigma^3}
= \frac{ \E{(X-\mu)^3} }
       { (\E{ (X-\mu)^2 })^{3/2} }
= \frac{\kappa_3}{\kappa_2^{3/2}}
$$
Here $\kappa$ is the cumulant, $\mu_3$ third central moment.

If Y = sum of IID X's, then
* Third cumulant of Y = n times that of X
* Second cumulant of Y = n times that of X  

Hence Skew[Y] = Skew[X]/n

If $G_1 = \kappa_3 / \kappa_2^{3/2} $, then
$$
\V{G_1}= \frac{6n ( n - 1 )}{ ( n - 2 )( n + 1 )( n + 3 ) }
$$

An approximate alternative is 6/n, but this is inaccurate for small samples.  

Let 
\begin{array}{llr}
\overline{x} & & \color{gray}{\text{sample mean}}\\
m_3
&= \frac{1}{n} \sum_{i=1}^n (x_i-\overline{x})^3
& \color{gray}{\text{sample third central moment}}
\\
s^3
&= \left[
      \frac{1}{n-1} \sum_{i=1}^n (x_i-\overline{x})^2
    \right]^{3/2}
& \color{gray}{\text{sample standard deviation}}
\\
b_1 &= \mu_3 / \sigma^3\\
\end{array}

Then, $b_1$ has a smaller variance, given by
$$
\V{b_1} < \V{\frac{m_3}{m_2^{3/2}}} < \V{G_1} 
$$


Applications
============

* Many models assume normal distribution,
  * data are symmetric about the mean.
* The normal distribution has a skewness of zero.
* But in reality, data points may not be perfectly symmetric.
* So, an understanding of the skewness of the dataset indicates whether deviations from the mean are going to be positive or negative

Other Measures of Skewness
==========================

* Pearson's First skewness coefficient (mode skewness)  
$$\frac{ \text{mean - mode} }{ \text{standard deviation}}$$


* Pearson's Second skewness coefficient (median skewness)  
$$
S = 3 \frac{ \mu - \nu } { \sigma } 
$$
where $\mu, \nu, \sigma$ are mean, mode, standard deviation
THis is scaling of Non parametric skew.