# Moments

## Intution
Statistical moments are a concept not often discussed in MBA coursework in terms of moments. While we talk about means, variance, standard deviation, and sometimes covariance regularly, we don't often talk about what they mean.

The intuition behind statistical moments are that we have a collection of data, and we can identify certain patterns or specifics about the data by performing routine calculations.

https://stats.stackexchange.com/questions/132914/what-exactly-are-moments-how-are-they-derived

We have four moments we'll discuss: (1) Mean, (2) Variance, (3) Skewness, and (4) Kurtosis

### Mean
The mean is best describe as the "center of mass" off a distribution relative to 0.

### Variance
Variance just shows us how spread apart our distribution is relative to the mean. We'll also discuss Standard Deviation, Covariance, and Correlation.

### Skewness
Skewness shows us how things are distributed to the left or right of the mean.

### Kurtosis
Kurtosis is how to close to center is the distribution.

## Explanation
This part's going to be a bit long, since we need to describe the formulas for each moment. But let's start off with a **standard** formula and move forward from there.

We can standardize statistical moments with this general formula

$$
\frac{E[(X - \mu)^n]}{\sigma^n} = \tilde{\mu}_n
$$

Now, for the Mean, this will equal 0, and for variance, it will equal 1, but it serves as a general formula for higher dimension statistical moments.

### Mean
By "center of mass" we  mean center.

#### Expected Value Notation
$$
E[X]
$$

#### Summation Notation
$$
\frac{1}{N} \sum_{x=1}^Nx_i
$$

In [1]:
def mean(array):
    return sum(array) / len(array)

### Variance
#### Expected Value Notation
$$
E[(X-\mu)^2]
$$

#### Summation Notation
##### Population
$$
\frac{1}{N} \sum_{x=1}^N (x_i - \bar{x})^2 \\
$$

##### Sample
$$
\frac{1}{N-1} \sum_{x=1}^N (x_i - \bar{x})^2
$$

In [2]:
def var(array, sample=True):
    """
    Step 1: Calculate the mean, $\bar{x}$
    Step 2: Calculate Degrees of Freedom. Because we need at 
            least one observation to calculate the mean, by
            default we're adjusting for this.
    Step 3: Calculate the dispersion. Sum of squared difference 
            between the observations and their means.
    """

    mean_x = mean(array)
    dispersion = sum(map(lambda x: (x - mean_x) ** 2, array))    
    n = len(array)
    
    if sample is True:
        n = n - 1
    
    return dispersion / n

#### NB Standard Deviation
While not a moment, it's important to discuss. Intuitively, this is the same as variance, just in units of the mean

$$
Var(X)^{1/2}
$$

In [3]:
def std(array, sample=True):
    return var(array, sample) ** 0.5

#### NB Covariance
Similarly, Covariance is a measure of dispersion for two variables

##### Definition
$$
\textrm{Cov}(x,y) \equiv \sigma_{x,y}
$$

##### Expected Value Notation
$$
E[(x-\mu_x)(y-\mu_y)]
$$

##### Summation Notation
###### Population
$$
\frac{1}{N} \sum_{x=1}^N (x_i - \bar{x})(y_i - \bar{y})
$$

##### Sample
$$
\frac{1}{N-1} \sum_{x=1}^N (x_i - \bar{x})(y_i - \bar{y})
$$



In [4]:
def cov(array_x, array_y, sample=True):
    """
    This takes the same form as variance. Instead of squaring
    the measure of dispersion, we're multiplying the dispersion
    of each series
    """
    
    mean_x = mean(array_x)
    mean_y = mean(array_y)

    dispersion = [(array_x[i] - mean_x) * (array_y[i] - mean_y) for i in range(len(array_x))]
    dispersion = sum(dispersion)
    
    n = len(array_x)

    if sample is True:
        n = n - 1

    return dispersion / n

#### NB Correlation
Correlation is related to covariance. It's a standardized measure of covariance relative to standard deviations.

$$
\rho_{xy} = \frac{\sigma_{x,y}}{\sigma_x\sigma_y}
$$

In [5]:
def correlation(x, y, sample=True):
    return cov(x, y, sample) / (std(x, sample) * std(y, sample)) 

### Skewness
Skewness is essentially a measure of how clustered a distribution is to the left or right of the mean.

#### Expected Value Notation
$$
\frac{\mu_3}{\sigma^3} = \frac{E[(X - \mu)^3]}{\sigma^3} = \tilde{\mu}_3
$$

#### Summation Notation
##### Equation (Biased)
$$
\begin{align}
&s_{biased} = \frac{ \frac{1}{n} \sum_{i=1}^n(x_i - \bar{x})^3} {\left( \sqrt{ \frac{1}{n} \sum_{i=1}^n(x_i - \bar{x})^2 } \right)^3}
\end{align}
$$

##### Equation (Un-biased)
$$
s_{unbiased} = \left(s_{biased}\right)\left(\frac{ \sqrt{N(N-1)}} {N-2}\right)
$$


In [6]:
def skewness(array, bias=False):
    """
    Step 1: Calculate the mean, $\bar{x}$
    Step 2: Calculate Degrees of Freedom. Because we need at 
            least one observation to calculate the mean, by
            default we're adjusting for this.
    Step 3: Calculate the dispersion. Sum of squared difference 
            between the observations and their means.
    """

    mean_x = mean(array)
    dispersion = 1 / len(array) * sum(map(lambda x: (x - mean_x) ** 3, array))
    den = std(array, False) ** 3
    
    if bias:
        bias_correction = (len(array) * (len(array) - 1)) ** .5 / (len(array) - 2)
    else:
        bias_correction = 1
        
    return dispersion / den * bias_correction

### Kurtosis
We generally don't think about Kurtosis in absolute terms like mean or variance either. Instead, we standardize it to measure how wide the distribution is. Let's use Pearson's kurtosis.

#### Expected Value Notation
$$
\frac{\mu_4}{\sigma^4} = \frac{E[(X - \mu)^4]}{\sigma^4} = \tilde{\mu}_4
$$

#### Summation Notation
##### Equation (Biased)
$$
k_1 = \frac{ \frac{1}{n} \sum_{i=1}^n(x_i - \bar{x})^4 } 
           { \left(\frac{1}{n} \sum_{i=1}^n(x_i-\bar{x})^2\right)^2}
$$

##### Equation (Un-biased)
$$
k_0 = \frac{ n-1 } { (n-2)(n-3) } ((n+1)k_1-3(n-1)) + 3
$$



In [7]:
def kurtosis(array, bias=False):
    """
    Step 1: Calculate the mean, $\bar{x}$
    Step 2: Calculate Degrees of Freedom. Because we need at 
            least one observation to calculate the mean, by
            default we're adjusting for this.
    Step 3: Calculate the dispersion. Sum of squared difference 
            between the observations and their means.
    """

    mean_x = mean(array)
    dispersion = 1 / len(array) * sum(map(lambda x: (x - mean_x) ** 4, array))
    den = std(array, False) ** 4
    kurt = dispersion / den
    n = len(array)

    if bias:
        result = ((n - 1) / ((n-2) * (n-3))) * ((n+1) * kurt - 3 * (n - 1)) + 3
    else:
        result = kurt

    return result