# 6. Moments

## Brief summary

### Summaries of a distribution

#### Median

We say that $c$ is a *median* of an r.v. $X$ if $P(X \leq c) \geq 1/2$ and $P(X \geq c) \geq 1/2$. (The simplest way this can happen is if the CDF of $X$ hits 1/2 exactly at $c$, but we know that some CDFs have jumps.)

#### Mode

For a discrete r.v. $X$, we say that $c$ is a *mode* of $X$ if it maximizes the PMF: $P(X=c) \geq P(X=x)$ for all $x$. For a continuous r.v. $X$ with PDF $f$, we say that $c$ is a mode if it maximizes the PDF: $f(c) \geq f(x)$ for all $x$.

- The value of $c$ that minimizes the mean squared error $E(X-c)^2$ is $c = \mu$.
- A value of $c$ that minimizes the mean absolute error $E|X-c|$ is $c = m$.

### Kinds of moments

Let $X$ be a r.v. with mean $\mu$ and variance $\sigma^2$. For any positive integer $n$, the $n$th *moment* of $X$ is $E(X^n)$, the $n$th *central moment* is $E((X-\mu)^n)$, and the $n$th *standardized moment* is $E(\big(\frac{X-\mu}{\sigma}\big)^n)$. Throughout the previous sentence, "if it exists" is left implicit.

- The mean is the first moment.
- The variance is the second central moment.

#### Skewness

The *skewness* of a r.v. $X$ with mean $\mu$ and variance $\sigma^2$ is the third standardized moment of $X$:

\begin{equation}
Skew(X) = E\big(\frac{X-\mu}{\sigma}\big)^3.
\end{equation}

#### Symmetric of a r.v.

We say that a r.v. $X$ has a *symmetric distribution about* $\mu$ if $X - \mu$ has the same distribution as $\mu - X$. We also say that $X$ is symmetric or that the distribution of $X$ is symmetric; these all have the same meaning.

- A continuous r.v. X is symmetric about $\mu$ if and only if $f(x) = f(2\mu-x)$ for all $x$.

#### Kurtosis

The *kurtosis* of a r.v. $X$ with mean $\mu$ and variance $\sigma^2$ is a shifted version of the fourth standardized moment of $X$:

\begin{equation}
Kurt(X) = E\big(\frac{X-\mu}{\sigma}\big)^4 - 3.
\end{equation}

### Sample moments

Let $X_1, ..., X_n$ be i.i.d random variables. The $k$th *sample moment* is the r.v.

\begin{equation}
M_k = \frac{1}{n} \sum_{j=1}^{n} X_j^k.
\end{equation}

The *sample mean* $\bar{X}_n$ is the first sample moment:

\begin{equation}
\bar{X}_n = \frac{1}{n} \sum_{j=1}^{n} X_j.
\end{equation}

In contrast, the *population mean* or *true mean* is $E(X_j)$, the mean of the distribution from which the $X_j$ were drawn.

#### Mean and variance of sample mean

Let $X_1, ..., X_n$ be i.i.d. r.v.s with mean $\mu$ and variance $\sigma^2$. Then the sample mean $\bar{X}_n$ is unbiased for estimating $\mu$. That is,

\begin{equation}
E(\bar{X}_n) = \mu.
\end{equation}

The variance of $\bar{X}_n$ is given by

\begin{equation}
Var(\bar{X}_n) = \frac{\sigma^2}{n}.
\end{equation}

#### Sample variance and sample standard deviation

Let $X_1, ..., X_n$ be i.i.d. random variables with mean $\mu$ and variance $\sigma^2$. The *sample variance* is the r.v. 

\begin{equation}
S_n^2 = \frac{1}{n-1} \sum_{j=1}^{n} (X_j-\bar{X}_n)^2.
\end{equation}

and the sample variance $S_n^2$ is unbiased for estimating $\sigma^2$, i.e.,

\begin{equation}
E(S_n^2) = \sigma^2.
\end{equation}

The *sample standard deviation* is the square root of the sample variance.

#### Sample skewness and sample kurtosis

We can define the *sample skewness* to be

\begin{equation}
\frac{\frac{1}{n} \sum_{j=1}^{n} (X_j-\bar{X}_n)^3} {S_n^3},
\end{equation}

and the *sample kurtosis* to be

\begin{equation}
\frac{\frac{1}{n} \sum_{j=1}^{n} (X_j-\bar{X}_n)^4} {S_n^4} - 3.
\end{equation}


### Moment generating functions

The *moment generating function* (MGF) of a r.v. $X$ is $M(t) = E(e^{tX})$, as a function of $t$, if this is finite on some open interval $(-a, a)$ containing $0$. Otherwise we say the MGF of $X$ does not exist.

#### Bernoulli MGF

For $X \sim Bern(p)$, $M(t) = E(e^{tX}) = pe^t + q$, which is finite for all values of $t$.

#### Geometric MGF

For $X \sim Geom(p)$,

\begin{equation}
M(t) = E(e^{tX}) = \sum_{k=0}^{\infty} e^{tk}q^kp = p\sum_{k=0}^{\infty}(qe^t)^k = \frac{p}{1-qe^t}
\end{equation}

for $qe^t < 1$, i.e., for $t$ in $(-\infty, log(1/q))$.

#### Uniform MGF

Let $U \sim Unif(a,b)$. Then the MGF of $U$ is 

\begin{equation}
M(t) = E(e^{tU}) = \frac{1}{b-a} \int_a^b e^{tu}du = \frac{e^{tb}-e^{ta}} {t(b-a)}
\end{equation}

for $t \neq 0$, and $M(0) = 1$.

### Moments via derivatives of the MGF

Given the MGF of $X$, we can get the $n$th moment of $X$ by evaluating the $n$th derivative of the MGF at $0$: $E(X^n) = M^{(n)}(0)$.

#### MGF determines the distribution

The MGF of a random variable determines its distribution: if two r.v.s have the same MGF, they must have the same distribution.

#### MGF of a sum of independent r.v.s

If $X$ and $Y$ are independent, then the MGF of $X+Y$ is the product of the individual MGFs:

\begin{equation}
M_{X+Y}(t) = M_X(t)+M_Y(t).
\end{equation}

#### Binomial MGF

The MGF of a $Bern(p)$ r.v. is $pe^t+q$, so the MGF of a $Bin(n,p)$ r.v. is

\begin{equation}
M(t) = (pe^t+q)^n.
\end{equation}

#### Negative Binomial MGF

The MGF of a $Geom(p)$ r.v. is $\frac{p}{1-qe^t}$ for $qe^t < 1$, so the MGF of $X \sim NBin(r,p)$ is

\begin{equation}
M(t) = \big(\frac{p}{1-qe^t}\big)^r
\end{equation}

for $qe^t < 1$.

#### Normal MGF

The MGF of a standard Normal r.v. Z is 

\begin{equation}
M_Z(t) = E(e^{tZ}) = \int_{-\infty}^{\infty} e^{tz}\frac{1}{\sqrt{2\pi}} e^{-z^2/2}dz = e^{t^2/2}.
\end{equation}

Thus, the MGF of $X = \mu + \sigma Z \sim \mathcal{N}(\mu, \sigma^2)$ is

\begin{equation}
M_X(t) = e^{\mu t}M_Z(\sigma t) = e^{\mu t}e^{(\sigma t)^2/2} = e^{\mu t + \frac{1}{2} \sigma^2 t^2}.
\end{equation}

#### Exponential MGF

The MGF of $X \sim Expo(1)$ is 

\begin{equation}
M(t) = E(e^{tX}) = \int_{0}^{\infty} e^{tx}e^{-x}dx = \int_{0}^{\infty} e^{-x(1-t)}dx = \frac{1}{1-t}\ \text{for}\ t < 1.
\end{equation}

So the MGF of $Y = X/\lambda \sim Expo(\lambda)$ is 

\begin{equation}
M_Y(t) = M_X\big(\frac{t}{\lambda}\big) = \frac{\lambda}{\lambda-t}\ \text{for}\ t < \lambda.
\end{equation}


## Python examples

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm, uniform, poisson, expon, binom
from scipy.integrate import quad
from scipy import optimize
from numpy.random import choice

%matplotlib inline
%load_ext autoreload
%autoreload 2

### Moments

In [2]:
# The 6th moment of a N(0, 1)
g = lambda x: x**6*norm.pdf(x)
print(quad(g, -np.inf, np.inf))   # (hhe integral, an estimate of the absolute error)

(15.000000000000004, 4.4229308200406246e-09)


In [3]:
# The 2nd moment of a Unif(-1, 1)
h = lambda x: x**2*uniform.pdf(x, loc=-1.0, scale=2.0)
print(quad(h, -1, 1))

(0.3333333333333333, 3.700743415417188e-15)


In [4]:
# The 2nd moment of X ~ Pois(7)
g = lambda k: k**2*poisson.pmf(k, 7)
print(np.sum([g(x) for x in np.arange(100)]))   # The total contribution of all the terms after k=100 is negligible

56.0


In [5]:
# The 6th sample moment of 100 i.i.d. N(0,1) r.v.s
X = norm.rvs(size=100)
print(np.mean(X**6))

10.841877935


In [6]:
# Using the sample mean and sample variance to estimate the true mean and true variance
# Generate 1000 examples from a N(0, 1)
Z = norm.rvs(size=1000)
print(np.mean(Z))   # ~= 0
print(np.var(Z))       # ~= 1

-0.0457364860542
1.04088400025


In [7]:
# Skewness and kurtosis
skew = lambda X: np.mean((X - np.mean(X)) ** 3) / np.std(X) ** 3
kurt = lambda X: np.mean((X - np.mean(X)) ** 4) / np.std(X) ** 4 - 3

### Medians and modes

In [8]:
# Find the median of the Expo(1) by finding a root of PDF
g = lambda x: expon.pdf(x) - 1/2.
print(optimize.brentq(g, 0, 1))    # (function, lowerbound, upperbound)

0.69314718056


In [9]:
# Find the median of the Expo(1)
print(expon.ppf(1/2.))   # calls quantile function

0.69314718056


In [10]:
# Find the mode of the Gamma(6,1) distribution by maximizing PDF
h = lambda x: -x**5*np.exp(-x)
print(optimize.minimize(h, 1).x)

[ 5.00000002]


In [11]:
# Find the median of the Bin(50, 0.2) distribution 
n, p = 50, 0.2
print(np.argmax(binom.cdf(np.arange(n), n, p) >= 0.5))

10


### Dice simulation

In [12]:
num_trials = 10**6
results = []
for i in range(num_trials):
    results.append(np.sum(np.random.choice(np.arange(1, 7), size=6, replace=True)))
results = np.array(results)

In [13]:
print(np.sum(results == 18) / float(num_trials))

0.073564
