# Moments

In this notes, we explore how the moments of an random variable shed light on its distribution. Most people know the first two moments of a distribution - mean $E(X)$ and variance $E(X^2) - (EX)^2$, which are important summaries of the average value of X and how spread out its distribution is. But there is much more to a distribution than its mean and variance.

Key Ideas:

- moments generating functions are like bridge between sequence of numbers and the world of calculus.
-  starting with a sequence of numbers, create a continuous function—the generating function—that encodes the sequence.

The _moment generating function_ of an random variable $X$ is $M(t) = E(e^{tX})$, as a function of $t$, if this is finite on some open interval $(−a, a)$ containing $0$. Otherwise we say the MGF of $X$ does not exist. 

> A natural question at this point is “What is the interpretation of $t$?” The answer is that $t$ has no interpretation in particular; it’s just a bookkeeping device that we introduce in order to be able to use calculus instead of working with a discrete sequence of moments.

The first moment is the mean, which we can easily compute from the definition as,
$$\frac{d M(t)}{dt} = \frac{d}{dt} E(\exp(tX)) = E \frac{d}{dt}(\exp(tX)) = E(X \exp(tX))$$
Now, we have to set $t=0$ and we have the mean,
$M^{(1)}(0) = E(X)$
continuing this derivative process again, we obtain the second moment as,
\begin{align}
M^{(2)}(t) & = E(X^2 \exp(t X))\\
M^{(2)}(0) & = E(X^2)
\end{align}

## Why are they called "moments"?

First, watch this [video](https://youtu.be/lwO0V5FitAo), and make sure you understand the relationship between moment of inertia and distribution of points of masses. In probability, moments quantify three parameters of distributions: location, shape, and scale.

Many distributions have parameters that are called "location", "scale", or "shape" because they control their respective attributes, but some do not. 

## Different distributions have different moments generating functions. 

For $X \sim Geom(p)$, we have 
$$M(t) = E(e^{tX}) = \sum_{k=0}^\infty e^{tk}q^k p = p \sum_{k=0}^\infty (qe^t)^k = \frac{p}{1-qe^t}$$

The moments generating function of a standard normal random variable $X$ is
$$M_X(t) = E(e^{tX}) = \int_{-\infty}^\infty e^{tx} \frac{1}{\sqrt{2\pi}} e^{-x^2/2} d x = e^{t^2/2}$$

## Relationship between moments generating functions and moments

The moment-generating function is so named because it can be used to find the moments of the distribution. The series expansion of $e^{tX}$ is
$$e^{tX} = 1 + tX + \frac{t^2X^2}{2!} + \frac{t^3X^3}{3!} + \cdots + \frac{t^n X^n}{n!}$$
Hence
\begin{align}
M_X(t) & = E(e^{tX}) = 1 + t E(X) + \frac{t^2 E(X^2)}{2!} + \frac{t^3E(X^3)}{3!} + \cdots + \frac{t^nE(X^n)}{n!} \\
& = 1 + t m_1 + \frac{t^2m_2}{2!} + \cdots,
\end{align}
where $m_n$ is the $n$th moment. Differentiating $M_X(t)$ $i$ times with respect to $t$ and setting $t=0$, we obtain the $i$th moment about the origin, $m_i$. 

That's why we have the equation for the $k$th moment of a random variable $X$ with a density function $f(x)$,
$$\mu_k = E[X^k] = \int_{-\infty}^\infty x^k f(x) dx$$

So, $k$th moment of a random variable is just __the expectation of random variable $X^k$__.

__Remark__: To gain the intution for momenets generating functions, one could always rely on the Taylor series expansion along point $0$. We want to have a function (our moments generating function) with the following properties:

* reflect an infinite sum of weighted raw moments
* encode a countable sequence of numbers, or events as all the measurements have to be countable (or discrete)
* maps to a well-behaved function (such as continuous function or even have higher order derivative $C^k$)
* Universe might be continuous, infinite, and uncountable, but our lives are discrete, finite and countable.
* moments refers to how probability mass is distributed from the 'central point' (or initial moment $t=0$). 
* if $e^{tx}$ is the measurement in _time_ dimension, then moments could be regarded as the measurement in _space_ dimension. 

### From 0th moment to kth moment 

$x^0 = 1$ for any number $x$, the zeroth raw, central, and standardized moments are all
$$\mu_0 = E[X^0] = \int_{-\infty}^\infty f(x) dx = 1$$,
which is the just the integration of _probability density function_ $f(x)$. 

The first raw moment is 
$$\mu_1 = E[X] = \int_{-\infty}^\infty x f(x) dx$$

The second raw moment is
$$m_2 = E[X^2] = \int_{-\infty}^\infty x^ f(x) dx$$

However, we are more interested in the _second central moment_, rather than the second raw moment, this is called the variance:
$$V[X] = \int_{-\infty}^\infty (x- \mu_x)^2 f(x) dx$$

Formula for calculating variance based on the moments:
$$V(X) = E(X^2) - (EX)^2$$

### Law of the unconscious statistician (LOTUS)

In probability theory and statistics, the law of the unconscious statistician (LOTUS) is a theorem used to calculate the expected value of a function $g(X)$ of a random variable $X$ when one knows the probability distribution of $X$ but one does not know the distribution of $g(X)$. We can do this because of the _unique theory of moment generating function_. Once we knw the moment of $g(x)$
$$E[g(X)^k] = \int_{\infty}^\infty g(x)^k f_X(x) dx$$,
we can get the distribution. 





### Summary

<img src="images/moments.png">


In [1]:
import sympy as S
from sympy import stats


In [4]:
p, t = S.symbols ('p t', positive=True)
x = stats.Binomial('x', 10, p)  # binomal distribution
mgf = stats.E(S.exp(t*x))
print(S.simplify(stats.E(x)))
print(S.simplify(stats.moment(x, 1)))  # mean
print(S.simplify(stats.moment(x, 2)))  # second moment

10*p
10*p
10*p*(9*p + 1)


### References

* Understanding Moments by [Gregory Gundersen](https://gregorygundersen.com/blog/2020/04/11/moments/)