## Mean

### Arithmetic mean
Given a bunch of real numbers $x_1, x_2, \dots, x_N$, their *arithmetic mean* is defined as:

\begin{equation}
\bar{x}_A = \frac{1}{N}\sum_{i=1}^{N}x_i.
\end{equation}

`numpy.mean` computes the arithmetic mean:

In [1]:
from __future__ import division
import numpy as np
import scipy.stats as stats

In [2]:
# Make results reproducible
np.random.seed(123)

# Draw N samples from the standard normal distribution.
N = 15
s = np.random.normal(size=N)

xbar_a = sum(s)/N
print("Arithmetic mean: Numpy = {0}, Reproduced = {1}".format(np.mean(s), xbar_a))

Arithmetic mean: Numpy = -0.204016703602, Reproduced = -0.204016703602


### Geometric mean

*Geometric mean* of a bunch of positive numbers $x_1, \dots, x_N$ is defined as

\begin{equation}
\bar{x}_G = (x_1x_2\dots x_N)^{1/N}
\end{equation}

Taking the log of both sides:

\begin{equation}
\log{\bar{x}_G} = \frac{1}{N}\sum_{i=1}^{N}\log{x_i},
\end{equation}

we see that $\log{\bar{x}_G}$ is the arithmetic mean of $\log(x_1), \dots, \log(x_N)$.

`scipy.stats.gmean` computes the geometric mean:

In [3]:
s_pos = np.random.lognormal(size=N)

xbar_g = np.prod(s_pos)**(1/N)
print("Geometric mean: Scipy = {0}, Reproduced = {1}".format(stats.gmean(s_pos), xbar_g))

Geometric mean: Scipy = 1.34103894485, Reproduced = 1.34103894485


Geometric mean is always less than or equal to the arithmetic mean. Let's show this for $N=2$:

\begin{align*}
\bar{x}_A^2 - \bar{x}_G^2
    &= \left[\frac{x_1 + x_2}{2}\right]^2 - \left[(x_1x_2)^{1/2}\right]^2 \\
    &= \frac{x_1^2+x_2^2+2x_1x_2-4x_1x_2}{4} \\
    &= \frac{x_1^2+x_2^2-2x_1x_2}{4} \\
    &= \frac{(x_1 - x_2)^2}{4} \\
    &\ge 0,
\end{align*}

which in turn implies $\bar{x}_A \ge \bar{x}_G$ since by definition both $\bar{x}_A$ and $\bar{x}_A$ are non-negative. Note that the equality occurs when all observations are the same.

This result also holds for $N>2$ and it has an interesting implication for stock returns. Suppose we are given daily returns $R_1, R_2, \dots, R_T$ of a stock and asked what the "average" daily return is. It may seem natural to just report the arithmetic mean $R_A=(R_1+R_2+\dots+R_T)/T$. Let's see what happens if we apply $R_A$ for $T$ consecutive days to the initial price $S_0$:

\begin{align*}
(1+R_A)^NS_0
    &= \left[1 + \frac{R_1+R_2+\dots+R_T}{T}\right]^TS_0 \\
    &= \left[\frac{(1+R_1)+(1+R_2)+\dots+(1+R_T)}{T}\right]^T S_0 \\
    &\ge \left[\left[(1+R_1)(1+R_2)\dots(1+R_T)\right]^{1/T}\right]^T S_0 \\
    &= (1+R_1)(1+R_2)\dots(1+R_T) S_0 \\
    &= S_T,
\end{align*}

where $S_T$ is the stock price on the last day. The inequality on the third line follows from the fact that the arithmetic mean of $1+R_1, \dots, 1+R_T$ is always greater than or equal to its geometric mean. The last line follows from the definition of returns: $R_i = S_t/S_{t-1} - 1$. Therefore, by reporting the arithmetic mean of returns, we are overestimating the total the return over $T$ days! It is more appropriate to report the geometric mean $R_G$:

\begin{equation}
R_G \equiv \left[(1+R_1)(1+R_2)\dots(1+R_T)\right]^{1/T} - 1,
\end{equation}

which satisfies $S_T=(1+R_G)^TS_0$.

### Harmonic mean

*Harmonic mean* of a bunch of positive real numbers $x_1, \dots, x_N$ is defined as

\begin{equation}
\bar{x}_H = \frac{N}{\sum_{i=1}^{N}1/x_i}.
\end{equation}

Inverting both sides:

\begin{equation}
\frac{1}{\bar{x}_H} = \frac{1}{N}\sum_{i=1}^{N}\frac{1}{x_i},
\end{equation}

we see that the inverse of the harmonic mean is the arithmetic mean of the inverse of the observations.

`scipy.stats.hmean` computes the harmonic mean:

In [4]:
xbar_h = N / np.sum(1/s_pos)
print("Harmonic mean: Scipy = {0}, Reproduced = {1}".format(stats.hmean(s_pos), xbar_h))

Harmonic mean: Scipy = 0.755112784567, Reproduced = 0.755112784567


Consider a strategy where we spend \$1000 on shars of a stock every day. The higher the price of the stock, the fewer shares we get, and vice versa. If we do this for $N$ days, how much have we paid for each share on average? Let $S_i$ denote the stock price on the $i$th day. The total number of shares bought is

\begin{equation}
\text{Total shares} = \sum_{i=1}^{N}\frac{1000}{S_i},
\end{equation}

at a total cost of

\begin{equation}
\text{Total cost} = N \times 1000.
\end{equation}

Therefore the average stock price is given by

\begin{equation}
\text{Average stock price}
    = \frac{\text{Total cost}}{\text{Total shares}}
    = \frac{N}{\sum_{i=1}^{N}1/x_i},
\end{equation}

which is the harmonic mean of the prices.

The harmonic mean is at most the geometric mean. Let's check this for $N=2$:

\begin{align*}
\bar{x}_G^2 - \bar{x}_H^2
    &= \left[(x_1x_2)^{1/2}\right]^2 - \left[\frac{2}{1/x_1 + 1/x_2}\right]^2 \\
    &= x_1x_2 - \left[\frac{2x_1x_2}{x_1 + x_2}\right]^2 \\
    &= \frac{x_1x_2\left[(x_1 + x_2)^2 - 4 x_1x_2\right]}{(x_1 + x_2)^2} \\
    &= \frac{x_1x_2(x_1 - x_2)^2}{(x_1 + x_2)^2} \\
    &\ge 0.
\end{align*}

Again, the equality holds when $x_1=x_2=\dots=x_N$. To summarize:

\begin{equation}
\bar{x}_H \le \bar{x}_G \le \bar{x}_A.
\end{equation}