<a href="https://colab.research.google.com/github/rsoaresp/docs/blob/master/mean_variance_online_calculations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The general form os the arithmetic mean is well know to everybody, from school times, where we learnt that the mean $\bar{x}_{n}$ of $n$ number ${x_1, x_2, ..., x_n}$ is

$\bar{x}_{n} = \frac{x_1 + x_2 + ... + x_n}{n}$.

One problem with this formula is that suppose we calculate the average of one billion numbers. Then someone gives you a new number and asks the result of the new average. Do you have to do the sum of a billion number all over again? This is a waste of time and there must be a better way to do this, and actualy there is a way!

We can rewrite the equation as

$\bar{x}_{n} = \frac{x_1 + x_2 + ... + x_{n-1}}{n} \times \frac{n-1}{n-1} + \frac{x_n}{n}$,

which leads to the formula

$\bar{x}_{n} = \bar{x}_{n-1} \frac{n-1}{n} + \frac{x_n}{n}$.

The result above is all we wished for, but let's move a little bit more and write it as

$\bar{x}_{n} = \bar{x}_{n-1} + \frac{1}{n}(x_n - \bar{x}_{n-1})$

We see that the new average is just equal to the previous one, calculate with our first billion numbers plus a term that is the difference between the new number and the calculated mean divided by one over 1000000001 (n), which is a pretty simple and nice formula!

It's amusing that an _online_ calculation in the same spirit of what we did for the arithmetic mean is also available for the generalized mean

$\bar{x}_{n} = \left( \frac{1}{n}\sum_{i=1}^{n} x_{i}^{m} \right)^{(1/m)}$:

$\bar{x}_{n}^{m} = \bar{x}_{n-1}^{m} + \frac{1}{n}(x_n^{m} - \bar{x}_{n-1}^{m})$,

which cover the harmonic, geometric and other cases. Actually, for the more general form of the mean

$\bar{x}_{n} = f^{-1}\left( \frac{1}{n} \sum_{i=1}^{n} f(x_i) \right)$,

$\bar{x}_{n}^{m} = f \left(\bar{x}_{n-1}\right) + \frac{1}{n}(f\left(x_n\right) - f\left(\bar{x}_{n-1}\right))$


Let's move on and do some numerical exercises to check what we've saying!


**TODO**: TODO the sections below

In [0]:
import numpy as np

In [0]:
class OnlineMean:
  def __init__(self, x: float):
    self.n = 1
    self.mean = x

  def reset_params(self, x: float):
    self.n = 1
    self.online_mean = x

  def update_arithmetic_mean(self, new_value: float) -> float:
    self.n = self.n + 1
    self.mean = self.mean + (new_value - self.mean)/self.n

  def update_generalized_mean(self, new_value: float, m: float) -> float:
    self.n = self.n + 1
    self.mean = (self.mean**m + (new_value**m - self.mean**m)/self.n)**(1/m)

In [17]:
online_calculator = OnlineMean(1)
numbers_sequence =  [2, 3, 4]

for new_value in numbers_sequence:
  online_calculator.update_arithmetic_mean(new_value)
  print(online_calculator.online_mean)

1.5
2.0
2.5


In [19]:
(1 + 2 + 3 + 4)/4

2.5

In [18]:
online_calculator = OnlineMean(1)
numbers_sequence =  [2, 3, 4]

for new_value in numbers_sequence:
  online_calculator.update_generalized_mean(new_value, 3)
  print(online_calculator.gen_mean)

1.6509636244473134
2.2894284851066637
2.924017738212866


In [15]:
((1**3 + 2**3 + 3**3 + 4**3)/4)**(1/3)

2.924017738212866

Non-stationary mean