## In this quick overview, we are going to observe what Exponentially Weighted Averages is and why it is more efficient than regular moving average.

![](https://i.ytimg.com/vi/NxTFlzBjS-4/maxresdefault.jpg)

> Exponentially Weighted Averages are used in various optimization algorithms such as Adam

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sns
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
data = pd.read_csv("../input/weather-dataset-rattle-package/weatherAUS.csv")
data.head()

### We are going to look over the sydney's 2017 Temprature Stats.

In [None]:
data["Date"] = data["Date"].astype("datetime64[ns]")
#selecting a chunk of data
sydney_feb = data[(data["Date"].dt.year == 2016) & (data["Location"] == "Sydney")]


In [None]:

plt.scatter(sydney_feb["Date"],sydney_feb["MaxTemp"])

# Exponentially Weighted Averages


In statistics, moving average is generally calculated to smooth out noise in time series data, but it can be helpfull in deep learning too, Simply, the gradient update in each iteration can be smoothen out with each step ofcourse.

Just like this:
![](https://eloquentarduino.github.io/wp-content/uploads/2020/04/SGD.jpg)

### Calculating moving average is inefficient in both ways (memory and time), it requires to calculate average of window with every iteration.

### To resolve this, we can use Exponentially Weighted Averages, it will not be as precise as traditional way but it gives great estimations, besides it's very efficient.

#### To start with, let's say we have:

> $ V_{0} = 0 $

> $ \theta_1,\theta_2,\theta_3 ... \theta_n = Datapoints$

> $ \beta = 0.9$  (We will discuss about $\beta$ later.)

#### Furthermore,


> $ V_{1} = \beta V_{0} + (1 - \beta)\theta_1$ <br>
> $ V_{2} = \beta V_{1} + (1 - \beta)\theta_2$ <br>
> $ V_{3} = \beta V_{2} + (1 - \beta)\theta_3$

> And so on...

#### So we can formulate:

# $ V_t = \beta V_{t-1} + (1 - \beta)\theta_t $

## It calculate decaying average from past data points:

#### It can be seen by pluging $V_2$ in $V_3$:

 ## $  V_{3} = (1 - \beta)\theta_3 + \beta ((1 - \beta)\theta_2 + \beta V_{1})$
 
Here, $V_1$ can be replaced too.

### Here, $ \beta $ is a parameter that decides decaying range let's say.

#### The higher the $\beta$ is the higher the decaying window is, for example if $\beta = 0.9$ then it calculates average of past 10 points,

lets say window = $ w $

$ w \approx 1 / (1 - \beta) $

for $ \beta = 0.9 $,<br>
$ w \approx 1 / (1 - 0.9)$ <br>
$ w \approx 10 $

for $ \beta = 0.5 $,<br>
$ w \approx 1 / (1 - 0.5)$ <br>
$ w \approx 2 $

#### For $ \beta = 2 $, it calculate the average of two points at an iteration.

In [None]:
def weighted_average(series,beta):
    v = [0]
    n = series.shape[0]
    for i in range(n):
        v.append(beta*v[-1] + (1-beta)*series.iloc[i])
        
    return v[1:]

In [None]:
fig, ax = plt.subplots(1,3,figsize=(18,4))
beta = [0.9,0.5,0.98]
for i in range(3):
    ax[i].scatter(sydney_feb["Date"],sydney_feb["MaxTemp"])
    a1 = weighted_average(sydney_feb["MaxTemp"],beta[i])
    ax[i].plot(sydney_feb["Date"],a1,c="r")
    ax[i].set_ylim(0,sydney_feb["MaxTemp"].max()+5)
    ax[i].set_title("Beta = {}".format(beta[i]))
    

# Bias correction

### Notice something strange? Yup, that trail start from 0.

> #### It's because we started from $V_0 = 0$.

## #ToDo