# Dynamic systems with unlimited memory

In this post we build on our preceding discussion of dynamic systems and discuss dynamic systems with unlimited memory.  This type of dynamic system is used throughout the sciences and engineering, in particular in the area of *automatic control*.  In machine learning such dynamic systems models are the bread and butter of so-called  *Recurrent Neural Networks*.

As in our prior posts (e.g., on [Markov chains](https://blog.dgsix.com/posts/markov_chains/markov_chains.html), [recurrence relations](https://blog.dgsix.com/posts/recurrence_relations/recurrence_relations.html), and basic [dynamic systems with *limited* memory](https://blog.dgsix.com/posts/dynamic_systems_limited_memory/dynamic_systems_limited_memory.html)) here we will deal with defining dynamic systems over a generic ordered input sequence $x_1,\,x_2,\,...,x_P$.

You can skip around this document to particular subsections via the hyperlinks below.

-  [Computing a running sum the smart way](#running-sum)
-  [A general definition](#definition)
-  [A whole bunch of examples](#examples)
-  [What does "unlimited" really mean?](#unlimited-meaning)

In [1]:
## this code cell will not be shown in the HTML version of this notebook
# imports from custom library for animations #
from library import exponential_average_animator
from library import history_animators
from library import plot_input_with_hidden_together
from library import plot_input_with_hidden_separate
from library import plot_hidden_histogram

# import standard libs
import numpy as np
import pandas as pd
from IPython.display import clear_output

# path to data
datapath = '../../datasets/plain_timeseries/'

# This is needed to compensate for matplotlib notebook's tendancy to blow up images when plotted inline
%matplotlib notebook
from matplotlib import rcParams
rcParams['figure.autolayout'] = True

%load_ext autoreload
%autoreload 2

<a id='running-sum'></a>
## Computing a running sum the smart way

Here we will tease out the basic idea behind a dynamic system with unlimited memory by exploring a super simple example: computing a *running sum* of input numbers *on the fly* - i.e., as they arrive.  So suppose our input sequence arrives - in order - one element at a time.  That is $x_1$ arrives first, then $x_2$, then $x_3$, and so on.  To compute a "running sum" of these numbers we sum them all up when each new element arrives.  That is, when the $p^{th}$ number arrives we want to compute the sum $h_p$ of the the numbers $x_1,\,x_2,\,...,x_p$.

A lazy approach to doing this would be to just sum up our numbers over and over again as each new element arrives, as shown below.

<center> <h3>The lazy way to compute a running sum</h3> </center>

\begin{array}
\
\text{sum of the first $1$ elements:} \,\,\,\,\,\,\,\,\,\,\,\,\, h_1 = x_1 \\
\text{sum of the first $2$ elements:} \,\,\,\,\,\,\,\,\,\,\,\,\, h_2 = x_1 + x_2 \\
\text{sum of the first $3$ elements:} \,\,\,\,\,\,\,\,\,\,\,\,\, h_3 = x_1 + x_2 + x_3 \\
\text{sum of the first $4$ elements:} \,\,\,\,\,\,\,\,\,\,\,\,\, h_4 = x_1 + x_2 + x_3 + x_4 \\
\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \\
\text{sum of the first $p$ elements:} \,\,\,\,\,\,\,\,\,\,\,\,\, h_{t} = x_1 + x_2 + x_3 + x_4 + \cdots  + x_p \\
\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \\
\end{array}

But this is clearly very wasteful - both computationally and in terms of storage (to compute $h_p$ we need to store every single number from $x_1$ to $x_p$).  For example, when computing the third sum $h_3 = x_1 + x_2 + x_3$ we waste computation, since we have already computed the sum $h_2 = x_1 + x_2$ previously.  A more efficient *recursive* way of computing the sum $h_3$ would instead be $h_3 = h_2 + x_3$.  This recursion then carries over to the next running sum $h_4$ as well: instead of computing $h_4 = x_1 + x_2 + x_3 + x_4$ we can instead re-use the work we did to compute $h_3 = x_1 + x_2 + x_3$ previously and compute the sum simply as $h_4 = h_3 + x_4$.  This recursion holds at each subsequent level of computation, as shown below.

<center> <h3>The right way to compute a running sum</h3> </center>

\begin{array}
\
\text{sum of the first $1$ elements:} \,\,\,\,\,\,\,\,\,\,\,\,\, h_1 = x_1 \\
\text{sum of the first $2$ elements:} \,\,\,\,\,\,\,\,\,\,\,\,\, h_2 = h_1 + x_2 \\
\text{sum of the first $3$ elements:} \,\,\,\,\,\,\,\,\,\,\,\,\, h_3 = h_2 + x_3 \\
\text{sum of the first $4$ elements:} \,\,\,\,\,\,\,\,\,\,\,\,\, h_4 = h_3 + x_4 \\
\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \\
\text{sum of the first $p$ elements:} \,\,\,\,\,\,\,\,\,\,\,\,\, h_{p} = h_{p-1} + x_p \\
\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \\
\end{array}

As we can see our general recursion $h_p = h_{p-1} + x_t$ is a very efficient way to compute running sums, as we not only save immense computation but only *two* numbers need ever be stored in memory (as opposed to $p$ with the lazy way): $h_{p-1}$ and $x_p$.

This recursive manner of computing a running sum is a simple example of a *dynamic system with unlimited memory*.  Here the memory of this system is deemed "unlimited" since - at every instance - the value $h_p$ captures something about (here, the *sum* of) *every value $x_1,\,x_2,\,...,x_p$* as

\begin{equation}
h_p = h_{p-1} + x_p = x_1 + x_2 + \cdots + x_p.
\end{equation}

In the jargon of dynamic systems and machine learning the update $h_p$ is often called a *hidden state* of the system.  It might be more aptly referred to as a "summarizer" or "accumulator" since it *summarizes* some aspect of the input sequence $x_1,\,x_2,\,...,x_p$ - in this case by literally *summing* them up.

Note how this differs from a [dynamic system with *limited* memory](https://blog.dgsix.com/posts/dynamic_systems_limited_memory/dynamic_systems_limited_memory.html) - here we have no "window" defining a subset of input values used in computing $h_p$.

<a id='definition'></a>
## A general definition

General dynamic systems with unlimited memory look something like the recursive formula for a running sum given in Equation (1), only instead of a simple sum any functional form can be used as

\begin{equation}
h_p = f\left(h_{p-1},x_p \right).
\end{equation}

Regardless of the function chosen (as we show [formally below](#unlimited-meaning)), the hidden state $h_p$ of such a system *always* summarizes the entire input sequence $x_1,\,x_2,\,...,x_p$.  As we will see via a range of examples below, the facet(s) of the input sequence summarized in $h_p$ depends entirely on how the function $f$ is chosen (or - more generally - *learned*).

Notice strictly in terms of the formulae, comparing this to a general order $D=1$ [dynamic system with *limited* memory]((https://blog.dgsix.com/posts/dynamic_systems_limited_memory/dynamic_systems_limited_memory.html))

\begin{equation}
h_p = f\left(x_p\right)
\end{equation}

the only difference lies in the latter's lack of recursion on the hidden state $h_p$.  It is this recursion that gives the former "unlimited" memory, and the latter a "limited"  memory.

<a id='examples'></a>
## A whole bunch of examples

Here we describe a range of examples of dynamic systems with unlimited memory.  Some of these - particularly the *exponential average*, *running maximum*, and the *running histogram* examples have natural analogs in the [limited memory case](https://blog.dgsix.com/posts/dynamic_systems_limited_memory/dynamic_systems_limited_memory.html)

#### <span style="color:#a50e3e;">Example 1: </span>  Running mean 

Instead of computing a running sum, say we wanted to compute a running *mean* of our input numbers.  To do this the "lazy way" we would literally average all $p$ numbers when the input $x_p$ arrives as 

\begin{equation}
h_{p} = \frac{x_1 + x_2 + \cdots + x_{p-1} +  x_p}{p}.
\end{equation}

This approach - of course - would suffer the same sort of computational and storage issues described with the running sum above.  But with a little re-arranging of this formula

\begin{equation}
h_{p} = \frac{p-1}{p}\frac{x_1 + x_2 + \cdots + x_{p-1}}{p-1} + \frac{1}{p}x_p = \frac{p-1}{p}h_{p-1} + \frac{1}{p}x_p
\end{equation}

we can determine a hidden state recursion for the running mean as $h_p = \frac{p-1}{p}h_{p-1} + \frac{1}{p}x_p$.

Computing the running mean in this way solves both the computation and storage problems, and is another example of a dynamic system with unlimited memory.  Here the hidden state $h_p$ *summarizes $x_1,\,x_2,\,...,x_p$ by (efficiently) computing its *mean*.

#### <span style="color:#a50e3e;">Example 2: </span>  Exponential average

A basic but very popular generalization of the running mean $h_p = \frac{p-1}{p}h_{p-1} + \frac{1}{p}x_p$ is called the *exponential average*

\begin{equation}
h_t = \alpha h_{p-1} + (1 - \alpha) x_p
\end{equation}

$\,\,\,$ where $0 \leq \alpha \leq 1$.  This is a popular time series *smoother* (analogous to the [moving average](https://blog.dgsix.com/posts/moving_averages/Moving_averages.html) only with unlimited memory).  It is also popularly used in [momentum accelerated gradient descent](https://jermwatt.github.io/machine_learning_refined/notes/3_First_order_methods/3_8_Momentum.html).  It is called an 'exponential maverage' because if one "rolls back" the recursion on $h_p$ one can see that $h_p$ *summarizes* the input $x_1,\,x_2,\,...,x_p$ as an <a href="https://en.wikipedia.org/wiki/Exponential_smoothing" target="_blank">exponential average</a>.

Below we animate the production of an exponential average (in orange) for a time series input (in black).  Here $\alpha$ has been set to $\alpha = 0.9$.  

In [10]:
## This code cell will not be shown in the HTML version of this notebook
# load in data
csvname = datapath + 'ford_data.csv'
data = pd.read_csv(csvname)
x = np.array(data['Close'])    # date: 1980 to 2017

# exponential average function
def exponential_average(x,alpha):
    h = [x[0]]
    for p in range(len(x) - 1):
        # get next element of input series
        x_p = x[p]
        
        # make next hidden state
        h_p = alpha*h[-1] + (1 - alpha)*x_p
        h.append(h_p)
    return np.array(h)

# produce moving average time series
alpha = 0.9
h = exponential_average(x,alpha)

# run animator
demo = exponential_average_animator.Visualizer()
demo.animate_exponential_ave(x,h,savepath='videos/animation_1.mp4')
clear_output()

In [34]:
## This code cell will not be shown in the HTML version of this notebook
from IPython.display import HTML
HTML("""
<video width="1000" height="400" controls loop>
  <source src="videos/animation_1.mp4" type="video/mp4">
  </video>
""")

#### <span style="color:#a50e3e;">Example 3. </span>  The running Riemann sum 

In the instance that $x_1,\,x_2,\,...,x_p$ are $p$ ordered evaluations of a function spaced $\frac{1}{T}$ apart, a slight adjustment to the running sum gives an approximation to the one-dimensional integral or 'area under the curve', known as a *Riemann sum*.  As illustrated in the figure below the *Riemann sum* approximates the area under a curve by a series of equally spaced rectangles whose heights are equally spaced evaluations of the function.

<figure>
<p>
<img src= 'images/riemann.png' width="70%" height="70%" alt=""/>
</p>
<figcaption> 
</em>
</figcaption>
</figure>

The Riemann sum of a function up to the $p^{th}$ evaluation $x_p$ is just the sum of the area of the rectangles defined by it and its predecessors, that is 

\begin{equation}
h_{p} = \frac{1}{T}x_1 + \frac{1}{T}x_2 + \cdots + \frac{1}{T}x_{p-1} + \frac{1}{T}x_{p}
\end{equation}

which - like the running sum (here we are just multiplying the same step by $\frac{1}{T}$) - can be defined in terms of its predecessor simply as 

\begin{equation}
\
h_{p} = \left(\frac{1}{T}x_1 + \frac{1}{T}x_2 + \cdots + \frac{1}{T}x_{p-1}\right) + \frac{1}{T}x_{p}  \\
\,\,\,\,\,\,\, = h_{p-1} + \frac{1}{T}x_{p}.
\end{equation}

Here the state variable $h_{p}$ summarizes the input from $x_1$ through $x_{p}$ in that it precisely the Reimann sum of the rectangles with these heights.

#### <span style="color:#a50e3e;">Example 4: </span>  Running maximum 

We can compute the maximum of an input series on the run effectively 

\begin{equation}
h_p = \text{maximum}\left(h_{p-1},x_p\right).
\end{equation}

Here the hidden state $h_p$ *summarizes* the input $x_1,\,x_2,\,...,x_p$ by its *maximum value*.

Below we show an example with a input series (in blue), and its corresponding running maximum in dark orange.

In [19]:
## This code cell will not be shown in the HTML version of this notebook
# an example input sequence of ordered data 
x = []
for t in range(50):
    x_t = 0.035*t*np.sin(0.5*t)
    x.append(x_t)  

# maximum
h = [x[0]]
for p in range(1,len(x)):
    # get current average and point
    ave = h[-1]
    x_p = x[p]

    # make next element
    new_ave = np.maximum(ave,x_p)
    h.append(new_ave)
    
plotter = plot_input_with_hidden_together.Plotter(hidden_name = 'running max')
animator = history_animators.Animator()
animator.animate_plot(plotter,x,h,num_frames = 100,savepath='videos/animation_2.mp4',fps=15)

In [33]:
## This code cell will not be shown in the HTML version of this notebook
from IPython.display import HTML
HTML("""
<video width="1000" height="400" controls loop>
  <source src="videos/animation_2.mp4" type="video/mp4">
  </video>
""")

#### <span style="color:#a50e3e;">Example 5. </span> Running count of zero-crossings

Many cheap guitar tuners work by feeding in an audio signal - which consists of a sine or sum of sine waves - and determining its pitch by counting the number of times the sine wave crosses zero over a short range of its input.  The process of counting the number of non-zero crossings of a sine wave can be easily modeled dynamic system with unlimited memory.  For a centered and digitized sine wave like the one shown below in blue, we simply scan through the input sequence two units at a time looking for places where $x_{p-1} < 0$ and $x_{p} > 0$, or vice versa.  Hence a dynamic system can be formed where $h_{p}$ is a running count of the number of zero crossings of the series as

\begin{equation}
h_{p} = h_{p-1} + \mathcal{I}_{0}\left(x_{p},x_{p-1}\right)
\end{equation}

where $\mathcal{I}_{0}$ is a simple indicator function that equals $1$ if the two points $x_{p-1}$ and $x_{p}$ straddle $0$, and is equal to zero otherwise. 

Below we show an example with a input series (in blue), and its corresponding running number of zero-crossings in dark orange.

In [25]:
## This code cell will not be shown in the HTML version of this notebook
# an example input sequence of ordered data 
x = []
for t in range(50):
    x_t = np.sin(0.5*t)
    x.append(x_t)  

# running sum
def zero_cross_counter(x_t,x_t_minus_1):
    # determine if zero crossing has occured
    cross = 0
    if (x_t_minus_1 >=0 and x_t <= 0) or (x_t_minus_1 <=0 and x_t >= 0):
        cross = 1
    return cross

# create simulated monthly income
h = [x[0]]
for t in range(1,len(x)):
    # get current average and point
    h_t = h[-1]
    x_t_minus_1 = x[t-1]
    x_t = x[t]
    
    # make next element
    cross = zero_cross_counter(x_t,x_t_minus_1)
    h.append(h_t + cross)
    
plotter = plot_input_with_hidden_separate.Plotter()
animator = history_animators.Animator()
animator.animate_plot(plotter,x,h,num_frames = 100,savepath='videos/animation_3.mp4',fps=10)

In [32]:
## This code cell will not be shown in the HTML version of this notebook
from IPython.display import HTML
HTML("""
<video width="1000" height="400" controls loop>
  <source src="videos/animation_3.mp4" type="video/mp4">
  </video>
""")

#### <span style="color:#a50e3e;">Example 6. </span>  Running normalized histogram

If one is willing to discretize you (to create a set of bins) we can design $f$ to accumulate an approximate distribution or *histogram* of values in a input series.  In particular, below we animate an example of a running average *normalized histogram* (in orange) of an input series (in blue).

In [30]:
## This code cell will not be shown in the HTML version of this notebook
from collections import Counter
import numpy as np

# an example input sequence of ordered data 
x = []
for t in range(50):
    x_t = np.sin(t) + t*0.2
    x.append(x_t)  
    
# histogram history
def update_histogram(h_t,x_t,alpha):
    # update h
    for key in h_t.keys():
        h_t[key]=h_t[key]*(1 - alpha)
    
    # round to 100th decimal
    x_t = np.round(x_t,1)
    
    # ceiling / floor
    if x_t < 0:
        x_t = 0
    if x_t > 10:
        x_t = 10
    
    # one-hot encode
    h_t[x_t] += alpha

    return h_t

# initialize hidden (histogram) state
bins = np.unique(np.array([np.round(a,1) for a in np.linspace(0,10,10000)]))
h_t = {a:0 for a in bins}

# update hidden state
import copy
h_all = [copy.deepcopy(h_t)]
n = 1
for x_t in x:
    alpha = 0.1
    h_t = update_histogram(h_t,x_t,alpha)
    h_all.append(copy.deepcopy(h_t))
    n+=1

animator = history_animators.Animator()
plotter = plot_hidden_histogram.Plotter()
animator.animate_plot(plotter,x,h_all,num_frames = 100,savepath='videos/animation_4.mp4',fps=10)

In [35]:
## This code cell will not be shown in the HTML version of this notebook
from IPython.display import HTML
HTML("""
<video width="1000" height="400" controls loop>
  <source src="videos/animation_4.mp4" type="video/mp4">
  </video>
""")

<a id='unlimited-meaning'></a>
## What does "unlimited" really mean?

In each of the examples above we saw how the state variable $h_{p}$ *provides a summary of all preceding input $x_1$ through $x_{p}$*.  We can see that this is true *for every dynamic system with unlimited memory* by 'rolling back' the general update step.  If we do so one time - plugging in the formula $h_{p} = f\left(h_{p-1},x_{p}\right)$ into the formula for $h_{p-1}$ we can see dependence on both $x_{p}$ and $x_{p-1}$ 

\begin{equation}
h_{p} = f\left(f\left(h_{p-2},x_{p-1}\right),x_{p}\right)
\end{equation}

Continuing in this fashion we can 'roll back' all the way to $h_1$ 

\begin{equation}
h_{p} = f\left(f\left(f\left(\cdots f\left(h_{1},x_{2}\right),x_3\right)\cdots,x_{p-1}\right),x_{p}\right)
\end{equation}

which exposes the fact that $h_{p}$ is dependent on all prior values $x_2$ to $x_{p}$, and $x_1$ as well if simply set the initial condition $h_1 = x_1$.  In general then, when we say that '$h_{p}$ provides a summary of all preceding input $x_1$ through $x_{p}$' we mean exactly the statement above.  Another common way of saying this is that such a system has a complete 'memory' of all input preceding it.

How valuable is this "unlimited memory" summarizing ability of the hidden state $h_p$?  As mentioned previously, this *completely* depends on the function $f$ chosen (or - in the case of Recurrent Neural Networks - *learned*).  When $f$ is a simple sum, an average, etc., what is summarized about an input series is not all that distinctive, and thus the fact that these systems have "unlimited memory" is not very valuable.  The more intricate the function $f$ the more interesting (and more useful) the summarizing variable of an unlimited memory dynamic system can be.