# Simple control theory

## Introduction

This notebook is at the very basic level of control theory. We will expose very basic principle of controle theory and a very simple application with a PID.
Big part of this notebooks was inspired by this page: https://onion.io/2bt-pid-control-python/ and this page: https://jmanton.wordpress.com/2016/08/13/poles-rocs-and-z-transforms/ as well as https://see.stanford.edu/materials/lsoeldsee263/13-lin-sys.pdf all of the credits goes to the original authors.

### What is control theory

Control theory is about manipulating a system that is sometimes call a process, or a model of a system through a set of input so that the output correspond to some kind of expected value.
The system can be subject to arbitrary external inputs as well (perturbations due to external factors, such as temperature going down in winter, light going down at night or water flow going when raining) that may vary in time.
In the meantime, to help us control the process, we get feedback of what is happening, so that we can compute and apply the best possible response to the current situation, unfortunately, the sensor that measures the output of the process might be subject to some noise.

![title](data/PID_en-1024x364.jpg)

* In practice, the output of the process that we would like to control is called $y(t)$
* The desired output value of the process is called $r(t)$
* The measured output value of the process is called $m(t)$
* The delta ($\delta$) between command and measure, is called $e(t)$ ie $e(t)=r(t)-m(t)$
* The command sent to the process is called u(t)

Control theory is about how to compute the best $u(t)$ for each $t$ so that the outcome of the process $y(t)$ is as close as possible to $r(t)$, the desired value.


### What control theory is not

The task of "optimal control" can be made arbitrarily complex, in case, there are multiple inputs/outputs, in case the measure itself is an indirect measurment, subject to some kind of noise with weird statistical properties.

We will mostly here put aside the signal reconstruction problem, as it is the main topic of many of our notebooks, in particular, we will come back to statistical aspects of process estimation in the notebook related to recursive least square and Kalmann filtering.

Part of control theory is about model estimation. But estimating a model given a set of input and output, and how to optimally sample the system, is somehow a different problem that is more general than control theory.

But there are still many aspects of control theory that lead to evaluate or take into account a subset of the properties of the system, especially regarding temporal properties, like, how much current input influences the short term output, or longer term output. Is there a time delay in the system under which nothing from the input will be reflected in the output ?
We will touch upon dynamical system modeling a bit in this intro.

### Control theory and constraints

Another important aspect that makes control theory interesting, even in the case of noiseless measurments, single input, and single ouput, is the type of constraint one would like to impose on $y(t)$. In some case, you want to reach $u(t)$ in the minimum amount of time.
Sometimes you would like to avoid having a too big overshoot while applying corrections. There are numerous parameters:

![title](data/pid_performance-768x336.png)

## Simple dynamical system modeling
(You can find most of the content here from https://jmanton.wordpress.com/2016/08/13/poles-rocs-and-z-transforms/)

In order to play with examples of systems, we will introduce simple mathematical model for dynamical system and dry to dive into some of the interesting mathematical aspects.

### Step 1: Linear Time Invariant systems

We can consider that a dynamical system has input and outputs, and that input can influence outputs over time.
As we really like digital system, we will assume that time is quantized in time steps.

Digital systems operate on sequences of numbers: a sequence of numbers goes into a system and a sequence of numbers comes out. For example, if $x[n]$ denotes an input sequence then an example of a discrete-time system is $y[n] = x[n] + \frac{1}{2} x[n-1]$.

How can a specific input sequence be written down, and how can the output sequence be computed?

The direct approach is to specify each individual $x[n]$, for example

\begin{align*}
    x[n] &= 0 \; \text{for} \; n < 0 \\
    x[0] &= 1 \\
    x[1] &= 2 \\
    x[2] &= 3 \\
    \text{and} \; x[n] &= 0  \; \text{for n} \; > 2
\end{align*}

Direct substitution then determines the output:
\begin{align*}
y[0] &= x[0] + \frac{1}{2} x[-1] &= 1 + \frac{1}{2} \cdot 0 &= 1 \\
y[1] &= x[1] + \frac{1}{2} x[0] &= 2 + \frac{1}{2} \cdot 1 &= 2.5\\
& \dots
\end{align*}

The system $y[n] = x[n] + \frac{1}{2} x[n-1]$ has a number of nice properties, two of which are linearity and time invariance. This is often expressed in textbooks by saying the system is LTI, where the L in LTI stands for Linear and the TI stands for Time Invariant.
When first learning about discrete-time digital systems, usually attention is restricted to linear time-invariant systems, and indeed, it is for such systems that the Z-transform proves most useful.

#### Impulse response

Any LTI system has the property that the output is the convolution of the input with the impulse response.
In digital signal processing, the Kronecker Delta function can be represented as a sequence or discrete function on $\mathbb{Z}$ (the integers):

\begin{align*}
    \delta [n] = \begin{cases} 0, \forall &n \neq 0\\
                               1, &n=0
                 \end{cases}
\end{align*}
The function is also referred to as an impulse, or unit impulse.

For example, the impulse response of $y[n] = x[n] + \frac{1}{2} x[n-1]$ is found by injecting the input $x[0] = 1$ and $x[n] = 0 \; \text{for} \; |n| \geq 1$. This impulse as input produces the output $y[0] = 1, y[1] = \frac{1}{2}$ and $y[n] = 0 \; \text{for} \; n \notin {0,1}$.


It is common to denote the impulse response using the letter $h$, that is, $h[0] = 1, h[1] = \frac{1}{2}$ and $h[n] = 0 \; \text{for} \; n \notin \{0,1\}$.


#### Convolution

We recall that discrete (non circular) convolution between a sequence $x[n]$ and a sequence $h[n]$ reads:
\begin{align*}
  c[n] = (h \star x)[n] = \sum_{k=-\infty}^{+\infty} h[k] x[(n-k)]
\end{align*}

#### Formal Laurent series as a convenient writing tool

Let’s be honest: writing $h[0] = 1, h[1] = \frac{1}{2}$ and $h[n] = 0 \forall n \notin {0,1}$ is tedious!
An alternative is to use a formal Laurent series to represent a sequence. Here, “formal” means that we do not try to evaluate this series.
A Laurent series just means instead of limiting ourselves to polynomials in a dummy variable $t$, we allow negative powers of $t$, and we allow an infinite number of non-zero terms. (The variable $t$ has nothing to do with time; it will be replaced in Step 3 by $z^{-1}$ where it will be referred to as the Z-transform. To emphasise, $t$ is just a dummy variable, a placeholder.)

Precisely, the sequence
\begin{align*}
    h[0] = 1, h[1] = \frac{1}{2}, h[n] = 0 \forall n \notin {0,1}
\end{align*}

can be encoded as the following polynomial:
\begin{align*}
    1 + \frac{1}{2} t
\end{align*}

isn’t that much more convenient to write?

In general, an arbitrary sequence $h[n]$ can be encoded as the formal series $\sum_{n=-\infty}^{\infty} h[n] t^n$. The sole purpose of the $t^n$ term is as a placeholder for placing the coefficient $h[n]$.
In other words, if we are given the formal series $2 t^{-5} + 3 + t^7$ then we immediately know that that is our secret code for the sequence that consists of all zeros except for the -5th term which is 2, the 0th term which is 3 and the 7th term which is 1.

Being able to associate letters with encoded sequences is convenient, hence people will write

\begin{align*}
    x(t) = 2 t^{-5} + 3 + t^7
\end{align*}

to mean

\begin{align*}
    x[-5] = 2, x[0] = 3, x[7] = 1 \; \text{and} \; x[n] = 0 \forall n \notin {-5,0,7}
\end{align*}

Do not try to evaluate $x(t)$ for some value of t though - that would make no sense at this stage. (Mathematically, we are dealing with a commutative ring that involves an indeterminant $t$. If you are super-curious, see [Formal Laurent series on wikipedia](https://en.wikipedia.org/wiki/Formal_power_series#Formal_Laurent_series))

Now, there is an added benefit!

If the input to our system is

\begin{align*}
x(t) = 1 + 2t + 3t^2
\end{align*}

and the impulse response is

\begin{align*}
h(t) = 1 + \frac{1}{2} t
\end{align*}

then there is a “simple” way to determine the output: just multiply these two series together!
The multiplication rule for formal Laurent series is equivalent to the convolution rule for two sequences. Lets check:

\begin{align*}
y(t) &= h(t)x(t) \\
&= (1+\frac{1}{2} t) (1 + 2t + 3t^2) \\
&= 1 + 2t + 3t^2 + \frac{1}{2} t + t^2 + \frac{3}{2} t^3 \\
&= 1 + \frac{5}{2} t + 4 t^2 + \frac{3}{2} t^3
\end{align*}

With the latest line corresponding to $y[0]=1, y[1]=\frac{5}{2}, y[2] = 4 \dots$

This is really the output of our system if the input were $x[n] = 0 \forall n < 0, x[0] = 1, x[1] = 2, x[2] = 3, x[n] = 0 \forall n > 2.$

A formal Laurent series provides a convenient encoding of a discrete-time sequence.
Convolution of sequences corresponds to multiplication of formal Laurent series.
Do not think of formal Laurent series as functions of a variable. (They are elements of a commutative ring.) Just treat the variable as a placeholder.
In particular, there is no need (and it makes no sense) to talk about ROC (regions of convergence) if we restrict ourselves to formal Laurent series.

### Step 2: Analytic Functions

This is where the fun begins. We were told not to treat a formal Laurent series as a function, but it is very tempting to do so… so let’s do just that! The price we pay though is that we do need to start worrying about ROC (regions of convergence) for otherwise we could end up with incorrect answers.


#### Impulse response: FIR vs IIR
To motivate further our desire to break the rules, consider the following LTI system:

\begin{align*}
y[n] = \frac{1}{2} y[n-1] + x[n]
\end{align*}

Such a system has “memory” or an “internal state” because the current output $y[n]$ depends on a previous output $y[n-1]$ and hence the system must somehow remember previous values of the output, recursively infinitely in the past.
This type of system, where an output depends on previous input arbitrarily long time ago, is called infinite impulse response system or IIR.
On the contrary, the LTI we have seen in the previous point, was only depending on 2 last input, hence it had a finite impulse response, hence the name finite impulse response system or FIR.

In practice, we can assume that in the distant past the system has been “reset to zero” and hence the output will be zero up until the first time the input is non-zero.

The impulse response is determined by setting $x[0]$ to 1 and observing the output: $h[0] = 1, h[1] = \frac{1}{2}, h[2] = \frac{1}{4}$ and in general $h[n] = \frac{1}{2^n} \forall n \geq 0$.

Encoding this using a formal Laurent series gives the alternative representation 

\begin{align*}
h(t) = \sum_{n=0}^\infty \frac{1}{2^n} t^n = 1 + \frac{1}{2} t + \frac{1}{4} t^2 + \cdots.
\end{align*}

There is a more “efficient” representation of $h(t) = \sum_{n=0}^\infty \frac{1}{2^n} t^n$ based on the following Taylor series expansion 

\begin{align*}
\frac1{1-\alpha t} = 1 + \alpha t + \alpha^2 t^2 + \alpha^3 t^3 + \cdots.
\end{align*}

Indeed, if we were to break the rules and treat h(t) as a function, we might be tempted to write $h(t) = \frac{1}{1 - \frac{1}{2} t}$ instead of $h(t) = \sum_{n=0}^\infty \frac{1}{2^n} t^n$. It turns out that, with some care, it is mathematically legitimate to do this. (Furthermore, Step 3 will explain why it is beneficial to go to this extra trouble.)


#### Converge of power series and analytic functions


We know we can encode a sequence using a formal Laurent series, and we can reverse this operation to recover the sequence from the formal Laurent series. In Step 2 then, we just have to consider when we can encode a formal Laurent series as some type (what type?) of function, meaning in particular that it is possible to determine uniquely the formal Laurent series given the function.

Power series (and Laurent series) are studied in complex analysis: recall that every power series has a (possibly zero) radius of convergence, and within that radius of convergence, a power series can be evaluated. Furthermore, within the radius of convergence, a power series (with real or complex coefficients) defines a complex analytic function. The basic idea is that a formal Laurent series might (depending on how quickly the coefficients die out to zero) represent an analytic function on a part of the complex plane.

If the formal Laurent series only contains non-negative powers of t then it is a power series

\begin{align*}
f(t) = \sum_{n=0}^\infty \alpha_n t^n 
\end{align*}

and from complex analysis we know that there exists an $R$, the radius of convergence (which is possibly zero or possibly infinite), such that the sum converges if $|t| < R$ and the sum diverges if $|t| > R$. In particular, $f(t)$ is an analytic function on the open disk $t \in \mathbb{C} \mid |t| < R$.

#### What is a analytic/holomorphic function
One can see real analytic / complex holomorphic function as a class of very smooth function that extends the property of the $C^{\infty}$ function. Not only those functions are infinitely differentiable, but their Taylor expansion converges to their expression in a neighborhood of their domain (set of departure).

From (https://en.wikipedia.org/wiki/Analytic_function)[wikipedia]

Formally, a function $f$ is real analytic on an open set $D$ in the real line if for any $x_{0}\in D$ one can write

\begin{align*}
f(x)=\sum _{n=0}^{\infty }a_{n}\left(x-x_{0}\right)^{n}=a_{0}+a_{1}(x-x_{0})+a_{2}(x-x_{0})^{2}+a_{3}(x-x_{0})^{3}+\cdots 
\end{align*}

in which the coefficients $ a_{0},a_{1},\dots $ are real numbers and the series is convergent to $f(x)$ for $x$ in a neighborhood of $x_{0}$.

Alternatively, an analytic function is an infinitely differentiable function such that the Taylor series at any point $x_{0}$ in its domain

\begin{align*}
T(x)=\sum _{n=0}^{\infty }{\frac {f^{(n)}(x_{0})}{n!}}(x-x_{0})^{n}
\end{align*}

converges to $f(x)$ for $x$ in a neighborhood of $x_{0}$ pointwise. We'd like to recall that pointwise converge is weaker than uniform congergence. Withtout giving too much details, one of the biggest difference is that the pointwise limit of a sequence of continuous functions may be a discontinuous function only if the convergence is not uniform.

The definition of a complex analytic function is obtained by replacing, in the definitions above, "real" with "complex" and "real line" with "complex plane". A function is complex analytic if and only if it is holomorphic i.e. it is complex differentiable. For this reason the terms "holomorphic" and "analytic" are often used interchangeably for such functions.

If the formal Laurent series only contains non-positive powers, that is, $f(t) = \sum_{n=-\infty}^0 \alpha_n t^n$, then we can consider it to be a power series in $t^{-1}$ and apply the above result.
Since $f(t) = \sum_{n=0}^\infty \alpha_{-n} \left(\frac{1}{t}\right)^n$, there exists a $\tilde R$ (possibly zero or possibly infinite) such that $f(t)$ is an analytic function on $t \in \mathbb{C} \mid |t| > \tilde R$. From above, the condition would be $|t^{-1}| < R$ which is equivalent to $|t| > \tilde R^{-1}$ hence defining $\tilde R = R^{-1}$ verifies the claim.

In the general case, a formal Laurent series decomposes as $\sum_{n=-\infty}^{-1} \alpha_n t^n + \sum_{n=0}^\infty \alpha_n t^n$.
Both sums must converge if it is to define an analytic function, hence in the general case, a formal Laurent series defines an analytic function on a domain of the form $t \in \mathbb{C} \mid \tilde R < |t| < R$.


#### Analytic function and ROC

The encoding of a sequence as an analytic function is therefore straightforward in principle: given a formal Laurent series $f(t) = \sum_{n=-\infty}^\infty \alpha_n t^n$, determine the constants $\tilde R$ and $R$, and provided the region ${ t \in \mathbb{C} \mid \tilde R < |t| < R}$ is non-empty, we can use the analytic function $f(t)$ to encode the sequence $\alpha_n$.

We must specify the ROC
\begin{align*}
    { t \in \mathbb{C} \mid \tilde R < |t| < R}
\end{align*}
together with $f(t)$ whenever we wish to define a sequence in this way; without knowing the ROC, we do not know the domain of definition of $f(t)$, that is, we would not know for what values of t does $f(t)$ describe the sequence we want. It is possible for a function $f(t)$ to describe different sequences depending on the ROC!


#### Roc and role

Lets try to study a function that can be described with different series depending on the domain area.

Lets define
\begin{align*}
    f(t) = \sum_{n=0}^\infty t^n
\end{align*}

Then we have seen that $f(t) = \frac1{1-t}$. But wait! This is not entirely true. 
If $|t| < 1$ then certainly $\sum_{n=0}^\infty t^n = \frac1{1-t}$.
Yet if $|t| > 1$ then $\sum_{n=0}^\infty t^n = \infty \neq \frac1{1-t}$.
The function $t \mapsto \frac1{1-t}$ for $|t| > 1$ does not describe the power series $\sum_{n=0}^\infty t^n$. It looks like a perfectly good function though, so what series does it describes ?


The region $|t| > 1$ is unbounded. (In fact, it is an open disc centred at the point at infinity.)
This motivates us to replace $t$ by $\tau = t^{-1}$ so that the domain becomes $|\tau| < 1$ and we can attempt looking for a power series in $\tau$.

Precisely,
\begin{align*}
    \frac1{1-t} &= \frac1{1-\tau^{-1}} \\
    &= \frac{-\tau}{1-\tau} \\
    &= -\tau \sum_{n=0}^\infty \tau^n \\
    &= -\sum_{n=1}^\infty \tau^n \\
    &= \sum_{n=-\infty}^{-1} (-1) t^n.
\end{align*}

Therefore, the single function $t \mapsto \frac1{1-t}$ actually encodes two different series depending on whether we treat it as a function on $|t|<1$ or a function on $|t|>1$.

Readers remembering their complex analysis will not find this bizarre because a holomorphic (i.e., complex differentiable) function generally requires more than one power series to represent it. A power series about a point is only valid up until a pole is encountered, after which another point must be chosen and a power series around that point used to continue the description of the function. When we change points, the coefficients of the power series will generally change.

In the above example, the first series 
\begin{align*}
\sum_{n=0}^\infty t^n
\end{align*}
is a power series around the origin while the second series
\begin{align*}
\sum_{n=-\infty}^{-1} (-1) t^n
\end{align*}
is a power series around the point at infinity and therefore naturally has different coefficients. (Thankfully there are only these two possibilities to worry about: a power series about a different point $c \in \mathbb{C} $ would look like $\sum_{n=0}^\infty \alpha_n (t-c)^n$ and is not of the form we described in Step 1. Only when $c=0$ or $c = \infty$ does the power series match up with the formal Laurent series in Step 1.)

While it might seem that introducing analytic functions is an unnecessary complication, it actually makes certain calculations simpler! Such a phenomenon happened in Step 1: we discovered convolution of sequences became multiplication of Laurent series (and usually multiplication is more convenient than convolution). In Step 3 we will see how the impulse response of $y[n] = \frac12 y[n-1] + x[n]$ can be written down immediately.



### Step 3: The Z-transform

#### Introduction 

Define 
\begin{align*}
    y(t) = \sum_{n=-\infty}^\infty y[n] t^n
\end{align*}
assumed to be an output of a LTI, and
\begin{align*}
    x(t) = \sum_{n=-\infty}^\infty x[n] t^n
\end{align*}

Assumed to be the input of an LTI. Clearly, the constraint $y[n] = \frac12 y[n-1] + x[n]$ implies a relationship between $x(t)$ and $y(t)$, but how can we determine this relationship ?

Note that $y[n] = \frac12 y[n-1] + x[n]$ is actually an infinite number of constraints, one for each n. The first step is to think of these as a single constraint on the whole sequences ${\cdots,x[-2],x[-1],x[0],x[1],x[2],\cdots}$ and ${\cdots,y[-2],y[-1],y[0],y[1],y[2],\cdots}$. We can do this either intuitively or rigorously, arriving at the same answer either way.


Intuitively, look at this table:

| $y[n]$ | $\cdots$ | $y[-1]$ | $y[0]$ | $y[1]$ | $\cdots$ |
|---|---|---|---|---|---|
| $\frac12 y[n-1]$ | $\cdots$ | $\frac12 y[-2]$ | $\frac12 y[-1]$ | $\frac12 y[0]$ | $\cdots$ |
| $x[n]$ | $\cdots$ | $x[-1]$ | $x[0]$ | $x[1]$ | $\cdots$ |

One way of expressing $y[n] = \frac12 y[n-1] + x[n]$ is by saying that the first row of the table is equal to the second row of the table plus the third row of the table, where rows are to be added elementwise.

#### A bit of linear algebra

Rigorously, what has just happened is that we are treating a sequence as a vector in an infinite-dimensional vector space: just like $\begin{pmatrix} y[-1] \\ y[0] \\ y[1] \end{pmatrix}$ is a vector in $\mathbb{R}^3$, we can think of $\begin{pmatrix}\vdots \\ y[-1] \\ y[0] \\ y[1] \\ \vdots \end{pmatrix}$ as a vector in $\mathbb{R}^\infty$. Each of the three rows of the table described above is simply a vector.

To be able to write the table compactly in vector form, we need some way of going from the vector $\begin{pmatrix}\vdots \\ y[-1] \\ y[0] \\ y[1] \\ \vdots \end{pmatrix}$ to the shifted-one-place-to-the-right version of it, namely $\begin{pmatrix}\vdots \\ y[-2] \\ y[-1] \\ y[0] \\ \vdots \end{pmatrix}$. In linear algebra, we know that a matrix can be used to map one vector to another vector provided the operation is linear, and in fact, shifting a vector is indeed a linear operation. In abstract terms, there exists a linear operator S $\colon \mathbb{R}^\infty \rightarrow \mathbb{R}^\infty$ that shifts a sequence one place to the right:

\begin{align*}
& S & \begin{pmatrix}\vdots \\ y[-1] \\ y[0] \\ y[1] \\ \vdots \end{pmatrix} = \begin{pmatrix}\vdots \\ y[-2] \\ y[-1] \\ y[0] \\ \vdots \end{pmatrix} \\
=& \begin{pmatrix}\ddots & \cdots & \cdots & \cdots & \cdots & \cdots \\
               \vdots & \ddots & \cdots & \cdots & \cdots & \cdots \\
               \vdots & 1 & 0 & \ddots & \ddots & \vdots \\
               \vdots & 0 & 1 & 0 & \ddots & \vdots \\
               \vdots & \ddots & 0 & 1 & 0 & \vdots \\
               \cdots & \cdots & \cdots & \cdots & \cdots & \ddots \end{pmatrix}
               & \begin{pmatrix}\vdots \\ y[-1] \\ y[0] \\ y[1] \\ \vdots \end{pmatrix} = \begin{pmatrix}\vdots \\ y[-2] \\ y[-1] \\ y[0] \\ \vdots \end{pmatrix}
\end{align*}

Letting $\mathbf{x}$ and $\mathbf{y}$ denote the vectors $\begin{pmatrix}\vdots \\ x[-1] \\ x[0] \\ x[1] \\ \vdots \end{pmatrix}$ and $\begin{pmatrix}\vdots \\ y[-1] \\ y[0] \\ y[1] \\ \vdots \end{pmatrix}$ respectively, we can rigorously write our system as $\mathbf{y} = \frac12 S \mathbf{y} + \mathbf{x}$. This is precisely the same as what we did when we said the first row of the table is equal to the second row of the table plus the third row of the table.

Without looking at conditions of validity for now, we can even go a little further:

\begin{align*}
    \mathbf{y} &= \frac12 S \mathbf{y} + I \mathbf{x} \quad \text{with } \quad I\mathbf{y} = \mathbf{y} \\
    \left( I - \frac12 S \right) \mathbf{y} &= \mathbf{x}\\
    \mathbf{y} &=  \left( I - \frac12 S \right)^{-1} \mathbf{x}
\end{align*}

This equation tells us the output $\mathbf{y}$ is a function of the input $\mathbf{x}$ — but how useful is it? (Does the inverse $\left( I - \frac12 S \right)^{-1}$ exist, and even if it does, how can we evaluate it?)


#### Linear system impulse response estimation

While in some cases the expression $\mathbf{y} = \left( I - \frac12 S \right)^{-1} \mathbf{x}$ might actually be useful, often it is easier to use series rather than vectors to represent sequences: we will see that the linear operator $S$ is equivalent to multiplication by $t$, which is very convenient to work with!

More precisely, as we have the equivalence
\begin{align*}
    y(t) &= \sum_{n=-\infty}^\infty y[n] t^n \\
    \Leftrightarrow \mathbf{y} &= \begin{pmatrix} \vdots \\ y[-1] \\ y[0]\\ y[1]\\ \vdots \end{pmatrix}
\end{align*}

Those represents exactly the same sequence, hence we can write

\begin{align*}
    \mathbf{y} &= \frac12 S \mathbf{y} + \mathbf{x} \\
    \Leftrightarrow y(t) &= \frac12 t\,y(t) + x(t)
\end{align*}
as well as
\begin{align*}
    \frac12 t\,y(t) &= \frac12 t \sum_{n=-\infty}^\infty y[n] t^n \\
    \Leftrightarrow \frac12 t\,y(t) &= \sum_{n=-\infty}^\infty \frac12 y[n-1] t^n    
\end{align*}

Which allows us to clearly identify the second row from previous table

Now, from Step 2, we know that we can think of $x(t)$ and $y(t)$ as analytic functions (provided we are careful about the ROC). Therefore, we feel entitled to continue with

\begin{align*}
    y(t) &= \frac12 t\,y(t) + x(t) \\
    \Leftrightarrow y(t) &= \frac1{1-\frac12t} x(t) \\
    \Leftrightarrow h(t) &= \frac1{1-\frac12t} \quad \text{by identification from} \quad y(t) = h(t) x(t) \\
\end{align*}    
    
By definition of the impulse response $h(t)$ satisfies $y(t) = h(t) x(t)$, we seem to have immediately found the impulse response  of the original system; at the very least, it agrees with the longer calculation performed at the start of Step 2.

#### Remarks on validity of system estimation
Actually, the only way to justify rigorously that the above manipulation is valid is to check the answer we have found really is the correct answer. Indeed, to be allowed to perform the manipulations we must assume the existence of a domain on which both $x(t)$ and $y(t)$ are analytic. If we assume $y(t)$ is "nice" then, under that assumption, prove that y(t) is "nice", that does not mean $y(t)$ is nice.

A rigorous justification would look something like the following:
First we must make some assumption about the input sequence, otherwise we have no idea what ROC to use.

If we are interested in inputs that are uniformly bounded and which are zero up until time 0 then we can safely assume that $x(t)$ is analytic on $|t| < 1$.

Since $1-\frac12t$ is non-zero whenever $|t| < 1$, we know $h(t) = \frac1{1-\frac12t}$ is analytic on $|t| < 1$.

Therefore it is easy to show that the product $y(t) = h(t) x(t)$ will be analytic on $|t| < 1$.
This means that $y(t)$ can be safely expanded as a power series in a neighbourhood of the origin, and the coefficients of that power series are what we believe the output of the system will be.

To check this really is the output of the system, it suffices to show that $y(t) = \frac12 t\,y(t) + x(t)$ for $|t| < 1$.
This is straightforward:

\begin{align*}
    &\frac12 t\, y(t) + x(t) \\
    =& \left(\frac12 t\, h(t) + 1\right) x(t) \\
    =& \left( \frac{\frac12 t}{1-\frac12t} + \frac{1-\frac12 t}{1-\frac12 t}\right) x(t) \\
    =& \frac1{1-\frac12t} x(t) \\
    =& h(t)x(t) \\
    =& y(t)
\end{align*}

as required, where every manipulation can be seen to be valid for |t| < 1 (there are no divisions by zero or other bad things occurring).

The remark above shows it is (straightforward but) tedious to verify rigorously that we are allowed to perform the manipulations we want. It is much simpler to go the other way and define a system directly in terms of $h(t)$ and its ROC. Then, provided the ROC of $x(t)$ overlaps with the ROC of $h(t)$, the output $y(t)$ is given by $h(t)x(t)$ on the overlap, which can be correctly expanded as a Laurent series with regard to the ROC, and the output sequence read off.

All that remains is to introduce the Z-transform and explain why engineers treat $z^{-1}$ as a "time shift operator".

#### Z-transform

The Z-transform is simply doing the same as what we have been doing, but using $z^{-1}$ instead of $t$. Why $z^{-1}$ rather than $z$? Just convention (and perhaps a bit of convenience too, e.g., it leads to stable systems being those with all poles inside the unit circle, rather than outside).

In fact, sometimes $z$ is used instead of $z^{-1}$, hence you should always check what conventions an author or lecturer has decided to adopt.

The rule that engineers are taught is that when you "take the Z-transform" of $y[n] = \frac12 y[n-1] + x[n]$ then you replace
* $y[n]$ by $Y(z)$
* $x(n)$ by $X(z)$
* $y[n-1]$ by $z^{-1} Y(z)$

The reason this rule works was justified at great length above: recall that as an intermediate mental step we can think of the input and output as vectors, and this led to thinking of them instead as series, because multiplication by $t = z^{-1}$ will then shift the sequence one place to the right.

Thus, $Y(z) = \frac12 z^{-1} Y(z) + X(z)$ is asserting that the sequence $y[n]$ is equal to the sequence $x[n]$ plus a half times the sequence $y[n]$ shifted to the right by one place, which is equivalent to the original description $y[n] = \frac12 y[n-1] + x[n]$

### Step 4: Poles and Zeros

A straightforward but nonetheless rich class of causal LTI systems can be written in the form

\begin{align*}
    y[n] = a_1 y[n-1] + a_2 y[n-2] + \cdots + a_p y[n-p] + b_0 x[n] + b_1 x[n-1] + \cdots + b_q x[n-q]
\end{align*}

or equivalently, in a linear form:

\begin{align*}
\begin{pmatrix}\vdots \\ y[-1] \\ y[0] \\ y[1] \\ \vdots \end{pmatrix} & = A & \begin{pmatrix}\vdots \\ y[-1] \\ y[0] \\ y[1] \\ \vdots \end{pmatrix} & + & B & \begin{pmatrix}\vdots \\ x[-1] \\ x[0] \\ x[1] \\ \vdots \end{pmatrix} \\
\begin{pmatrix}\vdots \\ y[-1] \\ y[0] \\ y[1] \\ \vdots \end{pmatrix} & = 
   \begin{pmatrix}\ddots & \cdots & \cdots & \cdots & \cdots & \cdots \\
               \vdots & \ddots & \cdots & \cdots & \cdots & \cdots \\
               \vdots & a_1 & 0 & \ddots & \ddots & \vdots \\
               \vdots & a_2 & a_1 & 0 & \ddots & \vdots \\
               \vdots & \ddots & a_2 & a_1 & 0 & \vdots \\
               \cdots & \cdots & \cdots & \cdots & \cdots & \ddots \end{pmatrix} &
           \begin{pmatrix}\vdots \\ y[-1] \\ y[0] \\ y[1] \\ \vdots \end{pmatrix} & + &
    \begin{pmatrix}\ddots & \cdots & \cdots & \cdots & \cdots & \cdots \\
               \vdots & \ddots & \cdots & \cdots & \cdots & \cdots \\
               \vdots & b_1 & b_0 & \ddots & \ddots & \vdots \\
               \vdots & b_2 & b_1 & b_0 & \ddots & \vdots \\
               \vdots & \ddots & b_2 & b_1 & b_0 & \vdots \\
               \cdots & \cdots & \cdots & \cdots & \cdots & \ddots \end{pmatrix} &
           \begin{pmatrix}\vdots \\ x[-1] \\ x[0] \\ x[1] \\ \vdots \end{pmatrix} \\
    \mathbf{y} & = A  &\mathbf{y} & + & B & \mathbf{x} \\
    \Leftrightarrow \mathbf{y} & = & & (I - A)^{-1} B & \mathbf{x}  
\end{align*}

Importantly, this class is “closed” in that if you connect two such systems together (the output of one connects to the input of the other) then the resulting system can again be written in the same way (but generally with larger values of p,q). Another important feature of this class of systems is that they can be readily implemented in hardware.

Applying the Z-transform to such a system shows that the impulse response in the Z-domain is a rational function. Note that the product of two rational functions is again a rational function, demonstrating that this class of systems is closed under composition, as stated above. Most of the time, it suffices to work with rational impulse responses.

If $H(z) = \frac{f(z)}{g(z)}$ is rational (and in reduced form — no common factors) then the zeros of $H(z)$ are the solutions of the polynomial equation $f(z)=0$ while the $poles$ are the solutions of the polynomial equation $g(z)=0$.
More generally, if $H(z)$ is analytic, then the zeros are the solutions of $H(z)=0$ and the poles are the solutions of $\frac1{H(z)} = 0$. Note that a pole of $H(z)$ is a zero of its inverse: if we invert a system, poles become zeros and zeros become poles.

Poles are important because poles are what determine the regions of convergence and hence they determine when our manipulations in Step 3 are valid. This manifests itself in poles having a physical meaning: as we will see, the closer a pole gets to the unit circle, the less stable a system becomes.

Real-world systems are causal: the output cannot depend on future input. The impulse response $h[n]$ of a causal system is zero for $n < 0$. Therefore, the formal Laurent series (Step 1) representation of $h[n]$ has no negative powers of $t$.

Its region of convergence will therefore be of the form $|t| < R$ for some $R > 0$. (If $R=0$ then the system would blow up!) Since $z = t^{-1}$, the ROC of a causal transfer function $H(z)$ will therefore be of the form $|z| > R$ for some $R < \infty$.

If $H(z)$ is rational then it has only a finite number of poles, and it suffices to choose $R$ to be the largest of the magnitudes of the poles.

Let’s look at an unstable system: $y[n] = 2 y[n-1] + x[n]$. This system is clearly unstable because its impulse response is readily seen to be $1, 2, 4, 8, 16, 32, \cdots$.

We put in a bounded signal ($x[0]=1$ and $x[n] = 0$ for $|n| \geq 1$) yet obtain an unbounded output ($y[n] = 2^n$ for $n \geq 0$). Taking the Z-transform gives:

\begin{align*}
    Y(z) &= 2 z^{-1} Y(z) + X(z) \\
    \Leftrightarrow H(z) &= Y(z)/X(z) \\
    \Leftrightarrow H(z) &= \frac1{1-2z^{-1}} \\
    \Leftrightarrow H(z) &= \sum_{n=0}^\infty 2^n z^{-n} \quad \text{for } \quad |z|>2
\end{align*}

If we put a signal that starts at time zero (i.e., $x[n] = 0$ for $n < 0$) into a causal system we will get an output even if the system is unstable. This can be seen by analyzing the ROC of $X$ and $H$ and deducing the ROC of $Y$

||$X(z)$|$H(z)$|$Y(z) = H(z)X(z)$|
|:---:|:---:|:---:|:---:|
|ROC|$|z| > R_X$|$|z| > R_H$|$|z| > \max\{R_X,R_H\}$|

If we put a signal that started at time $-\infty$ into our system, then there might be no solution! (Intuitively, it means the system will have to have blown up before reaching time 0.)

|Expression| ROC | Corresponding serie | Corresponding sequence|
|:---:|:---:|:---:|:---:|
|$X(z) = \frac1{1-z}$ | $|z| < 1$ | $X(z)=\sum_{n=-\infty}^{0} z^{-n}$ | $x[n]=\cases{1 \quad \text{if} &n \leq 0 \\ 0 \quad \text{if} &n > 0}$|
|$H(z) = \frac1{1-2z^{-1}}$|$|z|>2$|$\sum_{n=0}^\infty 2^n z^{-n}$|$h[n] = 2^n$|

We cannot form the product $Y(z) = H(z) X(z)$ because there is no value of $z$ for which both $H(z)$ and $X(z)$ are valid: one’s ROC is $|z|>2$ while the other’s is $|z| < 1$.



#### What is a bounded sequence

There are many different definitions of a sequence being bounded. Three examples are:

* $ \exists M | \forall n, |x[n]| < M$
* $\sum_{n=-\infty}^\infty | x[n] | < \infty$
* $\sum_{n=-\infty}^{\infty} | x[n] |^2 < \infty$

Out of these three, the easiest to detect whether $x[n]$ is bounded given only $X(z)$ is the second one: if the ROC of $X(z)$ includes $|z|=1$ then, by definition (see a complex analysis textbook), $\sum_{n=-\infty}^\infty x[n] z^{-n}$ is absolutely convergent for $|z|=1$, meaning $\sum_{n=-\infty}^\infty |x[n] z^{-n}| < \infty$, hence $\sum_{n=-\infty}^\infty |x[n]| < \infty$ because $|z|=1$ implies $|z^{-n}| = 1$ for all n.

This type of boundedness is called "bounded in $l^1$". For the reverse direction, recall that the radius of convergence $R$ is such that a power series in $t$ will converge for all $|t| < R$ and diverge for all $|t| > R$. 

Therefore, if the boundary of the largest possible ROC of $X(z)$ does not contain the unit circle then $x[n]$ must be unbounded in $l^1$. (The boundary case is inconclusive: the power series $x(t) = \sum_{n=1}^\infty \frac1{n^2} t^n$ has a ROC $|t| < 1$ yet $x[n] = \frac1{n^2}$ is bounded. On the other hand, $x(t) = \sum_{n=0}^\infty t^n$ has a ROC $|t| <1$ and $x[n] = 1$ is unbounded.)

A sequence $x[n]$ is bounded in $l^1$ if the largest possible ROC of $X(z)$ includes the unit circle $|z|=1$. A sequence $x[n]$ is unbounded in $l^1$ if the closure of the largest possible ROC does not include the unit circle.

#### How to caracterize a stable system

If the ROC of the transfer function $H(z)$ includes the unit circle, then an $l^1$-bounded input will produce an $l^1$-bounded output. Indeed, the ROC of both $H(z)$ and $X(z)$ will include the unit circle, therefore, the ROC of $Y(z)$ will include the unit circle and $y[n]$ will be $l^1$-bounded.

If all the poles of a causal $H(z)$ are within the unit circle then the ROC will include the unit circle, hence a bounded input will produce a bounded output. Engineers therefore call a system stable if all the poles of $H(z)$ are inside the unit circle.

#### Resonating system
Note that it follows from earlier remarks that if $H(z)$ has a pole outside the unit circle then the impulse response will be unbounded. If $H(z)$ has a pole on the unit circle then the impulse response will be resonant at the corresponding frequency indefinitely.

For example, if $H(z) = \frac1{1-az^{-1}}$ where $|a|=1$ then $h[n] = a^n$ for $n \geq 0$.

This does not die away because $|h[n]| = |a^n| = 1$ for all $n \geq 0$.

By writing $a = e^{\jmath \omega}$ we get $h[n] = e^{\jmath \omega n}$ and therefore we think of this as a resonance at frequency $\omega$.

Furthermore, if $a = r e^{\jmath \omega}$ where $|r| < 1$, then $h[n] = a^n = r^n e^{\jmath \omega n}$ and we see that there is a damped resonance at frequency $\omega$. As the pole gets closer to the unit circle (that is, $r \rightarrow 1$) the impulse response takes longer to die out. Engineers care about pole placement!

## On/Off control

On/Off control is probably the simplest kind of control one can set up. Given a measure $m(t)$ and a target $r(t)$ compute $u(t)$ simply depending on the sign of $e(t)=r(t)-m(t)$:

\begin{align*}
    u(t) = \begin{cases}
    1 \; \text{if} \quad r(t)-m(t) > 0\\
    0 \; \text{otherwise}
    \end{cases}
\end{align*}


![title](data/on-off-control.jpg)

This type of control is easy to set up, but it does not even takes into account previous inputs, and by design, if may often results in oscillating patterns. The frequency of the oscillations depends on the behaviour of the process measure. In case the process has some kind of inertia, or is smooth in the temporal domain, then this method can be ok. Most of the fridge work like this for instance.

However if the input reflects in the output after  very short time, you might enter a very high frequency behaviour where you constantly switch on and off the system. That can damage for instance an electric relay.

## On/Off with hysteresis

On/Off control with hysteresis fix some of issues of the simple On/Off control. In particular in case the process reacts quite quickly to inputs, and one does not want to adjust the command at a high frequency, while in the meatime loosing a bit of accuracy in terms of how close is the measure $m(t)$ from the target $r(t)$

The solution is simply to take some margin: set command back to 0 only when measure reaches target + margin, and set command to 1 in case measure reaches target - margin for instance

In [1]:
import dot2tex
import pydot

graph = pydot.Dot(graph_type='digraph', rankdir="LR")
# add node
graph.add_node(pydot.Node('LOW_REACHED', label='$$u(t)=1$$'))
graph.add_node(pydot.Node('HIGH_REACHED', label='$$u(t)=0$$'))

# add edege
graph.add_edge(pydot.Edge('LOW_REACHED', 'HIGH_REACHED', label='$$r(t)-m(t) < -margin$$'))
graph.add_edge(pydot.Edge('HIGH_REACHED', 'LOW_REACHED', label='$$r(t)-m(t) > margin$$'))

# Export to tex
texcode = dot2tex.dot2tex(graph.to_string(),format='tikz',texmode='math',crop=True)
#with open("test.tex", "w") as f: 
#    f.write(texcode) 

#pdflatex ./test.tex

![title](data/hysteresis_cycle.png)

## What is a PID

PID stands for Proportional Integral Derivative. Those three terms stands for units that will help to take into account the difference between target and input measures in a different way.

The first noticeable difference with PIC controller, is that, instead of only allowing for binary output (fully on with $u(t)=1$ or fully off with $u(t)=0$, this controller will allow a much more fined grained command.

PID controller will adjust how hard the actuator should be working so that the variable of interest $y(t)$ will stay as close as possible to the desired value $r(t)$, with little variation.




We recall that discrete circular convolution reads:
\begin{align*}
  (s_1 \star s_2)[n] = \sum_{k=0}^{N-1} s_1[k] s_2[(n-k)\%N]
\end{align*}


It is interesting to notice that this convolution operator can be expressed as a linear operation, $(s_1 \star s_2) = S_2 s_1$ where we have:
\begin{align}
    S_2 = \begin{pmatrix}
    s_2[0] & s_2[N-1] & \cdots & s_2[1] \\
    s_2[1] & s_2[0] & \cdots & s_2[2] \\
    \vdots & \vdots & \vdots & \vdots \\
    s_2[N-1] & s_2[N-2] & \cdots & s_2[0] \\
    \end{pmatrix}
\end{align}
And one can easily see that this matrix is a circulant matrix. As stated in the notebook "SparseFourierApproximation" circulant matrices can be diagonalized in a Fourier basis. That makes the discrete Fourier transform a very useful tool to compute the convolution between a system input, and the system's impulse response.

To be more precise, one can show that the $F, F^{-1}$ pair of matrices can be used to diagonalize any circulant matrices $C$. such that we can write $C = F D F^{-1}$ or, alternatively $D = F^{-1} C F$

We recall that the discrete fourier transform reads:
\begin{align*}
  X[k]= \sum_{n=0}^{N-1} x[n]e^{-2 \pi j \frac{kn}{N}}
\end{align*}
And the inverse discrete fourier transform reads:
\begin{align*}
  x[n]= \frac{1}{N} \sum_{k=0}^{N-1} X[k]e^{2 \pi j \frac{kn}{N}}
\end{align*}

Or alternatively, if we write the discrete Fourier transform as a matrix F:
$X = F^{-1} x$ and $x = F X$ where we have:

\begin{align}
    F = \frac{1}{N} \begin{pmatrix}
    1 & 1 & \cdots & 1\\
    1 & e^{2\pi j\frac{1}{N}} & \cdots & e^{2\pi j\frac{N-1}{N}}\\
    \vdots & \vdots & \vdots & \vdots \\
    1 & e^{2\pi j\frac{N-1}{N}} & \cdots & e^{2\pi j\frac{(N-1)(N-1)}{N}}\\
    \end{pmatrix}
\end{align}
where k indices are constant across a row, but are different along a column
and
 
\begin{align}
    F^{-1} = \begin{pmatrix}
    1 & 1 & \cdots & 1\\
    1 & e^{-2\pi j\frac{1}{N}} & \cdots & e^{-2\pi j\frac{N-1}{N}}\\
    \vdots & \vdots & \vdots & \vdots \\
    1 & e^{-2\pi j\frac{N-1}{N}} & \cdots & e^{-2\pi j\frac{(N-1)(N-1)}{N}}\\
    \end{pmatrix}
\end{align}
where n indices are constant across row, but are different along a column


In [2]:
from IPython.display import IFrame
IFrame("doc/linear_dynamical_systems.pdf", width=800, height=600)

## From control theory to deep learning
This chapter has been inspired by a very cool blog post target PID emulation with LSTMs:
https://towardsdatascience.com/emulating-a-pid-controller-with-long-short-term-memory-part-1-bb5b87165b08