## Processing Sequences Using RNNs and CNNs

### Recurrent Neural Network

recognizes patterns in sequences of data, such as time series, speech, text, and video

**Recurrent layer**

At each time step t (also called a frame), the recurrent neuron receives the inputs $x_t$ as well as its own output from the previous time step, $\hat{y}_{t-1}$.

For a layer of recurrent neurons 

$\hat{y}_t = f(W_{x}.x_t + W_{\hat{y}}.\hat{y}_{t-1}+b)$

$W_x$ is the weight vector for inputs

$W_{\hat{y}}$ is the weight vector for output of the previous step t-1

**Input and output sequences**

sequence-to-sequence network: a sequence of inputs produces a sequence of outputs → daily power consumption

sequence-to-vector network: a sequence of inputs ignoring all outputs except for the last one → movie review

 vector-to-sequence network: the same input vector over and over again at each time step and let it output a sequence → image captioning

encoder (sequence-to-vector) → followed by → decoder (vector-to-sequence) → translation from one language to another

### Training RNNs

seasonality

trend

differencing

moving averages

### ARMA model family

It consists of two main types of models: the AR (AutoRegressive) model and the MA (Moving Average) model.

1.  **AutoRegressive (AR) Model**
    
    The AutoRegressive model specifies that the output variable depends linearly on its own previous values.
    
    $X_t = c + \phi_1 X_{t-1}+\phi_2 X_{t-2}+\cdot\cdot\cdot+\phi_p X_{t-p} + \epsilon_t$
    
    - $X_t$ is the time series at time t.
    - c is a constant.
    - ϕ1,ϕ2,…,ϕp are the parameters of the model.
    - $\epsilon_t$ is white noise error term at time t, typically assumed to be normally distributed with mean 0 and variance $\sigma^2$.
2. **Moving Average (MA) Model**
    
    The Moving Average model specifies that the output variable depends linearly on the current and past values of a stochastic (white noise) term.
    
    $X_t = c + \epsilon_t + \theta_1\epsilon_{t-1} + \theta_2\epsilon_{t-2} + \cdot \cdot \cdot+ \theta_q\epsilon_{t-q}$
    
    - $X_t$ is the time series at time t.
    - c is a constant.
    - $\epsilon_t, \epsilon_{t-1},\cdot \cdot\cdot,\epsilon_{t-q}$ are white noise error terms.
    - $\theta_1, \theta_2, \cdot \cdot\cdot ,\theta_1$ are the parameters of the model.

Combining the AR and MA models we get

$X_t = c + \phi_1 X_{t-1} + \phi_2 X_{t-2} + \cdot \cdot \cdot + \phi_p X_{t-p} + \epsilon_t + \theta_1\epsilon_{t-1} + \theta_2\epsilon_{t-2} + \cdot \cdot \cdot+ \theta_q\epsilon_{t-q}$

- $X_t$ is the time series at time t.
- c is a constant.
- ϕ1,ϕ2,…,ϕp are the parameters of the AR part of the model.
- θ1,θ2,…,θq are the parameters of the MA part of the model.
- $\epsilon_t$ is white noise term.

**Differencing + ARMA = ARIMA (Auto Regressive Integrated Moving Average) model**

**Seasonality + ARIMA = SARIMA ( Seasonal Auto Regressive Integrated Moving Average) model**

### LSTM

special kind of RNN, capable of learning long-term dependencies

<div>
  <img src="Images/LSTM.png" alt="LSTM" style="width: 500px; height: 300px;">
</div>

Variants of LSTM

peephole

coupled forget and input gates

GRU

Excellent blog to understand LSTM :[https://colah.github.io/posts/2015-08-Understanding-LSTMs/]