### Sequence Modelling

The need for RNNs arise from the fact that fuzzy neural network (FNNs) are unable to capture time based dependencies in data.

RNNs are used frequently for and have tremendous results in tasks such as NLP, Machine Translation and Algorithmic Trading.

In RNN, each output is dependent on the previous one, which allows them capture dependencies in sequences. (e.g. in language, next word depends on previous word)



To model sequences, we need to:
 1. Handle variable-length sequences
 2. Track long-term dependencies
 3. Maintain information about order
 4. Share parameters across the sequence

Today: Recurrent Neural Networks (RNNs) as an approach to sequence modeling problems

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

A recurrent neural network (RNN) is used for sequential data problems. It iterates through the sequence elements and maintains a state containing information relative to what it has been seen so far.

In effect, an RNN is a type of neural network that has an internal state.

![title](rnn.jpg)

Parameters for RNN: <br/>
1. $x_{t}$ is the input at time step t. For example, $x_{1}$ could be a one-hot vector corresponding to the second word of a sentence.
2. $s_{t}$ is the hidden state at time step t. It’s the “memory” of the network.  $s_{t}$ is calculated based on the previous hidden state and the input at the current step:  $s_{t}$=f(U $x_{t}$ + W $s_{t-1}$). The function f usually is a nonlinearity such as tanh or ReLU.   $s_{-1}$, which is required to calculate the first hidden state, is typically initialized to all zeroes.
3. $o_{t}$ is the output at step t. For example, if we wanted to predict the next word in a sentence it would be a vector of probabilities across our vocabulary. <br/>
$o_{t}$ = softmax(V$s_{t}$).

RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being dependent on the previous computations.

Unfold or unroll mean writing up network for full sequence.

For example, if the sequence we care about is a sentence of 5 words, the network would be unrolled into a 5-layer neural network, one layer for each word. 

U, W, V remains same at each step, because of this RNN is more memory intensive and need to trained for monger in comparison to RNN.

Reason for same weights at each step: Separate paramters are unable to generalize to sequence length that aren't encountered during training process.

Intuitively, we can think f this as having a sequence ($x_{1}$, $x_{2}$, ....., $x_{t}$) where we are trying to find the P($x_{t+1}$|$x_{t}$, $x_{t-1}$, $x_{t-2}$, ...., $x_{1}$)


RNN Model Topology:
1. One-to-One (one input, one output)
2. One-to-Many (one input, many outputs)
3. Many-to-One (many inputs, one output)
4. Many-to-Many (multiple inputs and outputs where no of inputs = no of outputs)
5. Many-to-Many (multiple inputs and outputs where no of inputs != no of outputs)

In [6]:
# Pseudocode for RNN:
input_sequence = [] # vector representation of input
s_t = 0
for x_t in input_sequence:
    o_t = activation(dot(U, x_t) + dot(W, s_t) + b)
    s_t = o_t

### NumPy Implementation of a Simple RNN

In [30]:
import numpy as np

timesteps = 100 # number of timesteps in the input sequence
input_features = 32 # dimensionality of input feature space
output_features = 64 # dimensionality of output feature space

inputs = np.random.random((timesteps, input_features))

state_t = np.zeros((output_features, ))

U = np.random.random((output_features, input_features))
W = np.random.random((output_features, output_features))
V = np.random.random((output_features, output_features))
b = np.random.random((output_features, ))

In [31]:
from scipy.special import softmax

success_outputs = []

for x_t in inputs:
    state_t = np.tanh(np.dot(U, x_t) + np.dot(W, state_t) + b)
    o_t = softmax(np.dot(V, state_t))
    success_outputs.append(o_t)

In [32]:
final_output_sequence = np.concatenate(success_outputs, axis=0)

In [33]:
final_output_sequence[:100]

array([1.44227454e-03, 1.08536523e-03, 4.43459078e-05, 5.37241939e-02,
       8.54148853e-02, 7.83333548e-04, 1.44987756e-02, 3.24604339e-04,
       5.73214573e-03, 5.65427507e-03, 1.04581250e-02, 3.41979746e-05,
       1.76262880e-03, 1.11982628e-04, 1.59859904e-05, 1.08534095e-05,
       8.23672608e-05, 1.60411720e-03, 1.36498807e-02, 2.46639595e-02,
       2.46634069e-04, 5.16599592e-03, 3.42905761e-03, 1.43004569e-03,
       9.78768236e-06, 3.72034768e-04, 1.69641486e-02, 4.86755356e-03,
       8.41974386e-05, 5.60756902e-05, 7.82076218e-03, 6.64289437e-03,
       7.18332788e-03, 1.77527097e-03, 1.63904767e-01, 1.56633307e-03,
       7.30257035e-04, 6.67034392e-03, 3.02139586e-04, 1.02356066e-03,
       3.86993331e-01, 3.10809787e-04, 9.70005012e-04, 2.59502030e-02,
       4.30630990e-04, 6.06291098e-06, 8.33450206e-03, 1.34531544e-03,
       4.24672398e-03, 3.76258316e-04, 1.04474604e-02, 1.15546668e-04,
       5.39744547e-03, 1.35795488e-03, 1.15677659e-03, 7.19408596e-02,
      

In [34]:
inputs



array([[0.36442242, 0.80936581, 0.91212408, ..., 0.7241877 , 0.04158229,
        0.26876221],
       [0.23420591, 0.27569583, 0.40458744, ..., 0.26082021, 0.33589759,
        0.55422131],
       [0.57100382, 0.89665303, 0.82936618, ..., 0.12513132, 0.48662522,
        0.99469631],
       ...,
       [0.51047852, 0.97326324, 0.89371166, ..., 0.88828527, 0.4564248 ,
        0.01728221],
       [0.58678564, 0.95309764, 0.25554656, ..., 0.51015233, 0.1723988 ,
        0.6965287 ],
       [0.732482  , 0.59642324, 0.47269101, ..., 0.7254335 , 0.48181679,
        0.98339069]])