# Recurrent Neural Network RNN

1. **Objective** : Translate English sentence - "I go" to Hindi sentence - "मैं जाता हूँ" using Recurrent neural network

2. **Process of RNN - Encoding**
    - **Step 1 :** Convert the input tokens into embeddings. 
    Let x("I") = x1 = 1
    Let x("go") = x2 = 2
    - **Step 2 :** Decide number of hidden layers in the neural network and number of states in each layer. 
    Let hidden state size = s = 2. Number of layers of the neural network = n = 1
    - **Step 3 :** Initialize the 1st hidden state $h_0$ based on the hidden size s. It will be a matrix of dimensions s x n = 2x1. Number of rows = Number of hidden states in the layer. Number of columns = Number of layers in the neural network. **The matrix is like a neural network standing erect.**
    - **Step 4 :** Mathematical relation between the hidden state - \
    $h_t = \tanh(Wh_{t-1} + Ux_t + b)$ \
    where $\tanh()$ is the activation function, W is the weight matrix for the hidden state, U is the weight matrix for the input, b is the bias
    - **Step 5 :** Initialize the weights and biases of the neural network randomly
    $$
    W = \begin{bmatrix}
    0.3 & -0.1 \\
    0 & 0.2
    \end{bmatrix}_{2*2}
    $$

    $$
    U = \begin{bmatrix}
    0.5 \\
    0.7
    \end{bmatrix}_{2*1}
    $$

    $$
    b = \begin{bmatrix}
    0 \\
    0
    \end{bmatrix}_{2*1}
    $$
                  

In [2]:
import numpy as np
# Initialization of the weight and biases for the neural network
W = np.array([[0.30, -0.10], [0, 0.20]])
h0 = np.array([[0.0], [0.0]])
U = np.array([[0.50], [0.70]])
b = np.array([[0.0], [0.0]])
x1 = 1
x2 = 2

### Encoding

- **Step 6 :** Encoding - Calculate the hidden states $h_1$ and $h2$ using the formula in step 4 \

$h_1 = \tanh(Wh_0 + Ux1 + b)$  

$
    h_1 = \begin{bmatrix}
    0.5 \\
    0.7
    \end{bmatrix}
$

$h_2 = \tanh(Wh_1 + Ux2 + b)$  

$
    h_2 = \begin{bmatrix}
    1.08 \\
    1.54
    \end{bmatrix}
$
   

In [3]:
# Matrix multiplication
h1 = np.matmul(W, h0) + U * x1 + b
h2 = np.matmul(W, h1) + U * x2 + b

print("h1:\n", h1)
print("h2:\n", h2)

h1:
 [[0.5]
 [0.7]]
h2:
 [[1.08]
 [1.54]]


### Decoding

- Step 7 : create embeddings for the Hindi words.
  - Let y(Go) = y1 = 0.5, y("मैं") = y2 = 1, y("जाता") = y3 = 1.1, y("हूँ") = y4 = 0.9, y(EOS) = y5 = 0.0
- Step 8 : The first output layer will be a copy of the last hidden state of the encoder. $S_0 = h_2 = \begin{bmatrix} 1.08 \\ 1.54 \end{bmatrix}$

- **Step 7 :** Initialize the output weights and biases
- **Step 8 :** Calculate the output using the formula $y = Vh_2 + c$
- **Step 9 :** Calculate the loss using the formula $L = -\sum_{i} y_i \log(\hat{y_i})$
- **Step 10 :** Update the weights and biases using the formula $W = W - \alpha \frac{\partial L}{\partial W}$, where $\alpha$ is the learning rate
- **Step 11 :** Repeat the steps 6 to 10 for all the sentences in the training data
- **Step 12 :** Predict the output for the test data

#### Few observations 

1. The hidden state at time t is a function of the hidden state at time t-1, the input at time t and the bias term.
2. W and U are the weight matrices or parameters. That are trained using the training set. In the beginning, they are randomly initialized. And then they are trained, using the corpus so that the loss is minimized.
3. b is the bias term. It is also randomly initialized and then trained.
4. $x_t$ is the input at time t. It is the embedding of the token at time t.
5. $h_t$ is the hidden state at time t. It is the memory of the network at time t.