# Recurrent Neural Networks

---

## NLP - Natural Language Processing:
- Represent total *vocabulary* as an ordered array *dictionary*
- Represent each word in the *dictionary* as **one-hot encoded** vector based on the position on that word in the *dictionary*

---

## Unidirectional RNN Model:
- Represent input and output as a sequence: $x^{<1>}, x^{<2>}, ... , x^{<t>} \rightarrow y^{<1>}, y^{<2>}, ... , y^{<t>} $
- Length on input sequence is denoted as $T_x$, and output sequence as $T_y$
- Feed each input into the network to predict $\hat{y}$.  Information from previous inputs gets passed into the current input as activation.
$$a_{<0>} \rightarrow \downarrow x^{<1>} \begin{bmatrix} 0\\0\\0\\0\end{bmatrix} \uparrow \hat{y}^{<1>} \rightarrow a_{<1>} \rightarrow \downarrow x^{<2>} \begin{bmatrix} 0\\0\\0\\0\end{bmatrix} \uparrow \hat{y}^{<2>} \rightarrow ... a_{<T_x -1>} \rightarrow \downarrow x^{<T_x>} \begin{bmatrix} 0\\0\\0\\0\end{bmatrix} \uparrow \hat{y}^{<T_Y>}$$

- $x^{<1>}$ and $a^{<0>}$ are used to calculate $\hat{y}^{<1>}$ 
- $x^{<2>}$ and $a^{<1>}$ are used to calculate $\hat{y}^{<2>}$ 
- $x^{<T_x>}$ and $a^{<T_x-1>}$ are used to calculate $\hat{y}^{<T_y>}$

## Forward Propagation:
$$ a^{<t>} = g(W_{aa}a^{<t-1>} + W_{ax}x^{<t>} + b_a) $$
  
$$ \hat{y}^{<t>} = g(W_{ya}a^{<t>} + b_y) $$

## Backpropagation:
- Croos Entropy Loss Function: 

$$\sum_{t=1}^{T_y} L^{<t>}(\hat{y}^{<t>}, y^{<t>})$$
where:
$$ L^{<t>}(\hat{y}^{<t>}, y^{<t>}) = -y^{<t>}\log \hat{y}^{<t>} - (1-y^{<t>}) \log (1-\hat{y}^{<t>}) $$

---

## Different Types of RNN Architectures:
- One-to-One (Generic NN)
- One-to-Many (Generation, ex. Music generation)
    - One input with many outputs, where each output is fed back into the network as an *input*
- Many-to-One (Classification)
- Many-to-Many 
    - Same length for Input and Output
    - Different length for Input and Output (Language Translation).  Encoder and Decoder parts