## Clarification of A Recurrent Neuron and A Layer of Recurrent Neurons
Actually the figures `15-1` and `15-2` expresses quite accurately what each vector's dimension is, namely when it is **not boldface**,
it concerns a **scalar** (like the $y, y_{(t-3)}\,, y_{(t-2)}\,, y_{(t-1)}\,, y_{(t)}$ in figure `15-1`.) And when it is **boldface**,
it concerns a **vector** (like the $\mathbf{y}, \mathbf{y}_{(0)}\,, \mathbf{y}_{(1)}\,, \mathbf{y}_{(2)}$ in figure `15-2`.)
![](./figs/fig.15-1.png)
![](./figs/fig.15-2.png)


More explicitly speaking,

- a recurrent neuron's output is a scalar
- a layer of recurrent neurons is a cooperative unit of multiple recurrent neurons. And its output is a vector

## Recurrent Neuron
We will have

- $y_{(t)} \in \mathbb{R}$ for all time $t$.
- $\mathbf{x}_{(t)} \in \mathbb{R}^{n_{\,\text{inputs}}}\;\;$ for all time $t$.
- A single neuron's parameters are vectors $\mathbf{w_x} \in \mathbb{R}^{n_{\,\text{inputs}}}\;\;$ and scalars $w_y \in \mathbb{R}, b \in \mathbb{R}$
- The formula connecting all these together is the following: $$y_{(t)} = \mathbf{w_x} \cdot \mathbf{x}_{(t)} + w_y y_{(t-1)} + b$$

## A Layer of Recurrent Neurons (To be edited!!!)
We will have

For a single instance,
- $\mathbf{y}_{(t)} \in \mathbb{R}^{n_{\,\text{neurons}}}\;\;\;$ for all time $t$.
- $\mathbf{x}_{(t)} \in \mathbb{R}^{n_{\,\text{inputs}}}\;\;$ for all time $t$.
- Parameters: Matrices $W_{\mathbf{x}} \in \mathbb{R}^{n_{\,\text{inputs}}}\;\;$ and scalars $w_y \in \mathbb{R}, b \in \mathbb{R}$
- The formula connecting all these together is the following: $$y_{(t)} = \mathbf{w_x} \cdot \mathbf{x}_{(t)} + w_y y_{(t-1)} + b$$


In [1]:
import tensorflow.keras as keras

In [5]:
simpleRNN_layer = keras.layers.SimpleRNN(5, input_shape=(28*28,))

In [10]:
simpleRNN_layer.trainable_weights

[]

## Memory Cells
In a more sophisticated setting, there is also sth called **hidden state**, usually noted as $\mathbf{h}_{(t)}\,.$
And the common practice is

- let $\mathbf{h}_{(t)} = f(\mathbf{h}_{(t-1)}\,, \mathbf{x}_{(t)}\,)$ for some function $f$
- let $\mathbf{y}_{(t)} = g(\mathbf{h}_{(t-1)}\,, \mathbf{x}_{(t)}\,)$ for some function $g$.

In what we discussed above (for the simplest case), the output $\mathbf{y}_{(t)}$ plays the role of a hidden state $\mathbf{h}_{(t)}$ and there was no $\mathbf{h}_{(t)}$. But further in this chapter, we will encounter more sophisticated RNNs which do make use of hidden states.