# Deep learning architectures with TensorFlow

This notebook provides an introduction to various deep learning architectures in TensorFlow. Each section includes explanations and code examples to help us understand and implement these models.

In [1]:
import tensorflow as tf
from tensorflow.keras import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, SimpleRNN, LSTM, GRU, Bidirectional, Input, Concatenate
from tensorflow.keras.layers import Conv2D, MaxPooling2D, AveragePooling2D, DepthwiseConv2D, SeparableConv2D, Flatten
from tensorflow.keras.utils import to_categorical
import numpy as np
from sklearn.model_selection import train_test_split

### Feedforward neural networks (FFNN)
Feedforward neural networks are the simplest type of artificial neural network architecture. Information flows in one direction, from input to output, without any cycles or loops. Key components:
- Layers: Consist of an input layer, one or more hidden layers, and an output layer.
- Activation functions: Introduce non-linearities to the model.

In [2]:
# Generate synthetic data
np.random.seed(42)
X_ffnn = np.random.rand(1000, 20)
y_ffnn = np.random.randint(2, size=1000)

# Split the data
X_train_ffnn, X_test_ffnn, y_train_ffnn, y_test_ffnn = train_test_split(X_ffnn, y_ffnn, test_size=0.2, random_state=42)

# Define the FFNN model
model_ffnn = Sequential()
model_ffnn.add(Dense(64, activation='relu', input_shape=(X_train_ffnn.shape[1],)))
model_ffnn.add(Dense(32, activation='relu'))
model_ffnn.add(Dense(1, activation='sigmoid'))

# Compile the model
model_ffnn.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Fit the model
model_ffnn.fit(X_train_ffnn, y_train_ffnn, epochs=10, batch_size=32, validation_split=0.1)

# Evaluate the model
loss_ffnn, accuracy_ffnn = model_ffnn.evaluate(X_test_ffnn, y_test_ffnn)
print(f"FFNN accuracy: {accuracy_ffnn}")

# Predict
predictions_ffnn = model_ffnn.predict(X_test_ffnn)
print(f"FFNN predictions: {predictions_ffnn[:5]}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
FFNN accuracy: 0.5099999904632568
FFNN predictions: [[0.49172392]
 [0.51922053]
 [0.5610242 ]
 [0.5892413 ]
 [0.6545411 ]]


**FFNN model syntax**:

In TensorFlow, defining a neural network model involves specifying the architecture of layers in a sequential manner. 
- **Defining a sequential model** (`Sequential()`): The sequential model in TensorFlow's keras API is a linear stack of layers. This means that each layer in the model is added one after the other. It's a straightforward way to build a model when the architecture can be described as a simple chain of layers, where the output of one layer is the input to the next.
- **Adding layers to the model**: Each neural network layer is added to the model sequentially using the `add()` method. The most common type of layer in FFNNs is the dense layer. The general syntax:
    
    ```python
    model.add(Dense(units, activation=activation_function, input_shape=input_shape))
    ```
    
    - **Dense layers:** The `Dense` layer is a fully connected layer, meaning each neuron in this layer receives input from all the neurons in the previous layer. In FFNNs, Dense layers are the building blocks that connect layers together.
        - **Units**: The number of neurons in the layer.
        - **Activation functions** (`activation`): The activation function that determines how the outputs of the layer are transformed before being passed to the next layer. Common activation functions include `'relu'` for hidden layers and `'sigmoid'` or `'softmax'` for output layers.
        - **Input shape** (`input_shape=(n,)`): Defines the shape of the input data. It is only required for the first layer in a sequential model so the model knows what kind of input to expect. `n` is the number of features in the input data. Here, it's `(20,)` indicating that each input sample has 20 features.


**Understanding the FFNN model architecture**:
An FFNN typically consists of multiple layers:
- **Input layer:** The first layer that takes in the data. In TensorFlow, this is implicitly defined by the `input_shape` argument in the first `Dense` layer.
- **Hidden layers:** Intermediate layers where the learning happens. The number of hidden layers and the number of neurons in each layer can be adjusted depending on the complexity of the problem. Each hidden layer is a dense layer, where each neuron receives input from all the neurons in the previous layer.
- **Output layer:** The final layer that produces the network's output. The output layer is also a dense layer, with the number of neurons and the activation function used here depend on the nature of the task:
  - **Regression:** Often uses a single neuron with no activation function or a linear activation function.
  - **Binary classification:** Typically uses a single neuron with a `sigmoid` activation function.
  - **Multi-class classification:** Uses multiple neurons (equal to the number of classes) with a `softmax` activation function.
  

### Recurrent neural networks (RNN)
Recurrent neural networks are designed to recognize patterns in sequences of data, such as time series or natural language. Key components:
- Recurrent layers: Process sequences by maintaining a hidden state that is updated at each time step.

In [3]:
# Generate synthetic sequential data
timesteps = 10
input_dim = 8
num_classes = 3

X_rnn = np.random.rand(1000, timesteps, input_dim)
y_rnn = np.random.randint(num_classes, size=1000)

# One-hot encode the labels
y_rnn = to_categorical(y_rnn, num_classes)

# Split the data
X_train_rnn, X_test_rnn, y_train_rnn, y_test_rnn = train_test_split(X_rnn, y_rnn, test_size=0.2, random_state=42)

# Define the RNN model
model_rnn = Sequential()
model_rnn.add(SimpleRNN(64, activation='tanh', input_shape=(timesteps, input_dim)))
model_rnn.add(Dense(16, activation='relu'))
model_rnn.add(Dense(num_classes, activation='softmax'))

# Compile the model
model_rnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit the model
model_rnn.fit(X_train_rnn, y_train_rnn, epochs=10, batch_size=32, validation_split=0.1)

# Evaluate the model
loss_rnn, accuracy_rnn = model_rnn.evaluate(X_test_rnn, y_test_rnn)
print(f"RNN accuracy: {accuracy_rnn}")

# Predict
predictions_rnn = model_rnn.predict(X_test_rnn)
print(f"RNN predictions: {predictions_rnn[:5]}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
RNN accuracy: 0.36000001430511475
RNN predictions: [[0.24902198 0.27167076 0.4793073 ]
 [0.59757    0.15059231 0.25183764]
 [0.39393646 0.3676065  0.23845702]
 [0.28344953 0.54798806 0.16856241]
 [0.29030985 0.32991418 0.37977606]]


**RNN model syntax:**

In TensorFlow, defining an RNN involves specifying the architecture with layers that can handle sequences of data, unlike the FFNN which only considers independent data points.

- **Adding a RNN layer:** The RNN layer is designed to process sequences of data, where each input data point depends on the previous ones. In TensorFlow, this is achieved using the `SimpleRNN` layer. The general syntax:

    ```python
    model.add(SimpleRNN(units, activation=activation_function, input_shape=(timesteps, input_dim)))
    ```

    - **RNN layer**: The `SimpleRNN()` layer defines a basic RNN cell.
        - **Units**: The number of units (or neurons) in the RNN cell. This determines the dimensionality of the output space, which is the hidden state of the RNN.
        - **Activation**: The activation function for the RNN cell, typically `'tanh'` or `'relu'`. For RNNs, the `tanh` activation function is commonly used because it helps in capturing temporal dependencies in the sequence.
        - **Input shape** (`input_shape=(timesteps, input_dim)`): Defines the shape of the input data.
            - **Timesteps**: The number of time steps in the sequence. This indicates how many past observations the RNN should consider when making a prediction. For example, in a time series, if we are predicting the next value based on the previous 10 values, then `timesteps=10`.
            - **Input dimension** (`input_dim`): The dimensionality of each time step, which is the number of features per time step. This represents the size of the input vector at each time step. For example, if we are working with sequences of 50-dimensional vectors, `input_dim=50` (i.e., each word (timestep) is represented by a 50-dimensional vector, so the input dimension is 50).
- **Adding dense layers:** RNN models can also include multiple layers, such as additional Dense layers, after the RNN layer to further process the output.


**Understanding the RNN model architecture:**
An RNN processes sequences of data, making it different from the FFNN in how it handles input data and passes information through layers.
- **Input layer (implicit):** For RNNs, the input layer handles sequential data, meaning the model receives data in the form of sequences. The input layer is not explicitly defined as a separate layer in TensorFlow but is specified by the `input_shape` parameter in the first RNN layer, which includes both the number of timesteps and the number of features per timestep.

- **RNN layer:**
    - **Recurrent nature:** Unlike FFNNs, RNNs have a recurrent structure where the output from the previous time step is fed back into the network as input for the next time step. This allows RNNs to maintain a hidden state that captures temporal dependencies in the data.
    - **Hidden state:** The hidden state in an RNN carries information from one time step to the next, allowing the model to remember previous inputs in the sequence. This is crucial for tasks like time series prediction or text generation, where the context of previous data points influences the output.
- **Hidden layers:** After the sequence data has been processed by the RNN layer, dense layers can be added to refine the output further. These layers act as fully connected layers that help in further extracting features and combining the information from the sequence data processed by the RNN.
- **Output layer:** The final Dense layer in the model often serves as the output layer. Similar to FFNNs, the output layer in an RNN is designed according to the nature of the task. For example, a softmax activation is typically used in the output layer for classification tasks where the goal is to predict a class label for the entire sequence or for each time step.

### Long short-term memory (LSTM)
LSTMs are a type of RNN that can learn long-term dependencies, making them effective for sequence prediction problems. Key components:
- Memory cells: Allow the network to retain information over longer periods.
- Gates: Control the flow of information into and out of the memory cell.

In [4]:
# Define the LSTM model
model_lstm = Sequential()
model_lstm.add(LSTM(64, activation='tanh', input_shape=(timesteps, input_dim)))
model_lstm.add(Dense(16, activation='relu'))
model_lstm.add(Dense(num_classes, activation='softmax'))

# Compile the model
model_lstm.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit the model
model_lstm.fit(X_train_rnn, y_train_rnn, epochs=10, batch_size=32, validation_split=0.1)

# Evaluate the model
loss_lstm, accuracy_lstm = model_lstm.evaluate(X_test_rnn, y_test_rnn)
print(f"LSTM accuracy: {accuracy_lstm}")

# Predict
predictions_lstm = model_lstm.predict(X_test_rnn)
print(f"LSTM predictions: {predictions_lstm[:5]}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
LSTM accuracy: 0.30000001192092896
LSTM predictions: [[0.35574389 0.31959754 0.3246586 ]
 [0.35195228 0.32256836 0.3254794 ]
 [0.3559693  0.31903145 0.32499927]
 [0.3299029  0.34588078 0.3242163 ]
 [0.3440122  0.32979408 0.3261938 ]]


**LSTM model syntax:**

In TensorFlow, defining an LSTM model involves specifying layers that can handle sequences of data, particularly when those sequences have long-term dependencies. The LSTM layer is a type of recurrent neural network (RNN) layer designed to better capture these dependencies compared to a standard RNN.

- **Adding a LSTM layer:** The LSTM layer is particularly effective for sequences with long-term dependencies, thanks to its ability to maintain a more complex internal state than a simple RNN. In TensorFlow, this is achieved using the `LSTM` layer. The general syntax:

    ```python
    model.add(LSTM(units, activation=activation_function, input_shape=(timesteps, input_dim)))
    ```

    - **LSTM layer**: The `LSTM()` layer defines an LSTM cell, which is a specialized form of an RNN cell.
        - **Units**: The number of units (or neurons) in the LSTM cell. This defines the dimensionality of the output space, which is also the size of the hidden state that the LSTM maintains.
        - **Activation**: The activation function for the LSTM cell. While the internal mechanisms of LSTM (like the forget gate, input gate, and output gate) use the `sigmoid` and `tanh` functions internally, the outer activation is typically `'tanh'`.
        - **Input shape** (`input_shape=(timesteps, input_dim)`): Defines the structure of the input data.
            - **Timesteps**: This indicates the number of time steps in the input sequence, essentially representing how many past observations the LSTM should consider.
            - **Input dimension** (`input_dim`): The number of features per time step, representing the dimensionality of the input vector at each time step.
- **Adding dense layers:** After the LSTM layer has processed the sequence data, additional Dense layers can be included to further refine the output. These Dense layers act like the ones in FFNNs and can help in learning complex patterns in the data.


**Understanding the LSTM model architecture:**
An LSTM model is designed to handle sequences of data, particularly where long-term dependencies and relationships across time steps are crucial.
- **Input layer (implicit):** In LSTM models, the input layer is implicit, meaning it is defined within the first LSTM layer via the `input_shape` parameter.
- **LSTM layer:**
    - **Long-term dependency fandling:** The LSTM layer improves upon the standard RNN by using mechanisms like forget gates and memory cells to capture long-term dependencies, making it particularly useful for tasks where context from earlier in the sequence affects the output significantly.
    - **Hidden state and cell state:** Unlike a simple RNN, an LSTM maintains two states— the hidden state and the cell state. The hidden state carries information across time steps, while the cell state helps in controlling what information to retain and what to forget over longer sequences.
    - **Gate mechanisms:** LSTMs use three gates (input, forget, and output) to manage the flow of information. This allows LSTMs to effectively decide which pieces of information from previous time steps should influence the current state and output.
- **Hidden layers:** After the LSTM layer has processed the sequence data, additional dense layers can be added to further process the data. These layers are fully connected and help in refining the features extracted by the LSTM layer.
- **Output layer:** The output layer is usually a dense layer with an activation function suited to the specific task.

### Gated recurrent units (GRU)
GRUs are similar to LSTMs but with a simpler architecture. They are effective for capturing dependencies in sequential data. Key components:
- Update and reset gates: Simplify the control mechanism compared to LSTMs.

In [5]:
# Define the GRU model
model_gru = Sequential()
model_gru.add(GRU(64, activation='tanh', input_shape=(timesteps, input_dim)))
model_gru.add(Dense(16, activation='relu'))
model_gru.add(Dense(num_classes, activation='softmax'))

# Compile the model
model_gru.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit the model
model_gru.fit(X_train_rnn, y_train_rnn, epochs=10, batch_size=32, validation_split=0.1)

# Evaluate the model
loss_gru, accuracy_gru = model_gru.evaluate(X_test_rnn, y_test_rnn)
print(f"GRU accuracy: {accuracy_gru}")

# Predict
predictions_gru = model_gru.predict(X_test_rnn)
print(f"GRU predictions: {predictions_gru[:5]}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
GRU accuracy: 0.3100000023841858
GRU predictions: [[0.4073248  0.29250658 0.3001687 ]
 [0.3788755  0.32337958 0.29774496]
 [0.4199705  0.28967035 0.2903591 ]
 [0.28629386 0.38366592 0.3300402 ]
 [0.32596502 0.33576128 0.3382737 ]]


**GRU model syntax:**

In TensorFlow, defining a GRU model involves creating layers that process sequential data with a focus on capturing dependencies over time. The GRU is a type of RNN that is a simplified version of the LSTM, designed to achieve similar performance with fewer parameters.

- **Adding a GRU layer:** The GRU layer is designed to handle sequences of data, much like the LSTM, but with a simplified architecture that often leads to faster training times. In TensorFlow, the `GRU` layer is used to implement this. The general syntax:

    ```python
    model.add(GRU(units, activation=activation_function, input_shape=(timesteps, input_dim)))
    ```

    - **GRU layer**: The `GRU()` layer defines a GRU cell, which is a type of RNN cell designed to capture dependencies in sequential data.
        - **Units**: The number of units (or neurons) in the GRU cell. This determines the dimensionality of the hidden state output.
        - **Activation**: The activation function used within the GRU cell. The `tanh` activation function is commonly used as it effectively captures temporal dependencies in the data.
        - **Input shape** (`input_shape=(timesteps, input_dim)`): Defines the structure of the input data.
            - **Timesteps**: The number of time steps in the input sequence.
            - **Input dimension** (`input_dim`): The number of features per time step.
- **Adding dense layers:** After the GRU  layer has processed the sequence data, additional Dense layers can be included to further refine and transform the output.


**Understanding the GRU model architecture:**
A GRU model processes sequences of data with a focus on capturing dependencies across time steps, similar to LSTM but with a more streamlined approach.
- **Input layer (implicit):** The input layer is not explicitly defined as a separate layer but is implied by the `input_shape` parameter in the first GRU layer.
- **GRU layer:**
    - **Gate mechanisms:** The GRU layer uses gates, similar to LSTM, but combines the forget and input gates into a single update gate. This simplification reduces the complexity and computational cost while still allowing the model to capture important temporal dependencies.
    - **Hidden state:** The GRU maintains a single hidden state (unlike LSTM, which has both a hidden state and a cell state). This hidden state is updated at each time step, capturing the relevant information from previous time steps.
    - **Simplification:** By merging the forget and input gates into one update gate, and the output gate into the reset gate, GRUs often require fewer parameters to train than LSTMs, leading to faster training times while still being effective for many sequence-based tasks.
- **Hidden layers:** Similar to FFNNs and LSTM models, additional dense layers can be added after the GRU layer to further process and refine the output.
- **Output layer:** The output layer is usually a dense layer with an activation function depending on the specific task.


### Bidirectional RNN (Bi-RNN)
Bidirectional RNNs process the input data in both forward and backward directions, capturing context from both ends of the sequence. Key components:
- Bidirectional layer: Wraps an RNN layer to process inputs in both directions.

In [6]:
# Define the Bidirectional RNN model
model_birnn = Sequential()
model_birnn.add(Bidirectional(SimpleRNN(64, activation='tanh'), input_shape=(timesteps, input_dim)))
model_birnn.add(Dense(16, activation='relu'))
model_birnn.add(Dense(num_classes, activation='softmax'))

# Compile the model
model_birnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit the model
model_birnn.fit(X_train_rnn, y_train_rnn, epochs=10, batch_size=32, validation_split=0.1)

# Evaluate the model
loss_birnn, accuracy_birnn = model_birnn.evaluate(X_test_rnn, y_test_rnn)
print(f"Bi-RNN accuracy: {accuracy_birnn}")

# Predict
predictions_birnn = model_birnn.predict(X_test_rnn)
print(f"Bi-RNN predictions: {predictions_birnn[:5]}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Bi-RNN accuracy: 0.36500000953674316
Bi-RNN predictions: [[0.36458915 0.19103023 0.44438058]
 [0.4731815  0.2130693  0.3137492 ]
 [0.21945472 0.5448359  0.23570925]
 [0.17167735 0.54557097 0.28275162]
 [0.295507   0.45927334 0.24521965]]


**Bidirectional RNN model syntax:**

In TensorFlow, a Bidirectional RNN (Bi-RNN) model is used to process sequences of data in both forward and backward directions, capturing dependencies from past and future time steps. This can be particularly useful in tasks where context from both the past and future improves predictions.

- **Adding a bidirectional RNN layer:** The `Bidirectional` wrapper in TensorFlow allows us to create a model where the RNN processes the input sequence in both forward and backward directions. The general syntax for adding a bidirectional RNN layer is:

    ```python
    model.add(Bidirectional(SimpleRNN(units, activation=activation_function), input_shape=(timesteps, input_dim)))
    ```
    We can easily replace SimpleRNN with LSTM or GRU to create a bidirectional LSTM (Bi-LSTM) or bidirectional GRU (Bi-GRU). For example:
    ```python
    model.add(Bidirectional(LSTM(units, activation=activation_function), input_shape=(timesteps, input_dim)))
    ```
    Or:
    ```python
    model.add(Bidirectional(GRU(units, activation=activation_function), input_shape=(timesteps, input_dim)))
    ```

    - **Bidirectional wrapper**: The `Bidirectional()` function wraps around any recurrent layer (like `SimpleRNN`, `LSTM`, or `GRU`) to create a bidirectional version of that layer.
        - **Units**: The number of units (or neurons) in the RNN cell within the `SimpleRNN` layer.
        - **Activation**: The activation function used within the RNN cell. For RNNs, `tanh` is often used to capture temporal dependencies.
        - **Input shape** (`input_shape=(timesteps, input_dim)`): Defines the structure of the input data.
            - **Timesteps**: The number of time steps in the input sequence.
            - **Input dimension** (`input_dim`): The number of features at each time step.
- **Adding dense layers:** After the bidirectional RNN layer, dense layers are added to further process the sequence data.


**Understanding the Bi-RNN model architecture:**
A Bidirectional RNN processes sequences of data in both forward and backward directions, which can be particularly beneficial when the context from both directions is essential for accurate predictions.
- **Input layer (implicit):** In a bidirectional RNN model, the input layer is implied by the `input_shape` parameter in the first bidirectional RNN layer.
- **Bidirectional RNN layer:**
    - **Bidirectional processing:** The bidirectional layer processes the sequence data in two directions: forward (from past to future) and backward (from future to past).
    - **Combining outputs:** The outputs from both directions are combined. This can be done by concatenating the outputs from the forward and backward passes, summing them, or averaging them, depending on the specific implementation.
    - **Recurrent layer variants:** While the example uses `SimpleRNN`, we can also use `LSTM` or `GRU`.
- **Hidden layers:** After the bidirectional RNN layer, additional Dense layers can be added to further process and refine the output.
- **Output layer:** The final Dense layer in the model typically serves as the output layer.

### Nested RNN
A Nested RNN, also known as a stacked RNN, is a deep RNN architecture where multiple RNN layers are stacked on top of each other. This architecture is particularly useful for capturing hierarchical patterns in the data, as the nested structure allows the model to learn both fine-grained and coarse-grained temporal dependencies by having each RNN layer process the output from the previous RNN layer. Key components include:
- Nested layer: Combines multiple RNN cells in a hierarchical manner, where each RNN cell processes different levels of temporal abstraction.

In [7]:
# Define the nested RNN model
model_nested_rnn = Sequential()
model_nested_rnn.add(SimpleRNN(64, activation='tanh', return_sequences=True, input_shape=(timesteps, input_dim)))
model_nested_rnn.add(SimpleRNN(32, activation='tanh'))
model_nested_rnn.add(Dense(16, activation='relu'))
model_nested_rnn.add(Dense(num_classes, activation='softmax'))

# Compile the model
model_nested_rnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit the model
model_nested_rnn.fit(X_train_rnn, y_train_rnn, epochs=10, batch_size=32, validation_split=0.1)

# Evaluate the model
loss_nested_rnn, accuracy_nested_rnn = model_nested_rnn.evaluate(X_test_rnn, y_test_rnn)
print(f"Nested RNN accuracy: {accuracy_nested_rnn}")

# Predict
predictions_nested_rnn = model_nested_rnn.predict(X_test_rnn)
print(f"Nested RNN predictions: {predictions_nested_rnn[:5]}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Nested RNN accuracy: 0.38499999046325684
Nested RNN predictions: [[0.3316957  0.17422909 0.49407518]
 [0.46093673 0.11252645 0.42653683]
 [0.3877326  0.3594785  0.2527889 ]
 [0.31242576 0.3363809  0.35119343]
 [0.3203868  0.4260071  0.2536061 ]]


**Nested RNN model syntax:**

In TensorFlow, a Nested RNN model uses multiple layers of RNNs, where each RNN layer can capture different levels of temporal dependencies. This structure is beneficial when working with sequences that exhibit hierarchical patterns, such as text, speech, or time-series data.

- **Adding a nested RNN layer:** A nested RNN architecture consists of multiple RNN layers stacked on top of each other, where each subsequent layer processes the sequence output from the previous layer. The general syntax for adding a nested RNN layer is:

    ```python
    model.add(SimpleRNN(units_1, activation=activation_function, return_sequences=True, input_shape=(timesteps, input_dim)))
    model.add(SimpleRNN(units_2, activation=activation_function))
    ```
    We can replace `SimpleRNN` with `LSTM` or `GRU` to create a nested LSTM or nested GRU architecture. For example:
    ```python
    model.add(LSTM(units_1, activation=activation_function, return_sequences=True, input_shape=(timesteps, input_dim)))
    model.add(LSTM(units_2, activation=activation_function))
    ```
    Or:
    ```python
    model.add(GRU(units_1, activation=activation_function, return_sequences=True, input_shape=(timesteps, input_dim)))
    model.add(GRU(units_2, activation=activation_function))
    ```

    - **RNN layer**: The first RNN layer is configured with `return_sequences=True`, which ensures that it outputs a sequence (rather than a single vector). This sequence is then passed as input to the next RNN layer.
        - **Units**: The number of units (or neurons) in each RNN cell.
        - **Activation**: The activation function used within the RNN cells.
        - **Return sequences**: If set to `True`, the RNN layer will return the full sequence of outputs for each input time step. This is essential for passing the output sequence from one RNN layer to another.
        - **Input shape** (`input_shape=(timesteps, input_dim)`): Defines the structure of the input data and is only specified in the first RNN layer.
            - **Timesteps**: The number of time steps in the input sequence.
            - **Input dimension** (`input_dim`): The number of features at each time step. 
- **Adding dense layers:** After the nested RNN layers, dense layers are added to further process the sequence data.


**Understanding the nested RNN model architecture:**
The Nested RNN model is designed to capture complex hierarchical temporal patterns by stacking multiple RNN layers. Each layer learns different levels of abstraction, which helps the model understand both short-term and long-term dependencies in the data.
- **Input layer (implicit):** In a nested RNN model, the input layer is implied by the `input_shape parameter` in the first RNN layer.
- **Subsequent RNN layers:**
    - **Stacked processing:** Each subsequent RNN layer processes the output of the previous RNN layer. In a nested RNN, each layer receives the entire sequence from the layer below.
    - **Hidden states:** Each RNN layer maintains its own hidden state, capturing different levels of temporal dependencies.
    - **Recurrent layer variants:** While the example uses `SimpleRNN`, we can also use `LSTM` or `GRU`.
- **Hidden layers:** After the nested RNN layers, additional Dense layers can be added to further process and refine the output.
- **Output layer:** The final Dense layer in the model typically serves as the output layer.


### Convolutional neural networks (CNNs)
CNNs are designed to process visual data, such as images, though they can also be applied to other types of data. CNNs are particularly effective at capturing spatial hierarchies in images by using convolutional layers, pooling layers, and fully connected layers. Key components:
- Convolutional layers: Extract features from input data.
- Pooling layers: Reduce the spatial dimensions of the data.
- Flatten layers: Transforms the 2D (or higher) data into a 1D vector.

In [8]:
# Generate synthetic image data
X_cnn = np.random.rand(1000, 64, 64, 3)
y_cnn = np.random.randint(num_classes, size=1000)
y_cnn = to_categorical(y_cnn, num_classes)

# Split the data
X_train_cnn, X_test_cnn, y_train_cnn, y_test_cnn = train_test_split(X_cnn, y_cnn, test_size=0.2, random_state=42)

# Define the CNN model
model_cnn = Sequential()

# First convolutional block
model_cnn.add(Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)))
model_cnn.add(MaxPooling2D((2, 2)))

# Second convolutional block using separable convolution
model_cnn.add(SeparableConv2D(64, (3, 3), activation='relu'))
model_cnn.add(AveragePooling2D(pool_size=(2, 2)))

# Third convolutional block using depthwise convolution
model_cnn.add(DepthwiseConv2D((3, 3), activation='relu'))
model_cnn.add(MaxPooling2D(pool_size=(2, 2)))

# Flatten the output and add dense layers
model_cnn.add(Flatten())
model_cnn.add(Dense(64, activation='relu'))
# Output layer
model_cnn.add(Dense(num_classes, activation='softmax'))

# Compile the model
model_cnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit the model
model_cnn.fit(X_train_cnn, y_train_cnn, epochs=10, batch_size=32, validation_split=0.1)

# Evaluate the model
loss_cnn, accuracy_cnn = model_cnn.evaluate(X_test_cnn, y_test_cnn)
print(f"CNN accuracy: {accuracy_cnn}")

# Predict
predictions_cnn = model_cnn.predict(X_test_cnn)
print(f"CNN predictions: {predictions_cnn[:5]}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
CNN accuracy: 0.33000001311302185
CNN predictions: [[0.3267816  0.36631304 0.30690533]
 [0.32674113 0.36648828 0.3067706 ]
 [0.3267772  0.36633396 0.30688888]
 [0.32680443 0.3663261  0.3068695 ]
 [0.32680878 0.36653906 0.30665213]]


**CNN model syntax:**

In TensorFlow, a CNN model uses convolutional block, typically consisting of a convolutional layer followed by a pooling layer. This block helps the model learn and condense features while reducing the spatial dimensions of the data.

- **Adding convolutional block:** A convolutional block is a fundamental building block of CNNs that typically consists of a convolutional layer followed by a pooling layer. It helps the model learn and condense features while reducing the spatial dimensions of the data. The general syntax for adding a convolutional block is:

    ```python
    model.add(Conv2D(filters, kernel_size, activation='relu', input_shape=(height, width, channels)))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    ```

    The components of a convolutional block:
    - **Convolutional layer** (1D, 2D, 3D): Extracts features from the input by sliding a filter across the data. The choice between 1D, 2D, and 3D convolutional layers depends on the type and structure of our data. `filters` define the number of feature maps, while the `kernel_size` determines the dimensions of the convolutional window.
        - 1D convolutional layer (`Conv1D`):  1D convolutional layers are ideal for processing sequences where each time step or position is represented by a vector. Common applications include time series data, audio signals, and natural language text. A 1D convolutional layer applies a filter (or kernel) along the temporal or sequential axis. This filter slides over the sequence to detect patterns or features at different positions. 
        - 2D convolutional layer (`Conv2D`):  2D convolutional layers are used for image data or any data with spatial relationships. They are apply a filter over the two-dimensional spatial dimensions of the input (width and height).. The filter extracts features like edges, textures, and shapes. 
            - Separable and depthwise convolutions are used instead of standard Convolutional layers (like `Conv2D`) when we need to reduce computational cost and model size without significantly sacrificing accuracy. They are particularly useful in scenarios where efficiency is crucial, such as in mobile applications or real-time processing.
                1. Separable convolution (`SeparableConv2D`): Separable convolution breaks down the convolution operation into two simpler steps, depthwise donvolution (Applies a single convolutional filter to each input channel separately) and pointwise convolution (Combines the output of the depthwise convolution by applying a 1x1 convolution across all channels). Its syntax: `model.add(SeparableConv2D(filters, kernel_size, activation='relu', input_shape=(height, width, channels)))`
                2. Depthwise convolution (`DepthwiseConv2D`): Depthwise convolution is a step within separable convolution that applies a convolutional filter to each input channel individually, rather than across all channels as in standard convolution. It is particularly useful in mobile and embedded devices where computational resources are limited. Its syntax: `model.add(DepthwiseConv2D(kernel_size, activation='relu', input_shape=(height, width, channels)))`
        - 3D convolutional layer (`Conv3D`):  3D convolutional layers handle data with three dimensions, including spatial and temporal dimensions. They are suitable for applications involving volumes or sequences of 2D frames. A 3D convolutional layer applies a filter over three dimensions (depth, height, width).
    - **Activation function**: Introduces non-linearity into the model, allowing it to learn more complex patterns.
    - **Pooling layer**: Reduces the dimensionality (width and height) of the data, which helps in reducing the computational load and controlling overfitting. Options include:
        - Max pooling (`MaxPooling2D`): Takes the maximum value from a feature map within a defined window (e.g., 2x2). This operation retains the most prominent features while reducing the spatial dimensions.
        - Average pooling (`AveragePooling2D`): Takes the average value from a feature map within a defined window. This operation smooths out the feature map by averaging values, which can be useful when you want to reduce the risk of overfitting by smoothing the feature representations.

- **Transition from convolutional to dense layers** This involves converting the multi-dimensional output of the convolutional layers into a one-dimensional representation that can be used by dense (fully connected) layers for further processing. The goal is to aggregate the features learned by convolutional layers into a format suitable for classification or regression.
    - **Adding flatten layers** (`Flatten()`): The flatten layer is used to convert the multi-dimensional output of convolutional layers into a one-dimensional vector, which can then be fed into dense (fully connected) layers. It bridges the convolutional layers and the dense layers. We can use it when we need to feed the full-dimensional feature maps into dense layers without any reduction.
    - **Adding global pooling layer**: Global pooling layers can be used as an alternative to the flatten layer. These layers perform a downsampling operation, where they reduce each feature map to a single value by taking the maximum or average of all values in the map. We can use it when we want to simplify the model by reducing the number of parameters.
        - Global maximum pooling (`GlobalMaxPooling2D()`): Reduces each feature map to its maximum value.
        - Global average pooling (`GlobalAveragePooling2D()`): Reduces each feature map to its average value.
- **Adding dense layers:** After transforming the output from convolutional layers into a format suitable for dense layers, dense layers are added to further process the sequence data.

**Understanding the nested RNN model architecture:**
The Nested RNN model is designed to capture complex hierarchical temporal patterns by stacking multiple RNN layers. Each layer learns different levels of abstraction, which helps the model understand both short-term and long-term dependencies in the data.
- **Input layer (implicit):** In a CNN model, the input layer is implied by the `input_shape` parameter in the first convolution layer.
- **Convolutional blocks:** Sequential layers of convolution and pooling that extract and condense features.
- **Flatten layer or global pooling layer:** Converts the data into a format suitable for dense layers. Flatten provides a simple 1D vector, while Global Pooling layers provide a summarized vector of feature maps.
- **Dense layers:** Further process the flattened or pooled data to make predictions.
- **Output layer:** The final Dense layer provides the final predictions, typically using a softmax activation function for classification tasks.


### Multimodal neural network (RNN + CNN)

A multimodal neural network can process different types of data simultaneously by combining various neural network architectures, such as RNNs for sequential data (e.g., time series, text) and CNNs for spatial data (e.g., images). The outputs from these separate models are then concatenated and processed together to make predictions. Key components:
- Concatenation: Merges the outputs of the RNN and CNN models.

In [9]:
# Generate synthetic sequential data for RNN
X_rnn = np.random.rand(1000, 10, 8)
y_rnn = np.random.randint(3, size=1000)
y_rnn = to_categorical(y_rnn, 3)

# Generate synthetic image data for CNN
X_cnn = np.random.rand(1000, 64, 64, 3)
y_cnn = y_rnn  # Assuming same labels for simplicity

# Define the RNN branch
input_rnn = Input(shape=(10, 8))
model_rnn = SimpleRNN(64, activation='tanh')(input_rnn)
model_rnn = Dense(32, activation='relu')(model_rnn)

# Define the CNN branch
input_cnn = Input(shape=(64, 64, 3))
model_cnn = Conv2D(32, (3, 3), activation='relu')(input_cnn)
model_cnn = MaxPooling2D((2, 2))(model_cnn)
model_cnn = Flatten()(model_cnn)
model_cnn = Dense(32, activation='relu')(model_cnn)

# Concatenate the outputs of the RNN and CNN branches
concatenated_model = Concatenate()([model_rnn, model_cnn])
concatenated_model = Dense(32, activation='relu')(concatenated_model)
output = Dense(3, activation='softmax')(concatenated_model)

# Create the model
model = Model(inputs=[input_rnn, input_cnn], outputs=output)

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Split the data for training and testing
X_train_rnn, X_test_rnn, y_train, y_test = train_test_split(X_rnn, y_rnn, test_size=0.2, random_state=42)
X_train_cnn, X_test_cnn, _, _ = train_test_split(X_cnn, y_cnn, test_size=0.2, random_state=42)

# Fit the model
model.fit([X_train_rnn, X_train_cnn], y_train, epochs=10, batch_size=32, validation_split=0.1)

# Evaluate the model
loss, accuracy = model.evaluate([X_test_rnn, X_test_cnn], y_test)
print(f"Multimodal model accuracy: {accuracy}")

# Predict
predictions = model.predict([X_test_rnn, X_test_cnn])
print(f"Multimodal model predictions: {predictions[:5]}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Multimodal model accuracy: 0.3149999976158142
Multimodal model predictions: [[0.1088253  0.74524546 0.14592925]
 [0.26525834 0.48776522 0.24697645]
 [0.4892044  0.32656553 0.18423003]
 [0.36213967 0.22869022 0.40917012]
 [0.27750468 0.5452279  0.17726749]]


***Multimodal neural network syntax***

In TensorFlow, a multimodal neural network can combine RNN and CNN branches, each processing different types of data before merging their outputs for final predictions. The architecture allows distinct processing paths (RNN for sequence data, CNN for spatial data) that are specialized for their respective inputs.


1. **Defining RNN and CNN branches:**
    - **RNN branch:** Handles sequential data like time series or text.
        - **Input layer** (`Input(shape=(timesteps, features))`): The `Input` layer defines the shape of the sequential data, specifying the time steps and the number of features per step. For example, `Input(shape=(10, 8))` expects sequences of 10 time steps, each with 8 features.
        - **Recurrent layer** (`RecurrentLayer(units, activation='activation_function')`): Processes the sequential data with recurrent layer e.g., RNN/LSTM/GRU. Units are the number of neurons in the recurrent layer. For example, `SimpleRNN(64, activation='tanh')` adds an LSTM layer with 64 units and a `tanh` activation function.
        - **Dense layer** (`Dense(units, activation='activation_function')`): Further processes the output from the recurrent layer.
    - **CNN branch:** Handles spatial data like images.
        - **Input layer** (`Input(shape=(height, width, channels))`): The `Input` layer defines the shape of the image data, such as `Input(shape=(64, 64, 3))` for 64x64 RGB images.
        - **Convolutional layer** (`ConvLayer(filters, kernel_size, activation='activation_function')`): Extracts features from the images by applying filters across the input. Filters are the number of convolutional filters applied and Kernel size is the size of the filter (e.g., 3x3). For example, `Conv2D(32, (3, 3), activation='relu')` adds a convolutional layer with 32 filters and a 3x3 kernel.
        - **Pooling layer** (`PoolingLayer(pool_size)`): Reduces the spatial dimensions of the feature maps, helping in downsampling and feature condensation. Pool size is the dimension of the pooling window (e.g., 2x2). For example, `MaxPooling2D((2, 2))`.
        - **Flatten layer** (`Flatten()`): Converts the multi-dimensional output of the CNN into a 1D vector for further processing. In our example, it scnverts the 2D output into a 1D vector.
        - **Dense layer** (`Dense(units, activation='activation_function')`): Further processes the flattened output.
2. **Concatenation of RNN and CNN outputs:** After each branch has processed its respective input data, their outputs are concatenated to form a combined representation that merges the insights from both branches.
    - **Concatenate layer** (`Concatenate()([output_rnn, output_cnn])`): The outputs of the RNN and CNN branches are concatenated to form a combined representation.
3. **Dense layers and output:**
    - **Further dense layers:** After concatenation, additional dense layers can be added to further process the combined data.
    - **Output layer:** The final dense layer typically uses a softmax activation function for classification.
4. **Model creation and compilation:**
    - **Model** (`Model(inputs=[input_rnn, input_cnn], outputs=final_output)`): The `Model` function defines the multimodal model by taking the inputs from both the RNN and CNN branches and defines the output layer (from step 3).
    - **Compilation:** The model is compiled using an optimizer, loss function, and evaluation metrics. 