# Imports

In [22]:
import tensorflow as tf
from tensorflow.keras import layers ## For adding layers
from tensorflow.keras.datasets import imdb ## IMDB review data
from tensorflow.keras.preprocessing.sequence import pad_sequences ## For padding
import numpy as np ## For shape check

# Loading data

In [21]:
vocab_size = 10000  # Consider the top 10,000 most frequent words
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=vocab_size)  ## Divide into train and test data

* At this stage, the train_data variable contains movie reviews, each review words mapped to some indices.
* For example, if the review is 'This is a great movie', then the train_data can have values like [44,31,21,18,11], 1 numerical value for each word.
* Say for example if the word 'movie' is not in first 10000 (vocab_size) frequent words, then it is not taken into account and the train data value might be [44,31,21,18].
* Also, there can be some outliner, for example one review value might be [100011,11,21,35], here 100011 is greater than vocab_size, so it should be eliminated and for that we will use the following function:

# Function to remove outliners

In [29]:
def filter_data(data):
    ##If word_id, which is the indice value of word, is less than vocab_size, then only it is included in train data, otherwise skipped.
    return [[word_id for word_id in review if word_id < vocab_size] for review in data]  

train_data = filter_data(train_data) ## Apply the function to train data
test_data = filter_data(test_data)   ## Apply the function to test data

* We can verify if the function works

In [26]:
print(f"Max index in filtered train_data: {max(max(review) for review in train_data)}")
print(f"Min index in filtered train_data: {min(min(review) for review in train_data)}")


Max index in filtered train_data: 9999
Min index in filtered train_data: 1


* Here the max index is less than vocab_size, so the function works in removing the outliners

# Padding

* Now, if we need to make each review of the same word limit.
* So we will call the pad_sequence function and provide input as train_data and put maximum length of the review as needed (max_length).
* What it does is, it will add 0 if the review is less than 250 words.
* In our example of 'This is a great movie', after padding, the vector would look like [44,31,21,18,0,0,0........] (remember that 'movie' is not in frequent words)

In [27]:
max_length = 250  ## Set maximum length for padding
train_data = pad_sequences(train_data, maxlen=max_length, padding='post')  ## Apply the padding function for train data
test_data = pad_sequences(test_data, maxlen=max_length, padding='post')   ## Apply the padding function for test data


* Lets check the shape of the padded train_data

In [28]:
np.shape(train_data)

(25000, 250)

* This implies that there are 25000 reviews in padded train data and each review contains 250 words.

# Create Model

* Lets create the model and we will use the Sequential class. In TensorFlow, tf.keras.Sequential is a class that represents a linear stack of layers. In other words, it defines a neural network model as a sequence of layers where each layer has exactly one input tensor and one output tensor. The data flows sequentially through these layers from input to output.
* Here, we will be using 5 layers and details of each layers are given below:


## Layer 1 - Embedding:
An embedding is a mapping from discrete objects, such as words or identifiers, to vectors of real numbers. In NLP, word embeddings are vector representations of words in a continuous vector space, capturing semantic relationships between words.

### Usage in TensorFlow:
In TensorFlow, the `tf.keras.layers.Embedding` layer creates word embeddings. It takes integer indices representing words as input and outputs dense vectors (embeddings) for each word.

### Parameters:
- `input_dim`: Size of the vocabulary, i.e., total number of unique words.
- `output_dim`: Dimensionality of the dense embedding vectors.
- `input_length`: Length of input sequences.

### How it Works:
1. **Internal Lookup Table Creation:** The layer initializes an internal lookup table with shape `(vocab_size, output_dim)` mapping word indices to embedding vectors.
2. **Transforming Word Indices to Embeddings:** Given an input sequence of word indices, the layer converts each index to its corresponding dense embedding vector using the lookup table.
3. **Output:** The output is a sequence of dense embedding vectors, each of dimension `output_dim`.

By utilizing word embeddings, neural networks can learn semantic representations crucial for various NLP tasks.

**Example:**
For instance, if the input sequence is `[44, 31, 21, 18, 0, 0, 0, ...]`, the layer retrieves the embedding vectors for indices 44, 31, 21, 18, and so on, from its lookup table, resulting in a sequence of embedding vectors e.g., `[vector_23, vector_1, vector_999, vector_1342, vector_0,...........]`).

## Layer 2 - Convolutional Layer (Conv1D):
The Convolutional Layer (Conv1D) is a type of layer commonly used in convolutional neural networks (CNNs) for processing one-dimensional sequences, such as time series or text data.

## Parameters:
- `filters`: Number of filters (or kernels) to apply to the input. Each filter produces one feature map in the output.
- `kernel_size`: Size of the convolutional window, determining the number of neighboring elements considered at a time during the convolution operation.
- `activation`: Activation function applied to the output of the convolution operation.

## How it Works:
1. **Convolution Operation:** The Conv1D layer performs a 1D convolution operation on the input sequence using the specified number of filters and kernel size. Each filter slides across the input sequence, computing dot products with the corresponding elements in the input.
2. **Summation under the Window:** After computing the dot products, the convolution operation performs a summation under the window, aggregating the results to produce a single value for each position in the output feature map.
3. **Feature Extraction:** The convolution operation extracts features from the input sequence, capturing patterns or local dependencies within the data.
4. **Activation Function (ReLU):** The ReLU activation function is applied element-wise to the output of the convolution operation. It replaces any negative values with zero, introducing non-linearity and enabling the network to learn complex relationships in the data. The ReLU function is defined as:
ReLU(x) = max(0, x)


Convolutional layers are effective for capturing local patterns and dependencies in sequential data, making them suitable for tasks such as text classification and time series forecasting.

**Example:**
For instance, if the input sequence is `[44, 31, 21, 18, 0, 0, 0, ...]`, the Conv1D layer applies filters to extract features from the sequence and then applies the ReLU activation function to introduce non-linearity.

**Output Format:**
The output of the Conv1D layer consists of feature maps. Each feature map is a two-dimensional array, where the first dimension corresponds to the number of filters (`filters`) and the second dimension corresponds to the length of the feature map. For example, if the Conv1D layer has 64 filters and the input sequence length is 246, the output would be 64 feature maps, each with 246 values.


## Layer 3 - Global Max Pooling Layer (GlobalMaxPooling1D):
The Global Max Pooling Layer (GlobalMaxPooling1D) is a pooling layer commonly used in convolutional neural networks (CNNs) for processing one-dimensional sequences. It performs a global maximum pooling operation across the entire sequence, reducing the spatial dimensions to a single value per feature map.

## How it Works:
1. **Maximum Pooling Operation:** The GlobalMaxPooling1D layer computes the maximum value across each feature map along the temporal (sequence) dimension.
2. **Feature Aggregation:** The maximum value for each feature map serves as a summary statistic, capturing the most salient feature within the sequence.
3. **Output:** The output of the GlobalMaxPooling1D layer is a one-dimensional vector, with each element representing the maximum value of the corresponding feature map.

Global max pooling is useful for capturing the most important features within the input sequence, often used as a dimensionality reduction technique before feeding the data into fully connected layers for further processing.

**Example:**
Suppose the input to the GlobalMaxPooling1D layer consists of 64 feature maps, each with a length of 246 values. After applying the global max pooling operation, the output would be a one-dimensional vector containing 64 values, each representing the maximum value of the corresponding feature map.

**Output Format:**
The output of the GlobalMaxPooling1D layer is a one-dimensional vector, where the length of the vector corresponds to the number of feature maps in the input.


## Layer 4 - Dense Layer:
The Dense Layer is a fully connected layer in a neural network, where each neuron is connected to every neuron in the previous layer. It is commonly used for learning non-linear relationships in the data.

## Parameters:
- `units`: Number of neurons or units in the layer.
- `activation`: Activation function applied to the output of the layer.

## How it Works:
1. **Weighted Sum Calculation:** The Dense layer computes the weighted sum of the inputs from the previous layer along with biases for each neuron.
2. **Activation Function:** The activation function (e.g., ReLU) is applied element-wise to the output of the weighted sum, introducing non-linearity to the model.
3. **Output:** The output of the Dense layer is a vector of size `units`, where each element represents the output of a neuron in the layer.

Dense layers are versatile and can be used in various neural network architectures for tasks such as classification, regression, and more.

**Example:**
Suppose we have a Dense layer with 64 units and ReLU activation. If the input to this layer is a vector of size 100, the output would also be a vector of size 64.

**Output Format:**
The output of the Dense layer is a vector of size `units`, where each element represents the output of a neuron in the layer.


## Layer 5 - Dense Layer for Binary Classification:
The Dense Layer with `sigmoid` activation is commonly used as the output layer for binary classification tasks, where the goal is to predict one of two classes (e.g., positive or negative).

## Parameters:
- `units`: Number of neurons or units in the layer.
- `activation`: Activation function applied to the output of the layer.

## How it Works:
1. **Weighted Sum Calculation:** The Dense layer computes the weighted sum of the inputs from the previous layer along with biases for each neuron.
2. **Activation Function (Sigmoid):** The sigmoid activation function is applied element-wise to the output of the weighted sum. It squashes the output values between 0 and 1, interpreting them as probabilities. In binary classification, a threshold (typically 0.5) is applied to these probabilities to make class predictions.
3. **Output:** The output of the Dense layer is a single scalar value representing the probability of belonging to the positive class.

Dense layers with sigmoid activation are suitable for binary classification tasks, such as sentiment analysis, spam detection, and medical diagnosis.

**Example:**
Suppose we have a Dense layer with 1 unit and sigmoid activation. If the input to this layer is a vector of size 64, the output would be a single scalar value between 0 and 1, representing the probability of the positive class.

**Output Format:**
The output of the Dense layer is a single scalar value representing the probability of belonging to the positive class.


In [7]:
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=128, input_length=max_length),
    tf.keras.layers.Conv1D(filters=64, kernel_size=5, activation='relu'),
    tf.keras.layers.GlobalMaxPooling1D(),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')  # Binary classification (positive/negative)
])

## Model Compilation:
After defining the architecture of the neural network, the next step is to compile the model. During compilation, you specify additional settings that determine how the model will be trained.

## Parameters:
- `optimizer`: Optimization algorithm used to update the weights of the neural network during training. Adam is a popular optimizer known for its adaptive learning rate capabilities.
- `loss`: Loss function used to measure the difference between the predicted outputs and the true labels during training. For binary classification tasks, binary cross-entropy is commonly used.
- `metrics`: List of metrics to evaluate the performance of the model during training and testing. Accuracy is a commonly used metric that measures the proportion of correctly classified samples.

## How it Works:
1. **Optimizer Selection:** The optimizer (`adam` in this case) determines the strategy used to update the weights of the neural network based on the computed gradients during backpropagation.
2. **Loss Function:** The loss function (`binary_crossentropy`) quantifies the difference between the predicted outputs and the true labels. It is specifically designed for binary classification tasks and penalizes deviations from the true labels.
3. **Metrics:** During training and evaluation, the model's performance is monitored using the specified metrics (e.g., accuracy). These metrics provide insights into how well the model is performing on the given task.

Model compilation is a crucial step in setting up the training process, as it defines the optimization strategy and evaluation criteria used to train and assess the model's performance.

**Example:**
In this example, the model is compiled using the Adam optimizer, binary cross-entropy loss function, and accuracy as the evaluation metric.



In [8]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


## Model Training:
Once the model architecture is defined and compiled, the next step is to train the model on the training data.

## Parameters:
- `train_data`: Input training data, typically features or input sequences.
- `train_labels`: True labels corresponding to the training data.
- `epochs`: Number of times the entire training dataset is passed forward and backward through the neural network. One epoch consists of one forward pass and one backward pass.
- `batch_size`: Number of samples processed simultaneously by the model during training. It determines the number of samples used to compute the gradient and update the weights in each optimization step.
- `validation_data`: Optional tuple containing validation data and labels. If provided, the model's performance on the validation set is evaluated at the end of each epoch.

## How it Works:
1. **Forward and Backward Pass:** During each epoch, the model performs a forward pass to compute the predicted outputs, followed by a backward pass to compute the gradients of the loss function with respect to the model parameters.
2. **Gradient Descent:** The optimization algorithm (e.g., Adam) uses the computed gradients to update the weights of the neural network, minimizing the loss function and improving the model's performance.
3. **Validation:** At the end of each epoch, if validation data is provided, the model's performance is evaluated on the validation set using the specified metrics (e.g., accuracy).
4. **Monitoring Training Progress:** Throughout the training process, the model's performance metrics on the training and validation sets are logged, allowing for monitoring of training progress and detection of overfitting or underfitting.

Training a neural network involves finding the optimal set of weights that minimize the loss function and generalize well to unseen data.

**Example:**
In this example, the model is trained for 5 epochs using a batch size of 64. Training data (`train_data`) and corresponding labels (`train_labels`) are provided. Additionally, the model's performance is evaluated on the test set (`test_data`) and test labels (`test_labels`) after each epoch.



In [9]:
epochs = 5
batch_size = 64

model.fit(train_data, train_labels, epochs=epochs, batch_size=batch_size,
          validation_data=(test_data, test_labels))


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x177dac46bb0>

## Model Evaluation:
After training the model, it's essential to evaluate its performance on unseen data to assess its generalization capabilities.

## Parameters:
- `test_data`: Input test data, typically features or input sequences.
- `test_labels`: True labels corresponding to the test data.

## How it Works:
1. **Forward Pass:** The model performs a forward pass on the test data to compute the predicted outputs.
2. **Loss Computation:** The loss function is computed using the predicted outputs and the true labels.
3. **Metric Calculation:** If specified during model compilation, additional metrics (e.g., accuracy) are calculated based on the predicted outputs and true labels.
4. **Evaluation:** The model's performance metrics, such as loss and accuracy, are returned as output.

Model evaluation provides insights into how well the trained model generalizes to unseen data and helps identify potential issues, such as overfitting or underfitting.

**Example:**
In this example, the model's performance is evaluated on the test data (`test_data`) and corresponding labels (`test_labels`). The evaluation results include the test loss and test accuracy.



In [11]:
test_loss, test_acc = model.evaluate(test_data, test_labels)
print(f"Test accuracy: {test_acc:.4f}")


Test accuracy: 0.8942
