<a href="https://colab.research.google.com/github/hurricane195/Intro-to-Deep-Learning/blob/Homework_5/HW5_P1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Problem 1: In this problem, similar to homework 3 problem 1, we focus on the language model we did in the lectures. However, we expand it to a much longer sequence. Here is the sequence:

“Next character prediction is a fundamental task in the field of natural language processing (NLP) that involves predicting the next character in a sequence of text based on the characters that precede it. This task is essential for various applications, including text auto-completion, spell checking, and even in the development of sophisticated AI models capable of generating human-like text.

At its core, next character prediction relies on statistical models or deep learning algorithms to analyze a given sequence of text and predict which character is most likely to follow. These predictions are based on patterns and relationships learned from large datasets of text during the training phase of the model.

One of the most popular approaches to next character prediction involves the use of Recurrent Neural Networks (RNNs), and more specifically, a variant called Long Short-Term Memory (LSTM) networks. RNNs are particularly well-suited for sequential data like text, as they can maintain information in 'memory' about previous characters to inform the prediction of the next character. LSTM networks enhance this capability by being able to remember long-term dependencies, making them even more effective for next character prediction tasks.

Training a model for next character prediction involves feeding it large amounts of text data, allowing it to learn the probability of each character's appearance following a sequence of characters. During this training process, the model adjusts its parameters to minimize the difference between its predictions and the actual outcomes, thus improving its predictive accuracy over time.

Once trained, the model can be used to predict the next character in a given piece of text by considering the sequence of characters that precede it. This can enhance user experience in text editing software, improve efficiency in coding environments with auto-completion features, and enable more natural interactions with AI-based chatbots and virtual assistants.

In summary, next character prediction plays a crucial role in enhancing the capabilities of various NLP applications, making text-based interactions more efficient, accurate, and human-like. Through the use of advanced machine learning models like RNNs and LSTMs, next character prediction continues to evolve, opening new possibilities for the future of text-based technology.”

Inspired by the course example, **train and validate a transformer mode, for learning the above sequence. Use sequence lengths of 10, 20, and 30 for your training. Feel free to adjust other network parameters. Report and compare training loss, validation accuracy, execution time for training, and computational and mode size complexities against RNN-based approaches in Homework 3**.

In [None]:
#Using a modided example of Dr. Tabkhi's "RNN" available at https://github.com/HamedTabkhi/Intro-to-DL/blob/main/RNN.py
#Using a modided example of Dr. Tabkhi's "RNN-CharDataset" available at https://github.com/HamedTabkhi/Intro-to-DL/blob/main/RNN-CharDataset.py
#Using a modided example of Dr. Tabkhi's "transformer_encoder_nextcharactor" available at https://github.com/HamedTabkhi/Intro-to-DL/blob/main/transformer_encoder_nextcharactor.py
#Random help from Chat GPT on formatting, sytntax, etc.
#Random help from Chat Colab AI on formatting, sytntax, etc.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from sklearn.model_selection import train_test_split
import time

In [None]:
# Check for CUDA support and set the device accordingly
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

device(type='cuda')

In [None]:
# Sample text
text = "Next character prediction is a fundamental task in the field of natural language processing (NLP) that involves predicting the next character in a sequence of text based on the characters that precede it. This task is essential for various applications, including text auto-completion, spell checking, and even in the development of sophisticated AI models capable of generating human-like text. At its core, next character prediction relies on statistical models or deep learning algorithms to analyze a given sequence of text and predict which character is most likely to follow. These predictions are based on patterns and relationships learned from large datasets of text during the training phase of the model. One of the most popular approaches to next character prediction involves the use of Recurrent Neural Networks (RNNs), and more specifically, a variant called Long Short-Term Memory (LSTM) networks. RNNs are particularly well-suited for sequential data like text, as they can maintain information in 'memory' about previous characters to inform the prediction of the next character. LSTM networks enhance this capability by being able to remember long-term dependencies, making them even more effective for next character prediction tasks. Training a model for next character prediction involves feeding it large amounts of text data, allowing it to learn the probability of each character's appearance following a sequence of characters. During this training process, the model adjusts its parameters to minimize the difference between its predictions and the actual outcomes, thus improving its predictive accuracy over time. Once trained, the model can be used to predict the next character in a given piece of text by considering the sequence of characters that precede it. This can enhance user experience in text editing software, improve efficiency in coding environments with auto-completion features, and enable more natural interactions with AI-based chatbots and virtual assistants. In summary, next character prediction plays a crucial role in enhancing the capabilities of various NLP applications, making text-based interactions more efficient, accurate, and human-like. Through the use of advanced machine learning models like RNNs and LSTMs, next character prediction continues to evolve, opening new possibilities for the future of text-based technology."
# Creating character vocabulary
# part of the data preprocessing step for a character-level text modeling task.
# Create mappings between characters in the text and numerical indices

#set(text): Creates a set of unique characters found in the text. The set function removes any duplicate characters.
#list(set(text)): Converts the set back into a list so that it can be sorted.
# sorted(list(set(text))): Sorts the list of unique characters.
chars = sorted(list(set(text)))
#This line creates a dictionary that maps each character to a unique index (integer)."
ix_to_char = {i: ch for i, ch in enumerate(chars)}
#Similar to the previous line, but in reverse. This line creates a dictionary that maps each unique index (integer) back to its corresponding character.
char_to_ix = {ch: i for i, ch in enumerate(chars)}
chars = sorted(list(set(text)))

**MAXIMUM LENGTH OF INPUT SECQUENCES = 10**

---



In [None]:
# Preparing the dataset
max_length = 10  # Maximum length of input sequences
X = []
y = []
for i in range(len(text) - max_length):
    sequence = text[i:i + max_length]
    label = text[i + max_length]
    X.append([char_to_ix[char] for char in sequence])
    y.append(char_to_ix[label])

X = np.array(X)
y = np.array(y)

In [None]:
# Splitting the dataset into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Converting data to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.long)
y_train = torch.tensor(y_train, dtype=torch.long)
X_val = torch.tensor(X_val, dtype=torch.long)
y_val = torch.tensor(y_val, dtype=torch.long)

In [None]:
# Defining the Transformer model
class CharTransformer(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers, nhead):
        super(CharTransformer, self).__init__()
        self.embedding = nn.Embedding(input_size, hidden_size)
        encoder_layers = nn.TransformerEncoderLayer(hidden_size, nhead)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        embedded = self.embedding(x)
        transformer_output = self.transformer_encoder(embedded)
        output = self.fc(transformer_output[:, -1, :])  # Get the output of the last Transformer block
        return output

In [None]:
# Hyperparameters
hidden_size = 120
num_layers = 3
nhead = 2
learning_rate = 0.001
epochs = 100

In [None]:
# Model, loss, and optimizer
model = CharTransformer(len(chars), hidden_size, len(chars), num_layers, nhead)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)



In [None]:
#count trainable parameters of the model
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

count_parameters(model)

1667348

In [None]:
# Training the model
start_time = time.time()
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()
    output = model(X_train)
    loss = criterion(output, y_train)
    loss.backward()
    optimizer.step()

    # Training accuracy
    with torch.no_grad():
        _, predicted_train = torch.max(output, 1)
        train_accuracy = (predicted_train == y_train).float().mean()

    # Validation
    model.eval()
    with torch.no_grad():
        val_output = model(X_val)
        val_loss = criterion(val_output, y_val)
        _, predicted_val = torch.max(val_output, 1)
        val_accuracy = (predicted_val == y_val).float().mean()

    if (epoch+1) % 10 == 0:
        print(f'Epoch {epoch+1}, Training Loss: {loss.item():.4f}, Training Accuracy: {train_accuracy.item():.4f}, Validation Loss: {val_loss.item():.4f}, Validation Accuracy: {val_accuracy.item():.4f}')

end_time = time.time()
training_time = end_time - start_time
print(f"Training time: {training_time} seconds")

Epoch 10, Training Loss: 2.5545, Training Accuracy: 0.2679, Validation Loss: 2.5211, Validation Accuracy: 0.2836
Epoch 20, Training Loss: 2.3817, Training Accuracy: 0.2674, Validation Loss: 2.4009, Validation Accuracy: 0.2878
Epoch 30, Training Loss: 2.2987, Training Accuracy: 0.2821, Validation Loss: 2.3347, Validation Accuracy: 0.2920
Epoch 40, Training Loss: 2.2609, Training Accuracy: 0.2805, Validation Loss: 2.3274, Validation Accuracy: 0.2962
Epoch 50, Training Loss: 2.2384, Training Accuracy: 0.2826, Validation Loss: 2.3164, Validation Accuracy: 0.2962
Epoch 60, Training Loss: 2.2282, Training Accuracy: 0.2832, Validation Loss: 2.3187, Validation Accuracy: 0.2878
Epoch 70, Training Loss: 2.2151, Training Accuracy: 0.2795, Validation Loss: 2.3143, Validation Accuracy: 0.2878
Epoch 80, Training Loss: 2.2096, Training Accuracy: 0.2842, Validation Loss: 2.3178, Validation Accuracy: 0.2920
Epoch 90, Training Loss: 2.2093, Training Accuracy: 0.2853, Validation Loss: 2.3161, Validation 

In [None]:
# Prediction function
def predict_next_char(model, char_to_ix, ix_to_char, initial_str):
    model.eval()
    with torch.no_grad():
        initial_input = torch.tensor([char_to_ix[c] for c in initial_str[-max_length:]], dtype=torch.long).unsqueeze(0)
        prediction = model(initial_input)
        predicted_index = torch.argmax(prediction, dim=1).item()
        return ix_to_char[predicted_index]

# Predicting the next character
test_str = "This is a simple example to demonstrate how to predict the next char"
predicted_char = predict_next_char(model, char_to_ix, ix_to_char, test_str)
print(f"Predicted next character: '{predicted_char}'")

Predicted next character: 'e'


**MAXIMUM LENGTH OF INPUT SECQUENCES = 20**

In [None]:
# Preparing the dataset
max_length = 20  # Maximum length of input sequences
X = []
y = []
for i in range(len(text) - max_length):
    sequence = text[i:i + max_length]
    label = text[i + max_length]
    X.append([char_to_ix[char] for char in sequence])
    y.append(char_to_ix[label])

X = np.array(X)
y = np.array(y)

In [None]:
# Splitting the dataset into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Converting data to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.long)
y_train = torch.tensor(y_train, dtype=torch.long)
X_val = torch.tensor(X_val, dtype=torch.long)
y_val = torch.tensor(y_val, dtype=torch.long)

In [None]:
# Defining the Transformer model
class CharTransformer(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers, nhead):
        super(CharTransformer, self).__init__()
        self.embedding = nn.Embedding(input_size, hidden_size)
        encoder_layers = nn.TransformerEncoderLayer(hidden_size, nhead)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        embedded = self.embedding(x)
        transformer_output = self.transformer_encoder(embedded)
        output = self.fc(transformer_output[:, -1, :])  # Get the output of the last Transformer block
        return output

In [None]:
# Hyperparameters
hidden_size = 120
num_layers = 3
nhead = 2
learning_rate = 0.001
epochs = 100

In [None]:
# Model, loss, and optimizer
model = CharTransformer(len(chars), hidden_size, len(chars), num_layers, nhead)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)



In [None]:
#count trainable parameters of the model
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

count_parameters(model)

1667348

In [None]:
# Training the model
start_time = time.time()
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()
    output = model(X_train)
    loss = criterion(output, y_train)
    loss.backward()
    optimizer.step()

    # Training accuracy
    with torch.no_grad():
        _, predicted_train = torch.max(output, 1)
        train_accuracy = (predicted_train == y_train).float().mean()

    # Validation
    model.eval()
    with torch.no_grad():
        val_output = model(X_val)
        val_loss = criterion(val_output, y_val)
        _, predicted_val = torch.max(val_output, 1)
        val_accuracy = (predicted_val == y_val).float().mean()

    if (epoch+1) % 10 == 0:
        print(f'Epoch {epoch+1}, Training Loss: {loss.item():.4f}, Training Accuracy: {train_accuracy.item():.4f}, Validation Loss: {val_loss.item():.4f}, Validation Accuracy: {val_accuracy.item():.4f}')

end_time = time.time()
training_time = end_time - start_time
print(f"Training time: {training_time} seconds")

Epoch 10, Training Loss: 2.6529, Training Accuracy: 0.2611, Validation Loss: 2.6204, Validation Accuracy: 0.2426
Epoch 20, Training Loss: 2.4279, Training Accuracy: 0.2685, Validation Loss: 2.4677, Validation Accuracy: 0.2785
Epoch 30, Training Loss: 2.3206, Training Accuracy: 0.2748, Validation Loss: 2.4138, Validation Accuracy: 0.2511
Epoch 40, Training Loss: 2.2675, Training Accuracy: 0.2807, Validation Loss: 2.3698, Validation Accuracy: 0.2637
Epoch 50, Training Loss: 2.2378, Training Accuracy: 0.2918, Validation Loss: 2.3578, Validation Accuracy: 0.2743
Epoch 60, Training Loss: 2.2242, Training Accuracy: 0.2801, Validation Loss: 2.3555, Validation Accuracy: 0.2827
Epoch 70, Training Loss: 2.2210, Training Accuracy: 0.2849, Validation Loss: 2.3646, Validation Accuracy: 0.2848
Epoch 80, Training Loss: 2.2077, Training Accuracy: 0.2854, Validation Loss: 2.3673, Validation Accuracy: 0.2848
Epoch 90, Training Loss: 2.2064, Training Accuracy: 0.2902, Validation Loss: 2.3741, Validation 

In [None]:
# Prediction function
def predict_next_char(model, char_to_ix, ix_to_char, initial_str):
    model.eval()
    with torch.no_grad():
        initial_input = torch.tensor([char_to_ix[c] for c in initial_str[-max_length:]], dtype=torch.long).unsqueeze(0)
        prediction = model(initial_input)
        predicted_index = torch.argmax(prediction, dim=1).item()
        return ix_to_char[predicted_index]

# Predicting the next character
test_str = "This is a simple example to demonstrate how to predict the next char"
predicted_char = predict_next_char(model, char_to_ix, ix_to_char, test_str)
print(f"Predicted next character: '{predicted_char}'")

Predicted next character: 'a'


**MAXIMUM LENGTH OF INPUT SECQUENCES = 30**

In [None]:
# Preparing the dataset
max_length = 30  # Maximum length of input sequences
X = []
y = []
for i in range(len(text) - max_length):
    sequence = text[i:i + max_length]
    label = text[i + max_length]
    X.append([char_to_ix[char] for char in sequence])
    y.append(char_to_ix[label])

X = np.array(X)
y = np.array(y)

In [None]:
# Splitting the dataset into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Converting data to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.long)
y_train = torch.tensor(y_train, dtype=torch.long)
X_val = torch.tensor(X_val, dtype=torch.long)
y_val = torch.tensor(y_val, dtype=torch.long)

In [None]:
# Defining the Transformer model
class CharTransformer(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers, nhead):
        super(CharTransformer, self).__init__()
        self.embedding = nn.Embedding(input_size, hidden_size)
        encoder_layers = nn.TransformerEncoderLayer(hidden_size, nhead)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        embedded = self.embedding(x)
        transformer_output = self.transformer_encoder(embedded)
        output = self.fc(transformer_output[:, -1, :])  # Get the output of the last Transformer block
        return output

In [None]:
# Hyperparameters
hidden_size = 120
num_layers = 3
nhead = 2
learning_rate = 0.001
epochs = 100

In [None]:
# Model, loss, and optimizer
model = CharTransformer(len(chars), hidden_size, len(chars), num_layers, nhead)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

In [None]:
#count trainable parameters of the model
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

count_parameters(model)

1667348

In [None]:
# Training the model
start_time = time.time()
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()
    output = model(X_train)
    loss = criterion(output, y_train)
    loss.backward()
    optimizer.step()

    # Training accuracy
    with torch.no_grad():
        _, predicted_train = torch.max(output, 1)
        train_accuracy = (predicted_train == y_train).float().mean()

    # Validation
    model.eval()
    with torch.no_grad():
        val_output = model(X_val)
        val_loss = criterion(val_output, y_val)
        _, predicted_val = torch.max(val_output, 1)
        val_accuracy = (predicted_val == y_val).float().mean()

    if (epoch+1) % 10 == 0:
        print(f'Epoch {epoch+1}, Training Loss: {loss.item():.4f}, Training Accuracy: {train_accuracy.item():.4f}, Validation Loss: {val_loss.item():.4f}, Validation Accuracy: {val_accuracy.item():.4f}')

end_time = time.time()
training_time = end_time - start_time
print(f"Training time: {training_time} seconds")

Epoch 10, Training Loss: 2.6614, Training Accuracy: 0.2627, Validation Loss: 2.6663, Validation Accuracy: 0.2521
Epoch 20, Training Loss: 2.4267, Training Accuracy: 0.2792, Validation Loss: 2.5325, Validation Accuracy: 0.2331
Epoch 30, Training Loss: 2.3246, Training Accuracy: 0.2728, Validation Loss: 2.4597, Validation Accuracy: 0.2331
Epoch 40, Training Loss: 2.2603, Training Accuracy: 0.2813, Validation Loss: 2.4243, Validation Accuracy: 0.2458
Epoch 50, Training Loss: 2.2335, Training Accuracy: 0.2925, Validation Loss: 2.4265, Validation Accuracy: 0.2500
Epoch 60, Training Loss: 2.2186, Training Accuracy: 0.2919, Validation Loss: 2.4331, Validation Accuracy: 0.2415
Epoch 70, Training Loss: 2.2126, Training Accuracy: 0.2983, Validation Loss: 2.4369, Validation Accuracy: 0.2436
Epoch 80, Training Loss: 2.2022, Training Accuracy: 0.2856, Validation Loss: 2.4384, Validation Accuracy: 0.2458
Epoch 90, Training Loss: 2.1975, Training Accuracy: 0.2866, Validation Loss: 2.4393, Validation 

In [None]:
# Prediction function
def predict_next_char(model, char_to_ix, ix_to_char, initial_str):
    model.eval()
    with torch.no_grad():
        initial_input = torch.tensor([char_to_ix[c] for c in initial_str[-max_length:]], dtype=torch.long).unsqueeze(0)
        prediction = model(initial_input)
        predicted_index = torch.argmax(prediction, dim=1).item()
        return ix_to_char[predicted_index]

# Predicting the next character
test_str = "This is a simple example to demonstrate how to predict the next char"
predicted_char = predict_next_char(model, char_to_ix, ix_to_char, test_str)
print(f"Predicted next character: '{predicted_char}'")

Predicted next character: 'a'
