<a href="https://colab.research.google.com/github/elsaimo/ECGR4106/blob/main/Homework_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Problem 1:

sequence lengths of 10

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import time
from sklearn.model_selection import train_test_split

text = """Next character prediction is a fundamental task in the field of natural language processing (NLP) that involves predicting the next character in a sequence of text based on the characters that precede it. This task is essential for various applications, including text auto-completion, spell checking, and even in the development of sophisticated AI models capable of generating human-like text.
At its core, next character prediction relies on statistical models or deep learning algorithms to analyze a given sequence of text and predict which character is most likely to follow. These predictions are based on patterns and relationships learned from large datasets of text during the training phase of the model.
One of the most popular approaches to next character prediction involves the use of Recurrent Neural Networks (RNNs), and more specifically, a variant called Long Short-Term Memory (LSTM) networks. RNNs are particularly well-suited for sequential data like text, as they can maintain information in 'memory' about previous characters to inform the prediction of the next character. LSTM networks enhance this capability by being able to remember long-term dependencies, making them even more effective for next character prediction tasks.
Training a model for next character prediction involves feeding it large amounts of text data, allowing it to learn the probability of each character's appearance following a sequence of characters. During this training process, the model adjusts its parameters to minimize the difference between its predictions and the actual outcomes, thus improving its predictive accuracy over time.
Once trained, the model can be used to predict the next character in a given piece of text by considering the sequence of characters that precede it. This can enhance user experience in text editing software, improve efficiency in coding environments with auto-completion features, and enable more natural interactions with AI-based chatbots and virtual assistants.
In summary, next character prediction plays a crucial role in enhancing the capabilities of various NLP applications, making text-based interactions more efficient, accurate, and human-like. Through the use of advanced machine learning models like RNNs and LSTMs, next character prediction continues to evolve, opening new possibilities for the future of text-based technology."""

# Creating character vocabulary
chars = sorted(list(set(text)))
ix_to_char = {i: ch for i, ch in enumerate(chars)}
char_to_ix = {ch: i for i, ch in enumerate(chars)}

# Preparing the dataset
max_length = 10  # Maximum length of input sequences
X = []
y = []
for i in range(len(text) - max_length):
    sequence = text[i:i + max_length]
    label = text[i + max_length]
    X.append([char_to_ix[char] for char in sequence])
    y.append(char_to_ix[label])

X = np.array(X)
y = np.array(y)

# Splitting the dataset into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Converting data to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.long)
y_train = torch.tensor(y_train, dtype=torch.long)
X_val = torch.tensor(X_val, dtype=torch.long)
y_val = torch.tensor(y_val, dtype=torch.long)

# Defining the Transformer model
class CharTransformer(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers, nhead):
        super(CharTransformer, self).__init__()
        self.embedding = nn.Embedding(input_size, hidden_size)
        encoder_layers = nn.TransformerEncoderLayer(hidden_size, nhead)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        embedded = self.embedding(x)
        transformer_output = self.transformer_encoder(embedded)
        output = self.fc(transformer_output[:, -1, :])  # Get the output of the last Transformer block
        return output

# Hyperparameters
hidden_size = 256
num_layers = 3
nhead = 2
learning_rate = 0.0001
epochs = 100

# Model, loss, and optimizer
model = CharTransformer(len(chars), hidden_size, len(chars), num_layers, nhead)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Training the model
start_time = time.time()
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()
    output = model(X_train)
    loss = criterion(output, y_train)
    loss.backward()
    optimizer.step()
    end_time = time.time()
    training_time = end_time - start_time

    # Validation
    model.eval()
    with torch.no_grad():
        val_output = model(X_val)
        val_loss = criterion(val_output, y_val)
        _, predicted = torch.max(val_output, 1)
        val_accuracy = (predicted == y_val).float().mean()

    if (epoch+1) % 10 == 0:
        print(f'Epoch {epoch+1}, Loss: {loss.item()}, Validation Loss: {val_loss.item()}, Validation Accuracy: {val_accuracy.item()}')

# Prediction function
def predict_next_char(model, char_to_ix, ix_to_char, initial_str):
    model.eval()
    with torch.no_grad():
        initial_input = torch.tensor([char_to_ix[c] for c in initial_str[-max_length:]], dtype=torch.long).unsqueeze(0)
        prediction = model(initial_input)
        predicted_index = torch.argmax(prediction, dim=1).item()
        return ix_to_char[predicted_index]

# Counting model complexity
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

# Predicting the next character
test_str = "This is a simple example to demonstrate how to predict the next char"
predicted_char = predict_next_char(model, char_to_ix, ix_to_char, test_str)
print(f"Predicted next character: '{predicted_char}'")
print(f"Training time:{training_time} seconds")
print(f"Number of trainable parameters in the model: {count_parameters(model)}")

# Compute computational complexity
embedding_complexity = X_train.shape[1] * hidden_size
rnn_complexity = X_train.shape[1] * hidden_size * hidden_size
linear_complexity = hidden_size * len(chars)

total_complexity = embedding_complexity + rnn_complexity + linear_complexity
print(f"Total computational complexity: {total_complexity}")



Epoch 10, Loss: 2.848618984222412, Validation Loss: 2.737607002258301, Validation Accuracy: 0.2710084021091461
Epoch 20, Loss: 2.584132432937622, Validation Loss: 2.5279035568237305, Validation Accuracy: 0.2857142984867096
Epoch 30, Loss: 2.456926107406616, Validation Loss: 2.4384562969207764, Validation Accuracy: 0.27941176295280457
Epoch 40, Loss: 2.392003297805786, Validation Loss: 2.401851177215576, Validation Accuracy: 0.2710084021091461
Epoch 50, Loss: 2.343634843826294, Validation Loss: 2.380768060684204, Validation Accuracy: 0.2710084021091461
Epoch 60, Loss: 2.3137946128845215, Validation Loss: 2.352088212966919, Validation Accuracy: 0.28151261806488037
Epoch 70, Loss: 2.2839298248291016, Validation Loss: 2.334244966506958, Validation Accuracy: 0.287815123796463
Epoch 80, Loss: 2.263580799102783, Validation Loss: 2.3208870887756348, Validation Accuracy: 0.287815123796463
Epoch 90, Loss: 2.252814769744873, Validation Loss: 2.3126070499420166, Validation Accuracy: 0.281512618064

sequence lengths of 20

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import time
from sklearn.model_selection import train_test_split

text = """Next character prediction is a fundamental task in the field of natural language processing (NLP) that involves predicting the next character in a sequence of text based on the characters that precede it. This task is essential for various applications, including text auto-completion, spell checking, and even in the development of sophisticated AI models capable of generating human-like text.
At its core, next character prediction relies on statistical models or deep learning algorithms to analyze a given sequence of text and predict which character is most likely to follow. These predictions are based on patterns and relationships learned from large datasets of text during the training phase of the model.
One of the most popular approaches to next character prediction involves the use of Recurrent Neural Networks (RNNs), and more specifically, a variant called Long Short-Term Memory (LSTM) networks. RNNs are particularly well-suited for sequential data like text, as they can maintain information in 'memory' about previous characters to inform the prediction of the next character. LSTM networks enhance this capability by being able to remember long-term dependencies, making them even more effective for next character prediction tasks.
Training a model for next character prediction involves feeding it large amounts of text data, allowing it to learn the probability of each character's appearance following a sequence of characters. During this training process, the model adjusts its parameters to minimize the difference between its predictions and the actual outcomes, thus improving its predictive accuracy over time.
Once trained, the model can be used to predict the next character in a given piece of text by considering the sequence of characters that precede it. This can enhance user experience in text editing software, improve efficiency in coding environments with auto-completion features, and enable more natural interactions with AI-based chatbots and virtual assistants.
In summary, next character prediction plays a crucial role in enhancing the capabilities of various NLP applications, making text-based interactions more efficient, accurate, and human-like. Through the use of advanced machine learning models like RNNs and LSTMs, next character prediction continues to evolve, opening new possibilities for the future of text-based technology."""

# Creating character vocabulary
chars = sorted(list(set(text)))
ix_to_char = {i: ch for i, ch in enumerate(chars)}
char_to_ix = {ch: i for i, ch in enumerate(chars)}

# Preparing the dataset
max_length = 20  # Maximum length of input sequences
X = []
y = []
for i in range(len(text) - max_length):
    sequence = text[i:i + max_length]
    label = text[i + max_length]
    X.append([char_to_ix[char] for char in sequence])
    y.append(char_to_ix[label])

X = np.array(X)
y = np.array(y)

# Splitting the dataset into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Converting data to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.long)
y_train = torch.tensor(y_train, dtype=torch.long)
X_val = torch.tensor(X_val, dtype=torch.long)
y_val = torch.tensor(y_val, dtype=torch.long)

# Defining the Transformer model
class CharTransformer(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers, nhead):
        super(CharTransformer, self).__init__()
        self.embedding = nn.Embedding(input_size, hidden_size)
        encoder_layers = nn.TransformerEncoderLayer(hidden_size, nhead)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        embedded = self.embedding(x)
        transformer_output = self.transformer_encoder(embedded)
        output = self.fc(transformer_output[:, -1, :])  # Get the output of the last Transformer block
        return output

# Hyperparameters
hidden_size = 256
num_layers = 3
nhead = 2
learning_rate = 0.0001
epochs = 100

# Model, loss, and optimizer
model = CharTransformer(len(chars), hidden_size, len(chars), num_layers, nhead)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Training the model
start_time = time.time()
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()
    output = model(X_train)
    loss = criterion(output, y_train)
    loss.backward()
    optimizer.step()
    end_time = time.time()
    training_time = end_time - start_time

    # Validation
    model.eval()
    with torch.no_grad():
        val_output = model(X_val)
        val_loss = criterion(val_output, y_val)
        _, predicted = torch.max(val_output, 1)
        val_accuracy = (predicted == y_val).float().mean()

    if (epoch+1) % 10 == 0:
        print(f'Epoch {epoch+1}, Loss: {loss.item()}, Validation Loss: {val_loss.item()}, Validation Accuracy: {val_accuracy.item()}')

# Prediction function
def predict_next_char(model, char_to_ix, ix_to_char, initial_str):
    model.eval()
    with torch.no_grad():
        initial_input = torch.tensor([char_to_ix[c] for c in initial_str[-max_length:]], dtype=torch.long).unsqueeze(0)
        prediction = model(initial_input)
        predicted_index = torch.argmax(prediction, dim=1).item()
        return ix_to_char[predicted_index]

# Counting model complexity
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

# Predicting the next character
test_str = "This is a simple example to demonstrate how to predict the next char"
predicted_char = predict_next_char(model, char_to_ix, ix_to_char, test_str)
print(f"Predicted next character: '{predicted_char}'")
print(f"Training time:{training_time} seconds")
print(f"Number of trainable parameters in the model: {count_parameters(model)}")

# Compute computational complexity
embedding_complexity = X_train.shape[1] * hidden_size
rnn_complexity = X_train.shape[1] * hidden_size * hidden_size
linear_complexity = hidden_size * len(chars)

total_complexity = embedding_complexity + rnn_complexity + linear_complexity
print(f"Total computational complexity: {total_complexity}")



Epoch 10, Loss: 2.8427894115448, Validation Loss: 2.7786269187927246, Validation Accuracy: 0.22995780408382416
Epoch 20, Loss: 2.581909418106079, Validation Loss: 2.5715394020080566, Validation Accuracy: 0.2405063360929489
Epoch 30, Loss: 2.4562981128692627, Validation Loss: 2.4732918739318848, Validation Accuracy: 0.24894514679908752
Epoch 40, Loss: 2.38381028175354, Validation Loss: 2.436664581298828, Validation Accuracy: 0.24894514679908752
Epoch 50, Loss: 2.339353322982788, Validation Loss: 2.4177231788635254, Validation Accuracy: 0.26582279801368713
Epoch 60, Loss: 2.3023862838745117, Validation Loss: 2.4011006355285645, Validation Accuracy: 0.2552742660045624
Epoch 70, Loss: 2.2743022441864014, Validation Loss: 2.3890838623046875, Validation Accuracy: 0.2594936788082123
Epoch 80, Loss: 2.2631657123565674, Validation Loss: 2.3802390098571777, Validation Accuracy: 0.2531645596027374
Epoch 90, Loss: 2.2534725666046143, Validation Loss: 2.373762369155884, Validation Accuracy: 0.25949

sequence lengths of 30

In [3]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import time
from sklearn.model_selection import train_test_split

text = """Next character prediction is a fundamental task in the field of natural language processing (NLP) that involves predicting the next character in a sequence of text based on the characters that precede it. This task is essential for various applications, including text auto-completion, spell checking, and even in the development of sophisticated AI models capable of generating human-like text.
At its core, next character prediction relies on statistical models or deep learning algorithms to analyze a given sequence of text and predict which character is most likely to follow. These predictions are based on patterns and relationships learned from large datasets of text during the training phase of the model.
One of the most popular approaches to next character prediction involves the use of Recurrent Neural Networks (RNNs), and more specifically, a variant called Long Short-Term Memory (LSTM) networks. RNNs are particularly well-suited for sequential data like text, as they can maintain information in 'memory' about previous characters to inform the prediction of the next character. LSTM networks enhance this capability by being able to remember long-term dependencies, making them even more effective for next character prediction tasks.
Training a model for next character prediction involves feeding it large amounts of text data, allowing it to learn the probability of each character's appearance following a sequence of characters. During this training process, the model adjusts its parameters to minimize the difference between its predictions and the actual outcomes, thus improving its predictive accuracy over time.
Once trained, the model can be used to predict the next character in a given piece of text by considering the sequence of characters that precede it. This can enhance user experience in text editing software, improve efficiency in coding environments with auto-completion features, and enable more natural interactions with AI-based chatbots and virtual assistants.
In summary, next character prediction plays a crucial role in enhancing the capabilities of various NLP applications, making text-based interactions more efficient, accurate, and human-like. Through the use of advanced machine learning models like RNNs and LSTMs, next character prediction continues to evolve, opening new possibilities for the future of text-based technology."""

# Creating character vocabulary
chars = sorted(list(set(text)))
ix_to_char = {i: ch for i, ch in enumerate(chars)}
char_to_ix = {ch: i for i, ch in enumerate(chars)}

# Preparing the dataset
max_length = 30  # Maximum length of input sequences
X = []
y = []
for i in range(len(text) - max_length):
    sequence = text[i:i + max_length]
    label = text[i + max_length]
    X.append([char_to_ix[char] for char in sequence])
    y.append(char_to_ix[label])

X = np.array(X)
y = np.array(y)

# Splitting the dataset into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Converting data to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.long)
y_train = torch.tensor(y_train, dtype=torch.long)
X_val = torch.tensor(X_val, dtype=torch.long)
y_val = torch.tensor(y_val, dtype=torch.long)

# Defining the Transformer model
class CharTransformer(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers, nhead):
        super(CharTransformer, self).__init__()
        self.embedding = nn.Embedding(input_size, hidden_size)
        encoder_layers = nn.TransformerEncoderLayer(hidden_size, nhead)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        embedded = self.embedding(x)
        transformer_output = self.transformer_encoder(embedded)
        output = self.fc(transformer_output[:, -1, :])  # Get the output of the last Transformer block
        return output

# Hyperparameters
hidden_size = 256
num_layers = 3
nhead = 2
learning_rate = 0.0001
epochs = 100

# Model, loss, and optimizer
model = CharTransformer(len(chars), hidden_size, len(chars), num_layers, nhead)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Training the model
start_time = time.time()
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()
    output = model(X_train)
    loss = criterion(output, y_train)
    loss.backward()
    optimizer.step()
    end_time = time.time()
    training_time = end_time - start_time

    # Validation
    model.eval()
    with torch.no_grad():
        val_output = model(X_val)
        val_loss = criterion(val_output, y_val)
        _, predicted = torch.max(val_output, 1)
        val_accuracy = (predicted == y_val).float().mean()

    if (epoch+1) % 10 == 0:
        print(f'Epoch {epoch+1}, Loss: {loss.item()}, Validation Loss: {val_loss.item()}, Validation Accuracy: {val_accuracy.item()}')

# Prediction function
def predict_next_char(model, char_to_ix, ix_to_char, initial_str):
    model.eval()
    with torch.no_grad():
        initial_input = torch.tensor([char_to_ix[c] for c in initial_str[-max_length:]], dtype=torch.long).unsqueeze(0)
        prediction = model(initial_input)
        predicted_index = torch.argmax(prediction, dim=1).item()
        return ix_to_char[predicted_index]

# Counting model complexity
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

# Predicting the next character
test_str = "This is a simple example to demonstrate how to predict the next char"
predicted_char = predict_next_char(model, char_to_ix, ix_to_char, test_str)
print(f"Predicted next character: '{predicted_char}'")
print(f"Training time:{training_time} seconds")
print(f"Number of trainable parameters in the model: {count_parameters(model)}")

# Compute computational complexity
embedding_complexity = X_train.shape[1] * hidden_size
rnn_complexity = X_train.shape[1] * hidden_size * hidden_size
linear_complexity = hidden_size * len(chars)

total_complexity = embedding_complexity + rnn_complexity + linear_complexity
print(f"Total computational complexity: {total_complexity}")

Epoch 10, Loss: 2.7806179523468018, Validation Loss: 2.7803876399993896, Validation Accuracy: 0.23516948521137238
Epoch 20, Loss: 2.535264730453491, Validation Loss: 2.5980286598205566, Validation Accuracy: 0.23728813230991364
Epoch 30, Loss: 2.4268832206726074, Validation Loss: 2.5351672172546387, Validation Accuracy: 0.24152542650699615
Epoch 40, Loss: 2.3692431449890137, Validation Loss: 2.503262519836426, Validation Accuracy: 0.2330508530139923
Epoch 50, Loss: 2.329648971557617, Validation Loss: 2.4775948524475098, Validation Accuracy: 0.23516948521137238
Epoch 60, Loss: 2.2977283000946045, Validation Loss: 2.453031301498413, Validation Accuracy: 0.24152542650699615
Epoch 70, Loss: 2.2705113887786865, Validation Loss: 2.441943645477295, Validation Accuracy: 0.24576270580291748
Epoch 80, Loss: 2.259186267852783, Validation Loss: 2.429812431335449, Validation Accuracy: 0.24788135290145874
Epoch 90, Loss: 2.2428393363952637, Validation Loss: 2.42517352104187, Validation Accuracy: 0.24

# Problem 2:

1-

In [4]:
import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.optim as optim
import numpy as np
import requests
import time

# Step 1: Download the dataset
url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
response = requests.get(url)
text = response.text  # This is the entire text data

# Step 2: Prepare the dataset
sequence_length = 20
# Create a character mapping to integers
chars = sorted(list(set(text)))
char_to_int = {ch: i for i, ch in enumerate(chars)}
int_to_char = {i: ch for i, ch in enumerate(chars)}

# Encode the text into integers
encoded_text = [char_to_int[ch] for ch in text]

# Create sequences and targets
sequences = []
targets = []
for i in range(0, len(encoded_text) - sequence_length):
    seq = encoded_text[i:i+sequence_length]
    target = encoded_text[i+sequence_length]
    sequences.append(seq)
    targets.append(target)

# Convert lists to PyTorch tensors
sequences = torch.tensor(sequences, dtype=torch.long)
targets = torch.tensor(targets, dtype=torch.long)

# Step 3: Create a dataset class
class CharDataset(Dataset):
    def __init__(self, sequences, targets):
        self.sequences = sequences
        self.targets = targets

    def __len__(self):
        return len(self.sequences)

    def __getitem__(self, index):
        return self.sequences[index], self.targets[index]

# Instantiate the dataset
dataset = CharDataset(sequences, targets)

# Step 4: Create data loaders
batch_size = 1024
train_size = int(len(dataset) * 0.8)
test_size = len(dataset) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(dataset, [train_size, test_size])

# Check if CUDA is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_loader = DataLoader(train_dataset, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_dataset, shuffle=False, batch_size=batch_size)

# Defining the Transformer model
class CharTransformer(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers, nhead):
        super(CharTransformer, self).__init__()
        self.embedding = nn.Embedding(input_size, hidden_size)
        encoder_layers = nn.TransformerEncoderLayer(hidden_size, nhead)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        embedded = self.embedding(x)
        transformer_output = self.transformer_encoder(embedded.permute(1, 0, 2))  # Permute for Transformer
        output = self.fc(transformer_output[-1, :, :])  # Get the output of the last Transformer block
        return output

# Hyperparameters
hidden_size = 128
num_layers = 2
nhead = 1
learning_rate = 0.001
epochs = 100

# Model, loss, and optimizer
model = CharTransformer(len(chars), hidden_size, len(chars), num_layers, nhead).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Training the model
start_time = time.time()
for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)  # Move to GPU if available
        optimizer.zero_grad()
        output = model(inputs)
        loss = criterion(output, targets)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

    # Validation
    model.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, targets in test_loader:
            inputs, targets = inputs.to(device), targets.to(device)  # Move to GPU if available
            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)
            total += targets.size(0)
            correct += (predicted == targets).sum().item()
            val_loss += criterion(outputs, targets).item()

    end_time = time.time()
    training_time = end_time - start_time

    # Print statistics
    epoch_loss = running_loss / len(train_loader)
    epoch_val_loss = val_loss / len(test_loader)
    val_accuracy = correct / total
    if (epoch+1) % 10 == 0:
        print(f'Epoch {epoch+1}, Loss: {epoch_loss}, Validation Loss: {epoch_val_loss}, Validation Accuracy: {val_accuracy}')

# Prediction function
def predict_next_char(model, char_to_ix, ix_to_char, initial_str):
    model.eval()
    with torch.no_grad():
        initial_input = torch.tensor([char_to_int[c] for c in initial_str[-sequence_length:]], dtype=torch.long).unsqueeze(0).to(device)
        prediction = model(initial_input)
        predicted_index = torch.argmax(prediction, dim=1).item()
        return int_to_char[predicted_index]

# Counting model complexity
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

# Predicting the next character
test_str = "This is a simple example to demonstrate how to predict the next char"
predicted_char = predict_next_char(model, char_to_int, int_to_char, test_str)
print(f"Predicted next character: '{predicted_char}'")
print(f"Training time:{training_time} seconds")
print(f"Number of trainable parameters in the model: {count_parameters(model)}")

# Compute computational complexity
total_complexity = 0
for p in model.parameters():
    total_complexity += torch.prod(torch.tensor(p.shape))
print(f"Total trainable parameters in the model: {total_complexity}")

Epoch 10, Loss: 2.142408851910075, Validation Loss: 2.1330088781654286, Validation Accuracy: 0.361797601703463
Epoch 20, Loss: 2.0819983367526205, Validation Loss: 2.1063197457462275, Validation Accuracy: 0.36989353356494453
Epoch 30, Loss: 2.043982806282306, Validation Loss: 2.100745919647567, Validation Accuracy: 0.37206320744144344
Epoch 40, Loss: 2.0162520732628098, Validation Loss: 2.101905262251513, Validation Accuracy: 0.37256976353244425
Epoch 50, Loss: 1.9936746655527604, Validation Loss: 2.106878588505841, Validation Accuracy: 0.3728880421382943
Epoch 60, Loss: 1.9754796036339681, Validation Loss: 2.1117652742140884, Validation Accuracy: 0.3728252829765774
Epoch 70, Loss: 1.9597089430060954, Validation Loss: 2.1184874980821524, Validation Accuracy: 0.3716821696738765
Epoch 80, Loss: 1.9457861902790332, Validation Loss: 2.116636033998717, Validation Accuracy: 0.37302252605625913
Epoch 90, Loss: 1.9338065287388793, Validation Loss: 2.122593520978175, Validation Accuracy: 0.3716

2-

In [5]:
import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.optim as optim
import numpy as np
import requests
import time

# Step 1: Download the dataset
url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
response = requests.get(url)
text = response.text  # This is the entire text data

# Step 2: Prepare the dataset
sequence_length = 30
# Create a character mapping to integers
chars = sorted(list(set(text)))
char_to_int = {ch: i for i, ch in enumerate(chars)}
int_to_char = {i: ch for i, ch in enumerate(chars)}

# Encode the text into integers
encoded_text = [char_to_int[ch] for ch in text]

# Create sequences and targets
sequences = []
targets = []
for i in range(0, len(encoded_text) - sequence_length):
    seq = encoded_text[i:i+sequence_length]
    target = encoded_text[i+sequence_length]
    sequences.append(seq)
    targets.append(target)

# Convert lists to PyTorch tensors
sequences = torch.tensor(sequences, dtype=torch.long)
targets = torch.tensor(targets, dtype=torch.long)

# Step 3: Create a dataset class
class CharDataset(Dataset):
    def __init__(self, sequences, targets):
        self.sequences = sequences
        self.targets = targets

    def __len__(self):
        return len(self.sequences)

    def __getitem__(self, index):
        return self.sequences[index], self.targets[index]

# Instantiate the dataset
dataset = CharDataset(sequences, targets)

# Step 4: Create data loaders
batch_size = 128
train_size = int(len(dataset) * 0.8)
test_size = len(dataset) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(dataset, [train_size, test_size])

# Check if CUDA is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_loader = DataLoader(train_dataset, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_dataset, shuffle=False, batch_size=batch_size)

# Defining the Transformer model
class CharTransformer(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers, nhead):
        super(CharTransformer, self).__init__()
        self.embedding = nn.Embedding(input_size, hidden_size)
        encoder_layers = nn.TransformerEncoderLayer(hidden_size, nhead)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        embedded = self.embedding(x)
        transformer_output = self.transformer_encoder(embedded.permute(1, 0, 2))  # Permute for Transformer
        output = self.fc(transformer_output[-1, :, :])  # Get the output of the last Transformer block
        return output

# Hyperparameters
hidden_size = 256
num_layers = 3
nhead = 2
learning_rate = 0.0001
epochs = 100

# Model, loss, and optimizer
model = CharTransformer(len(chars), hidden_size, len(chars), num_layers, nhead).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Training the model
start_time = time.time()
for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)  # Move to GPU if available
        optimizer.zero_grad()
        output = model(inputs)
        loss = criterion(output, targets)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

    # Validation
    model.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, targets in test_loader:
            inputs, targets = inputs.to(device), targets.to(device)  # Move to GPU if available
            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)
            total += targets.size(0)
            correct += (predicted == targets).sum().item()
            val_loss += criterion(outputs, targets).item()

    end_time = time.time()
    training_time = end_time - start_time

    # Print statistics
    epoch_loss = running_loss / len(train_loader)
    epoch_val_loss = val_loss / len(test_loader)
    val_accuracy = correct / total
    if (epoch+1) % 10 == 0:
        print(f'Epoch {epoch+1}, Loss: {epoch_loss}, Validation Loss: {epoch_val_loss}, Validation Accuracy: {val_accuracy}')

# Prediction function
def predict_next_char(model, char_to_ix, ix_to_char, initial_str):
    model.eval()
    with torch.no_grad():
        initial_input = torch.tensor([char_to_int[c] for c in initial_str[-sequence_length:]], dtype=torch.long).unsqueeze(0).to(device)
        prediction = model(initial_input)
        predicted_index = torch.argmax(prediction, dim=1).item()
        return int_to_char[predicted_index]

# Counting model complexity
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

# Predicting the next character
test_str = "This is a simple example to demonstrate how to predict the next char"
predicted_char = predict_next_char(model, char_to_int, int_to_char, test_str)
print(f"Predicted next character: '{predicted_char}'")
print(f"Training time:{training_time} seconds")
print(f"Number of trainable parameters in the model: {count_parameters(model)}")

# Compute computational complexity
total_complexity = 0
for p in model.parameters():
    total_complexity += torch.prod(torch.tensor(p.shape))
print(f"Total trainable parameters in the model: {total_complexity}")

Epoch 10, Loss: 2.1566100698637674, Validation Loss: 2.1728921359254243, Validation Accuracy: 0.349419248407472
Epoch 20, Loss: 2.068687466886354, Validation Loss: 2.1750686431984345, Validation Accuracy: 0.35204170831969805
Epoch 30, Loss: 1.9842122393610555, Validation Loss: 2.210665318313441, Validation Accuracy: 0.3465323010852949
Epoch 40, Loss: 1.9076208169270163, Validation Loss: 2.2592574828531435, Validation Accuracy: 0.34182980459311524
Epoch 50, Loss: 1.84135387385637, Validation Loss: 2.314776359586557, Validation Accuracy: 0.335728662814415
Epoch 60, Loss: 1.783902613513441, Validation Loss: 2.3641325835858824, Validation Accuracy: 0.3347648527612037
Epoch 70, Loss: 1.7349737473516134, Validation Loss: 2.4176036843744697, Validation Accuracy: 0.3306720221631484
Epoch 80, Loss: 1.69286871736103, Validation Loss: 2.453466584268002, Validation Accuracy: 0.3273547224451189
Epoch 90, Loss: 1.6557343756407072, Validation Loss: 2.497006590791769, Validation Accuracy: 0.3224325669

3-

In [6]:
import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.optim as optim
import numpy as np
import requests
import time

# Step 1: Download the dataset
url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
response = requests.get(url)
text = response.text  # This is the entire text data

# Step 2: Prepare the dataset
sequence_length = 50
# Create a character mapping to integers
chars = sorted(list(set(text)))
char_to_int = {ch: i for i, ch in enumerate(chars)}
int_to_char = {i: ch for i, ch in enumerate(chars)}

# Encode the text into integers
encoded_text = [char_to_int[ch] for ch in text]

# Create sequences and targets
sequences = []
targets = []
for i in range(0, len(encoded_text) - sequence_length):
    seq = encoded_text[i:i+sequence_length]
    target = encoded_text[i+sequence_length]
    sequences.append(seq)
    targets.append(target)

# Convert lists to PyTorch tensors
sequences = torch.tensor(sequences, dtype=torch.long)
targets = torch.tensor(targets, dtype=torch.long)

# Step 3: Create a dataset class
class CharDataset(Dataset):
    def __init__(self, sequences, targets):
        self.sequences = sequences
        self.targets = targets

    def __len__(self):
        return len(self.sequences)

    def __getitem__(self, index):
        return self.sequences[index], self.targets[index]

# Instantiate the dataset
dataset = CharDataset(sequences, targets)

# Step 4: Create data loaders
batch_size = 512
train_size = int(len(dataset) * 0.8)
test_size = len(dataset) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(dataset, [train_size, test_size])

# Check if CUDA is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_loader = DataLoader(train_dataset, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_dataset, shuffle=False, batch_size=batch_size)

# Defining the Transformer model
class CharTransformer(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers, nhead):
        super(CharTransformer, self).__init__()
        self.embedding = nn.Embedding(input_size, hidden_size)
        encoder_layers = nn.TransformerEncoderLayer(hidden_size, nhead)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        embedded = self.embedding(x)
        transformer_output = self.transformer_encoder(embedded.permute(1, 0, 2))  # Permute for Transformer
        output = self.fc(transformer_output[-1, :, :])  # Get the output of the last Transformer block
        return output

# Hyperparameters
hidden_size = 128
num_layers = 3
nhead = 2
learning_rate = 0.0001
epochs = 100

# Model, loss, and optimizer
model = CharTransformer(len(chars), hidden_size, len(chars), num_layers, nhead).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Training the model
start_time = time.time()
for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)  # Move to GPU if available
        optimizer.zero_grad()
        output = model(inputs)
        loss = criterion(output, targets)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

    # Validation
    model.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, targets in test_loader:
            inputs, targets = inputs.to(device), targets.to(device)  # Move to GPU if available
            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)
            total += targets.size(0)
            correct += (predicted == targets).sum().item()
            val_loss += criterion(outputs, targets).item()

    end_time = time.time()
    training_time = end_time - start_time

    # Print statistics
    epoch_loss = running_loss / len(train_loader)
    epoch_val_loss = val_loss / len(test_loader)
    val_accuracy = correct / total
    if (epoch+1) % 10 == 0:
        print(f'Epoch {epoch+1}, Loss: {epoch_loss}, Validation Loss: {epoch_val_loss}, Validation Accuracy: {val_accuracy}')

# Prediction function
def predict_next_char(model, char_to_ix, ix_to_char, initial_str):
    model.eval()
    with torch.no_grad():
        initial_input = torch.tensor([char_to_int[c] for c in initial_str[-sequence_length:]], dtype=torch.long).unsqueeze(0).to(device)
        prediction = model(initial_input)
        predicted_index = torch.argmax(prediction, dim=1).item()
        return int_to_char[predicted_index]

# Counting model complexity
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

# Predicting the next character
test_str = "This is a simple example to demonstrate how to predict the next char"
predicted_char = predict_next_char(model, char_to_int, int_to_char, test_str)
print(f"Predicted next character: '{predicted_char}'")
print(f"Training time:{training_time} seconds")
print(f"Number of trainable parameters in the model: {count_parameters(model)}")

# Compute computational complexity
total_complexity = 0
for p in model.parameters():
    total_complexity += torch.prod(torch.tensor(p.shape))
print(f"Total trainable parameters in the model: {total_complexity}")

Epoch 10, Loss: 2.2936684858231318, Validation Loss: 2.279041416054472, Validation Accuracy: 0.3193361695260211
Epoch 20, Loss: 2.2414844551786772, Validation Loss: 2.2446503026769795, Validation Accuracy: 0.3278985426034097
Epoch 30, Loss: 2.2080745593063873, Validation Loss: 2.2300227672681894, Validation Accuracy: 0.33130107724515734
Epoch 40, Loss: 2.1793267502568607, Validation Loss: 2.228518009185791, Validation Accuracy: 0.33255629424079547
Epoch 50, Loss: 2.151636112372354, Validation Loss: 2.231611459627064, Validation Accuracy: 0.3326504355154683
Epoch 60, Loss: 2.1247686391853424, Validation Loss: 2.2416981103223397, Validation Accuracy: 0.3312248676418507
Epoch 70, Loss: 2.099016552606124, Validation Loss: 2.2515804805886854, Validation Accuracy: 0.3306645029116551
Epoch 80, Loss: 2.073945065920748, Validation Loss: 2.261345985285733, Validation Accuracy: 0.3282168297701608
Epoch 90, Loss: 2.051377202133577, Validation Loss: 2.2764915484900867, Validation Accuracy: 0.326961

# Problem 3:

1

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

# English to French dataset
english_to_french = [
    ("I am cold", "J'ai froid"),
    ("You are tired", "Tu es fatigué"),
    ("He is hungry", "Il a faim"),
    ("She is happy", "Elle est heureuse"),
    ("We are friends", "Nous sommes amis"),
    ("They are students", "Ils sont étudiants"),
    ("The cat is sleeping", "Le chat dort"),
    ("The sun is shining", "Le soleil brille"),
    ("We love music", "Nous aimons la musique"),
    ("She speaks French fluently", "Elle parle français couramment"),
    ("He enjoys reading books", "Il aime lire des livres"),
    ("They play soccer every weekend", "Ils jouent au football chaque week-end"),
    ("The movie starts at 7 PM", "Le film commence à 19 heures"),
    ("She wears a red dress", "Elle porte une robe rouge"),
    ("We cook dinner together", "Nous cuisinons le dîner ensemble"),
    ("He drives a blue car", "Il conduit une voiture bleue"),
    ("They visit museums often", "Ils visitent souvent des musées"),
    ("The restaurant serves delicious food", "Le restaurant sert une délicieuse cuisine"),
    ("She studies mathematics at university", "Elle étudie les mathématiques à l'université"),
    ("We watch movies on Fridays", "Nous regardons des films le vendredi"),
    ("He listens to music while jogging", "Il écoute de la musique en faisant du jogging"),
    ("They travel around the world", "Ils voyagent autour du monde"),
    ("The book is on the table", "Le livre est sur la table"),
    ("She dances gracefully", "Elle danse avec grâce"),
    ("We celebrate birthdays with cake", "Nous célébrons les anniversaires avec un gâteau"),
    ("He works hard every day", "Il travaille dur tous les jours"),
    ("They speak different languages", "Ils parlent différentes langues"),
    ("The flowers bloom in spring", "Les fleurs fleurissent au printemps"),
    ("She writes poetry in her free time", "Elle écrit de la poésie pendant son temps libre"),
    ("We learn something new every day", "Nous apprenons quelque chose de nouveau chaque jour"),
    ("The dog barks loudly", "Le chien aboie bruyamment"),
    ("He sings beautifully", "Il chante magnifiquement"),
    ("They swim in the pool", "Ils nagent dans la piscine"),
    ("The birds chirp in the morning", "Les oiseaux gazouillent le matin"),
    ("She teaches English at school", "Elle enseigne l'anglais à l'école"),
    ("We eat breakfast together", "Nous prenons le petit déjeuner ensemble"),
    ("He paints landscapes", "Il peint des paysages"),
    ("They laugh at the joke", "Ils rient de la blague"),
    ("The clock ticks loudly", "L'horloge tic-tac bruyamment"),
    ("She runs in the park", "Elle court dans le parc"),
    ("We travel by train", "Nous voyageons en train"),
    ("He writes a letter", "Il écrit une lettre"),
    ("They read books at the library", "Ils lisent des livres à la bibliothèque"),
    ("The baby cries", "Le bébé pleure"),
    ("She studies hard for exams", "Elle étudie dur pour les examens"),
    ("We plant flowers in the garden", "Nous plantons des fleurs dans le jardin"),
    ("He fixes the car", "Il répare la voiture"),
    ("They drink coffee in the morning", "Ils boivent du café le matin"),
    ("The sun sets in the evening", "Le soleil se couche le soir"),
    ("She dances at the party", "Elle danse à la fête"),
    ("We play music at the concert", "Nous jouons de la musique au concert"),
    ("He cooks dinner for his family", "Il cuisine le dîner pour sa famille"),
    ("They study French grammar", "Ils étudient la grammaire française"),
    ("The rain falls gently", "La pluie tombe doucement"),
    ("She sings a song", "Elle chante une chanson"),
    ("We watch a movie together", "Nous regardons un film ensemble"),
    ("He sleeps deeply", "Il dort profondément"),
    ("They travel to Paris", "Ils voyagent à Paris"),
    ("The children play in the park", "Les enfants jouent dans le parc"),
    ("She walks along the beach", "Elle se promène le long de la plage"),
    ("We talk on the phone", "Nous parlons au téléphone"),
    ("He waits for the bus", "Il attend le bus"),
    ("They visit the Eiffel Tower", "Ils visitent la tour Eiffel"),
    ("The stars twinkle at night", "Les étoiles scintillent la nuit"),
    ("She dreams of flying", "Elle rêve de voler"),
    ("We work in the office", "Nous travaillons au bureau"),
    ("He studies history", "Il étudie l'histoire"),
    ("They listen to the radio", "Ils écoutent la radio"),
    ("The wind blows gently", "Le vent souffle doucement"),
    ("She swims in the ocean", "Elle nage dans l'océan"),
    ("We dance at the wedding", "Nous dansons au mariage"),
    ("He climbs the mountain", "Il gravit la montagne"),
    ("They hike in the forest", "Ils font de la randonnée dans la forêt"),
    ("The cat meows loudly", "Le chat miaule bruyamment"),
    ("She paints a picture", "Elle peint un tableau"),
    ("We build a sandcastle", "Nous construisons un château de sable"),
    ("He sings in the choir", "Il chante dans le chœur")
]
# Special tokens for the start and end of sequences
SOS_token = 0  # Start Of Sequence Token
EOS_token = 1  # End Of Sequence Token
max_length = 12

# Preparing the character to index mapping and vice versa for English and French
def build_vocab(sentences):
    vocab = set()
    for pair in sentences:
        english_sentence, french_sentence = pair
        for word in english_sentence.split():
            vocab.add(word)
        for word in french_sentence.split():
            vocab.add(word)
    return vocab

english_vocab = build_vocab(english_to_french)
french_vocab = english_vocab

char_to_index_english = {"SOS": SOS_token, "EOS": EOS_token, **{char: i+2 for i, char in enumerate(sorted(list(english_vocab)))}}
index_to_char_english = {i: char for char, i in char_to_index_english.items()}

char_to_index_french = {"SOS": SOS_token, "EOS": EOS_token, **{char: i+2 for i, char in enumerate(sorted(list(french_vocab)))}}
index_to_char_french = {i: char for char, i in char_to_index_french.items()}

class EnglishFrenchDataset(Dataset):
    """Custom Dataset class for handling English-French sentence pairs."""
    def __init__(self, dataset, char_to_index_english, char_to_index_french):
        self.dataset = dataset
        self.char_to_index_english = char_to_index_english
        self.char_to_index_french = char_to_index_french

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        english_sentence, french_sentence = self.dataset[idx]
        english_tensor = torch.tensor([self.char_to_index_english[word] for word in english_sentence.split()] + [EOS_token], dtype=torch.long)
        french_tensor = torch.tensor([self.char_to_index_french[word] for word in french_sentence.split()] + [EOS_token], dtype=torch.long)
        return english_tensor, french_tensor

english_french_dataset = EnglishFrenchDataset(english_to_french, char_to_index_english, char_to_index_french)
dataloader = DataLoader(english_french_dataset, batch_size=1, shuffle=True)

# Define the Transformer model
class TranslatorTransformer(nn.Module):
    def __init__(self, input_size, output_size, hidden_size, num_layers, nhead):
        super(TranslatorTransformer, self).__init__()
        self.embedding = nn.Embedding(input_size, hidden_size)
        encoder_layers = nn.TransformerEncoderLayer(hidden_size, nhead)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        embedded = self.embedding(x)
        transformer_output = self.transformer_encoder(embedded)
        output = self.fc(transformer_output)
        return output

# Hyperparameters
hidden_size = 128
num_layers = 3
nhead = 2
learning_rate = 0.0001
epochs = 100

# Model, loss, and optimizer
model = TranslatorTransformer(len(english_vocab) + 2, len(french_vocab) + 2, hidden_size, num_layers, nhead)  # Add 2 for SOS and EOS tokens
criterion = nn.CrossEntropyLoss(ignore_index=0)  # Ignore padding index
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

for epoch in range(epochs):
    model.train()
    total_loss = 0.0
    for english_tensor, french_tensor in dataloader:
        optimizer.zero_grad()
        output = model(english_tensor)

        # Adjust the target tensor to include the whole french_tensor sequence but ensure it matches output length
        target_tensor = torch.cat((torch.ones_like(french_tensor[:, :1]) * SOS_token, french_tensor), dim=1)

        # Ensure the target tensor matches the output size if it's longer
        if target_tensor.size(1) > output.size(1):
            target_tensor = target_tensor[:, :output.size(1)]

        # Loss calculation - ensure target tensor is correct size
        loss = criterion(output.transpose(1, 2), target_tensor)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    if (epoch + 1) % 10 == 0:
        average_loss = total_loss / len(dataloader)
        print(f'Epoch {epoch + 1}, Training Loss: {average_loss:.4f}')

def translate_english_to_french(model, eng_sentence):
    model.eval()
    with torch.no_grad():
        eng_tensor = torch.tensor([char_to_index_english[word] for word in eng_sentence.split()] + [EOS_token], dtype=torch.long)
        output = model(eng_tensor.unsqueeze(0))  # Unsqueeze to add batch dimension
        _, predicted_indices = torch.max(output, dim=2)
        predicted_words = []
        for idx in predicted_indices.squeeze():
            if idx == EOS_token:
                break
            predicted_words.append(index_to_char_french[idx.item()])
        return ' '.join(predicted_words)

def evaluate_and_show_examples(model, dataloader, criterion, n_examples=5):
    model.eval()
    total_loss = 0
    total_sentences = 0
    correct_predictions = 0

    with torch.no_grad():
        for i, (input_tensor_english, target_tensor_french) in enumerate(dataloader):
            output = model(input_tensor_english)
            target_tensor = torch.cat((torch.full((target_tensor_french.size(0), 1), SOS_token, dtype=torch.long), target_tensor_french), dim=1)

            if target_tensor.size(1) > output.size(1):
                target_tensor = target_tensor[:, :output.size(1)]

            loss = criterion(output.transpose(1, 2), target_tensor)
            total_loss += loss.item()

            _, predicted_indices = torch.max(output, dim=2)
            correct_predictions += (predicted_indices == target_tensor).all(dim=1).sum().item()
            total_sentences += target_tensor.size(0)

            if i < n_examples:
                predicted_words = [index_to_char_french.get(idx.item(), '<UNK>') for idx in predicted_indices[0] if idx not in (SOS_token, EOS_token)]
                target_words = [index_to_char_french.get(idx.item(), '<UNK>') for idx in target_tensor_french[0] if idx not in (SOS_token, EOS_token)]
                input_words = [index_to_char_english.get(idx.item(), '<UNK>') for idx in input_tensor_english[0] if idx not in (SOS_token, EOS_token)]
                print(f'Input: {" ".join(input_words)}, Target: {" ".join(target_words)}, Predicted: {" ".join(predicted_words)}')

        average_loss = total_loss / len(dataloader)
        accuracy = correct_predictions / total_sentences
        print(f'Evaluation Loss: {average_loss:.4f}, Accuracy: {accuracy:.4f}')

# If using GPU, make sure your model is transferred to GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

evaluate_and_show_examples(model, dataloader, criterion, n_examples=5)

Epoch 10, Training Loss: 2.6836
Epoch 20, Training Loss: 1.9385
Epoch 30, Training Loss: 1.7398
Epoch 40, Training Loss: 1.6457
Epoch 50, Training Loss: 1.5904
Epoch 60, Training Loss: 1.5695
Epoch 70, Training Loss: 1.5399
Epoch 80, Training Loss: 1.5412
Epoch 90, Training Loss: 1.5087
Epoch 100, Training Loss: 1.5123
Input: You are tired, Target: Tu es fatigué, Predicted: la Nous es
Input: He waits for the bus, Target: Il attend le bus, Predicted: la Il attend dans bus
Input: He paints landscapes, Target: Il peint des paysages, Predicted: la Il peint
Input: He writes a letter, Target: Il écrit une lettre, Predicted: la Il construisons une
Input: She dances at the party, Target: Elle danse à la fête, Predicted: Ils Elle danse dans la
Evaluation Loss: 1.4244, Accuracy: 0.0000


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

# English to French dataset
english_to_french = [
    ("I am cold", "J'ai froid"),
    ("You are tired", "Tu es fatigué"),
    ("He is hungry", "Il a faim"),
    ("She is happy", "Elle est heureuse"),
    ("We are friends", "Nous sommes amis"),
    ("They are students", "Ils sont étudiants"),
    ("The cat is sleeping", "Le chat dort"),
    ("The sun is shining", "Le soleil brille"),
    ("We love music", "Nous aimons la musique"),
    ("She speaks French fluently", "Elle parle français couramment"),
    ("He enjoys reading books", "Il aime lire des livres"),
    ("They play soccer every weekend", "Ils jouent au football chaque week-end"),
    ("The movie starts at 7 PM", "Le film commence à 19 heures"),
    ("She wears a red dress", "Elle porte une robe rouge"),
    ("We cook dinner together", "Nous cuisinons le dîner ensemble"),
    ("He drives a blue car", "Il conduit une voiture bleue"),
    ("They visit museums often", "Ils visitent souvent des musées"),
    ("The restaurant serves delicious food", "Le restaurant sert une délicieuse cuisine"),
    ("She studies mathematics at university", "Elle étudie les mathématiques à l'université"),
    ("We watch movies on Fridays", "Nous regardons des films le vendredi"),
    ("He listens to music while jogging", "Il écoute de la musique en faisant du jogging"),
    ("They travel around the world", "Ils voyagent autour du monde"),
    ("The book is on the table", "Le livre est sur la table"),
    ("She dances gracefully", "Elle danse avec grâce"),
    ("We celebrate birthdays with cake", "Nous célébrons les anniversaires avec un gâteau"),
    ("He works hard every day", "Il travaille dur tous les jours"),
    ("They speak different languages", "Ils parlent différentes langues"),
    ("The flowers bloom in spring", "Les fleurs fleurissent au printemps"),
    ("She writes poetry in her free time", "Elle écrit de la poésie pendant son temps libre"),
    ("We learn something new every day", "Nous apprenons quelque chose de nouveau chaque jour"),
    ("The dog barks loudly", "Le chien aboie bruyamment"),
    ("He sings beautifully", "Il chante magnifiquement"),
    ("They swim in the pool", "Ils nagent dans la piscine"),
    ("The birds chirp in the morning", "Les oiseaux gazouillent le matin"),
    ("She teaches English at school", "Elle enseigne l'anglais à l'école"),
    ("We eat breakfast together", "Nous prenons le petit déjeuner ensemble"),
    ("He paints landscapes", "Il peint des paysages"),
    ("They laugh at the joke", "Ils rient de la blague"),
    ("The clock ticks loudly", "L'horloge tic-tac bruyamment"),
    ("She runs in the park", "Elle court dans le parc"),
    ("We travel by train", "Nous voyageons en train"),
    ("He writes a letter", "Il écrit une lettre"),
    ("They read books at the library", "Ils lisent des livres à la bibliothèque"),
    ("The baby cries", "Le bébé pleure"),
    ("She studies hard for exams", "Elle étudie dur pour les examens"),
    ("We plant flowers in the garden", "Nous plantons des fleurs dans le jardin"),
    ("He fixes the car", "Il répare la voiture"),
    ("They drink coffee in the morning", "Ils boivent du café le matin"),
    ("The sun sets in the evening", "Le soleil se couche le soir"),
    ("She dances at the party", "Elle danse à la fête"),
    ("We play music at the concert", "Nous jouons de la musique au concert"),
    ("He cooks dinner for his family", "Il cuisine le dîner pour sa famille"),
    ("They study French grammar", "Ils étudient la grammaire française"),
    ("The rain falls gently", "La pluie tombe doucement"),
    ("She sings a song", "Elle chante une chanson"),
    ("We watch a movie together", "Nous regardons un film ensemble"),
    ("He sleeps deeply", "Il dort profondément"),
    ("They travel to Paris", "Ils voyagent à Paris"),
    ("The children play in the park", "Les enfants jouent dans le parc"),
    ("She walks along the beach", "Elle se promène le long de la plage"),
    ("We talk on the phone", "Nous parlons au téléphone"),
    ("He waits for the bus", "Il attend le bus"),
    ("They visit the Eiffel Tower", "Ils visitent la tour Eiffel"),
    ("The stars twinkle at night", "Les étoiles scintillent la nuit"),
    ("She dreams of flying", "Elle rêve de voler"),
    ("We work in the office", "Nous travaillons au bureau"),
    ("He studies history", "Il étudie l'histoire"),
    ("They listen to the radio", "Ils écoutent la radio"),
    ("The wind blows gently", "Le vent souffle doucement"),
    ("She swims in the ocean", "Elle nage dans l'océan"),
    ("We dance at the wedding", "Nous dansons au mariage"),
    ("He climbs the mountain", "Il gravit la montagne"),
    ("They hike in the forest", "Ils font de la randonnée dans la forêt"),
    ("The cat meows loudly", "Le chat miaule bruyamment"),
    ("She paints a picture", "Elle peint un tableau"),
    ("We build a sandcastle", "Nous construisons un château de sable"),
    ("He sings in the choir", "Il chante dans le chœur")
]
# Special tokens for the start and end of sequences
SOS_token = 0  # Start Of Sequence Token
EOS_token = 1  # End Of Sequence Token
max_length = 12

# Preparing the character to index mapping and vice versa for English and French
def build_vocab(sentences):
    vocab = set()
    for pair in sentences:
        english_sentence, french_sentence = pair
        for word in english_sentence.split():
            vocab.add(word)
        for word in french_sentence.split():
            vocab.add(word)
    return vocab

english_vocab = build_vocab(english_to_french)
french_vocab = english_vocab

char_to_index_english = {"SOS": SOS_token, "EOS": EOS_token, **{char: i+2 for i, char in enumerate(sorted(list(english_vocab)))}}
index_to_char_english = {i: char for char, i in char_to_index_english.items()}

char_to_index_french = {"SOS": SOS_token, "EOS": EOS_token, **{char: i+2 for i, char in enumerate(sorted(list(french_vocab)))}}
index_to_char_french = {i: char for char, i in char_to_index_french.items()}

class EnglishFrenchDataset(Dataset):
    """Custom Dataset class for handling English-French sentence pairs."""
    def __init__(self, dataset, char_to_index_english, char_to_index_french):
        self.dataset = dataset
        self.char_to_index_english = char_to_index_english
        self.char_to_index_french = char_to_index_french

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        english_sentence, french_sentence = self.dataset[idx]
        english_tensor = torch.tensor([self.char_to_index_english[word] for word in english_sentence.split()] + [EOS_token], dtype=torch.long)
        french_tensor = torch.tensor([self.char_to_index_french[word] for word in french_sentence.split()] + [EOS_token], dtype=torch.long)
        return english_tensor, french_tensor

english_french_dataset = EnglishFrenchDataset(english_to_french, char_to_index_english, char_to_index_french)
dataloader = DataLoader(english_french_dataset, batch_size=1, shuffle=True)

# Define the Transformer model
class TranslatorTransformer(nn.Module):
    def __init__(self, input_size, output_size, hidden_size, num_layers, nhead):
        super(TranslatorTransformer, self).__init__()
        self.embedding = nn.Embedding(input_size, hidden_size)
        encoder_layers = nn.TransformerEncoderLayer(hidden_size, nhead)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        embedded = self.embedding(x)
        transformer_output = self.transformer_encoder(embedded)
        output = self.fc(transformer_output)
        return output

# Hyperparameters
hidden_size = 128
num_layers = 5
nhead = 4
learning_rate = 0.0001
epochs = 100

# Model, loss, and optimizer
model = TranslatorTransformer(len(english_vocab) + 2, len(french_vocab) + 2, hidden_size, num_layers, nhead)  # Add 2 for SOS and EOS tokens
criterion = nn.CrossEntropyLoss(ignore_index=0)  # Ignore padding index
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

for epoch in range(epochs):
    model.train()
    total_loss = 0.0
    for english_tensor, french_tensor in dataloader:
        optimizer.zero_grad()
        output = model(english_tensor)

        # Adjust the target tensor to include the whole french_tensor sequence but ensure it matches output length
        target_tensor = torch.cat((torch.ones_like(french_tensor[:, :1]) * SOS_token, french_tensor), dim=1)

        # Ensure the target tensor matches the output size if it's longer
        if target_tensor.size(1) > output.size(1):
            target_tensor = target_tensor[:, :output.size(1)]

        # Loss calculation - ensure target tensor is correct size
        loss = criterion(output.transpose(1, 2), target_tensor)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    if (epoch + 1) % 10 == 0:
        average_loss = total_loss / len(dataloader)
        print(f'Epoch {epoch + 1}, Training Loss: {average_loss:.4f}')

def translate_english_to_french(model, eng_sentence):
    model.eval()
    with torch.no_grad():
        eng_tensor = torch.tensor([char_to_index_english[word] for word in eng_sentence.split()] + [EOS_token], dtype=torch.long)
        output = model(eng_tensor.unsqueeze(0))  # Unsqueeze to add batch dimension
        _, predicted_indices = torch.max(output, dim=2)
        predicted_words = []
        for idx in predicted_indices.squeeze():
            if idx == EOS_token:
                break
            predicted_words.append(index_to_char_french[idx.item()])
        return ' '.join(predicted_words)

def evaluate_and_show_examples(model, dataloader, criterion, n_examples=5):
    model.eval()
    total_loss = 0
    total_sentences = 0
    correct_predictions = 0

    with torch.no_grad():
        for i, (input_tensor_english, target_tensor_french) in enumerate(dataloader):
            output = model(input_tensor_english)
            target_tensor = torch.cat((torch.full((target_tensor_french.size(0), 1), SOS_token, dtype=torch.long), target_tensor_french), dim=1)

            if target_tensor.size(1) > output.size(1):
                target_tensor = target_tensor[:, :output.size(1)]

            loss = criterion(output.transpose(1, 2), target_tensor)
            total_loss += loss.item()

            _, predicted_indices = torch.max(output, dim=2)
            correct_predictions += (predicted_indices == target_tensor).all(dim=1).sum().item()
            total_sentences += target_tensor.size(0)

            if i < n_examples:
                predicted_words = [index_to_char_french.get(idx.item(), '<UNK>') for idx in predicted_indices[0] if idx not in (SOS_token, EOS_token)]
                target_words = [index_to_char_french.get(idx.item(), '<UNK>') for idx in target_tensor_french[0] if idx not in (SOS_token, EOS_token)]
                input_words = [index_to_char_english.get(idx.item(), '<UNK>') for idx in input_tensor_english[0] if idx not in (SOS_token, EOS_token)]
                print(f'Input: {" ".join(input_words)}, Target: {" ".join(target_words)}, Predicted: {" ".join(predicted_words)}')

        average_loss = total_loss / len(dataloader)
        accuracy = correct_predictions / total_sentences
        print(f'Evaluation Loss: {average_loss:.4f}, Accuracy: {accuracy:.4f}')

# If using GPU, make sure your model is transferred to GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Call the evaluate function
evaluate_and_show_examples(model, dataloader, criterion, n_examples=5)

Epoch 10, Training Loss: 2.6106
Epoch 20, Training Loss: 1.9636
Epoch 30, Training Loss: 1.7567
Epoch 40, Training Loss: 1.6537
Epoch 50, Training Loss: 1.6253
Epoch 60, Training Loss: 1.5812
Epoch 70, Training Loss: 1.5565
Epoch 80, Training Loss: 1.5593
Epoch 90, Training Loss: 1.5230
Epoch 100, Training Loss: 1.5045
Input: You are tired, Target: Tu es fatigué, Predicted: à Nous es
Input: She dreams of flying, Target: Elle rêve de voler, Predicted: un Elle rêve de
Input: We love music, Target: Nous aimons la musique, Predicted: la Nous aimons
Input: He paints landscapes, Target: Il peint des paysages, Predicted: Le Il peint
Input: The cat is sleeping, Target: Le chat dort, Predicted: sont Le Il dort
Evaluation Loss: 1.4212, Accuracy: 0.0000


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

# English to French dataset
english_to_french = [
    ("I am cold", "J'ai froid"),
    ("You are tired", "Tu es fatigué"),
    ("He is hungry", "Il a faim"),
    ("She is happy", "Elle est heureuse"),
    ("We are friends", "Nous sommes amis"),
    ("They are students", "Ils sont étudiants"),
    ("The cat is sleeping", "Le chat dort"),
    ("The sun is shining", "Le soleil brille"),
    ("We love music", "Nous aimons la musique"),
    ("She speaks French fluently", "Elle parle français couramment"),
    ("He enjoys reading books", "Il aime lire des livres"),
    ("They play soccer every weekend", "Ils jouent au football chaque week-end"),
    ("The movie starts at 7 PM", "Le film commence à 19 heures"),
    ("She wears a red dress", "Elle porte une robe rouge"),
    ("We cook dinner together", "Nous cuisinons le dîner ensemble"),
    ("He drives a blue car", "Il conduit une voiture bleue"),
    ("They visit museums often", "Ils visitent souvent des musées"),
    ("The restaurant serves delicious food", "Le restaurant sert une délicieuse cuisine"),
    ("She studies mathematics at university", "Elle étudie les mathématiques à l'université"),
    ("We watch movies on Fridays", "Nous regardons des films le vendredi"),
    ("He listens to music while jogging", "Il écoute de la musique en faisant du jogging"),
    ("They travel around the world", "Ils voyagent autour du monde"),
    ("The book is on the table", "Le livre est sur la table"),
    ("She dances gracefully", "Elle danse avec grâce"),
    ("We celebrate birthdays with cake", "Nous célébrons les anniversaires avec un gâteau"),
    ("He works hard every day", "Il travaille dur tous les jours"),
    ("They speak different languages", "Ils parlent différentes langues"),
    ("The flowers bloom in spring", "Les fleurs fleurissent au printemps"),
    ("She writes poetry in her free time", "Elle écrit de la poésie pendant son temps libre"),
    ("We learn something new every day", "Nous apprenons quelque chose de nouveau chaque jour"),
    ("The dog barks loudly", "Le chien aboie bruyamment"),
    ("He sings beautifully", "Il chante magnifiquement"),
    ("They swim in the pool", "Ils nagent dans la piscine"),
    ("The birds chirp in the morning", "Les oiseaux gazouillent le matin"),
    ("She teaches English at school", "Elle enseigne l'anglais à l'école"),
    ("We eat breakfast together", "Nous prenons le petit déjeuner ensemble"),
    ("He paints landscapes", "Il peint des paysages"),
    ("They laugh at the joke", "Ils rient de la blague"),
    ("The clock ticks loudly", "L'horloge tic-tac bruyamment"),
    ("She runs in the park", "Elle court dans le parc"),
    ("We travel by train", "Nous voyageons en train"),
    ("He writes a letter", "Il écrit une lettre"),
    ("They read books at the library", "Ils lisent des livres à la bibliothèque"),
    ("The baby cries", "Le bébé pleure"),
    ("She studies hard for exams", "Elle étudie dur pour les examens"),
    ("We plant flowers in the garden", "Nous plantons des fleurs dans le jardin"),
    ("He fixes the car", "Il répare la voiture"),
    ("They drink coffee in the morning", "Ils boivent du café le matin"),
    ("The sun sets in the evening", "Le soleil se couche le soir"),
    ("She dances at the party", "Elle danse à la fête"),
    ("We play music at the concert", "Nous jouons de la musique au concert"),
    ("He cooks dinner for his family", "Il cuisine le dîner pour sa famille"),
    ("They study French grammar", "Ils étudient la grammaire française"),
    ("The rain falls gently", "La pluie tombe doucement"),
    ("She sings a song", "Elle chante une chanson"),
    ("We watch a movie together", "Nous regardons un film ensemble"),
    ("He sleeps deeply", "Il dort profondément"),
    ("They travel to Paris", "Ils voyagent à Paris"),
    ("The children play in the park", "Les enfants jouent dans le parc"),
    ("She walks along the beach", "Elle se promène le long de la plage"),
    ("We talk on the phone", "Nous parlons au téléphone"),
    ("He waits for the bus", "Il attend le bus"),
    ("They visit the Eiffel Tower", "Ils visitent la tour Eiffel"),
    ("The stars twinkle at night", "Les étoiles scintillent la nuit"),
    ("She dreams of flying", "Elle rêve de voler"),
    ("We work in the office", "Nous travaillons au bureau"),
    ("He studies history", "Il étudie l'histoire"),
    ("They listen to the radio", "Ils écoutent la radio"),
    ("The wind blows gently", "Le vent souffle doucement"),
    ("She swims in the ocean", "Elle nage dans l'océan"),
    ("We dance at the wedding", "Nous dansons au mariage"),
    ("He climbs the mountain", "Il gravit la montagne"),
    ("They hike in the forest", "Ils font de la randonnée dans la forêt"),
    ("The cat meows loudly", "Le chat miaule bruyamment"),
    ("She paints a picture", "Elle peint un tableau"),
    ("We build a sandcastle", "Nous construisons un château de sable"),
    ("He sings in the choir", "Il chante dans le chœur")
]
# Special tokens for the start and end of sequences
SOS_token = 0  # Start Of Sequence Token
EOS_token = 1  # End Of Sequence Token
max_length = 12

# Preparing the character to index mapping and vice versa for English and French
def build_vocab(sentences):
    vocab = set()
    for pair in sentences:
        english_sentence, french_sentence = pair
        for word in english_sentence.split():
            vocab.add(word)
        for word in french_sentence.split():
            vocab.add(word)
    return vocab

english_vocab = build_vocab(english_to_french)
french_vocab = english_vocab

char_to_index_english = {"SOS": SOS_token, "EOS": EOS_token, **{char: i+2 for i, char in enumerate(sorted(list(english_vocab)))}}
index_to_char_english = {i: char for char, i in char_to_index_english.items()}

char_to_index_french = {"SOS": SOS_token, "EOS": EOS_token, **{char: i+2 for i, char in enumerate(sorted(list(french_vocab)))}}
index_to_char_french = {i: char for char, i in char_to_index_french.items()}

class EnglishFrenchDataset(Dataset):
    """Custom Dataset class for handling English-French sentence pairs."""
    def __init__(self, dataset, char_to_index_english, char_to_index_french):
        self.dataset = dataset
        self.char_to_index_english = char_to_index_english
        self.char_to_index_french = char_to_index_french

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        english_sentence, french_sentence = self.dataset[idx]
        english_tensor = torch.tensor([self.char_to_index_english[word] for word in english_sentence.split()] + [EOS_token], dtype=torch.long)
        french_tensor = torch.tensor([self.char_to_index_french[word] for word in french_sentence.split()] + [EOS_token], dtype=torch.long)
        return english_tensor, french_tensor

english_french_dataset = EnglishFrenchDataset(english_to_french, char_to_index_english, char_to_index_french)
dataloader = DataLoader(english_french_dataset, batch_size=1, shuffle=True)

# Define the Transformer model
class TranslatorTransformer(nn.Module):
    def __init__(self, input_size, output_size, hidden_size, num_layers, nhead):
        super(TranslatorTransformer, self).__init__()
        self.embedding = nn.Embedding(input_size, hidden_size)
        encoder_layers = nn.TransformerEncoderLayer(hidden_size, nhead)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        embedded = self.embedding(x)
        transformer_output = self.transformer_encoder(embedded)
        output = self.fc(transformer_output)
        return output

# Hyperparameters
hidden_size = 512
num_layers = 5
nhead = 4
learning_rate = 0.0001
epochs = 200

# Model, loss, and optimizer
model = TranslatorTransformer(len(english_vocab) + 2, len(french_vocab) + 2, hidden_size, num_layers, nhead)  # Add 2 for SOS and EOS tokens
criterion = nn.CrossEntropyLoss(ignore_index=0)  # Ignore padding index
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

def translate_english_to_french(model, eng_sentence):
    model.eval()
    with torch.no_grad():
        eng_tensor = torch.tensor([char_to_index_english[word] for word in eng_sentence.split()] + [EOS_token], dtype=torch.long)
        output = model(eng_tensor.unsqueeze(0))  # Unsqueeze to add batch dimension
        _, predicted_indices = torch.max(output, dim=2)
        predicted_words = []
        for idx in predicted_indices.squeeze():
            if idx == EOS_token:
                break
            predicted_words.append(index_to_char_french[idx.item()])
        return ' '.join(predicted_words)

def evaluate_and_show_examples(model, dataloader, criterion, n_examples=5):
    model.eval()
    total_loss = 0
    total_sentences = 0
    correct_predictions = 0

    with torch.no_grad():
        for i, (input_tensor_english, target_tensor_french) in enumerate(dataloader):
            output = model(input_tensor_english)
            target_tensor = torch.cat((torch.full((target_tensor_french.size(0), 1), SOS_token, dtype=torch.long), target_tensor_french), dim=1)

            if target_tensor.size(1) > output.size(1):
                target_tensor = target_tensor[:, :output.size(1)]

            loss = criterion(output.transpose(1, 2), target_tensor)
            total_loss += loss.item()

            _, predicted_indices = torch.max(output, dim=2)
            correct_predictions += (predicted_indices == target_tensor).all(dim=1).sum().item()
            total_sentences += target_tensor.size(0)

            if i < n_examples:
                predicted_words = [index_to_char_french.get(idx.item(), '<UNK>') for idx in predicted_indices[0] if idx not in (SOS_token, EOS_token)]
                target_words = [index_to_char_french.get(idx.item(), '<UNK>') for idx in target_tensor_french[0] if idx not in (SOS_token, EOS_token)]
                input_words = [index_to_char_english.get(idx.item(), '<UNK>') for idx in input_tensor_english[0] if idx not in (SOS_token, EOS_token)]
                print(f'Input: {" ".join(input_words)}, Target: {" ".join(target_words)}, Predicted: {" ".join(predicted_words)}')

        average_loss = total_loss / len(dataloader)
        accuracy = correct_predictions / total_sentences
        print(f'Evaluation Loss: {average_loss:.4f}, Accuracy: {accuracy:.4f}')

evaluate_and_show_examples(model, dataloader, criterion, n_examples=5)



Input: They visit the Eiffel Tower, Target: Ils visitent la tour Eiffel, Predicted: starts parlons lire sings pour university
Input: I am cold, Target: J'ai froid, Predicted: movie J'ai fatigué university
Input: We watch movies on Fridays, Target: Nous regardons des films le vendredi, Predicted: un Tu soir fleurs oiseaux university
Input: The rain falls gently, Target: La pluie tombe doucement, Predicted: l'université drives speaks bloom university
Input: We are friends, Target: Nous sommes amis, Predicted: un radio shining university
Evaluation Loss: 6.1707, Accuracy: 0.0000


# Problem 4:

1

In [8]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torch.nn.utils.rnn import pad_sequence

# Special tokens and hyperparameters
SOS_token = 0  # Start Of Sequence Token
EOS_token = 1  # End Of Sequence Token
hidden_size = 512
num_layers = 5
nhead = 4
learning_rate = 0.0001
epochs = 100

# French to English dataset
french_to_english = [
    ("J'ai froid", "I am cold"),
    ("Tu es fatigué", "You are tired"),
    ("Il a faim", "He is hungry"),
    ("Elle est heureuse", "She is happy"),
    ("Nous sommes amis", "We are friends"),
    ("Ils sont étudiants", "They are students"),
    ("Le chat dort", "The cat is sleeping"),
    ("Le soleil brille", "The sun is shining"),
    ("Nous aimons la musique", "We love music"),
    ("Elle parle français couramment", "She speaks French fluently"),
    ("Il aime lire des livres", "He enjoys reading books"),
    ("Ils jouent au football chaque week-end", "They play soccer every weekend"),
    ("Le film commence à 19 heures", "The movie starts at 7 PM"),
    ("Elle porte une robe rouge", "She wears a red dress"),
    ("Nous cuisinons le dîner ensemble", "We cook dinner together"),
    ("Il conduit une voiture bleue", "He drives a blue car"),
    ("Ils visitent souvent des musées", "They visit museums often"),
    ("Le restaurant sert une délicieuse cuisine", "The restaurant serves delicious food"),
    ("Elle étudie les mathématiques à l'université", "She studies mathematics at university"),
    ("Nous regardons des films le vendredi", "We watch movies on Fridays"),
    ("Il écoute de la musique en faisant du jogging", "He listens to music while jogging"),
    ("Ils voyagent autour du monde", "They travel around the world"),
    ("Le livre est sur la table", "The book is on the table"),
    ("Elle danse avec grâce", "She dances gracefully"),
    ("Nous célébrons les anniversaires avec un gâteau", "We celebrate birthdays with cake"),
    ("Il travaille dur tous les jours", "He works hard every day"),
    ("Ils parlent différentes langues", "They speak different languages"),
    ("Les fleurs fleurissent au printemps", "The flowers bloom in spring"),
    ("Elle écrit de la poésie pendant son temps libre", "She writes poetry in her free time"),
    ("Nous apprenons quelque chose de nouveau chaque jour", "We learn something new every day"),
    ("Le chien aboie bruyamment", "The dog barks loudly"),
    ("Il chante magnifiquement", "He sings beautifully"),
    ("Ils nagent dans la piscine", "They swim in the pool"),
    ("Les oiseaux gazouillent le matin", "The birds chirp in the morning"),
    ("Elle enseigne l'anglais à l'école", "She teaches English at school"),
    ("Nous prenons le petit déjeuner ensemble", "We eat breakfast together"),
    ("Il peint des paysages", "He paints landscapes"),
    ("Ils rient de la blague", "They laugh at the joke"),
    ("L'horloge tic-tac bruyamment", "The clock ticks loudly"),
    ("Elle court dans le parc", "She runs in the park"),
    ("Nous voyageons en train", "We travel by train"),
    ("Il écrit une lettre", "He writes a letter"),
    ("Ils lisent des livres à la bibliothèque", "They read books at the library"),
    ("Le bébé pleure", "The baby cries"),
    ("Elle étudie dur pour les examens", "She studies hard for exams"),
    ("Nous plantons des fleurs dans le jardin", "We plant flowers in the garden"),
    ("Il répare la voiture", "He fixes the car"),
    ("Ils boivent du café le matin", "They drink coffee in the morning"),
    ("Le soleil se couche le soir", "The sun sets in the evening"),
    ("Elle danse à la fête", "She dances at the party"),
    ("Nous jouons de la musique au concert", "We play music at the concert"),
    ("Il cuisine le dîner pour sa famille", "He cooks dinner for his family"),
    ("Ils étudient la grammaire française", "They study French grammar"),
    ("La pluie tombe doucement", "The rain falls gently"),
    ("Elle chante une chanson", "She sings a song"),
    ("Nous regardons un film ensemble", "We watch a movie together"),
    ("Il dort profondément", "He sleeps deeply"),
    ("Ils voyagent à Paris","They travel to Paris"),
    ("Les enfants jouent dans le parc","The children play in the park"),
    ("Elle se promène le long de la plage","She walks along the beach"),
    ("Nous parlons au téléphone","We talk on the phone"),
    ("Il attend le bus","He waits for the bus"),
    ("Ils visitent la tour Eiffel","They visit the Eiffel Tower"),
    ("Les étoiles scintillent la nuit","The stars twinkle at night"),
    ("Elle rêve de voler", "She dreams of flying"),
    ("Nous travaillons au bureau", "We work in the office"),
    ("Il étudie l’histoire", "He studies history"),
    ("Ils écoutent la radio", "They listen to the radio"),
    ("Le vent souffle doucement", "The wind blows gently"),
    ("Elle nage dans l’océan", "She swims in the ocean"),
    ("Nous dansons au mariage", "We dance at the wedding"),
    ("Il gravit la montagne", "He climbs the mountain"),
    ("Ils font de la randonnée dans la forêt", "They hike in the forest"),
    ("Le chat miaule bruyamment", "The cat meows loudly"),
    ("Elle peint un tableau", "She paints a picture"),
    ("Nous construisons un château de sable", "We build a sandcastle"),
    ("Il chante dans le chœur", "He sings in the choir")
    ]


# Building separate vocabularies for French and English sentences
def build_vocab(sentences):
    vocab = {"SOS": SOS_token, "EOS": EOS_token}
    for sentence in sentences:
        for word in sentence.split():
            if word not in vocab:
                vocab[word] = len(vocab)
    return vocab

french_vocab = build_vocab(fr for fr, _ in french_to_english)
english_vocab = build_vocab(en for _, en in french_to_english)

# Custom Dataset class for handling French-English sentence pairs
class FrenchEnglishDataset(Dataset):
    def __init__(self, data, src_vocab, trg_vocab):
        self.data = data
        self.src_vocab = src_vocab
        self.trg_vocab = trg_vocab

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        src_sentence, trg_sentence = self.data[idx]
        src_indices = [self.src_vocab[word] for word in src_sentence.split()] + [EOS_token]
        trg_indices = [self.trg_vocab[word] for word in trg_sentence.split()] + [EOS_token]
        return torch.tensor(src_indices, dtype=torch.long), torch.tensor(trg_indices, dtype=torch.long)

# DataLoader with custom collate function to handle padding
def collate_fn(batch):
    src_batch, tgt_batch = zip(*batch)
    src_batch_padded = pad_sequence([torch.tensor(seq) for seq in src_batch], batch_first=True, padding_value=EOS_token)
    tgt_batch_padded = pad_sequence([torch.tensor(seq) for seq in tgt_batch], batch_first=True, padding_value=EOS_token)
    return src_batch_padded, tgt_batch_padded

dataset = FrenchEnglishDataset(french_to_english, french_vocab, english_vocab)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True, collate_fn=collate_fn)

# Transformer model definition
class TranslatorTransformer(nn.Module):
    def __init__(self, input_vocab_size, output_vocab_size, hidden_size, num_layers, nhead):
        super(TranslatorTransformer, self).__init__()
        self.embedding_src = nn.Embedding(input_vocab_size, hidden_size)
        self.embedding_tgt = nn.Embedding(output_vocab_size, hidden_size)
        self.transformer = nn.Transformer(d_model=hidden_size, nhead=nhead, num_encoder_layers=num_layers,
                                          num_decoder_layers=num_layers, batch_first=True)
        self.fc_out = nn.Linear(hidden_size, output_vocab_size)

    def forward(self, src, tgt):
        src = self.embedding_src(src)
        tgt = self.embedding_tgt(tgt)
        output = self.transformer(src, tgt)
        output = self.fc_out(output)
        return output

# Model, loss, and optimizer setup
model = TranslatorTransformer(len(french_vocab), len(english_vocab), hidden_size, num_layers, nhead)
criterion = nn.CrossEntropyLoss(ignore_index=EOS_token)  # Ignore padding for loss calculation
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Training loop with conditional loss printing
def train_model(model, dataloader, optimizer, criterion, epochs, print_every=10):
    model.train()
    for epoch in range(1, epochs + 1):
        total_loss = 0
        for src, tgt in dataloader:
            optimizer.zero_grad()
            tgt_input = tgt[:, :-1]
            tgt_output = tgt[:, 1:]
            outputs = model(src, tgt_input)
            outputs = outputs.reshape(-1, outputs.shape[-1])
            tgt_output = tgt_output.reshape(-1)
            loss = criterion(outputs, tgt_output)
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        if epoch % print_every == 0:
            print(f'Epoch {epoch}: Loss {total_loss / len(dataloader)}')

# Run the training
train_model(model, dataloader, optimizer, criterion, epochs)

def evaluate_and_show_examples(model, dataloader, criterion, src_vocab, trg_vocab, n_examples=5):
    model.eval()
    total_loss = 0
    correct_predictions = 0
    total_sentences = 0

    idx_to_word_src = {idx: word for word, idx in src_vocab.items()}
    idx_to_word_trg = {idx: word for word, idx in trg_vocab.items()}

    with torch.no_grad():
        for i, (src, tgt) in enumerate(dataloader):
            output = model(src, tgt[:, :-1])  # Feed the target as input to the decoder, excluding the EOS token at the end
            output = output.reshape(-1, output.shape[-1])
            tgt_output = tgt[:, 1:].reshape(-1)  # Shifted target outputs
            loss = criterion(output, tgt_output)
            total_loss += loss.item()

            # Calculate accuracy
            _, predicted_indices = torch.max(output, dim=1)
            correct_predictions += (predicted_indices == tgt_output).sum().item()
            total_sentences += tgt_output.size(0)

            if i < n_examples:
                # Print examples
                src_words = [idx_to_word_src[idx.item()] for idx in src[0] if idx.item() in idx_to_word_src]
                trg_words = [idx_to_word_trg[idx.item()] for idx in tgt[0] if idx.item() in idx_to_word_trg]
                pred_words = [idx_to_word_trg[idx.item()] for idx in predicted_indices.view(tgt.size(0), -1)[0] if idx.item() in idx_to_word_trg]

                print(f'Input: {" ".join(src_words)}')
                print(f'Target: {" ".join(trg_words)}')
                print(f'Predicted: {" ".join(pred_words)}')
                print("")

        average_loss = total_loss / len(dataloader)
        accuracy = correct_predictions / total_sentences
        print(f'Evaluation Loss: {average_loss:.4f}, Accuracy: {accuracy:.4f}')

  src_batch_padded = pad_sequence([torch.tensor(seq) for seq in src_batch], batch_first=True, padding_value=EOS_token)
  tgt_batch_padded = pad_sequence([torch.tensor(seq) for seq in tgt_batch], batch_first=True, padding_value=EOS_token)


Epoch 10: Loss 2.117745326115535
Epoch 20: Loss 0.1091165960026093
Epoch 30: Loss 0.02690561402302522
Epoch 40: Loss 0.015432785145747356
Epoch 50: Loss 0.009864733673823185
Epoch 60: Loss 0.006846844863433104
Epoch 70: Loss 0.0052394594830007125
Epoch 80: Loss 0.004208378350505462
Epoch 90: Loss 0.0030893161355589446
Epoch 100: Loss 0.002516282779069092


2

In [9]:
evaluate_and_show_examples(model, dataloader, criterion, french_vocab, english_vocab, n_examples=5)

Input: Elle est heureuse EOS EOS EOS
Target: She is happy EOS EOS EOS
Predicted: is happy is is is

Input: Il peint des paysages EOS EOS
Target: He paints landscapes EOS EOS
Predicted: paints landscapes landscapes paints

Input: Nous aimons la musique EOS EOS
Target: We love music EOS EOS EOS EOS
Predicted: love music at music music music

Input: Il attend le bus EOS
Target: He waits for the bus EOS
Predicted: waits for the bus for

Input: Ils voyagent à Paris EOS
Target: They travel to Paris EOS
Predicted: travel to Paris Paris



  src_batch_padded = pad_sequence([torch.tensor(seq) for seq in src_batch], batch_first=True, padding_value=EOS_token)
  tgt_batch_padded = pad_sequence([torch.tensor(seq) for seq in tgt_batch], batch_first=True, padding_value=EOS_token)


Evaluation Loss: 0.0005, Accuracy: 0.6929
