<h1 style="text-align: center;">Lab 3</h1>
<h4 style="text-align: center;"><span style="text-decoration: underline">Submitted by :</span> BOULBEN Firdaous</h4>
<h4 style="text-align: center;"><span style="text-decoration: underline">Supervised by
:</span> Pr. EL AACHAK Lotfi</h4>

## Objective :
The main purpose behind this lab is to get familiar with Pytorch, to build deep
neural network architecture for Natural language process by using Sequence Models.

## Part 1: Classification Task 

### 1. Data Collection Using Scrapy and BeautifulSoup  

In [1]:
# Import necessary libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd
import random

In [2]:
# Scrape Arabic text from websites
def scrape_arabic_text(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    paragraphs = soup.find_all('p')
    texts = [p.text.strip() for p in paragraphs if p.text.strip()]
    return texts

In [3]:
# Some URLs for Arabic websites
urls = [
    "https://www.hespress.com/",
    "https://www.aljazeera.net/",
    "https://arabic.cnn.com/",
    "https://www.skynewsarabia.com/",
    "https://www.bbc.com/arabic"
]

In [4]:
# Collect data
text_data = []
for url in urls:
    texts = scrape_arabic_text(url)
    for text in texts:
        score = random.uniform(0, 10)  # Random score between 0 and 10
        text_data.append({"Text": text, "Score": score})

In [5]:
# Create a dataset
df = pd.DataFrame(text_data)
df.to_csv("arabic_dataset.csv", index=False)
print("Data collection completed!")

Data collection completed!


### 2. Preprocessing the Dataset Using NLP

In [10]:
# Import NLP libraries
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem.isri import ISRIStemmer
from sklearn.preprocessing import KBinsDiscretizer
import re

nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /usr/share/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [14]:
# Load dataset
df = pd.read_csv("arabic_dataset.csv")

In [15]:
# Preprocessing pipeline
def preprocess_text(text):
    # Remove non-Arabic characters
    text = re.sub(r'[^\u0600-\u06FF\s]', '', text)
    # Tokenization
    tokens = word_tokenize(text)
    # Remove stop words
    stop_words = set(stopwords.words('arabic'))
    tokens = [word for word in tokens if word not in stop_words]
    # Stemming
    stemmer = ISRIStemmer()
    tokens = [stemmer.stem(word) for word in tokens]
    return " ".join(tokens)

In [17]:
# Apply preprocessing
df['Processed_Text'] = df['Text'].apply(preprocess_text)
print(df.head(10))

                                                Text     Score  \
0  This website is using a security service to pr...  8.004689   
1  You can email the site owner to let them know ...  9.043101   
2  Cloudflare Ray ID: 8e89099e1f76bfd3\n•\n\n    ...  3.319954   
3  على وقع المجاعة التي تفرضها إسرائيل على غزة؛ ي...  2.846044   
4  مع الصمود الأسطوري لسكان قطاع غزة والمقاومة في...  8.610400   
5  تمر 75 يوما كاملة بين إعلان نتيجة الانتخابات ا...  1.229702   
6  قالت صحيفة لوموند إن دخول حرب أوكرانيا مرحلة ج...  3.615562   
7  قالت صحيفة نيويورك تايمز إن المقيمين بالولايات...  3.913238   
8  أوضح موقع موندويس الأميركي أن “مشروع إستر” بات...  8.785422   
9  قال تقرير بصحيفة نيويورك تايمز إن إدارة ترامب ...  4.483984   

                                      Processed_Text  
0                                                     
1                                                     
2                                                     
3  وقع جاع فرض رائيل غزة؛ يحك قال قصص حرب غذي شهد... 

In [18]:
# Discretize scores
discretizer = KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='uniform')
df['Discrete_Score'] = discretizer.fit_transform(df[['Score']]).astype(int)

In [19]:
df.to_csv("preprocessed_dataset.csv", index=False)
print("Preprocessing completed!")

Preprocessing completed!


### 3. Train Models (RNN, Bi-RNN, GRU, LSTM)  

In [20]:
# Import PyTorch and supporting libraries
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
from transformers import AutoTokenizer

In [21]:
# Load preprocessed dataset
df = pd.read_csv("preprocessed_dataset.csv")

In [22]:
# Split data
train_texts, test_texts, train_labels, test_labels = train_test_split(
    df['Processed_Text'], df['Discrete_Score'], test_size=0.2, random_state=42
)

In [23]:
# Tokenizer and vocabulary
tokenizer = AutoTokenizer.from_pretrained("bert-base-multilingual-cased")

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/625 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/996k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.96M [00:00<?, ?B/s]



In [31]:
# Dataset class
class ArabicDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = str(self.texts.iloc[idx])  # Convert to string
        label = self.labels.iloc[idx]
        encoding = self.tokenizer(
            text, 
            padding='max_length', 
            truncation=True, 
            max_length=self.max_length, 
            return_tensors='pt'
        )
        return {
            'input_ids': encoding['input_ids'].squeeze(),
            'attention_mask': encoding['attention_mask'].squeeze(),
            'label': torch.tensor(label, dtype=torch.long)
        }

In [33]:
# Validate dataset
df['Text'] = df['Text'].astype(str)  # Ensure all text entries are strings

df = df.dropna(subset=['Text', 'Score'])

In [35]:
# Prepare datasets
max_length = 128
train_dataset = ArabicDataset(train_texts, train_labels, tokenizer, max_length)
test_dataset = ArabicDataset(test_texts, test_labels, tokenizer, max_length)

# DataLoaders
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=16, shuffle=False)

#### 3.1. RNN Implementation   

In [36]:
# Define RNN Model
class RNNModel(nn.Module):
    def __init__(self, vocab_size, embed_size, hidden_size, output_size):
        super(RNNModel, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embed_size)
        self.rnn = nn.RNN(embed_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, input_ids):
        x = self.embedding(input_ids)
        _, hidden = self.rnn(x)
        output = self.fc(hidden.squeeze(0))
        return output

In [37]:
# Initialize model, loss, and optimizer
vocab_size = tokenizer.vocab_size
embed_size = 128
hidden_size = 256
output_size = len(df['Discrete_Score'].unique())

rnn_model = RNNModel(vocab_size, embed_size, hidden_size, output_size)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(rnn_model.parameters(), lr=0.001)

In [62]:
# Training loop
def train_model(model, train_loader, criterion, optimizer, epochs=10):
    model.train()
    for epoch in range(epochs):
        total_loss = 0
        for batch in train_loader:
            optimizer.zero_grad()
            input_ids = batch['input_ids']
            labels = batch['label']
            outputs = model(input_ids)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        print(f"Epoch {epoch+1}, Loss: {total_loss / len(train_loader)}")

In [64]:
# Train RNN
train_model(rnn_model, train_loader, criterion, optimizer, epochs=20)

Epoch 1, Loss: 1.5665134191513062
Epoch 2, Loss: 1.5722788572311401
Epoch 3, Loss: 1.625604510307312
Epoch 4, Loss: 1.5238451957702637
Epoch 5, Loss: 1.5186818440755208
Epoch 6, Loss: 1.5100441376368205
Epoch 7, Loss: 1.5751574436823528
Epoch 8, Loss: 1.5095823605855305
Epoch 9, Loss: 1.5385051568349202
Epoch 10, Loss: 1.537808895111084
Epoch 11, Loss: 1.5561861197153728
Epoch 12, Loss: 1.50415833791097
Epoch 13, Loss: 1.5343877077102661
Epoch 14, Loss: 1.5772500038146973
Epoch 15, Loss: 1.5103242794672649
Epoch 16, Loss: 1.5225967168807983
Epoch 17, Loss: 1.5281529029210408
Epoch 18, Loss: 1.5472896099090576
Epoch 19, Loss: 1.5260744094848633
Epoch 20, Loss: 1.5271563132603962


#### 3.2. Bi-RNN Implementation  

In [40]:
# Define Bi-RNN Model
class BiRNNModel(nn.Module):
    def __init__(self, vocab_size, embed_size, hidden_size, output_size):
        super(BiRNNModel, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embed_size)
        self.rnn = nn.RNN(embed_size, hidden_size, batch_first=True, bidirectional=True)
        self.fc = nn.Linear(hidden_size * 2, output_size)  # Multiply hidden_size by 2 for Bi-directional

    def forward(self, input_ids):
        x = self.embedding(input_ids)
        _, hidden = self.rnn(x)
        # Combine both directions' hidden states
        hidden = torch.cat((hidden[0], hidden[1]), dim=1)
        output = self.fc(hidden)
        return output

In [41]:
# Initialize the Bi-RNN Model
bi_rnn_model = BiRNNModel(vocab_size, embed_size, hidden_size, output_size)

In [65]:
# Train Bi-RNN
train_model(bi_rnn_model, train_loader, criterion, optimizer, epochs=20)

Epoch 1, Loss: 1.671460707982381
Epoch 2, Loss: 1.6615585486094158
Epoch 3, Loss: 1.6737428506215413
Epoch 4, Loss: 1.6803996960322063
Epoch 5, Loss: 1.6698662439982097
Epoch 6, Loss: 1.6580512523651123
Epoch 7, Loss: 1.6821167469024658
Epoch 8, Loss: 1.669476310412089
Epoch 9, Loss: 1.6685885588328044
Epoch 10, Loss: 1.6704107522964478
Epoch 11, Loss: 1.6645151774088542
Epoch 12, Loss: 1.680450201034546
Epoch 13, Loss: 1.6827476024627686
Epoch 14, Loss: 1.6653744379679363
Epoch 15, Loss: 1.6629727681477864
Epoch 16, Loss: 1.6507583061854045
Epoch 17, Loss: 1.6606574455897014
Epoch 18, Loss: 1.6654892762502034
Epoch 19, Loss: 1.6705024639765422
Epoch 20, Loss: 1.6804960171381633


#### 3.3. GRU Implementation  

In [43]:
# Define GRU Model
class GRUModel(nn.Module):
    def __init__(self, vocab_size, embed_size, hidden_size, output_size):
        super(GRUModel, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embed_size)
        self.gru = nn.GRU(embed_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, input_ids):
        x = self.embedding(input_ids)
        _, hidden = self.gru(x)
        output = self.fc(hidden.squeeze(0))
        return output

In [44]:
# Initialize the GRU Model
gru_model = GRUModel(vocab_size, embed_size, hidden_size, output_size)

In [66]:
# Train GRU
train_model(gru_model, train_loader, criterion, optimizer, epochs=20)

Epoch 1, Loss: 1.629641016324361
Epoch 2, Loss: 1.6444669167200725
Epoch 3, Loss: 1.6286664406458538
Epoch 4, Loss: 1.6329840024312336
Epoch 5, Loss: 1.6425390640894573
Epoch 6, Loss: 1.6307514905929565
Epoch 7, Loss: 1.6245061953862507
Epoch 8, Loss: 1.628666599591573
Epoch 9, Loss: 1.6372473239898682
Epoch 10, Loss: 1.6371114651362102
Epoch 11, Loss: 1.6255133549372356
Epoch 12, Loss: 1.629776914914449
Epoch 13, Loss: 1.6360221306482952
Epoch 14, Loss: 1.636158029238383
Epoch 15, Loss: 1.6306153933207195
Epoch 16, Loss: 1.6296409765879314
Epoch 17, Loss: 1.6288022994995117
Epoch 18, Loss: 1.6370295683542888
Epoch 19, Loss: 1.636293927828471
Epoch 20, Loss: 1.6233958800633748


####   3.4. LSTM Implementation

In [49]:
# Define LSTM Model
class LSTMModel(nn.Module):
    def __init__(self, vocab_size, embed_size, hidden_size, output_size):
        super(LSTMModel, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embed_size)
        self.lstm = nn.LSTM(embed_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, input_ids):
        x = self.embedding(input_ids)
        _, (hidden, _) = self.lstm(x)
        output = self.fc(hidden.squeeze(0))
        return output

In [50]:
# Initialize the LSTM Model
lstm_model = LSTMModel(vocab_size, embed_size, hidden_size, output_size)

In [67]:
# Train LSTM
train_model(lstm_model, train_loader, criterion, optimizer, epochs=20)

Epoch 1, Loss: 1.6123337348302205
Epoch 2, Loss: 1.6152034997940063
Epoch 3, Loss: 1.6219777663548787
Epoch 4, Loss: 1.627034862836202
Epoch 5, Loss: 1.6129740873972576
Epoch 6, Loss: 1.6135566631952922
Epoch 7, Loss: 1.619318167368571
Epoch 8, Loss: 1.6219136714935303
Epoch 9, Loss: 1.6142181952794392
Epoch 10, Loss: 1.6133400599161785
Epoch 11, Loss: 1.6129741668701172
Epoch 12, Loss: 1.6241861979166667
Epoch 13, Loss: 1.6241861581802368
Epoch 14, Loss: 1.622834841410319
Epoch 15, Loss: 1.6228347619374592
Epoch 16, Loss: 1.6164906819661458
Epoch 17, Loss: 1.6082189877827961
Epoch 18, Loss: 1.6219136714935303
Epoch 19, Loss: 1.6168566544850667
Epoch 20, Loss: 1.6155484914779663


### 4. Evaluate Models  

In [52]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from nltk.translate.bleu_score import sentence_bleu

In [55]:
def evaluate_model(model, data_loader, criterion):
    model.eval()
    total_loss = 0
    predictions = []
    ground_truth = []
    bleu_scores = []

    with torch.no_grad():
        for batch in data_loader:
            input_ids = batch['input_ids']
            labels = batch['label']

            outputs = model(input_ids)
            loss = criterion(outputs, labels)
            total_loss += loss.item()

            # Save predictions and labels for metrics
            preds = torch.argmax(outputs, dim=1).tolist()
            predictions.extend(preds)
            ground_truth.extend(labels.tolist())

            # BLEU: For demonstration, treating predictions as sequences of single tokens
            for pred, label in zip(preds, labels.tolist()):
                bleu_scores.append(sentence_bleu([[str(label)]], [str(pred)]))  # Modify if predictions are sequences

    # Standard Metrics
    accuracy = accuracy_score(ground_truth, predictions)
    precision = precision_score(ground_truth, predictions, average='weighted')
    recall = recall_score(ground_truth, predictions, average='weighted')
    f1 = f1_score(ground_truth, predictions, average='weighted')

    # BLEU Score
    avg_bleu = sum(bleu_scores) / len(bleu_scores)

    results = {
        'loss': total_loss / len(data_loader),
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1_score': f1,
        'bleu_score': avg_bleu
    }
    
    # Display results row by row
    for metric, value in results.items():
        print(f"{metric.capitalize()}: {value:.4f}")

In [68]:
print("RNN Metrics:\n")
rnn_metrics = evaluate_model(rnn_model, test_loader, criterion)

RNN Metrics:

Loss: 1.6429
Accuracy: 0.2727
Precision: 0.0744
Recall: 0.2727
F1_score: 0.1169
Bleu_score: 0.2727


Corpus/Sentence contains 0 counts of 2-gram overlaps.
BLEU scores might be undesirable; use SmoothingFunction().
  _warn_prf(average, modifier, msg_start, len(result))


In [69]:
print("Bi-RNN Metrics:\n")
bi_rnn_metrics = evaluate_model(bi_rnn_model, test_loader, criterion)

Bi-RNN Metrics:

Loss: 1.6324
Accuracy: 0.0909
Precision: 0.0101
Recall: 0.0909
F1_score: 0.0182
Bleu_score: 0.0909


Corpus/Sentence contains 0 counts of 2-gram overlaps.
BLEU scores might be undesirable; use SmoothingFunction().
  _warn_prf(average, modifier, msg_start, len(result))


In [70]:
print("GRU Metrics:\n")
gru_metrics = evaluate_model(gru_model, test_loader, criterion)

GRU Metrics:

Loss: 1.6078
Accuracy: 0.2727
Precision: 0.0744
Recall: 0.2727
F1_score: 0.1169
Bleu_score: 0.2727


Corpus/Sentence contains 0 counts of 2-gram overlaps.
BLEU scores might be undesirable; use SmoothingFunction().
  _warn_prf(average, modifier, msg_start, len(result))


In [71]:
print("LSTM Metrics:\n")
lstm_metrics = evaluate_model(lstm_model, test_loader, criterion)

LSTM Metrics:

Loss: 1.6178
Accuracy: 0.2727
Precision: 0.0744
Recall: 0.2727
F1_score: 0.1169
Bleu_score: 0.2727


Corpus/Sentence contains 0 counts of 2-gram overlaps.
BLEU scores might be undesirable; use SmoothingFunction().
  _warn_prf(average, modifier, msg_start, len(result))


## Part 2: Transformer (Text generation) 

In [72]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
from transformers import TextDataset, DataCollatorForLanguageModeling, Trainer, TrainingArguments
import torch

### 1. Fine tune the pre-trained model (GPT2)  

#### 1.1. Load the Pre-trained GPT-2 Model   

In [73]:
# Load pre-trained GPT-2 model and tokenizer
model_name = "gpt2"
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

# Set the model in evaluation mode
model.eval()

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]



GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

#### 1.2. Load Shakespeare Dataset   

In [74]:
# Load the dataset
dataset_path = "/kaggle/input/shakespeare-text/text.txt"

# Read the text dataset
with open(dataset_path, 'r') as file:
    shakespeare_text = file.read()

# Check the first 500 characters
print(shakespeare_text[:500])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

All:
We know't, we know't.

First Citizen:
Let us kill him, and we'll have corn at our own price.
Is't a verdict?

All:
No more talking on't; let it be done: away, away!

Second Citizen:
One word, good citizens.

First Citizen:
We are accounted poor


#### 1.3. Fine-tuning GPT-2 on the Shakespeare Dataset   

In [76]:
# Prepare the dataset for fine-tuning
train_file = "/kaggle/working/shakespeare_train.txt"

# Write the dataset to a temporary file (required by TextDataset)
with open(train_file, "w") as file:
    file.write(shakespeare_text)

# Load the dataset using TextDataset
train_dataset = TextDataset(
    tokenizer=tokenizer,
    file_path=train_file,
    block_size=128  # Sequence length
)

# Set up data collator (used for padding and batching)
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False  # GPT-2 uses causal language modeling
)



In [77]:
# Set up training arguments
training_args = TrainingArguments(
    output_dir="./gpt2_finetuned",  # Where to save the fine-tuned model
    overwrite_output_dir=True,
    num_train_epochs=5,  # Train for 3 epochs (adjustable)
    per_device_train_batch_size=4,
    save_steps=500,  # Save model every 500 steps
    logging_steps=100,  # Log every 100 steps
    report_to=["none"]
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset
)

# Fine-tune the model
trainer.train()

  with torch.cuda.device(device), torch.cuda.stream(stream), autocast(enabled=autocast_enabled):


Step,Training Loss
100,3.8813
200,3.6586
300,3.6243
400,3.4907
500,3.3989
600,3.4144
700,3.3573
800,3.2696
900,3.2833
1000,3.2719


  with torch.cuda.device(device), torch.cuda.stream(stream), autocast(enabled=autocast_enabled):
  with torch.cuda.device(device), torch.cuda.stream(stream), autocast(enabled=autocast_enabled):
  with torch.cuda.device(device), torch.cuda.stream(stream), autocast(enabled=autocast_enabled):


TrainOutput(global_step=1650, training_loss=3.34943359375, metrics={'train_runtime': 499.9363, 'train_samples_per_second': 26.403, 'train_steps_per_second': 3.3, 'total_flos': 862263705600000.0, 'train_loss': 3.34943359375, 'epoch': 5.0})

### 2. Generate Text Using the Fine-tuned Model  

In [81]:
# Ensure the model and inputs are on the correct device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

In [102]:
# Define an input sentence
input_sentence = "As the sun set, the world seemed to quiet down"

# Encode the input sentence
input_ids = tokenizer.encode(input_sentence, return_tensors='pt').to(device)

In [103]:
# Generate new text (next tokens) using the fine-tuned model
generated_ids = model.generate(
    input_ids=input_ids,
    max_length=200,  # Length of the generated text
    num_return_sequences=1,  # Generate one sequence
    temperature=0.9,  # Add randomness
    top_k=50,  # Limit to the top 50 most likely next words
    top_p=0.95,  # Use top-p sampling for diversity
    no_repeat_ngram_size=2,  # Prevent repeating phrases
    pad_token_id=tokenizer.eos_token_id  # Avoid issues with padding
)

In [104]:
# Decode the generated ids to text
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

# Print the generated text
print(generated_text)

As the sun set, the world seemed to quiet down,
And the stars were still in their stars.

BENVOLIO:
O, I am a little too late!
I am not a man to be late. I have been
A little late, and am yet a very late;
But I will be so, for I must be a late
To be the late of the day. Come, come, my lord. What
You have done, you have made a mistake. You
Have made an error, sir, in your haste. Your
son, Angelo, is dead; and you, your son, are dead. Go, go, good sir; go. Away, away, home, Away! Away. Wherefore, what is your
goodly son?
What is his name? Angelo? What is he? I'll tell you. Angelo! what
is he, that is not Angelo: he is a poor,
