Nome: Fabio Grassiotto  
RA: 890441

**Exercício: LoRA**

- Exercício didático para entender a técnica de fazer ajuste fino em modelos grandes usando poucos recursos
- Aplicar no pré exercício de análise de sentimento ou no segundo exercício, e modelo de linguagem, com vocabulário de 3000 palavras, embedding size e 2 camadas, treinados da forma usual (medir tempo de treinamento/época)
- Modificar o seu modelo para adotar a técnica do LoRA no embedding e nas 2 camadas, e fazer o ajuste-fino, isto é, continuar o treinamento anterior, lembrando que as matrizes originais ficarão congeladas e o ajuste dos pesos serão apenas aplicados nas matrizes do LoRA. Medir o tempo de treinamento/época.
- Por último, substituir o modelo original, com os novos pesos calculados pelo W + LoRA.

In [1]:
import os
import sys
import time
import re
from tqdm import tqdm
from collections import Counter
from sklearn.model_selection import train_test_split

# Pytorch
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

import warnings
warnings.filterwarnings("ignore")

In [2]:
# Global variables
# Vocabulary
vocab_size = 3000
context_size = 9
pattern = r'\w+|[,;.:!?\']'

# Training
batch_size = 32
lr = 0.05
pretrain_epochs = 10
finetune_epochs = 50

# Model
embedding_dim = 64
hidden_dim = 200
dropout_rate = 0.2

# LoRA parameters
lora_r = 1         # Rank adaptation
lora_alpha = 1     # Scaling factor
lora_scaling = lora_alpha / lora_r 

In [3]:
# Colab environment
IN_COLAB = 'google.colab' in sys.modules

if (IN_COLAB):
    # Google Drive
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)

    project_folder="/content/drive/MyDrive/Classes/IA024/Aula_5_6"
    os.chdir(project_folder)
    !ls -la

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## Faz download e carrega o dataset

Definimos dois datasets: um de pré-treinamento, com livros do José de Alencar, e um segundo dataset com apenas uma obra do Machado de Assis.

In [4]:
def list_files(directory):
    file_list = []
    for filename in os.listdir(directory):
        file_path = os.path.join(directory, filename)
        if os.path.isfile(file_path):
            file_list.append(file_path)
    return file_list

In [5]:
# Check if download is necessary
if not os.path.exists("dataset/pretrain/67724.txt.utf-8"):
    print("Downloading Gutenberg texts")

    # José de Alencar - pretrain
    !wget https://www.gutenberg.org/ebooks/67724.txt.utf-8 -P dataset/pretrain/
    !wget https://www.gutenberg.org/ebooks/67725.txt.utf-8 -P dataset/pretrain/
    !wget https://www.gutenberg.org/ebooks/67740.txt.utf-8 -P dataset/pretrain/
    !wget https://www.gutenberg.org/ebooks/38496.txt.utf-8 -P dataset/pretrain/
    !wget https://www.gutenberg.org/ebooks/44540.txt.utf-8 -P dataset/pretrain/
    !wget https://www.gutenberg.org/ebooks/29040.txt.utf-8 -P dataset/pretrain/
    !wget https://www.gutenberg.org/ebooks/67831.txt.utf-8 -P dataset/pretrain/

    # Machado de Assis - fine-tuning
    !wget https://www.gutenberg.org/ebooks/54829.txt.utf-8 -P dataset/fine-tune/

### Limpeza do texto principal

In [6]:
def clean_text(text):
    start_marker = "*** START OF THE PROJECT GUTENBERG EBOOK"
    end_marker = "*** END OF THE PROJECT GUTENBERG EBOOK"
    text_start = text.find(start_marker)
    text_end = text.find(end_marker)

    text_content= text[text_start:text_end].replace('\r','')
    paragraphs = []
    for paragraph in text_content.split("\n\n"):
        paragraph = paragraph.replace('\n', ' ').strip()
        # Validation of length and index lines
        if (len(paragraph) > 10 and '....' not in paragraph):
            paragraphs.append(paragraph)
    return paragraphs

pretrain_texts = list_files("dataset/pretrain")
finetune_texts = list_files("dataset/fine-tune")

In [7]:
pretrain_cleaned_paragraphs=[]
finetune_cleaned_paragraphs=[]

for file_path in pretrain_texts:
    text = open(file_path, "r").read()
    cleaned = clean_text(text)
    pretrain_cleaned_paragraphs += cleaned

for file_path in finetune_texts:
    text = open(file_path, "r").read()
    cleaned = clean_text(text)
    finetune_cleaned_paragraphs += cleaned

print(f'Number of paragraphs (pretrain dataset): {len(pretrain_cleaned_paragraphs)}')
print(f'Number of paragraphs (fine-tune dataset): {len(finetune_cleaned_paragraphs)}')

Number of paragraphs (pretrain dataset): 9162
Number of paragraphs (fine-tune dataset): 1298


## Análise do dataset

In [8]:
# Conta as palavras no dataset
def count_words(texts):
    word_counts = Counter()
    for text in texts:
        word_counts.update(re.findall(pattern, text.lower()))
    return word_counts

word_counts_pretrain = count_words(pretrain_cleaned_paragraphs)
word_counts_finetune = count_words(finetune_cleaned_paragraphs)

print(f'Words on pretrain dataset: {len(word_counts_pretrain)}')
print(f'Words on fine-tune dataset: {len(word_counts_finetune)}')

Words on pretrain dataset: 22543
Words on fine-tune dataset: 10196


## Criando um vocabulário

In [9]:
pretrain_most_frequent_words = [word for word, count in word_counts_pretrain.most_common(vocab_size)]
pretrain_vocab = {word: i for i, word in enumerate(pretrain_most_frequent_words, 1)}

finetune_most_frequent_words = [word for word, count in word_counts_finetune.most_common(vocab_size)]
finetune_vocab = {word: i for i, word in enumerate(finetune_most_frequent_words, 1)}

# Teste
finetune_vocab = pretrain_vocab

In [10]:
print("Pretrain:")
print(f"Most Frequent Words: {pretrain_most_frequent_words[:10]}")
print(f"Vocabulary Size: {len(pretrain_vocab)}")

print("Finetune:")
print(f"Most Frequent Words: {finetune_most_frequent_words[:10]}")
print(f"Vocabulary Size: {len(finetune_vocab)}")

Pretrain:
Most Frequent Words: [',', '.', 'a', 'o', 'de', 'que', 'e', 'se', ';', 'do']
Vocabulary Size: 15000
Finetune:
Most Frequent Words: [',', '.', 'a', 'que', 'de', 'e', 'o', ';', 'não', 'um']
Vocabulary Size: 15000


#### Codificação / Decodificação das sentenças

In [11]:
def encode_sentence(sentence, vocab):
    return [vocab.get(word, 0) for word in re.findall(pattern, sentence.lower())]

def decode_sentence(encoded_sentence, vocab):
    words = []
    for index in encoded_sentence:
        word = next((word for word, code in vocab.items() if code == index), "<UNK>")
        words.append(word)

    return words

seq = pretrain_cleaned_paragraphs[20]
spc = ' '
encoded = encode_sentence(seq, pretrain_vocab)
decoded = decode_sentence(encoded, pretrain_vocab)

print(f'Original Seq: {seq}')
print(f'Encoded: {encoded}')
print(f'Decoded: {decoded}')
print(f'Reconstructed Seq: {spc.join(decoded)}')

Original Seq: Será d'aquelle, onde se referem as circumstancias, á que attribuo a predilecção de meu espirito pela fórma litteraria do romance.
Encoded: [438, 39, 46, 195, 1, 60, 8, 5695, 19, 1178, 1, 22, 6, 7451, 3, 5696, 5, 62, 164, 50, 665, 1114, 10, 579, 2]
Decoded: ['será', 'd', "'", 'aquelle', ',', 'onde', 'se', 'referem', 'as', 'circumstancias', ',', 'á', 'que', 'attribuo', 'a', 'predilecção', 'de', 'meu', 'espirito', 'pela', 'fórma', 'litteraria', 'do', 'romance', '.']
Reconstructed Seq: será d ' aquelle , onde se referem as circumstancias , á que attribuo a predilecção de meu espirito pela fórma litteraria do romance .


## Classe do dataset

In [12]:
# Dataset class
class BagOfWordsDataset(Dataset):
  def __init__(self, paragraphs, vocab, context):
    self.paragraphs = paragraphs
    self.vocab = vocab
    self.context = context
    self.tokens, self.targets = self.setup()

  def __len__(self):
    return len(self.tokens)

  def __getitem__(self, idx):
    return torch.tensor(self.tokens[idx]), torch.tensor(self.targets[idx])
  
  def setup(self):
    tokens = []
    targets = []
    for paragraph in self.paragraphs:
      encoded = encode_sentence(paragraph, self.vocab)
      
      # If paragraph is smaller than the context, skip it.
      if len(encoded) < self.context + 1:
          continue

      for i in range(len(encoded) - self.context):
        tks = encoded[i:i+self.context]
        tgt = encoded[i+self.context]
        # Only add if there are no unknown tokens in both context and target.
        bad_token = 0
        if not (bad_token in tks or tgt == bad_token):
          tokens.append(tks)
          targets.append(tgt)
    return tokens, targets


In [13]:
# Train/Validation split
pretrain_train_data, pretrain_val_data = train_test_split(pretrain_cleaned_paragraphs, test_size=0.2, random_state=18)
finetune_train_data, finetune_val_data = train_test_split(finetune_cleaned_paragraphs, test_size=0.2, random_state=18)

pretrain_train_dataset = BagOfWordsDataset(pretrain_train_data, pretrain_vocab, context_size)
pretrain_val_dataset = BagOfWordsDataset(pretrain_val_data, pretrain_vocab, context_size)
finetune_train_dataset = BagOfWordsDataset(finetune_train_data, finetune_vocab, context_size)
finetune_val_dataset = BagOfWordsDataset(finetune_val_data, finetune_vocab, context_size)

# Counting all Samples
print("Pretrain dataset:")
print(f"Training samples: {len(pretrain_train_data)}")
print(f"Validation samples: {len(pretrain_val_data)}")
print(f"Training dataset samples: {len(pretrain_train_dataset)}")
print(f"Validation dataset samples: {len(pretrain_val_dataset)}")
print()
print("Finetune dataset:")
print(f"Training samples: {len(finetune_train_data)}")
print(f"Validation samples: {len(finetune_val_data)}")
print(f"Training dataset samples: {len(finetune_train_dataset)}")
print(f"Validation dataset samples: {len(finetune_val_dataset)}")

Pretrain dataset:
Training samples: 7329
Validation samples: 1833
Training dataset samples: 120482
Validation dataset samples: 30919

Finetune dataset:
Training samples: 1038
Validation samples: 260
Training dataset samples: 15583
Validation dataset samples: 4353


In [14]:
tst_loader = DataLoader(pretrain_train_dataset, batch_size = 1, shuffle=True)
sample = next(iter(tst_loader))
print(sample)

[tensor([[   9,  249,   19, 6673, 6618, 5261,    4,  367,   17]]), tensor([3])]


In [15]:
# Train/val loaders
pretrain_train_loader = DataLoader(pretrain_train_dataset, batch_size=batch_size, shuffle=True)
pretrain_val_loader = DataLoader(pretrain_val_dataset, batch_size=batch_size, shuffle=True)
finetune_train_loader = DataLoader(finetune_train_dataset, batch_size=batch_size, shuffle=True)
finetune_val_loader = DataLoader(finetune_val_dataset, batch_size=batch_size, shuffle=True)

## Modelo (Modificado para habilitar ou desabilitar *low-rank adaptation*)
Se o parâmetro é ativado, as três camandas do modelo base (embedding, linear_1 e linear_2) terão seu pesos congelados para a utilização de *low-rank adaptation*, com apenas as matrizes A e B da LoRA sendo treinadas. Quando o parâmetro é desativado, as três camadas voltam a ser treinadas. Toda essa lógica é implementada na função *forward()* abaixo.

In [16]:
class BengioModel(torch.nn.Module):
    def __init__(self):
        super(BengioModel, self).__init__()
        self.LoRA_enabled = False # Default
        self.vocab_size = vocab_size

        # LoRA parameters
        self.lora_alpha = lora_alpha
        self.lora_r = lora_r
        self.scaling = lora_scaling
        
        # Embeddings layer
        self.embeddings = nn.Embedding(vocab_size+1, embedding_dim)
        # First Linear Layer
        self.linear1 = nn.Linear(context_size * embedding_dim, hidden_dim, bias=True)
        # Activation and Dropout
        self.tanh = torch.nn.Tanh()
        self.dropout = torch.nn.Dropout(dropout_rate)
        # Second Linear Layer
        self.linear2 = nn.Linear(hidden_dim, vocab_size+1, bias=True)

        # LoRA Matrixes
        # LoRA on embeddings layer
        self.embeddings_lora_B = nn.Parameter(torch.zeros(vocab_size+1, self.lora_r), requires_grad=False)
        self.embeddings_lora_A = nn.Parameter(torch.randn(self.lora_r, embedding_dim), requires_grad=False)
        # LoRA on the first linear layer
        self.linear1_lora_B = nn.Parameter(torch.zeros(context_size*embedding_dim, self.lora_r), requires_grad=False)
        self.linear1_lora_A = nn.Parameter(torch.randn(self.lora_r, hidden_dim), requires_grad=False)
        # LoRA on the second linear layer
        self.linear2_lora_B = nn.Parameter(torch.zeros(hidden_dim, self.lora_r), requires_grad=False)
        self.linear2_lora_A = nn.Parameter(torch.randn(self.lora_r, vocab_size+1), requires_grad=False)

    def forward(self, inputs):
        # Embeddings
        embeds = self.embeddings(inputs)
        if (self.LoRA_enabled):
            one_hot = F.one_hot(inputs, self.vocab_size+1).to(torch.float32)
            embeddings_LoRA = one_hot @ (self.embeddings_lora_B @ self.embeddings_lora_A)
            embeddings_LoRA = embeddings_LoRA * self.scaling
            embeds = embeds + embeddings_LoRA

        # Flatten embeddings
        embeds = embeds.view(embeds.size(0), -1)
        
        # First linear layer
        out = self.linear1(embeds)
        if (self.LoRA_enabled):
            linear1_lora_out = embeds @ (self.linear1_lora_B @ self.linear1_lora_A)
            linear1_lora_out = linear1_lora_out * self.scaling
            out = out + linear1_lora_out
        
        activation = self.tanh(out)
        activation = self.dropout(activation)

        # Second linear layer
        out = self.linear2(activation)
        if (self.LoRA_enabled):
            linear2_lora_out = activation @ (self.linear2_lora_B @ self.linear2_lora_A)
            linear2_lora_out = linear2_lora_out * self.scaling
            out = out + linear2_lora_out

        return out
    
    def enable_LoRA(self):
        self.LoRA_enabled = True
        # Freeze base model parameters
        print("Freezing Embeddings")
        self.embeddings.weight.requires_grad = False
        print("Freezing Layer 1")
        self.linear1.weight.requires_grad = False
        print("Freezing Layer 2")
        self.linear2.weight.requires_grad = False
        # Unfreeze LoRA parameters
        print("Unfreezing LoRA parameters")
        self.embeddings_lora_A.requires_grad = True
        self.embeddings_lora_B.requires_grad = True
        self.linear1_lora_A.requires_grad = True
        self.linear1_lora_B.requires_grad = True
        self.linear2_lora_A.requires_grad = True
        self.linear2_lora_B.requires_grad = True

    def disable_LoRA(self):
        self.LoRA_enabled = False
        print("Unfreezing Embeddings")
        self.embeddings.weight.requires_grad = True
        print("Unfreezing Layer 1")
        self.linear1.weight.requires_grad = True
        print("Unfreezing Layer 2")
        self.linear2.weight.requires_grad = True
        print("Freezing LoRA parameters")
        self.embeddings_lora_A.requires_grad = False
        self.embeddings_lora_B.requires_grad = False
        self.linear1_lora_A.requires_grad = False
        self.linear1_lora_B.requires_grad = False
        self.linear2_lora_A.requires_grad = False
        self.linear2_lora_B.requires_grad = False

    
    def apply_LoRA_weights(self):
        # Apply LoRA weights to the main model.
        
        lora_embeddings_weights = (self.embeddings_lora_B @ self.embeddings_lora_A) * self.scaling
        lora_linear1_weights = (self.linear1_lora_B @ self.linear1_lora_A).transpose(0, 1) * self.scaling
        lora_linear2_weights = (self.linear2_lora_B @ self.linear2_lora_A).transpose(0, 1) * self.scaling
        self.embeddings.weight.data += lora_embeddings_weights
        self.linear1.weight.data += lora_linear1_weights
        self.linear2.weight.data += lora_linear2_weights

In [17]:
model = BengioModel()

#### Teste básico do modelo

In [18]:
sample = next(iter(pretrain_train_loader))
input = sample[0]
target = sample[1]

print(input.shape)
print(target.shape)

output = model(input)
pred = output.argmax(dim=1)

print(pred)
print(target)

torch.Size([32, 9])
torch.Size([32])
tensor([11676, 14712, 12358,  1281, 11373,  4140,     5, 14324,  8178,  1939,
        13843,  2772, 11263,  3064,  6023,  2521,  9948,  1708,  3948,  4851,
         2365,  4647,    38, 12391,  2148,  7197,  3087, 10173,   361,  3092,
         8001,  2247])
tensor([  221,    14,     2,    11,    14,    18,   120,    25,    21,   156,
           30,     4,   565,    14,     7,     4,    16,     1,   543,   242,
           52,     1,  4881,     9,  3157,    56, 11521,    49,   804,    27,
          194,     2])


## Treinamento e Avaliação

### Funções de Treinamento e Avaliação do Modelo

#### Função para Contagem de Parâmetros do Modelo

In [19]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

# Exemplo de uso:
total_params = count_parameters(model)
print(f'O modelo tem um total de {total_params:,} parâmetros.')

O modelo tem um total de 4,090,665 parâmetros.


#### Função para Avaliação Inicial do Modelo

In [20]:
def init_eval(model, train_loader):
    # Initial Perplexity and Loss
    # Before training
    model.eval()

    loss = 0
    perp = 0

    with torch.no_grad():
        for inputs, targets in tqdm(train_loader):
            inputs = inputs.to(device)
            targets = targets.to(device)
            outputs = model(inputs)
            loss += criterion(outputs, targets).item()

    loss /= len(train_loader)
    perp = torch.exp(torch.tensor(loss))

    print(f'Initial Loss: {loss:.4f}')
    print(f'Initial Perplexity: {perp:.4f}')

#### Função para Treinamento do Modelo

In [21]:
def train(model, epochs, train_loader):
      # Training Loop
      model.train()

      # Overall training time stats
      epoch_time_total = 0
      epoch_time_fwd_total = 0
      
      for epoch in range(epochs):

            epoch_start = time.time()
            # Metrics
            epoch_loss = 0
            epoch_correct = 0
            epoch_samples = 0
            
            # Training times
            forward_time = 0

            for inputs, targets in tqdm(train_loader):
                  inputs = inputs.to(device)  # Move input data to the device
                  targets = targets.to(device)

                  # Forward pass
                  forward_start = time.time()
                  outputs = model(inputs)
                  forward_time += (time.time() - forward_start)

                  loss = criterion(outputs, targets)

                  # Backward pass and optimization
                  optimizer.zero_grad()
                  loss.backward()

                  optimizer.step()

                  # Loss
                  epoch_loss += loss.item()

                  # Predicted
                  predicted = outputs.argmax(dim=1)
                  epoch_correct += (predicted == targets).sum().item()
                  epoch_samples += targets.size(0)

            # Calculate average loss and accuracy for epoch
            avg_loss = epoch_loss / len(train_loader)
            acc = epoch_correct / epoch_samples

            # Perplexity
            perp = torch.exp(torch.tensor(avg_loss))

            epoch_end = time.time()
            epoch_time = epoch_end - epoch_start
            
            # Total training time
            epoch_time_total += epoch_time
            epoch_time_fwd_total += forward_time
            
            # Print epoch statistics
            print(f'Epoch [{epoch+1}/{epochs}], Epoch Time: {epoch_time:.2f}, Loss: {avg_loss:.4f}, Accuracy: {acc:.2f}%, Perplexity: {perp:.4f}')
      
      # Overall training average times
      epoch_time_avg = epoch_time_total / epochs
      epoch_time_fwd_avg = epoch_time_fwd_total / epochs
      epoch_time_bwd_avg = epoch_time_avg - epoch_time_fwd_avg
      print()
      print(f'Average Times per Epoch: {epoch_time_avg:.2f}, Forward Pass: {epoch_time_fwd_avg:.2f}, Backward Pass: {epoch_time_bwd_avg:.2f}')


#### Função para Avaliação na Base de Validação

In [22]:
def eval(model, val_loader):
    model.eval()

    loss_sum = 0
    total_sum = 0
    correct_sum = 0
    eval_round = 0

    loss = 0
    perp = 0

    with torch.no_grad():
        for inputs, targets in tqdm(val_loader):
            inputs = inputs.to(device)
            targets = targets.to(device)

            outputs = model(inputs)
            loss = criterion(outputs, targets)      
            loss_sum += loss

            # Get the predicted labels
            predicted = outputs.argmax(dim=1)

            total_sum += targets.size(0)
            correct_sum += (predicted == targets).sum().item()
            eval_round += 1

    # Calculate accuracy
    acc = 100 * correct_sum / total_sum

    # Calculate average perplexity
    average_loss = loss_sum / len(val_loader)
    average_perplexity = torch.exp(average_loss)

    print(f'Test Accuracy: {acc:.2f}%')
    print(f'Average Loss: {average_loss:.2f}')
    print(f'Average Perplexity: {average_perplexity:.2f}')

### Treinamento e Avaliação do modelo base (sem LoRA)

In [23]:
# Cross Entropy
criterion = nn.CrossEntropyLoss()

# Optimizer
optimizer = torch.optim.SGD(model.parameters(), lr)

model.to(device)
print(model)

base_parameters = count_parameters(model)
print()
print(f'Base model parameters = {base_parameters}')

BengioModel(
  (embeddings): Embedding(15001, 64)
  (linear1): Linear(in_features=576, out_features=200, bias=True)
  (tanh): Tanh()
  (dropout): Dropout(p=0.2, inplace=False)
  (linear2): Linear(in_features=200, out_features=15001, bias=True)
)

Base model parameters = 4090665


In [24]:
print("Base Model - No LoRA")
print()
print("Initial Evaluation")
print()
init_eval(model, pretrain_train_loader)
print()
print("Training the Model")
print()
train(model, pretrain_epochs, pretrain_train_loader)
print()
print("Evaluation on the Validation Dataset")
eval(model, pretrain_train_loader)

Base Model - No LoRA

Initial Evaluation



100%|██████████| 3766/3766 [00:07<00:00, 537.84it/s]


Initial Loss: 9.6609
Initial Perplexity: 15691.1396

Training the Model



100%|██████████| 3766/3766 [00:15<00:00, 239.96it/s]


Epoch [1/10], Epoch Time: 15.70, Loss: 6.8787, Accuracy: 0.08%, Perplexity: 971.4037


100%|██████████| 3766/3766 [00:15<00:00, 247.06it/s]


Epoch [2/10], Epoch Time: 15.25, Loss: 6.2056, Accuracy: 0.09%, Perplexity: 495.5381


100%|██████████| 3766/3766 [00:14<00:00, 255.60it/s]


Epoch [3/10], Epoch Time: 14.74, Loss: 5.9914, Accuracy: 0.10%, Perplexity: 399.9790


100%|██████████| 3766/3766 [00:15<00:00, 250.55it/s]


Epoch [4/10], Epoch Time: 15.03, Loss: 5.8389, Accuracy: 0.11%, Perplexity: 343.4047


100%|██████████| 3766/3766 [00:14<00:00, 252.71it/s]


Epoch [5/10], Epoch Time: 14.91, Loss: 5.7130, Accuracy: 0.11%, Perplexity: 302.7672


100%|██████████| 3766/3766 [00:14<00:00, 253.70it/s]


Epoch [6/10], Epoch Time: 14.85, Loss: 5.6041, Accuracy: 0.12%, Perplexity: 271.5383


100%|██████████| 3766/3766 [00:14<00:00, 254.97it/s]


Epoch [7/10], Epoch Time: 14.78, Loss: 5.5037, Accuracy: 0.12%, Perplexity: 245.6085


100%|██████████| 3766/3766 [00:14<00:00, 256.76it/s]


Epoch [8/10], Epoch Time: 14.67, Loss: 5.4102, Accuracy: 0.13%, Perplexity: 223.6745


100%|██████████| 3766/3766 [00:14<00:00, 259.67it/s]


Epoch [9/10], Epoch Time: 14.51, Loss: 5.3231, Accuracy: 0.13%, Perplexity: 205.0199


100%|██████████| 3766/3766 [00:14<00:00, 256.08it/s]


Epoch [10/10], Epoch Time: 14.71, Loss: 5.2367, Accuracy: 0.13%, Perplexity: 188.0536

Average Times per Epoch: 14.91, Forward Pass: 1.70, Backward Pass: 13.21

Evaluation on the Validation Dataset


100%|██████████| 3766/3766 [00:08<00:00, 461.65it/s]


Test Accuracy: 15.68%
Average Loss: 4.98
Average Perplexity: 146.15


### Ativação do LoRA e *fine-tuning* do modelo com o dataset de fine-tuning.

In [25]:
model.enable_LoRA()
print(model)

lora_parameters = count_parameters(model)

print()
print(f'LoRA model parameters = {lora_parameters}')
print(f'Percentage of base = {lora_parameters/base_parameters:.2f}%')

Freezing Embeddings
Freezing Layer 1
Freezing Layer 2
Unfreezing LoRA parameters
BengioModel(
  (embeddings): Embedding(15001, 64)
  (linear1): Linear(in_features=576, out_features=200, bias=True)
  (tanh): Tanh()
  (dropout): Dropout(p=0.2, inplace=False)
  (linear2): Linear(in_features=200, out_features=15001, bias=True)
)

LoRA model parameters = 46243
Percentage of base = 0.01%


In [26]:
print("Base Model - LoRA enabled")
print("Train the model with the fine-tune dataset")
print()
train(model, finetune_epochs, finetune_train_loader)
print()
print("Evaluation on the Validation Dataset")
eval(model, finetune_val_loader)

Base Model - LoRA enabled
Train the model with the fine-tune dataset



100%|██████████| 487/487 [00:02<00:00, 192.30it/s]


Epoch [1/50], Epoch Time: 2.54, Loss: 6.4791, Accuracy: 0.09%, Perplexity: 651.3954


100%|██████████| 487/487 [00:02<00:00, 197.95it/s]


Epoch [2/50], Epoch Time: 2.46, Loss: 6.1002, Accuracy: 0.11%, Perplexity: 445.9691


100%|██████████| 487/487 [00:02<00:00, 201.58it/s]


Epoch [3/50], Epoch Time: 2.42, Loss: 6.0531, Accuracy: 0.11%, Perplexity: 425.4445


100%|██████████| 487/487 [00:02<00:00, 195.31it/s]


Epoch [4/50], Epoch Time: 2.50, Loss: 6.0224, Accuracy: 0.11%, Perplexity: 412.5819


100%|██████████| 487/487 [00:02<00:00, 198.40it/s]


Epoch [5/50], Epoch Time: 2.46, Loss: 5.9969, Accuracy: 0.11%, Perplexity: 402.1859


100%|██████████| 487/487 [00:02<00:00, 203.00it/s]


Epoch [6/50], Epoch Time: 2.40, Loss: 5.9689, Accuracy: 0.11%, Perplexity: 391.0882


100%|██████████| 487/487 [00:02<00:00, 202.19it/s]


Epoch [7/50], Epoch Time: 2.41, Loss: 5.9511, Accuracy: 0.12%, Perplexity: 384.1693


100%|██████████| 487/487 [00:02<00:00, 205.43it/s]


Epoch [8/50], Epoch Time: 2.37, Loss: 5.9436, Accuracy: 0.12%, Perplexity: 381.3236


100%|██████████| 487/487 [00:02<00:00, 204.42it/s]


Epoch [9/50], Epoch Time: 2.39, Loss: 5.9367, Accuracy: 0.11%, Perplexity: 378.6850


100%|██████████| 487/487 [00:02<00:00, 202.97it/s]


Epoch [10/50], Epoch Time: 2.40, Loss: 5.9209, Accuracy: 0.12%, Perplexity: 372.7441


100%|██████████| 487/487 [00:02<00:00, 206.65it/s]


Epoch [11/50], Epoch Time: 2.36, Loss: 5.9219, Accuracy: 0.12%, Perplexity: 373.1026


100%|██████████| 487/487 [00:02<00:00, 201.45it/s]


Epoch [12/50], Epoch Time: 2.42, Loss: 5.9086, Accuracy: 0.11%, Perplexity: 368.1720


100%|██████████| 487/487 [00:02<00:00, 204.68it/s]


Epoch [13/50], Epoch Time: 2.38, Loss: 5.9004, Accuracy: 0.12%, Perplexity: 365.2015


100%|██████████| 487/487 [00:02<00:00, 205.99it/s]


Epoch [14/50], Epoch Time: 2.37, Loss: 5.8870, Accuracy: 0.12%, Perplexity: 360.3291


100%|██████████| 487/487 [00:02<00:00, 205.00it/s]


Epoch [15/50], Epoch Time: 2.38, Loss: 5.8816, Accuracy: 0.12%, Perplexity: 358.3825


100%|██████████| 487/487 [00:02<00:00, 208.01it/s]


Epoch [16/50], Epoch Time: 2.34, Loss: 5.8682, Accuracy: 0.12%, Perplexity: 353.6273


100%|██████████| 487/487 [00:02<00:00, 203.64it/s]


Epoch [17/50], Epoch Time: 2.39, Loss: 5.8532, Accuracy: 0.12%, Perplexity: 348.3333


100%|██████████| 487/487 [00:02<00:00, 205.30it/s]


Epoch [18/50], Epoch Time: 2.38, Loss: 5.8434, Accuracy: 0.12%, Perplexity: 344.9329


100%|██████████| 487/487 [00:02<00:00, 207.61it/s]


Epoch [19/50], Epoch Time: 2.35, Loss: 5.8410, Accuracy: 0.12%, Perplexity: 344.1245


100%|██████████| 487/487 [00:02<00:00, 204.35it/s]


Epoch [20/50], Epoch Time: 2.39, Loss: 5.8326, Accuracy: 0.12%, Perplexity: 341.2381


100%|██████████| 487/487 [00:02<00:00, 206.69it/s]


Epoch [21/50], Epoch Time: 2.36, Loss: 5.8263, Accuracy: 0.12%, Perplexity: 339.1054


100%|██████████| 487/487 [00:02<00:00, 209.17it/s]


Epoch [22/50], Epoch Time: 2.33, Loss: 5.8235, Accuracy: 0.12%, Perplexity: 338.1449


100%|██████████| 487/487 [00:02<00:00, 205.36it/s]


Epoch [23/50], Epoch Time: 2.38, Loss: 5.8141, Accuracy: 0.12%, Perplexity: 334.9956


100%|██████████| 487/487 [00:02<00:00, 204.51it/s]


Epoch [24/50], Epoch Time: 2.38, Loss: 5.8082, Accuracy: 0.12%, Perplexity: 333.0078


100%|██████████| 487/487 [00:02<00:00, 205.86it/s]


Epoch [25/50], Epoch Time: 2.37, Loss: 5.8071, Accuracy: 0.12%, Perplexity: 332.6433


100%|██████████| 487/487 [00:02<00:00, 204.05it/s]


Epoch [26/50], Epoch Time: 2.39, Loss: 5.7899, Accuracy: 0.12%, Perplexity: 326.9748


100%|██████████| 487/487 [00:02<00:00, 207.15it/s]


Epoch [27/50], Epoch Time: 2.35, Loss: 5.7920, Accuracy: 0.12%, Perplexity: 327.6667


100%|██████████| 487/487 [00:02<00:00, 205.23it/s]


Epoch [28/50], Epoch Time: 2.38, Loss: 5.7719, Accuracy: 0.12%, Perplexity: 321.1365


100%|██████████| 487/487 [00:02<00:00, 205.27it/s]


Epoch [29/50], Epoch Time: 2.38, Loss: 5.7678, Accuracy: 0.12%, Perplexity: 319.8279


100%|██████████| 487/487 [00:02<00:00, 205.91it/s]


Epoch [30/50], Epoch Time: 2.37, Loss: 5.7545, Accuracy: 0.12%, Perplexity: 315.6187


100%|██████████| 487/487 [00:02<00:00, 208.76it/s]


Epoch [31/50], Epoch Time: 2.34, Loss: 5.7519, Accuracy: 0.12%, Perplexity: 314.8011


100%|██████████| 487/487 [00:02<00:00, 206.93it/s]


Epoch [32/50], Epoch Time: 2.36, Loss: 5.7446, Accuracy: 0.12%, Perplexity: 312.4868


100%|██████████| 487/487 [00:02<00:00, 204.18it/s]


Epoch [33/50], Epoch Time: 2.39, Loss: 5.7402, Accuracy: 0.12%, Perplexity: 311.1296


100%|██████████| 487/487 [00:02<00:00, 206.38it/s]


Epoch [34/50], Epoch Time: 2.36, Loss: 5.7440, Accuracy: 0.12%, Perplexity: 312.2980


100%|██████████| 487/487 [00:02<00:00, 202.04it/s]


Epoch [35/50], Epoch Time: 2.41, Loss: 5.7266, Accuracy: 0.12%, Perplexity: 306.9321


100%|██████████| 487/487 [00:02<00:00, 203.54it/s]


Epoch [36/50], Epoch Time: 2.40, Loss: 5.7211, Accuracy: 0.12%, Perplexity: 305.2431


100%|██████████| 487/487 [00:02<00:00, 204.96it/s]


Epoch [37/50], Epoch Time: 2.38, Loss: 5.7184, Accuracy: 0.13%, Perplexity: 304.4091


100%|██████████| 487/487 [00:02<00:00, 207.37it/s]


Epoch [38/50], Epoch Time: 2.35, Loss: 5.7091, Accuracy: 0.13%, Perplexity: 301.6022


100%|██████████| 487/487 [00:02<00:00, 208.13it/s]


Epoch [39/50], Epoch Time: 2.34, Loss: 5.7032, Accuracy: 0.12%, Perplexity: 299.8280


100%|██████████| 487/487 [00:02<00:00, 207.05it/s]


Epoch [40/50], Epoch Time: 2.36, Loss: 5.6995, Accuracy: 0.13%, Perplexity: 298.7159


100%|██████████| 487/487 [00:02<00:00, 202.83it/s]


Epoch [41/50], Epoch Time: 2.41, Loss: 5.6955, Accuracy: 0.13%, Perplexity: 297.5109


100%|██████████| 487/487 [00:02<00:00, 204.78it/s]


Epoch [42/50], Epoch Time: 2.38, Loss: 5.6835, Accuracy: 0.13%, Perplexity: 293.9886


100%|██████████| 487/487 [00:02<00:00, 203.21it/s]


Epoch [43/50], Epoch Time: 2.40, Loss: 5.6907, Accuracy: 0.12%, Perplexity: 296.0911


100%|██████████| 487/487 [00:02<00:00, 208.40it/s]


Epoch [44/50], Epoch Time: 2.34, Loss: 5.6818, Accuracy: 0.13%, Perplexity: 293.4756


100%|██████████| 487/487 [00:02<00:00, 208.24it/s]


Epoch [45/50], Epoch Time: 2.34, Loss: 5.6808, Accuracy: 0.12%, Perplexity: 293.1930


100%|██████████| 487/487 [00:02<00:00, 207.02it/s]


Epoch [46/50], Epoch Time: 2.35, Loss: 5.6642, Accuracy: 0.13%, Perplexity: 288.3504


100%|██████████| 487/487 [00:02<00:00, 206.82it/s]


Epoch [47/50], Epoch Time: 2.36, Loss: 5.6628, Accuracy: 0.13%, Perplexity: 287.9425


100%|██████████| 487/487 [00:02<00:00, 203.73it/s]


Epoch [48/50], Epoch Time: 2.39, Loss: 5.6561, Accuracy: 0.13%, Perplexity: 286.0380


100%|██████████| 487/487 [00:02<00:00, 198.61it/s]


Epoch [49/50], Epoch Time: 2.46, Loss: 5.6593, Accuracy: 0.12%, Perplexity: 286.9468


100%|██████████| 487/487 [00:02<00:00, 200.85it/s]


Epoch [50/50], Epoch Time: 2.43, Loss: 5.6547, Accuracy: 0.13%, Perplexity: 285.6438

Average Times per Epoch: 2.39, Forward Pass: 0.52, Backward Pass: 1.86

Evaluation on the Validation Dataset


100%|██████████| 137/137 [00:00<00:00, 304.30it/s]

Test Accuracy: 12.06%
Average Loss: 5.76
Average Perplexity: 316.01





### Desativação do LoRA e Avaliação do Modelo com os novos pesos

In [27]:
model.apply_LoRA_weights()
model.disable_LoRA()
print(model)
count_parameters(model)

Unfreezing Embeddings
Unfreezing Layer 1
Unfreezing Layer 2
Freezing LoRA parameters
BengioModel(
  (embeddings): Embedding(15001, 64)
  (linear1): Linear(in_features=576, out_features=200, bias=True)
  (tanh): Tanh()
  (dropout): Dropout(p=0.2, inplace=False)
  (linear2): Linear(in_features=200, out_features=15001, bias=True)
)


4090665

In [28]:
print("Base Model - Weights = W + LoRA")
print("Evaluation on the Validation Dataset")
eval(model, pretrain_val_loader)

Base Model - Weights = W + LoRA
Evaluation on the Validation Dataset


100%|██████████| 967/967 [00:01<00:00, 513.01it/s]

Test Accuracy: 9.69%
Average Loss: 6.10
Average Perplexity: 444.83



