## Exercício: LoRA

- Exercício didático para entender a técnica de fazer ajuste fino em modelos grandes usando poucos recursos
- Aplicar no pré exercício de análise de sentimento ou no segundo exercício, e modelo de linguagem, com vocabulário de 3000 palavras, embedding size e 2 camadas, treinados da forma usual (medir tempo de treinamento/época)
- Modificar o seu modelo para adotar a técnica do LoRA no embedding e nas 2 camadas, e fazer o ajuste-fino, isto é, continuar o treinamento anterior, lembrando que as matrizes originais ficarão congeladas e o ajuste dos pesos serão apenas aplicados nas matrizes do LoRA. Medir o tempo de treinamento/época.
- Por último, substituir o modelo original, com os novos pesos calculados pelo W + LoRA.

In [1]:
import os
import sys
import random
import time
import re
import math
from tqdm import tqdm
from collections import Counter
from sklearn.model_selection import train_test_split

# Pytorch
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

import warnings
warnings.filterwarnings("ignore")

In [2]:
# Global variables
# Vocabulary
vocab_size = 3000
context_size = 9
pattern = r'\w+|[,;.:!?\']'

# Training
batch_size = 32
lr = 0.05
base_epochs = 10
lora_epochs = 10

# Model
embedding_dim = 64
hidden_dim = 200
dropout_rate = 0.2

# LoRA parameters
lora_r = 1         # Rank adaptation
lora_alpha = 1     # Scaling factor
lora_scaling = lora_alpha / lora_r 

In [3]:
# Colab environment
IN_COLAB = 'google.colab' in sys.modules

if (IN_COLAB):
    %pip install colorama

    # Google Drive
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)

    project_folder="/content/drive/MyDrive/Classes/IA024/Aula_2_3"
    os.chdir(project_folder)
    !ls -la

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## Faz download e carrega o dataset

In [4]:
# Check if download is necessary
if not os.path.exists("67724.txt.utf-8"):
    print("Downloading Gutenberg texts")

    !wget https://www.gutenberg.org/ebooks/67724.txt.utf-8
    !wget https://www.gutenberg.org/ebooks/67725.txt.utf-8

### Limpeza do texto principal

In [5]:
text_1 = open("67724.txt.utf-8","r").read()
text_2 = open("67725.txt.utf-8","r").read()

def clean_text(text):
    start_marker = "*** START OF THE PROJECT GUTENBERG EBOOK"
    end_marker = "*** END OF THE PROJECT GUTENBERG EBOOK"
    text_start = text.find(start_marker)
    text_end = text.find(end_marker)

    text_content= text[text_start:text_end].replace('\r','')
    paragraphs = []
    for paragraph in text_content.split("\n\n"):
        paragraph = paragraph.replace('\n', ' ').strip()
        # Validation of length and index lines
        if (len(paragraph) > 10 and '....' not in paragraph):
            paragraphs.append(paragraph)
    return paragraphs

cleaned_paragraphs = clean_text(text_1)+clean_text(text_2)
print(f'Number of paragraphs: {len(cleaned_paragraphs)}')

Number of paragraphs: 4596


## Análise do dataset

In [6]:
# Conta as palavras no dataset
def count_words(texts):
    word_counts = Counter()
    for text in texts:
        word_counts.update(re.findall(pattern, text.lower()))
    return word_counts

word_counts = count_words(cleaned_paragraphs)

len(word_counts)

11875

## Criando um vocabulário

In [7]:
most_frequent_words = [word for word, count in word_counts.most_common(vocab_size)]
vocab = {word: i for i, word in enumerate(most_frequent_words, 1)}

In [8]:
print(f"Most Frequent Words: {most_frequent_words[:10]}")
print(f"Vocabulary Size: {len(vocab)}")

Most Frequent Words: [',', '.', 'a', 'que', 'o', 'de', 'e', 'se', ';', 'um']
Vocabulary Size: 3000


#### Codificação / Decodificação das sentenças

In [9]:
def encode_sentence(sentence, vocab):
    return [vocab.get(word, 0) for word in re.findall(pattern, sentence.lower())]

def decode_sentence(encoded_sentence, vocab):
    words = []
    for index in encoded_sentence:
        word = next((word for word, code in vocab.items() if code == index), "<UNK>")
        words.append(word)

    return words

seq = cleaned_paragraphs[20]
spc = ' '
encoded = encode_sentence(seq, vocab)
decoded = decode_sentence(encoded, vocab)

print(f'Original Seq: {seq}')
print(f'Encoded: {encoded}')
print(f'Decoded: {decoded}')
print(f'Reconstructed Seq: {spc.join(decoded)}')

Original Seq: Ahi, o _Paquequer_ lança-se rapido sobre o seu leito, e atravessa as florestas como o tapir, espumando, deixando o pello esparso pelas pontas de rochedo, e enchendo a solidão com o estampido de sua carreira. De repente, falta-lhe o espaço, foge-lhe a terra; o soberbo rio recúa um momento para concentrar as suas forças e precipita-se de um só arremesso, como o tigre sobre a presa.
Encoded: [235, 1, 5, 723, 0, 8, 762, 38, 5, 19, 324, 1, 7, 0, 23, 634, 28, 5, 2447, 1, 0, 1, 763, 5, 1776, 0, 269, 1065, 6, 486, 1, 7, 2448, 3, 687, 16, 5, 2449, 6, 17, 1777, 2, 6, 240, 1, 522, 29, 5, 612, 1, 0, 29, 3, 128, 9, 5, 1577, 99, 0, 10, 72, 18, 0, 23, 87, 591, 7, 2450, 8, 6, 10, 74, 0, 1, 28, 5, 592, 38, 3, 979, 2]
Decoded: ['ahi', ',', 'o', '_paquequer_', '<UNK>', 'se', 'rapido', 'sobre', 'o', 'seu', 'leito', ',', 'e', '<UNK>', 'as', 'florestas', 'como', 'o', 'tapir', ',', '<UNK>', ',', 'deixando', 'o', 'pello', '<UNK>', 'pelas', 'pontas', 'de', 'rochedo', ',', 'e', 'enchendo', 'a', 's

## Classe do dataset

In [10]:
# Dataset class
class BagOfWordsDataset(Dataset):
  def __init__(self, paragraphs, vocab, context):
    self.paragraphs = paragraphs
    self.vocab = vocab
    self.context = context
    self.tokens, self.targets = self.setup()

  def __len__(self):
    return len(self.tokens)

  def __getitem__(self, idx):
    return torch.tensor(self.tokens[idx]), torch.tensor(self.targets[idx])
  
  def setup(self):
    tokens = []
    targets = []
    for paragraph in self.paragraphs:
      encoded = encode_sentence(paragraph, self.vocab)
      
      # If paragraph is smaller than the context, skip it.
      if len(encoded) < self.context + 1:
          continue

      for i in range(len(encoded) - self.context):
        tks = encoded[i:i+self.context]
        tgt = encoded[i+self.context]
        # Only add if there are no unknown tokens in both context and target.
        bad_token = 0
        if not (bad_token in tks or tgt == bad_token):
          tokens.append(tks)
          targets.append(tgt)
    return tokens, targets


In [11]:
# Train/Validation split
train_data, val_data = train_test_split(cleaned_paragraphs, test_size=0.2, random_state=18)

train_dataset = BagOfWordsDataset(train_data, vocab, context_size)
val_dataset = BagOfWordsDataset(val_data, vocab, context_size)

# Counting all Samples
print(f"Training samples: {len(train_data)}")
print(f"Validation samples: {len(val_data)}")
print()
print(f"Training dataset samples: {len(train_dataset)}")
print(f"Validation dataset samples: {len(val_dataset)}")

Training samples: 3676
Validation samples: 920

Training dataset samples: 24360
Validation dataset samples: 5851


In [12]:
tst_loader = DataLoader(train_dataset, batch_size = 1, shuffle=True)
sample = next(iter(tst_loader))
print(sample)

[tensor([[   4,   52,  852,  122,    3, 1493,   11, 1869,  868]]), tensor([7])]


In [13]:
# Train/val loaders
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=True)

## Modelo (Modificado para habilitar ou desabilitar *low-rank adaptation*)
Se o parâmetro é ativado, as três camandas do modelo base (embedding, linear_1 e linear_2) terão seu pesos congelados para a utilização de *low-rank adaptation*, com apenas as matrizes A e B da LoRA sendo treinadas. Quando o parâmetro é desativado, as três camadas voltam a ser treinadas. Toda essa lógica é implementada na função *forward()* abaixo.

In [14]:
class BengioModel(torch.nn.Module):
    def __init__(self):
        super(BengioModel, self).__init__()
        self.LoRA_enabled = False # Default
        self.vocab_size = vocab_size

        # LoRA parameters
        self.lora_alpha = lora_alpha
        self.lora_r = lora_r
        self.scaling = lora_scaling
        
        # Embeddings layer
        self.embeddings = nn.Embedding(vocab_size+1, embedding_dim)
        # First Linear Layer
        self.linear1 = nn.Linear(context_size * embedding_dim, hidden_dim, bias=True)
        # Activation and Dropout
        self.tanh = torch.nn.Tanh()
        self.dropout = torch.nn.Dropout(dropout_rate)
        # Second Linear Layer
        self.linear2 = nn.Linear(hidden_dim, vocab_size+1, bias=True)

        # LoRA Matrixes
        # LoRA on embeddings layer
        self.embeddings_lora_B = nn.Parameter(torch.zeros(vocab_size+1, self.lora_r), requires_grad=False)
        self.embeddings_lora_A = nn.Parameter(torch.randn(self.lora_r, embedding_dim), requires_grad=False)
        # LoRA on the first linear layer
        self.linear1_lora_B = nn.Parameter(torch.zeros(context_size*embedding_dim, self.lora_r), requires_grad=False)
        self.linear1_lora_A = nn.Parameter(torch.randn(self.lora_r, hidden_dim), requires_grad=False)
        # LoRA on the second linear layer
        self.linear2_lora_B = nn.Parameter(torch.zeros(hidden_dim, self.lora_r), requires_grad=False)
        self.linear2_lora_A = nn.Parameter(torch.randn(self.lora_r, vocab_size+1), requires_grad=False)

    def forward(self, inputs):
        # Embeddings
        embeds = self.embeddings(inputs)
        if (self.LoRA_enabled):
            one_hot = F.one_hot(inputs, self.vocab_size+1).to(torch.float32)
            embeddings_LoRA = one_hot @ (self.embeddings_lora_B @ self.embeddings_lora_A)
            embeddings_LoRA = embeddings_LoRA * self.scaling
            embeds = embeds + embeddings_LoRA

        # Flatten embeddings
        embeds = embeds.view(embeds.size(0), -1)
        
        # First linear layer
        out = self.linear1(embeds)
        if (self.LoRA_enabled):
            linear1_lora_out = embeds @ (self.linear1_lora_B @ self.linear1_lora_A)
            linear1_lora_out = linear1_lora_out * self.scaling
            out = out + linear1_lora_out
        
        activation = self.tanh(out)
        activation = self.dropout(activation)

        # Second linear layer
        out = self.linear2(activation)
        if (self.LoRA_enabled):
            linear2_lora_out = activation @ (self.linear2_lora_B @ self.linear2_lora_A)
            linear2_lora_out = linear2_lora_out * self.scaling
            out = out + linear2_lora_out

        return out
    
    def enable_LoRA(self):
        self.LoRA_enabled = True
        # Freeze base model parameters
        print("Freezing Embeddings")
        self.embeddings.weight.requires_grad = False
        print("Freezing Layer 1")
        self.linear1.weight.requires_grad = False
        print("Freezing Layer 2")
        self.linear2.weight.requires_grad = False
        # Unfreeze LoRA parameters
        print("Unfreezing LoRA parameters")
        self.embeddings_lora_A.requires_grad = True
        self.embeddings_lora_B.requires_grad = True
        self.linear1_lora_A.requires_grad = True
        self.linear1_lora_B.requires_grad = True
        self.linear2_lora_A.requires_grad = True
        self.linear2_lora_B.requires_grad = True

    def disable_LoRA(self):
        self.LoRA_enabled = False
        print("Unfreezing Embeddings")
        self.embeddings.weight.requires_grad = True
        print("Unfreezing Layer 1")
        self.linear1.weight.requires_grad = True
        print("Unfreezing Layer 2")
        self.linear2.weight.requires_grad = True
        print("Freezing LoRA parameters")
        self.embeddings_lora_A.requires_grad = False
        self.embeddings_lora_B.requires_grad = False
        self.linear1_lora_A.requires_grad = False
        self.linear1_lora_B.requires_grad = False
        self.linear2_lora_A.requires_grad = False
        self.linear2_lora_B.requires_grad = False

    
    def apply_LoRA_weights(self):
        # Apply LoRA weights to the main model.
        
        lora_embeddings_weights = (self.embeddings_lora_B @ self.embeddings_lora_A) * self.scaling
        lora_linear1_weights = (self.linear1_lora_B @ self.linear1_lora_A).transpose(0, 1) * self.scaling
        lora_linear2_weights = (self.linear2_lora_B @ self.linear2_lora_A).transpose(0, 1) * self.scaling
        self.embeddings.weight.data += lora_embeddings_weights
        self.linear1.weight.data += lora_linear1_weights
        self.linear2.weight.data += lora_linear2_weights

In [15]:
model = BengioModel()

#### Teste básico do modelo

In [16]:
sample = next(iter(train_loader))
input = sample[0]
target = sample[1]

print(input.shape)
print(target.shape)

output = model(input)
pred = output.argmax(dim=1)

print(pred)
print(target)

torch.Size([32, 9])
torch.Size([32])
tensor([1070,  284, 1289, 1302, 1778, 1751, 1253, 1855,  365, 2701,  801, 2274,
         610, 1739,  732,  720, 1156, 2264,  997, 1212, 1088, 1770, 1111, 1946,
        2015,  118,  717, 2730,  633,  863, 1518, 1828])
tensor([2127,  230,    6,  132,    4,    3,  160,   53,   21,   64,   25,  253,
          93,    8,    7, 1571,    2,    5,    5,  513,   36, 1092,   11, 1670,
          18,    2,    5,  442,   45,    3,  134,    2])


## Treinamento e Avaliação

### Funções de Treinamento e Avaliação do Modelo

#### Função para Contagem de Parâmetros do Modelo

In [17]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

# Exemplo de uso:
total_params = count_parameters(model)
print(f'O modelo tem um total de {total_params:,} parâmetros.')

O modelo tem um total de 910,665 parâmetros.


#### Função para Avaliação Inicial do Modelo

In [18]:
def init_eval(model):
    # Initial Perplexity and Loss
    # Before training
    model.eval()

    loss = 0
    perp = 0

    with torch.no_grad():
        for inputs, targets in tqdm(train_loader):
            inputs = inputs.to(device)
            targets = targets.to(device)
            outputs = model(inputs)
            loss += criterion(outputs, targets).item()

    loss /= len(train_loader)
    perp = torch.exp(torch.tensor(loss))

    print(f'Initial Loss: {loss:.4f}')
    print(f'Initial Perplexity: {perp:.4f}')

#### Função para Treinamento do Modelo

In [19]:
def train(model, epochs):
      # Training Loop
      model.train()

      # Overall training time stats
      epoch_time_total = 0
      epoch_time_fwd_total = 0
      
      for epoch in range(epochs):

            epoch_start = time.time()
            # Metrics
            epoch_loss = 0
            epoch_correct = 0
            epoch_samples = 0
            
            # Training times
            forward_time = 0

            for inputs, targets in tqdm(train_loader):
                  inputs = inputs.to(device)  # Move input data to the device
                  targets = targets.to(device)

                  # Forward pass
                  forward_start = time.time()
                  outputs = model(inputs)
                  forward_time += (time.time() - forward_start)

                  loss = criterion(outputs, targets)

                  # Backward pass and optimization
                  optimizer.zero_grad()
                  loss.backward()

                  optimizer.step()

                  # Loss
                  epoch_loss += loss.item()

                  # Predicted
                  predicted = outputs.argmax(dim=1)
                  epoch_correct += (predicted == targets).sum().item()
                  epoch_samples += targets.size(0)

            # Calculate average loss and accuracy for epoch
            avg_loss = epoch_loss / len(train_loader)
            acc = epoch_correct / epoch_samples

            # Perplexity
            perp = torch.exp(torch.tensor(avg_loss))

            epoch_end = time.time()
            epoch_time = epoch_end - epoch_start
            
            # Total training time
            epoch_time_total += epoch_time
            epoch_time_fwd_total += forward_time
            
            # Print epoch statistics
            print(f'Epoch [{epoch+1}/{epochs}], Epoch Time: {epoch_time:.2f}, Loss: {avg_loss:.4f}, Accuracy: {acc:.2f}%, Perplexity: {perp:.4f}')
      
      # Overall training average times
      epoch_time_avg = epoch_time_total / epochs
      epoch_time_fwd_avg = epoch_time_fwd_total / epochs
      epoch_time_bwd_avg = epoch_time_avg - epoch_time_fwd_avg
      print()
      print(f'Average Times per Epoch: {epoch_time_avg:.2f}, Forward Pass: {epoch_time_fwd_avg:.2f}, Backward Pass: {epoch_time_bwd_avg:.2f}')


#### Função para Avaliação na Base de Validação

In [20]:
def eval(model):
    model.eval()

    loss_sum = 0
    total_sum = 0
    correct_sum = 0
    eval_round = 0

    loss = 0
    perp = 0

    with torch.no_grad():
        for inputs, targets in tqdm(val_loader):
            inputs = inputs.to(device)
            targets = targets.to(device)

            outputs = model(inputs)
            loss = criterion(outputs, targets)      
            loss_sum += loss

            # Get the predicted labels
            predicted = outputs.argmax(dim=1)

            total_sum += targets.size(0)
            correct_sum += (predicted == targets).sum().item()
            eval_round += 1

    # Calculate accuracy
    acc = 100 * correct_sum / total_sum

    # Calculate average perplexity
    average_loss = loss_sum / len(val_loader)
    average_perplexity = torch.exp(average_loss)

    print(f'Test Accuracy: {acc:.2f}%')
    print(f'Average Loss: {average_loss:.2f}')
    print(f'Average Perplexity: {average_perplexity:.2f}')

### Treinamento e Avaliação do modelo base (sem LoRA)

In [21]:
# Cross Entropy
criterion = nn.CrossEntropyLoss()

# Optimizer
optimizer = torch.optim.SGD(model.parameters(), lr)

model.to(device)
print(model)

base_parameters = count_parameters(model)
print()
print(f'Base model parameters = {base_parameters}')

BengioModel(
  (embeddings): Embedding(3001, 64)
  (linear1): Linear(in_features=576, out_features=200, bias=True)
  (tanh): Tanh()
  (dropout): Dropout(p=0.2, inplace=False)
  (linear2): Linear(in_features=200, out_features=3001, bias=True)
)

Base model parameters = 910665


In [22]:
print("Base Model - No LoRA")
print()
print("Initial Evaluation")
print()
init_eval(model)
print()
print("Training the Model")
print()
train(model, base_epochs)
print()
print("Evaluation on the Validation Dataset")
eval(model)

Base Model - No LoRA

Initial Evaluation



100%|██████████| 762/762 [00:01<00:00, 571.53it/s]


Initial Loss: 8.0416
Initial Perplexity: 3107.6384

Training the Model



100%|██████████| 762/762 [00:02<00:00, 272.97it/s]


Epoch [1/10], Epoch Time: 2.79, Loss: 6.5736, Accuracy: 0.08%, Perplexity: 715.9240


100%|██████████| 762/762 [00:02<00:00, 282.00it/s]


Epoch [2/10], Epoch Time: 2.71, Loss: 5.6918, Accuracy: 0.11%, Perplexity: 296.4340


100%|██████████| 762/762 [00:03<00:00, 239.79it/s]


Epoch [3/10], Epoch Time: 3.18, Loss: 5.4045, Accuracy: 0.13%, Perplexity: 222.4151


100%|██████████| 762/762 [00:03<00:00, 250.90it/s]


Epoch [4/10], Epoch Time: 3.04, Loss: 5.1889, Accuracy: 0.14%, Perplexity: 179.2778


100%|██████████| 762/762 [00:04<00:00, 154.24it/s]


Epoch [5/10], Epoch Time: 4.95, Loss: 4.9947, Accuracy: 0.16%, Perplexity: 147.6358


100%|██████████| 762/762 [00:05<00:00, 147.48it/s]


Epoch [6/10], Epoch Time: 5.17, Loss: 4.8240, Accuracy: 0.17%, Perplexity: 124.4591


100%|██████████| 762/762 [00:04<00:00, 159.33it/s]


Epoch [7/10], Epoch Time: 4.79, Loss: 4.6572, Accuracy: 0.18%, Perplexity: 105.3412


100%|██████████| 762/762 [00:04<00:00, 176.32it/s]


Epoch [8/10], Epoch Time: 4.33, Loss: 4.4906, Accuracy: 0.20%, Perplexity: 89.1755


100%|██████████| 762/762 [00:03<00:00, 197.23it/s]


Epoch [9/10], Epoch Time: 3.87, Loss: 4.3426, Accuracy: 0.21%, Perplexity: 76.9094


100%|██████████| 762/762 [00:03<00:00, 193.21it/s]


Epoch [10/10], Epoch Time: 3.95, Loss: 4.1908, Accuracy: 0.23%, Perplexity: 66.0756

Average Times per Epoch: 3.88, Forward Pass: 0.43, Backward Pass: 3.44

Evaluation on the Validation Dataset


100%|██████████| 183/183 [00:00<00:00, 424.88it/s]

Test Accuracy: 10.39%
Average Loss: 5.64
Average Perplexity: 280.23





### Ativação do LoRA e *fine-tuning* do modelo

In [23]:
model.enable_LoRA()
print(model)

lora_parameters = count_parameters(model)

print()
print(f'LoRA model parameters = {lora_parameters}')
print(f'Percentage of base = {lora_parameters/base_parameters:.2f}%')

Freezing Embeddings
Freezing Layer 1
Freezing Layer 2
Unfreezing LoRA parameters
BengioModel(
  (embeddings): Embedding(3001, 64)
  (linear1): Linear(in_features=576, out_features=200, bias=True)
  (tanh): Tanh()
  (dropout): Dropout(p=0.2, inplace=False)
  (linear2): Linear(in_features=200, out_features=3001, bias=True)
)

LoRA model parameters = 10243
Percentage of base = 0.01%


In [24]:
print("Base Model - LoRA enabled")
print("Keep On training the model")
print()
train(model, lora_epochs)
print()
print("Evaluation on the Validation Dataset")
eval(model)

Base Model - LoRA enabled
Keep On training the model



100%|██████████| 762/762 [00:05<00:00, 138.50it/s]


Epoch [1/10], Epoch Time: 5.51, Loss: 4.2173, Accuracy: 0.23%, Perplexity: 67.8506


100%|██████████| 762/762 [00:05<00:00, 138.36it/s]


Epoch [2/10], Epoch Time: 5.51, Loss: 4.1437, Accuracy: 0.24%, Perplexity: 63.0328


100%|██████████| 762/762 [00:06<00:00, 110.57it/s]


Epoch [3/10], Epoch Time: 6.90, Loss: 4.1023, Accuracy: 0.24%, Perplexity: 60.4804


100%|██████████| 762/762 [00:07<00:00, 95.43it/s] 


Epoch [4/10], Epoch Time: 7.99, Loss: 4.0659, Accuracy: 0.25%, Perplexity: 58.3173


100%|██████████| 762/762 [00:07<00:00, 104.89it/s]


Epoch [5/10], Epoch Time: 7.27, Loss: 4.0398, Accuracy: 0.26%, Perplexity: 56.8131


100%|██████████| 762/762 [00:05<00:00, 128.28it/s]


Epoch [6/10], Epoch Time: 5.95, Loss: 4.0226, Accuracy: 0.26%, Perplexity: 55.8461


100%|██████████| 762/762 [00:06<00:00, 114.61it/s]


Epoch [7/10], Epoch Time: 6.65, Loss: 4.0090, Accuracy: 0.26%, Perplexity: 55.0909


100%|██████████| 762/762 [00:05<00:00, 129.08it/s]


Epoch [8/10], Epoch Time: 5.91, Loss: 3.9992, Accuracy: 0.26%, Perplexity: 54.5566


100%|██████████| 762/762 [00:06<00:00, 120.57it/s]


Epoch [9/10], Epoch Time: 6.33, Loss: 3.9969, Accuracy: 0.26%, Perplexity: 54.4313


100%|██████████| 762/762 [00:06<00:00, 126.49it/s]


Epoch [10/10], Epoch Time: 6.03, Loss: 3.9858, Accuracy: 0.27%, Perplexity: 53.8273

Average Times per Epoch: 6.41, Forward Pass: 1.41, Backward Pass: 5.00

Evaluation on the Validation Dataset


100%|██████████| 183/183 [00:00<00:00, 213.39it/s]

Test Accuracy: 10.24%
Average Loss: 5.66
Average Perplexity: 287.13





### Desativação do LoRA e Avaliação do Modelo com os novos pesos

In [25]:
model.apply_LoRA_weights()
model.disable_LoRA()
print(model)
count_parameters(model)

Unfreezing Embeddings
Unfreezing Layer 1
Unfreezing Layer 2
Freezing LoRA parameters
BengioModel(
  (embeddings): Embedding(3001, 64)
  (linear1): Linear(in_features=576, out_features=200, bias=True)
  (tanh): Tanh()
  (dropout): Dropout(p=0.2, inplace=False)
  (linear2): Linear(in_features=200, out_features=3001, bias=True)
)


910665

In [26]:
print("Base Model - Weights = W + LoRA")
print("Evaluation on the Validation Dataset")
eval(model)

Base Model - Weights = W + LoRA
Evaluation on the Validation Dataset


100%|██████████| 183/183 [00:00<00:00, 308.43it/s]

Test Accuracy: 10.24%
Average Loss: 5.66
Average Perplexity: 287.03



