# LoRA experiments

## Introduction
In this notebook, we will perform LoRA experiments

In [1]:
!pip install transformers tensorflow datasets tensorflow_addons

Collecting datasets
  Downloading datasets-2.20.0-py3-none-any.whl (547 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tensorflow_addons
  Downloading tensorflow_addons-0.23.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (611 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m611.8/611.8 kB[0m [31m40.2 MB/s[0m eta [36m0:00:00[0m
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-16.1.0-cp310-cp310-manylinux_2_28_x86_64.whl (40.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.8/40.8 MB[0m [31m34.7 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
Collecting requests (from transformers)
  Downloading requests-2.32.3-py3-no

## Load and Preprocess the  Dataset

In [2]:
from datasets import load_dataset

# Load the WMT16 English-German dataset
dataset = load_dataset('wmt16', 'de-en')

# Display an example
print(dataset['train'][0])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/11.1k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/282M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/267M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/277M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/343k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/475k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/4548885 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/2169 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2999 [00:00<?, ? examples/s]

{'translation': {'de': 'Wiederaufnahme der Sitzungsperiode', 'en': 'Resumption of the session'}}


In [3]:
import tensorflow as tf
from transformers import AutoTokenizer

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained('google/flan-t5-base')

# Preprocess the dataset for input into the model
def preprocess_data(examples):
    inputs = [f'Translate English to German: {example["en"]}' for example in examples['translation']]
    targets = [example['de'] for example in examples['translation']]
    model_inputs = tokenizer(inputs, max_length=128, truncation=True, padding='max_length', return_tensors='tf')
    labels = tokenizer(targets, max_length=128, truncation=True, padding='max_length', return_tensors='tf').input_ids
    model_inputs['labels'] = labels
    decoder_inputs = tokenizer(targets, max_length=128, truncation=True, padding="max_length")
    model_inputs["decoder_input_ids"] = decoder_inputs["input_ids"]
    return model_inputs

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

In [4]:


# Replace the dense layers with LoRA layers
class LoRALayer(tf.keras.layers.Layer):
    def __init__(self, dense, rank=4):
        super().__init__()
        self.dense = dense
        self.rank = rank

    def build(self, input_shape):
        self.w_a = self.add_weight(shape=(input_shape[-1], self.rank),
                                   initializer='random_normal',
                                   trainable=True, name='w_a')
        self.w_b = self.add_weight(shape=(self.rank, self.dense.units),
                                   initializer='random_normal',
                                   trainable=True, name='w_b')

    def call(self, inputs):
        original_output = self.dense(inputs)
        lora_output = tf.matmul(tf.matmul(inputs, self.w_a), self.w_b)
        self.dense.trainable = False
        return original_output + lora_output


## Train the Model

## Train the Model with Different Ranks and Batch Sizes

In [5]:
import tf_keras
import numpy as np
from transformers import TFAutoModelForSeq2SeqLM, AutoTokenizer
import tensorflow_addons as tfa
from tensorflow.keras.layers import Dense


def count_params(model):
    trainable_params = np.sum([np.prod(v.get_shape().as_list()) for v in model.trainable_weights])
    non_trainable_params = np.sum([np.prod(v.get_shape().as_list()) for v in model.non_trainable_weights])
    return trainable_params, non_trainable_params

# Define training configurations
ranks = [1, 4, 16]
batch_sizes = [8, 64, 128]
epochs = 2
results = {}

for rank in ranks:
    for batch_size in batch_sizes:
        print(f"Training with rank={rank}, batch_size={batch_size}")
        model = TFAutoModelForSeq2SeqLM.from_pretrained('google/flan-t5-base')
        model.layers[0].trainable = False
        model.layers[1].trainable = False
        model.layers[2].trainable = False
        model.layers[3] = LoRALayer(model.get_layer('lm_head'))

        # Get the number of parameters
        trainable_params, non_trainable_params = count_params(model)

        # Print the number of parameters
        print(f"Trainable parameters: {trainable_params}")
        print(f"Non-trainable parameters: {non_trainable_params}")

        # Update the batch size

        train_dataset = dataset['train'].select(range(20000)).map(preprocess_data, batched=True)
        test_dataset = dataset['test'].select(range(1000)).map(preprocess_data, batched=True)

        train_dataset =  train_dataset.to_tf_dataset(
            columns=['input_ids', 'attention_mask', 'decoder_input_ids'],
            label_cols=['labels'],
            shuffle=True,
            batch_size=batch_size,
            collate_fn=None
        )

        test_dataset = test_dataset.to_tf_dataset(
            columns=['input_ids', 'attention_mask', 'decoder_input_ids'],
            label_cols=['labels'],
            shuffle=False,
            batch_size=batch_size,
            collate_fn=None
        )

        # Compile the model
        model.compile(optimizer=tf_keras.optimizers.Adam(learning_rate=1e-2),
                      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))

        # Train the model
        history = model.fit(train_dataset, validation_data=test_dataset, epochs=epochs)
        results[(rank, batch_size)] = history.history



TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). 

For more information see: https://github.com/tensorflow/addons/issues/2807 



Training with rank=1, batch_size=8


config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


Trainable parameters: 24674304
Non-trainable parameters: 222903552


Map:   0%|          | 0/20000 [00:00<?, ? examples/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Old behaviour: columns=['a'], labels=['labels'] -> (tf.Tensor, tf.Tensor)  
             : columns='a', labels='labels' -> (tf.Tensor, tf.Tensor)  
New behaviour: columns=['a'],labels=['labels'] -> ({'a': tf.Tensor}, {'labels': tf.Tensor})  
             : columns='a', labels='labels' -> (tf.Tensor, tf.Tensor) 


Epoch 1/2


Cause: for/else statement not yet supported


Cause: for/else statement not yet supported
Epoch 2/2
Training with rank=1, batch_size=64


All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


Trainable parameters: 24674304
Non-trainable parameters: 222903552


Map:   0%|          | 0/20000 [00:00<?, ? examples/s]

Epoch 1/2
Epoch 2/2
Training with rank=1, batch_size=128


All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


Trainable parameters: 24674304
Non-trainable parameters: 222903552
Epoch 1/2
Epoch 2/2
Training with rank=4, batch_size=8


All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


Trainable parameters: 24674304
Non-trainable parameters: 222903552
Epoch 1/2
Epoch 2/2
Training with rank=4, batch_size=64


All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


Trainable parameters: 24674304
Non-trainable parameters: 222903552
Epoch 1/2
Epoch 2/2
Training with rank=4, batch_size=128


All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


Trainable parameters: 24674304
Non-trainable parameters: 222903552
Epoch 1/2
Epoch 2/2
Training with rank=16, batch_size=8


All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


Trainable parameters: 24674304
Non-trainable parameters: 222903552
Epoch 1/2
Epoch 2/2
Training with rank=16, batch_size=64


All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


Trainable parameters: 24674304
Non-trainable parameters: 222903552
Epoch 1/2
Epoch 2/2
Training with rank=16, batch_size=128


All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


Trainable parameters: 24674304
Non-trainable parameters: 222903552
Epoch 1/2
Epoch 2/2


## Evaluate the Model

In [6]:
# Evaluate the model for each configuration
for (rank, batch_size), history in results.items():
    print(f"Results for rank={rank}, batch_size={batch_size}")
    print(history)


Results for rank=1, batch_size=8
{'loss': [1.1027005910873413, 0.24512912333011627], 'val_loss': [0.6509847044944763, 0.5663990378379822]}
Results for rank=1, batch_size=64
{'loss': [4.121983528137207, 0.5004836320877075], 'val_loss': [0.8912308216094971, 0.6750527024269104]}
Results for rank=1, batch_size=128
{'loss': [7.162280559539795, 0.7077131867408752], 'val_loss': [1.0696519613265991, 0.7991496920585632]}
Results for rank=4, batch_size=8
{'loss': [1.0993390083312988, 0.2436252385377884], 'val_loss': [0.6532699465751648, 0.5617694854736328]}
Results for rank=4, batch_size=64
{'loss': [4.088901519775391, 0.4996236264705658], 'val_loss': [0.8932406902313232, 0.6726864576339722]}
Results for rank=4, batch_size=128
{'loss': [7.20989990234375, 0.7070597410202026], 'val_loss': [1.0674861669540405, 0.7964359521865845]}
Results for rank=16, batch_size=8
{'loss': [1.0984629392623901, 0.24567732214927673], 'val_loss': [0.6624072194099426, 0.5576761364936829]}
Results for rank=16, batch_siz