# LoRA Fine-Tuning with Hugging Face and TensorFlow on FLAN-T5-base for WMT16 Translation

## Introduction
In this notebook, we will perform LoRA fine-tuning on the FLAN-T5-base model using the WMT16 sentiment analysis dataset. We will replace the dense layers with LoRA layers and fine-tune the model for translation.

In [None]:
!pip install transformers tensorflow datasets tensorflow_addons

[31mERROR: Could not find a version that satisfies the requirement tensorflow_addons (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for tensorflow_addons[0m[31m
[0m

In [None]:
!pip install git+https://github.com/tensorflow/addons.git

Collecting git+https://github.com/tensorflow/addons.git
  Cloning https://github.com/tensorflow/addons.git to /tmp/pip-req-build-nswkw7in
  Running command git clone --filter=blob:none --quiet https://github.com/tensorflow/addons.git /tmp/pip-req-build-nswkw7in
  Resolved https://github.com/tensorflow/addons.git to commit d208d752e98c310280938efa939117bf635a60a8
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting typeguard<3.0.0,>=2.7 (from tensorflow-addons==0.23.0.dev0)
  Downloading typeguard-2.13.3-py3-none-any.whl.metadata (3.6 kB)
Downloading typeguard-2.13.3-py3-none-any.whl (17 kB)
Building wheels for collected packages: tensorflow-addons
  Building wheel for tensorflow-addons (pyproject.toml) ... [?25l[?25hdone
  Created wheel for tensorflow-addons: filename=tensorflow_addons-0.23.0.dev0-cp312-cp312-linux_x86_64.whl size=512724 sha256=fb33d20ba05a

## Load and Preprocess the  Dataset

In [1]:
from datasets import load_dataset

# Load the WMT16 English-German dataset
# dataset = load_dataset('abisee/cnn_dailymail', '3.0.0')
dataset = load_dataset('wmt16', 'de-en')

# Display an example
print(dataset['train'][0])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

de-en/train-00000-of-00003.parquet:   0%|          | 0.00/282M [00:00<?, ?B/s]

de-en/train-00001-of-00003.parquet:   0%|          | 0.00/267M [00:00<?, ?B/s]

de-en/train-00002-of-00003.parquet:   0%|          | 0.00/277M [00:00<?, ?B/s]

de-en/validation-00000-of-00001.parquet:   0%|          | 0.00/343k [00:00<?, ?B/s]

de-en/test-00000-of-00001.parquet:   0%|          | 0.00/475k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/4548885 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/2169 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2999 [00:00<?, ? examples/s]

{'translation': {'de': 'Wiederaufnahme der Sitzungsperiode', 'en': 'Resumption of the session'}}


In [2]:
import tensorflow as tf
from transformers import AutoTokenizer

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained('google/flan-t5-base')

# Preprocess the dataset for input into the model
def preprocess_data(examples):
    """ Preprocess the data for input into the model """
    inputs = [f'Translate English to German: {example["en"]}' for example in examples['translation']]
    # targe
    targets = [example['de'] for example in examples['translation']]
    model_inputs = tokenizer(inputs, max_length=128, truncation=True, padding='max_length', return_tensors='tf')
    labels = tokenizer(targets, max_length=128, truncation=True, padding='max_length', return_tensors='tf').input_ids
    model_inputs['labels'] = labels
    decoder_inputs = tokenizer(targets, max_length=128, truncation=True, padding="max_length")
    model_inputs["decoder_input_ids"] = decoder_inputs["input_ids"]
    return model_inputs


train_dataset = dataset['train'].select(range(20000)).map(preprocess_data, batched=True)
test_dataset = dataset['test'].select(range(1000)).map(preprocess_data, batched=True)

train_dataset = train_dataset.to_tf_dataset(
    columns=['input_ids', 'attention_mask', 'decoder_input_ids'],
    label_cols=['labels'],
    shuffle=True,
    batch_size=128, # larger means faster, but also more gpu. can be reduced to 8 or 16
    collate_fn=None
)

test_dataset = test_dataset.to_tf_dataset(
    columns=['input_ids', 'attention_mask', 'decoder_input_ids'],
    label_cols=['labels'],
    shuffle=False,
    batch_size=128, # larger means faster, but also more gpu
    collate_fn=None
)

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Map:   0%|          | 0/20000 [00:00<?, ? examples/s]

TensorFlow and JAX classes are deprecated and will be removed in Transformers v5. We recommend migrating to PyTorch classes or pinning your version of Transformers.


Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Old behaviour: columns=['a'], labels=['labels'] -> (tf.Tensor, tf.Tensor)  
             : columns='a', labels='labels' -> (tf.Tensor, tf.Tensor)  
New behaviour: columns=['a'],labels=['labels'] -> ({'a': tf.Tensor}, {'labels': tf.Tensor})  
             : columns='a', labels='labels' -> (tf.Tensor, tf.Tensor) 


## Load the Pre-trained FLAN-T5 Model and Modify

In [5]:
#from transformers import TFAutoModelForSeq2SeqLM, AutoTokenizer
#import tensorflow_addons as tfa
#from tensorflow.keras.layers import Dense

# Load the model
#model = TFAutoModelForSeq2SeqLM.from_pretrained('google/flan-t5-base')

In [9]:
from transformers import TFAutoModelForSeq2SeqLM, AutoTokenizer
from tensorflow.keras.layers import Dense
# Remove tensorflow_addons import - not needed for basic model loading

# Load the model - this works exactly the same
#model = TFAutoModelForSeq2SeqLM.from_pretrained('google/flan-t5-base', from_tf=True)
#model = TFAutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base")
model = TFAutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base", from_pt=True)

pytorch_model.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFT5ForConditionalGeneration: ['decoder.embed_tokens.weight', 'encoder.embed_tokens.weight']
- This IS expected if you are initializing TFT5ForConditionalGeneration from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFT5ForConditionalGeneration from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


In [10]:
# create keras layer
# Replace the dense layers with LoRA layers
class LoRALayer(tf.keras.layers.Layer):
    def __init__(self, dense, rank=4):
        super().__init__()
        self.dense = dense # the actual dense layer. We also need it to set it as non-trainable
        self.rank = rank

    def build(self, input_shape):
        self.w_a = self.add_weight(shape=(input_shape[-1], self.rank),
                                   initializer='random_normal',
                                   trainable=True, name='w_a')
        self.w_b = self.add_weight(shape=(self.rank, self.dense.units),
                                   initializer='random_normal',
                                   trainable=True, name='w_b')

    def call(self, inputs):
        original_output = self.dense(inputs)
        lora_output = tf.matmul(tf.matmul(inputs, self.w_a), self.w_b)
        self.dense.trainable = False
        # the final dimensions need to be same as the initial
        return original_output + lora_output


In [None]:
model.summary()

Model: "tft5_for_conditional_generation"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 shared (Embedding)          multiple                  24674304  
                                                                 
 encoder (TFT5MainLayer)     multiple                  109628544 
                                                                 
 decoder (TFT5MainLayer)     multiple                  137949312 
                                                                 
 lm_head (Dense)             multiple                  24674304  
                                                                 
Total params: 247577856 (944.43 MB)
Trainable params: 247577856 (944.43 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [11]:
[layer for layer in model.decoder._flatten_layers()] # this shows all the layers in the model

[<transformers.models.t5.modeling_tf_t5.TFT5MainLayer at 0x7a9f80a7f2c0>,
 <tf_keras.src.layers.core.embedding.Embedding at 0x7a9f809d83e0>,
 <transformers.models.t5.modeling_tf_t5.TFT5Block at 0x7a9f80a7f620>,
 <transformers.models.t5.modeling_tf_t5.TFT5LayerSelfAttention at 0x7a9f80a7f860>,
 <transformers.models.t5.modeling_tf_t5.TFT5Attention at 0x7a9f80a7f980>,
 <tf_keras.src.layers.core.dense.Dense at 0x7a9f80a7fe90>,
 <tf_keras.src.layers.core.dense.Dense at 0x7a9f80a7ff50>,
 <tf_keras.src.layers.core.dense.Dense at 0x7a9f80a90140>,
 <tf_keras.src.layers.core.dense.Dense at 0x7a9f80a902f0>,
 <tf_keras.src.layers.regularization.dropout.Dropout at 0x7a9f80a7fe60>,
 <transformers.models.t5.modeling_tf_t5.TFT5LayerNorm at 0x7a9f80a90770>,
 <tf_keras.src.layers.regularization.dropout.Dropout at 0x7a9f80a7f950>,
 <transformers.models.t5.modeling_tf_t5.TFT5LayerCrossAttention at 0x7a9f80a90a70>,
 <transformers.models.t5.modeling_tf_t5.TFT5Attention at 0x7a9f80a90bf0>,
 <tf_keras.src.lay

In [12]:
import tf_keras
for ix, layer in enumerate(model.decoder._flatten_layers()):
    if isinstance(layer, tf_keras.src.layers.core.dense.Dense):
        layer.trainable = False
        layer = LoRALayer(layer)
    else:
        layer.trainable = False
model.get_layer('encoder').trainable = False
model.get_layer('shared').trainable = False
model.layers[3] = LoRALayer(model.get_layer('lm_head'))

In [13]:
model.summary()

Model: "tft5_for_conditional_generation_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 shared (Embedding)          multiple                  24674304  
                                                                 
 encoder (TFT5MainLayer)     multiple                  109628544 
                                                                 
 decoder (TFT5MainLayer)     multiple                  137949312 
                                                                 
 lm_head (Dense)             multiple                  24674304  
                                                                 
Total params: 247577856 (944.43 MB)
Trainable params: 24674304 (94.12 MB)
Non-trainable params: 222903552 (850.31 MB)
_________________________________________________________________


## Train the Model

In [None]:
# Compile the model
model.compile(optimizer=tf_keras.optimizers.Adam(learning_rate=1e-3),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))

# Train the model
model.fit(train_dataset, validation_data=test_dataset, epochs=3)

Epoch 1/3
Epoch 2/3

In [2]:
print("{:x}".format(255))

ff


## Evaluate the Model

In [None]:
# Evaluate the model
model.evaluate(test_dataset)



2.61997127532959