<center><h1>Fine-tunning Gemma model with Kaggle Docs data</h1></center>

<center><img src="https://res.infoq.com/news/2024/02/google-gemma-open-model/en/headerimage/generatedHeaderImage-1708977571481.jpg" width="400"></center>


# Introduction

This notebook will demonstrate three things:

1. How to fine-tune Gemma model using LoRA
2. Creation of a specialised class to query about Kaggle features
3. Some results of querying about Kaggle Docs

This work is largely based on previous work. Here I list the sources:

1. Gemma Model Card, Kaggle Models, https://www.kaggle.com/models/google/gemma
2. Kaggle QA with Gemma - KerasNLP Starter, Kaggle Code, https://www.kaggle.com/code/awsaf49/kaggle-qa-with-gemma-kerasnlp-starter (Version 11)  
3. Fine-tune Gemma models in Keras using LoRA, Kaggle Code, https://www.kaggle.com/code/nilaychauhan/fine-tune-gemma-models-in-keras-using-lora (Version 1)  
4. Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, LoRA: Low-Rank Adaptation of Large Language Models, ArXiv, https://arxiv.org/pdf/2106.09685.pdf
5. Abheesht Sharma, Matthew Watson, Parameter-efficient fine-tuning of GPT-2 with LoRA, https://keras.io/examples/nlp/parameter_efficient_finetuning_of_gpt2_with_lora/
6. Keras 3 API documentation / KerasNLP / Models / Gemma, https://keras.io/api/keras_nlp/models/gemma/
7. Kaggle Docs, Kaggle Dataset, https://www.kaggle.com/datasets/awsaf49/kaggle-docs  
8. TPUs in Keras, Kaggle Docs, https://www.kaggle.com/docs/tpu  

**Let's go**!


# What is Gemma?


Gemma is a collection of lightweight source generative AI models designed to be used mostly by developers and researchers. Created by Google DeepMind research lab that also developed Gemini, Gemma is available in several versions, with 2B and 7B parameters, as following:


| Model                  | Parameters      | Tuned versions    | Description                                    | Recomemnded target platforms       |
|------------------------|-----------------|-------------------|------------------------------------------------|------------------------------------|
| `gemma_2b_en`          | 2.51B           | Pretrained        | 18-layer Gemma model (Gemma with 2B parameters)|Mobile devices and laptops          |
| `gemma_instruct_2b_en` | 2.51B           | Instruction tuned | 18-layer Gemma model (Gemma with 2B parameters)| Mobile devices and laptops         | 
| `gemma_7b_en`          | 8.54B           | Pretrained        | 28-layer Gemma model (Gemma with 7B parameters)| Desktop computers and small servers|
| `gemma_instruct_7b_en` | 8.54B           | Instruction tuned | 28-layer Gemma model (Gemma with 7B parameters)| Desktop computers and small servers|




# What is LoRA?  

LoRA stands for Low-Rank Adaptation. It is a method used to fine-tune large language models (LLMs) by freezing the weights of the LLM and injecting trainable rank-decomposition matrices. The number of trainable parameters during fine-tunning will decrease therefore considerably. According to LoRA paper, this number decreases 10,000 times, and the computational resources size decreases 3 times. 

# How we proceed?

For fine-tunning with LoRA, we will follow the steps:

1. Install prerequisites
2. Load and process the data for fine-tuning
3. Initialize the code for Gemma causal language model (Gemma Causal LM)
4. Perform fine-tuning
5. Test the fine-tunned model with questions from the data used for fine-tuning and with aditional questions

# Prerequisites


## Install packages

In [1]:
# Install Keras 3 last. See https://keras.io/getting_started/ for more details.
!pip install -q -U keras-nlp
!pip install -q -U keras>=3

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-decision-forests 1.8.1 requires wurlitzer, which is not installed.[0m[31m
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-decision-forests 1.8.1 requires wurlitzer, which is not installed.
tensorflow 2.15.0 requires keras<2.16,>=2.15.0, but you have keras 3.1.1 which is incompatible.[0m[31m
[0m

## Import packages

In [2]:
import os
os.environ["KERAS_BACKEND"] = "jax" # you can also use tensorflow or torch
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"] = "1.00" # avoid memory fragmentation on JAX backend.
os.environ["JAX_PLATFORMS"] = ""
import keras
import keras_nlp

import numpy as np
import pandas as pd
from tqdm.notebook import tqdm
tqdm.pandas() # progress bar for pandas

import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display, Markdown

2024-04-01 14:12:25.559428: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-01 14:12:25.559528: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-01 14:12:25.697509: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


## Configurations

In [3]:
class Config:
    seed = 42
    dataset_path = "/kaggle/input/kaggle-docs/questions_answers"
    preset = "gemma_2b_en" # name of pretrained Gemma
    sequence_length = 512 # max size of input sequence for training
    batch_size = 1 # size of the input batch in training, x 2 as two GPUs
    epochs = 15 # number of epochs to train

Initialize the TPU.

In [4]:
keras.utils.set_random_seed(Config.seed)

# Load the data

In [5]:
df = pd.read_csv(f"{Config.dataset_path}/data.csv")
df.head()

Unnamed: 0,Question,Answer,Category
0,What are the different types of competitions a...,# Types of Competitions\n\nKaggle Competitions...,competition
1,What are the different competition formats on ...,There are handful of different formats competi...,competition
2,How to join a competition?,"Before you start, navigate to the [Competition...",competition
3,"How to form, manage, and disband teams in a co...",Everyone that competes in a Competition does s...,competition
4,How do I make a submission in a competition?,You will need to submit your model predictions...,competition


Let's check the total number of rows in this dataset.

In [6]:
df.shape[0]

60

For easiness, we will create the following template for QA: 

In [7]:
template = "\n\nCategory:\nkaggle-{Category}\n\nQuestion:\n{Question}\n\nAnswer:\n{Answer}"
df["prompt"] = df.apply(lambda row: template.format(Category=row.Category,
                                                             Question=row.Question,
                                                             Answer=row.Answer), axis=1)
data = df.prompt.tolist()

## Template utility function

In [8]:
def colorize_text(text):
    for word, color in zip(["Category", "Question", "Answer"], ["blue", "red", "green"]):
        text = text.replace(f"\n\n{word}:", f"\n\n**<font color='{color}'>{word}:</font>**")
    return text

# Specialized class to query Gemma


We define a specialized class to query Gemma.

## Initialize the code for Gemma Causal LM

In [9]:
gemma_causal_lm = keras_nlp.models.GemmaCausalLM.from_preset("gemma_2b_en")
gemma_causal_lm.summary()

Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Kaggle notebook...
Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Kaggle notebook...
Attaching 'model.weights.h5' from model 'keras/gemma/keras/gemma_2b_en/2' to your Kaggle notebook...
Attaching 'tokenizer.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Kaggle notebook...
Attaching 'assets/tokenizer/vocabulary.spm' from model 'keras/gemma/keras/gemma_2b_en/2' to your Kaggle notebook...
normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.


## Define the specialized class

In [10]:
class GemmaQA:
    def __init__(self, max_length=512):
        self.max_length = max_length
        self.prompt = template
        self.gemma_causal_lm = gemma_causal_lm
        
    def query(self, category, question):
        response = self.gemma_causal_lm.generate(
            self.prompt.format(
                Category=category,
                Question=question,
                Answer=""), 
            max_length=self.max_length)
        display(Markdown(colorize_text(response)))
        

## Gemma preprocessor


This preprocessing layer will take in batches of strings, and return outputs in a ```(x, y, sample_weight)``` format, where the y label is the next token id in the x sequence.

From the code below, we can see that, after the preprocessor, the data shape is ```(num_samples, sequence_length)```.

In [11]:
x, y, sample_weight = gemma_causal_lm.preprocessor(data[0:2])

In [12]:
print(x, y)

{'token_ids': Array([[   2,  109, 8606, ...,    0,    0,    0],
       [   2,  109, 8606, ...,    0,    0,    0]], dtype=int32), 'padding_mask': Array([[ True,  True,  True, ..., False, False, False],
       [ True,  True,  True, ..., False, False, False]], dtype=bool)} [[   109   8606 235292 ...      0      0      0]
 [   109   8606 235292 ...      0      0      0]]


# Perform fine-tuning with LoRA

## Enable LoRA for the model

LoRA rank is setting the number of trainable parameters. A larger rank will result in a larger number of parameters to train.

In [13]:
# Enable LoRA for the model and set the LoRA rank to 4.
gemma_causal_lm.backbone.enable_lora(rank=4)
gemma_causal_lm.summary()

## Run the training sequence

In [14]:
gemma_causal_lm.preprocessor.sequence_length = Config.sequence_length 

# Compile the model with loss, optimizer, and metric
gemma_causal_lm.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(learning_rate=8e-5),
    weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],
)

# Train model
gemma_causal_lm.fit(data, epochs=Config.epochs, batch_size=Config.batch_size)

Epoch 1/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m65s[0m 734ms/step - loss: 1.7209 - sparse_categorical_accuracy: 0.5241
Epoch 2/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 728ms/step - loss: 1.6869 - sparse_categorical_accuracy: 0.5313
Epoch 3/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 728ms/step - loss: 1.6175 - sparse_categorical_accuracy: 0.5417
Epoch 4/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 728ms/step - loss: 1.5770 - sparse_categorical_accuracy: 0.5509
Epoch 5/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 728ms/step - loss: 1.5537 - sparse_categorical_accuracy: 0.5552
Epoch 6/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 728ms/step - loss: 1.5304 - sparse_categorical_accuracy: 0.5568
Epoch 7/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 728ms/step - loss: 1.5028 - sparse_categorical_accuracy: 0.5630
Epoch 8/15
[1m60/60

<keras.src.callbacks.history.History at 0x7cc610176080>

# Test the fine-tuned model

In [15]:
gemma_qa = GemmaQA()

## Sample 1

In [16]:
row = df.iloc[0]
gemma_qa.query(row.Category,row.Question)



**<font color='blue'>Category:</font>**
kaggle-competition

**<font color='red'>Question:</font>**
What are the different types of competitions available on Kaggle?

**<font color='green'>Answer:</font>**
## Datasets

The Dataset page describes what a dataset is (essentially an archive of files), how to use it (e.g., download, upload and share datasets), and what to do with it (e.g., create a new notebook, run a model on it, and evaluate your model).

## Competitions

The Competition page describes what a competition is (essentially a timed Kaggle Dataset) in terms of the rules of the competition and the data provided to the contestants.

## Tracks

If you’re competing in a multi-track competition, Tracks is the page to find your track(s). Tracks are the individual components of the competition.

## Participants

Participants is the page of your competition participants.

## Results

Results is the page of all the results for your competition.

## Rules

Finally, the Competition page has an “Rules as a Files” section. This is where competitions can publish their rules.



## Sample 2

In [17]:
row = df.iloc[15]
gemma_qa.query(row.Category,row.Question)



**<font color='blue'>Category:</font>**
kaggle-tpu

**<font color='red'>Question:</font>**
How to load and save model on TPU?

**<font color='green'>Answer:</font>**
You can load and save models on TPU devices by using TPUSharing.

## Getting Started

TPUSharing can be accessed via the [TPUSharing Console](https://www.kaggle.com/tpusher/tpu).

Once you have created the model on TPU device you can download the model from the console. You can then upload the model to the Kaggle Model Hub or download it from the hub.

## Saving a TPU Model

To save a model to a Tensor Processing Unit (TPU) device, follow these steps.

First, make sure you are connected to the internet and have a network connection that supports your network and model size.

Next, navigate to the "Upload Model" tab in the left-hand menu. Select the model file to upload, wait for the upload to finish, and you are ready to run the model on TPU!


## Loading a TPU Model

To load a saved model from TPU, follow these steps.

## Sample 3

In [18]:
row = df.iloc[25]
gemma_qa.query(row.Category,row.Question)



**<font color='blue'>Category:</font>**
kaggle-noteboook

**<font color='red'>Question:</font>**
What are the different types of notebooks available on Kaggle?

**<font color='green'>Answer:</font>**
## What are Notebooks?

Kaggle notebooks are an integral aspect of the data science community on Kaggle. Notebooks are where you write and execute code to analyze data and create insights.

## Notebook types

The types of notebooks available on Kaggle are:

- A standard notebook that you can share privately or publicly.
- A notebook published to Kaggle. Public notebooks are accessible to anyone with the notebook URL; Kaggle notebooks are visible to all Kagglers. Kaggle notebooks are a great way to collaborate on projects or host your own data science challenge. You can learn more about using Kaggle notebooks in the rest of the notebook editor. Public Kaggle notebooks are visible to anyone with the notebook URL and can be forked. For more details see the "Who Can See This Notebook?" section below.
- A private Kaggle notebook. Private Kaggle notebooks are accessible only to those with the invite-only URL. This can be useful if you'd like to share an early version of a notebook project with feedback or collaborators.

## Notebook sharing

There are three types of sharing:

- A notebook is private if you share it with only your username.
- A notebook is publicly available on Kaggle if you share it with your username (this is the default behavior). Public Kaggle notebooks are visible to anyone with the notebook URL and can be forked. For details and to set up access permissions, see “What are the different types of notebooks available on Kaggle?” below.
- A notebook is a private notebook. You can share private notebooks with your username, with a public URL that is not shareable.

## What are the different types of notebooks available on Kaggle?

- **Public** notebooks can be accessed by everyone with the notebook's URL. This can be useful if you'd like to share an early version of a notebook project with feedback or collaborators.
- **Private** notebooks can be accessed by the username the notebook was created with. You can share private notebooks with your username, with a URL that is not shareable. This is useful if you'd like to share an early version of a notebook project with only your username, or with a small group of people.
- **Public Kaggle notebooks** can be accessed by everyone with Kaggle. This can be useful if you'd like to share an

## Not seen question(s)

In [19]:
category = "notebook"
question = "How to run a notebook?"
gemma_qa.query(category,question)



**<font color='blue'>Category:</font>**
kaggle-notebook

**<font color='red'>Question:</font>**
How to run a notebook?

**<font color='green'>Answer:</font>**
When a notebook is created, it's publicly available through Kaggle's website. To access a notebook, navigate to the relevant Competition or Dataset page. The code in the notebook will then run in a sandbox and any resulting files (e.g. models, predictions, or metrics) will be saved to S3.

## Run a notebook from a saved notebook URL

You can share a URL of any notebook and other people will be able to run that notebook. If you share the URL of a Notebook on a public Dataset, then other people who have access to the Dataset will be able to run that notebook.

## Run a notebook in a notebook sandbox from a saved notebook URL

If you want to run a Notebook without sharing a URL, you can run it in the notebook sandbox. The easiest way to access the notebook sandbox is to go to the website of a Dataset or Competition and click on "View Notebook" to open the notebook associated with that Competition or Dataset in a new tab. You can then copy and paste your notebook’s code there. However, if you want to run a Notebook that is not associated with a Dataset or Competition (e.g. you have it locally), you need to use a URL.

In [20]:
category = "discussions"
question = "How to create a discussion topic?"
gemma_qa.query(category,question)



**<font color='blue'>Category:</font>**
kaggle-discussions

**<font color='red'>Question:</font>**
How to create a discussion topic?

**<font color='green'>Answer:</font>**
## Create a Discussion Topic

To start a discussion topic on Kaggle, you can either post a new one in the [Discussions](https://www.kaggle.com/discussions) section or upvote an existing one on the site.

You can also search for existing Discussions in the sidebar.

You can search by tag (e.g. "python"), keyword (e.g. "model transfer learning"), or user (e.g. "Kaggle").

In [21]:
category = "competitions"
question = "What is a code competition?"
gemma_qa.query(category,question)



**<font color='blue'>Category:</font>**
kaggle-competitions

**<font color='red'>Question:</font>**
What is a code competition?

**<font color='green'>Answer:</font>**
Code competitions are an important part of Kaggle Datasets and Competitions. Code competitions provide an opportunity for participants to demonstrate the utility of the code they’ve written in a time limit format.

There are three types of code competitions: notebooks, models, and challenges. Notebooks competitions and models competitions are very similar, and both provide an opportunity for participants to demonstrate the utility of the code they’ve written in a time limit format. The primary difference is that models competitions allow participants to use models written for the competition in addition to the code that participants submit. Challenges competitions are a more limited version of notebooks competitions. They have limited functionality and are designed to be more lightweight and accessible for newcomers to data science.

# Conclusions



We demonstated how to fine-tune a Gemma model using LoRA.   
We also created a class to run queries to the Gemma model and tested it with some examples from the existing training data but also with some new, not seen questions.