<a href="https://www.kaggle.com/code/gpreda/fine-tuning-gemma-2-model-using-lora-and-keras?scriptVersionId=205422280" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<center><h1>Fine-tuning Gemma 2 model using LoRA and Keras</h1></center>

<center><img src="https://res.infoq.com/news/2024/02/google-gemma-open-model/en/headerimage/generatedHeaderImage-1708977571481.jpg" width="400"></center>


# Introduction

This notebook will demonstrate three things:

1. How to fine-tune Gemma model using LoRA
2. Creation of a specialised class to query about Kaggle features
3. Some results of querying about Kaggle Docs

This work is largely based on previous work. Here I list the sources:

1. Gemma 2 Model Card, Kaggle Models,https://www.kaggle.com/models/google/gemma-2/
2. Kaggle QA with Gemma - KerasNLP Starter, Kaggle Code, https://www.kaggle.com/code/awsaf49/kaggle-qa-with-gemma-kerasnlp-starter (Version 11)  
3. Fine-tune Gemma models in Keras using LoRA, Kaggle Code, https://www.kaggle.com/code/nilaychauhan/fine-tune-gemma-models-in-keras-using-lora (Version 1) 
4. Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, LoRA: Low-Rank Adaptation of Large Language Models, ArXiv, https://arxiv.org/pdf/2106.09685.pdf
5. Abheesht Sharma, Matthew Watson, Parameter-efficient fine-tuning of GPT-2 with LoRA, https://keras.io/examples/nlp/parameter_efficient_finetuning_of_gpt2_with_lora/
6. Keras 3 API documentation / KerasNLP / Models / Gemma, https://keras.io/api/keras_nlp/models/gemma/
7. Unlock the Power of Gemma 2: Prompt it like a Pro, https://www.kaggle.com/code/gpreda/unlock-the-power-of-gemma-2-prompt-it-like-a-pro  
8. Fine-tune Gemma using LoRA and Keras, https://www.kaggle.com/code/gpreda/fine-tune-gemma-using-lora-and-keras
9. Fine-tunning Gemma model with Kaggle Docs data, https://www.kaggle.com/code/gpreda/fine-tunning-gemma-model-with-kaggle-docs-data
10. Kaggle Docs, Kaggle Dataset, https://www.kaggle.com/datasets/awsaf49/kaggle-docs  


**Let's go**!


# What is Gemma 2?

Gemma is a collection of lightweight, advanced open models developed by Google, leveraging the same research and technology behind the Gemini models. These models are text-to-text, decoder-only large language models available in English, with open weights provided for both pre-trained and instruction-tuned versions. Gemma models excel in a range of text generation tasks, such as question answering, summarization, and reasoning. Their compact size allows for deployment in resource-constrained environments like laptops, desktops, or personal cloud infrastructure, making state-of-the-art AI models more accessible and encouraging innovation for all. 

Gemma 2 represent the 2nd generation of Gemma models. These models were trained on a dataset of text data that includes a wide variety of sources. The **27B** model was trained with **13 trillion** tokens, the **9B** model was trained with **8 trillion tokens**, and **2B** model was trained with **2 trillion** tokens. Here is a summary of their key components: 
* **Web Documents**: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. Primarily English-language content.
* **Code**: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code or understand code-related questions.
* **Mathematics**: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries.

To learn more about Gemma 2, follow this link: [Gemma 2 Model Card](https://www.kaggle.com/models/google/gemma-2).




# What is LoRA?  

**LoRA** stands for **Low-Rank Adaptation**. It is a method used to fine-tune large language models (LLMs) by freezing the weights of the LLM and injecting trainable rank-decomposition matrices. The number of trainable parameters during fine-tunning will decrease therefore considerably. According to **LoRA** paper, this number decreases **10,000 times**, and the computational resources size decreases 3 times. 

# How we proceed?

For fine-tunning with LoRA, we will follow the steps:

1. Install prerequisites
2. Load and process the data for fine-tuning
3. Initialize the code for Gemma causal language model (Gemma Causal LM)
4. Perform fine-tuning
5. Test the fine-tunned model with questions from the data used for fine-tuning and with aditional questions

# Prerequisites


## Install packages

We start by installing `keras-nlp` and `keras` packages.

In [1]:
# Install Keras 3 last. See https://keras.io/getting_started/ for more details.
!pip install -q -U keras-nlp
!pip install -q -U keras>=3
!pip install -q -U kagglehub --upgrade

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-decision-forests 1.8.1 requires wurlitzer, which is not installed.[0m[31m
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-decision-forests 1.8.1 requires wurlitzer, which is not installed.
tensorflow 2.15.0 requires keras<2.16,>=2.15.0, but you have keras 3.6.0 which is incompatible.[0m[31m
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
keras-cv 0.8.2 requires keras-core, which is not installed.[0m[31m
[0m

## Import packages

Now we can import the packages we just installed. We will also install `os`, so that we can set the environment variables needed for keras backend. We will use `jax` as `KERAS_BACKEND`.

Because we want to publish the Model from the Notebook, we also include `kagglehub` and import secrets from `Kaggle App`.

In [2]:
import os
os.environ["KERAS_BACKEND"] = "jax" # you can also use tensorflow or torch
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"] = "1.00" # avoid memory fragmentation on JAX backend.
os.environ["JAX_PLATFORMS"] = ""
import keras
import keras_nlp
import kagglehub


from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
os.environ["KAGGLE_USERNAME"] = user_secrets.get_secret("kaggle_username")
os.environ["KAGGLE_KEY"] = user_secrets.get_secret("kaggle_key")

import numpy as np
import pandas as pd
from tqdm.notebook import tqdm
tqdm.pandas() # progress bar for pandas

import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display, Markdown

2024-11-05 16:51:28.352361: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-05 16:51:28.352464: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-05 16:51:28.485385: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


## Configurations


We use a `Config` class to group the information needed to control the fine-tuning process:
* random seed 
* dataset path
* preset - name of pretrained Gemma 2
* sequence length - this is the maximum size of input sequence for training
* batch size - size of the input batch in training, x 2 as two GPUs
* lora rank - rank for LoRA, higher means more trainable parameters 
* learning rate used in the train
* epochs - number of epochs for train

In [3]:
class Config:
    seed = 42
    dataset_path = "/kaggle/input/kaggle-docs/questions_answers"
    preset = "gemma2_2b_en" # name of pretrained Gemma 2
    sequence_length = 512 # max size of input sequence for training
    batch_size = 1 # size of the input batch in training
    lora_rank = 5 # rank for LoRA, higher means more trainable parameters
    learning_rate=8e-5 # learning rate used in train
    epochs = 15 # number of epochs to train

Set a random seed for results reproducibility.

In [4]:
keras.utils.set_random_seed(Config.seed)

# Load the data


We load the data we will use for fine-tunining.

In [5]:
df = pd.read_csv(f"{Config.dataset_path}/data.csv")
df.head()

Unnamed: 0,Question,Answer,Category
0,What are the different types of competitions a...,# Types of Competitions\n\nKaggle Competitions...,competition
1,What are the different competition formats on ...,There are handful of different formats competi...,competition
2,How to join a competition?,"Before you start, navigate to the [Competition...",competition
3,"How to form, manage, and disband teams in a co...",Everyone that competes in a Competition does s...,competition
4,How do I make a submission in a competition?,You will need to submit your model predictions...,competition


Let's check the total number of rows in this dataset.

In [6]:
df.shape[0]

60

For easiness, we will create the following template for QA: 

In [7]:
template = "\n\nCategory:\nkaggle-{Category}\n\nQuestion:\n{Question}\n\nAnswer:\n{Answer}"
df["prompt"] = df.apply(lambda row: template.format(Category=row.Category,
                                                             Question=row.Question,
                                                             Answer=row.Answer), axis=1)
data = df.prompt.tolist()

## Template utility function

In [8]:
def colorize_text(text):
    for word, color in zip(["Category", "Question", "Answer"], ["blue", "red", "green"]):
        text = text.replace(f"\n\n{word}:", f"\n\n**<font color='{color}'>{word}:</font>**")
    return text

# Specialized class to query Gemma


We define a specialized class to query Gemma. But first, we need to initialize an object of GemmaCausalLM class.

## Initialize the code for Gemma Causal LM

In [9]:
gemma_causal_lm = keras_nlp.models.GemmaCausalLM.from_preset(Config.preset)
gemma_causal_lm.summary()

normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.


## Define the specialized class

Here we define the special class `GemmaQA`. 
in the `__init__` we pass the `GemmaCausalLM` object created before.
The `query` member function uses `GemmaCausalLM` member function `generate` to generate the answer, based on a prompt that includes the category and the question.

In [10]:
class GemmaQA:
    def __init__(self, max_length=512):
        self.max_length = max_length
        self.prompt = template
        self.gemma_causal_lm = gemma_causal_lm
        
    def query(self, category, question):
        response = self.gemma_causal_lm.generate(
            self.prompt.format(
                Category=category,
                Question=question,
                Answer=""), 
            max_length=self.max_length)
        display(Markdown(colorize_text(response)))
        

## Gemma preprocessor


This preprocessing layer will take in batches of strings, and return outputs in a ```(x, y, sample_weight)``` format, where the y label is the next token id in the x sequence.

From the code below, we can see that, after the preprocessor, the data shape is ```(num_samples, sequence_length)```.

In [11]:
x, y, sample_weight = gemma_causal_lm.preprocessor(data[0:2])

In [12]:
print(x, y)

{'token_ids': Array([[     2,    109,   8606, ...,  25688, 235290,  75676],
       [     2,    109,   8606, ...,    109,    688,   2299]],      dtype=int32), 'padding_mask': Array([[ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True]], dtype=bool)} [[   109   8606 235292 ... 235290  75676      1]
 [   109   8606 235292 ...    688   2299      1]]


# Perform fine-tuning with LoRA

## Enable LoRA for the model

LoRA rank is setting the number of trainable parameters. A larger rank will result in a larger number of parameters to train.

In [13]:
# Enable LoRA for the model and set the LoRA rank to the lora_rank as set in Config (4).
gemma_causal_lm.backbone.enable_lora(rank=Config.lora_rank)
gemma_causal_lm.summary()

We see that only a small part of the parameters are trainable. 2.6 billions parameters total, and only 2.9 Millions parameters trainable.

## Run the training sequence

We set the `sequence_length` for the `GemmaCausalLM` (from configuration, will be 512).
We compile the model, with the loss, optimizer and metric.
For the metric, it is used `SparseCategoricalAccuracy`. This metric calculates how often predictions match integer labels.

In [14]:
#set sequence length cf. config (512)
gemma_causal_lm.preprocessor.sequence_length = Config.sequence_length 

# Compile the model with loss, optimizer, and metric
gemma_causal_lm.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(learning_rate=Config.learning_rate),
    weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],
)

# Train model
gemma_causal_lm.fit(data, epochs=Config.epochs, batch_size=Config.batch_size)

Epoch 1/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m88s[0m 829ms/step - loss: 1.6834 - sparse_categorical_accuracy: 0.5354
Epoch 2/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m50s[0m 826ms/step - loss: 1.5878 - sparse_categorical_accuracy: 0.5486
Epoch 3/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m50s[0m 827ms/step - loss: 1.5177 - sparse_categorical_accuracy: 0.5575
Epoch 4/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 1s/step - loss: 1.4742 - sparse_categorical_accuracy: 0.5677
Epoch 5/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m50s[0m 827ms/step - loss: 1.4314 - sparse_categorical_accuracy: 0.5768
Epoch 6/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m50s[0m 827ms/step - loss: 1.3843 - sparse_categorical_accuracy: 0.5854
Epoch 7/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 1s/step - loss: 1.3279 - sparse_categorical_accuracy: 0.5973
Epoch 8/15
[1m60/60[0m 

<keras.src.callbacks.history.History at 0x786a9c134430>

# Test the fine-tuned model

We instantiate an object of class GemmaQA. Because `gemma_causal_lm` was fine-tuned using LoRA, `gemma_qa` defined here will use the fine-tuned model.

In [15]:
gemma_qa = GemmaQA()

For start, we are testing the model with some of the data from the training set itself.

## Sample 1

In [16]:
row = df.iloc[0]
gemma_qa.query(row.Category,row.Question)



**<font color='blue'>Category:</font>**
kaggle-competition

**<font color='red'>Question:</font>**
What are the different types of competitions available on Kaggle?

**<font color='green'>Answer:</font>**
There are two different types of competitions available on Kaggle:

# Data-only Competitions
Data-only competitions are the most common type of competition on Kaggle. They consist of a dataset and a set of metrics to evaluate. Participants are free to explore the data and use any means necessary to generate an answer. Data-only competitions are a great way to hone your skills and get a feel for the competition format.

# Model-only Competitions
Model-only competitions are a relatively new type of competition on Kaggle. They consist of a dataset, a set of metrics to evaluate, and a model that participants must beat. Model-only competitions are a more direct way to measure your skills as a modeler.

## Sample 2

In [17]:
row = df.iloc[15]
gemma_qa.query(row.Category,row.Question)



**<font color='blue'>Category:</font>**
kaggle-tpu

**<font color='red'>Question:</font>**
How to load and save model on TPU?

**<font color='green'>Answer:</font>**
When saving a model on TPU, the model file is saved on the local TPU device. When loading a model, the TPU SDK will automatically look for the model file in the default directory on the TPU host.

## Saving a model on TPU

```python
# Save on TPU
model.save('/tmp/model')

# Load on TPU
print(tf.io.gfile.listdir('/tmp/model'))
```

## Loading a model on TPU

```python
# Save on TPU
model.save('/tmp/model')

# Load on TPU
print(tf.io.gfile.listdir('/tmp/model'))
```

## Tips & Tricks

- TPU-based models are typically larger than CPU-based models. It is recommended to save the model on TPU only if you need to load and use the model again.
- TPU-based models are typically larger than CPU-based models. It is recommended to save the model on TPU only if you need to load and use the model again.
- TPU-based models are typically larger than CPU-based models. It is recommended to save the model on TPU only if you need to load and use the model again.
- TPU-based models are typically larger than CPU-based models. It is recommended to save the model on TPU only if you need to load and use the model again.
- TPU-based models are typically larger than CPU-based models. It is recommended to save the model on TPU only if you need to load and use the model again.
- TPU-based models are typically larger than CPU-based models. It is recommended to save the model on TPU only if you need to load and use the model again.
- TPU-based models are typically larger than CPU-based models. It is recommended to save the model on TPU only if you need to load and use the model again.
- TPU-based models are typically larger than CPU-based models. It is recommended to save the model on TPU only if you need to load and use the model again.
- TPU-based models are typically larger than CPU-based models. It is recommended to save the model on TPU only if you need to load and use the model again.
- TPU-based models are typically larger than CPU-based models. It is recommended to save

## Sample 3

In [18]:
row = df.iloc[25]
gemma_qa.query(row.Category,row.Question)



**<font color='blue'>Category:</font>**
kaggle-noteboook

**<font color='red'>Question:</font>**
What are the different types of notebooks available on Kaggle?

**<font color='green'>Answer:</font>**
There are two types of notebooks available on Kaggle:

- **Python**: A notebook written in the Python programming language. You can run code snippets and scripts, and use the Jupyter Notebook interface for data exploration and visualizations.

- **RMarkdown**: A notebook written in the R programming language. You can run code snippets and scripts, and use the RStudio interface for data exploration and visualizations. RMarkdown notebooks can be rendered as a static HTML document, and you can customize the look and feel of the document with stylesheets and formatting commands.

RMarkdown notebooks are the same as RStudio files, with the .Rmd extension. Python notebooks are the same as Jupyter Notebooks, with the .ipynb extension.

Both types of notebooks are a great way to learn new skills and share your work with the community.

## Not seen question(s)

In [19]:
category = "notebook"
question = "How to run a notebook?"
gemma_qa.query(category,question)



**<font color='blue'>Category:</font>**
kaggle-notebook

**<font color='red'>Question:</font>**
How to run a notebook?

**<font color='green'>Answer:</font>**
When you run a notebook, the code cells execute sequentially. Each line of a code cell is executed before the next one.

When you run a notebook, the code execution is paused after each code cell execution. This allows you to inspect the output, debug your code, or execute the next code cell immediately.

When you run a notebook, the output is displayed in the console and in the Notebook view.

If you want to execute a notebook synchronously, i.e. block the current thread until the notebook execution completes, use the command `Kernel.stop()` at the end of the notebook.

If you want to execute a notebook asynchronously, i.e. continue the current thread after the notebook execution completes, use the command `Kernel.flush()` at the end of the notebook.

You can also choose between synchronous and asynchronous execution by default in the Notebook editor. Go to Settings > Editor > Execution > Async or Sync Block. This setting pauses execution after each code block, but allows you to resume execution by clicking inside the code cell.

When you execute a Notebook, the console output is displayed in real-time. However, the output from Notebook code cells is only updated when the Notebook execution completes.

If you want to update the Notebook view immediately, you can flush the kernel asynchronously. This is done by default when you execute a Notebook synchronously (i.e. blocking the current thread until the notebook execution completes) using the `Kernel.flush()` call.

If you want to execute a Notebook asynchronously, i.e. continue the current thread after the notebook execution completes, use the command `Kernel.flush()` at the end of the notebook.

If you want to execute a Notebook asynchronously and simultaneously, i.e. continue the current thread after the notebook execution completes and update the Notebook view immediately, use the setting "Async Update" in the Notebook editor. Go to Settings > Editor > Execution > Async Update.

If you want to execute a Notebook asynchronously and simultaneously with a custom speed, go to Settings > Editor > Execution > Async Update > Speed and select a value.

If you want to execute a Notebook synchronously and simultaneously with a custom speed, go to Settings > Editor > Execution > Async Block > Speed and select a value.

If you want to execute a Notebook synchronously and simultaneously with a custom pause, go to Settings > Editor > Execution > Sync

In [20]:
category = "discussions"
question = "How to create a discussion topic?"
gemma_qa.query(category,question)



**<font color='blue'>Category:</font>**
kaggle-discussions

**<font color='red'>Question:</font>**
How to create a discussion topic?

**<font color='green'>Answer:</font>**
Discussions is a community forum where users can ask questions, make suggestions, and share ideas. You can create your own topics or reply to existing ones.

Discussions are a great way to connect with other data scientists, share your work, and learn new things. They are also a great way to stay up-to-date on the latest in data science.

Creating a discussion topic is easy. Just click on the “Start Discussion” button on any data science landing page (e.g., [this one](https://www.kaggle.com/competitions/covid19-us-election-2020)) or on your profile page, select a topic from the provided list, or type away in the “New Post” text box.

Discussions are a great way to connect with other data scientists, share your work, and learn new things. They are also a great way to stay up-to

In [21]:
category = "competitions"
question = "What is a code competition?"
gemma_qa.query(category,question)



**<font color='blue'>Category:</font>**
kaggle-competitions

**<font color='red'>Question:</font>**
What is a code competition?

**<font color='green'>Answer:</font>**
Code competitions are a great way to hone your skills and stay competitive in the
machine learning community. They typically take the form of a leaderboard where
participants compete to see who can get the highest score.

Some code competitions are open to any machine learning algorithm, while others
focus on a specific type of algorithm or problem domain. For example, the
TensorFlow Object Detection Challenge is an algorithm-focused code competition
while the YouTube Music Lyrics Prediction Challenge is a problem-domain focused
code competition.

Most code competitions are free to join, but some may require a paid subscription
to Kaggle Pro to participate.

## Types of code competitions

There are two main types of code competitions:

### Algorithm-focused code competitions

Algorithm-focused code competitions focus solely on the algorithm itself.
Participants are typically provided with a dataset and an evaluation metric and
the challenge is to find the best algorithm to achieve the highest possible
score. These types of competitions are a great way to test and improve your
algorithm skills.

Some examples of algorithm-focused code competitions include:

- TensorFlow Object Detection Challenge
- PyTorch Speech Recognition Challenge
- TensorFlow Natural Language Processing (NLP) Challenge

### Problem-domain-focused code competitions

Problem-domain-focused code competitions focus on a specific industry or domain
of machine learning. Participants are typically provided with a dataset and a
specific evaluation metric and the challenge is to find the best algorithm to
achieve the highest possible score. These types of competitions are a great way
to apply your algorithm skills in a real-world setting.

Some examples of problem-domain-focused code competitions include:

- YouTube Music Lyrics Prediction
- Google Smart City Challenge
- Kaggle for Good Competition

## How to Win

### Choose a Competition that Fits You

The first step to winning a code competition is to choose one that fits you.
There are a variety of different types of code competitions, and each one has
its own set of rules and guidelines.

Some common types of code competitions include:

- Algorithm-focused code competitions: These competitions focus solely on the
  algorithm itself. Participants are typically provided with a dataset and an
  evaluation metric and the challenge is to find the best algorithm to achieve
  the highest possible score.
- Problem-domain-focused code competitions: These competitions focus on a
  specific industry

In [22]:
category = "datasets"
question = "What are the steps to create a Kaggle dataset?"
gemma_qa.query(category,question)



**<font color='blue'>Category:</font>**
kaggle-datasets

**<font color='red'>Question:</font>**
What are the steps to create a Kaggle dataset?

**<font color='green'>Answer:</font>**
## Creating a Dataset

1. Create a free account on Kaggle.
2. Navigate to the `Datasets` tab and click on the "Create new dataset" button.
3. Select the "Public" option for the visibility.
4. Enter a descriptive name for your dataset.
5. Choose the data format for your dataset.
6. Upload your data. You can upload a CSV or a JSON file. If you are using a JSON file, you will need to configure your dataset to use a `Rows` schema.
7. Upload your data and then upload your metadata. The data and metadata files must be in the same folder.
8. Upload your data and then upload your metadata. The data and metadata files must be in the same folder.
9. If you are using a JSON file, you will need to configure your dataset to use a `Rows` schema.
10. If you are using a CSV file, you can optionally upload a sample. The sample will be used by the data exploration tool to help visualize the data.
11. If you are using a JSON file, you will need to configure your dataset to use a `Rows` schema.
12. If you are using a JSON file, you can optionally configure your dataset to use a `Columns` schema.
13. If you are using a JSON file, you can optionally configure your dataset to use a `Rows` schema.
14. If you are using a JSON file, you can optionally configure your dataset to use a `Columns` schema.
15. If you are using a JSON file, you can optionally configure your dataset to use a `Rows` schema.
16. If you are using a JSON file, you can optionally configure your dataset to use a `Columns` schema.
17. If you are using a JSON file, you can optionally configure your dataset to use a `Rows` schema.
18. If you are using a JSON file, you can optionally configure your dataset to use a `Columns` schema.
19. If you are using a JSON file, you can optionally configure your dataset to use a `Rows` schema.
20. If you are using a JSON file, you can optionally configure your dataset to use a `Columns` schema.
21

# Save the model

In [23]:
preset_dir = ".\gemma2_2b_en_kaggle_docs"
gemma_causal_lm.save_to_preset(preset_dir)

# Publish Model on Kaggle as a Kaggle Model

We are publishing now the saved model as a Kaggle Model.

In [24]:
kaggle_username = os.environ["KAGGLE_USERNAME"]

kaggle_uri = f"kaggle://{kaggle_username}/gemma2-kaggle-docs/keras/gemma2_2b_en_kaggle_docs"
keras_nlp.upload_preset(kaggle_uri, preset_dir)

Uploading Model https://www.kaggle.com/models/gpreda/gemma2-kaggle-docs/keras/gemma2_2b_en_kaggle_docs ...
Starting upload for file .\gemma2_2b_en_kaggle_docs/task.json


Uploading: 100%|██████████| 2.98k/2.98k [00:00<00:00, 4.35kB/s]

Upload successful: .\gemma2_2b_en_kaggle_docs/task.json (3KB)
Starting upload for file .\gemma2_2b_en_kaggle_docs/config.json



Uploading: 100%|██████████| 782/782 [00:00<00:00, 1.30kB/s]

Upload successful: .\gemma2_2b_en_kaggle_docs/config.json (782B)
Starting upload for file .\gemma2_2b_en_kaggle_docs/tokenizer.json



Uploading: 100%|██████████| 591/591 [00:00<00:00, 965B/s]

Upload successful: .\gemma2_2b_en_kaggle_docs/tokenizer.json (591B)
Starting upload for file .\gemma2_2b_en_kaggle_docs/model.weights.h5



Uploading: 100%|██████████| 10.5G/10.5G [02:55<00:00, 59.7MB/s]

Upload successful: .\gemma2_2b_en_kaggle_docs/model.weights.h5 (10GB)
Starting upload for file .\gemma2_2b_en_kaggle_docs/metadata.json



Uploading: 100%|██████████| 143/143 [00:00<00:00, 232B/s]

Upload successful: .\gemma2_2b_en_kaggle_docs/metadata.json (143B)
Starting upload for file .\gemma2_2b_en_kaggle_docs/preprocessor.json



Uploading: 100%|██████████| 1.41k/1.41k [00:00<00:00, 2.20kB/s]

Upload successful: .\gemma2_2b_en_kaggle_docs/preprocessor.json (1KB)
Starting upload for file .\gemma2_2b_en_kaggle_docs/assets/tokenizer/vocabulary.spm



Uploading: 100%|██████████| 4.24M/4.24M [00:00<00:00, 5.58MB/s]

Upload successful: .\gemma2_2b_en_kaggle_docs/assets/tokenizer/vocabulary.spm (4MB)





Your model instance version has been created.
Files are being processed...
See at: https://www.kaggle.com/models/gpreda/gemma2-kaggle-docs/keras/gemma2_2b_en_kaggle_docs


# Conclusions



We demonstated how to fine-tune a **Gemma 2** model using LoRA.   
We also created a class to run queries to the **Gemma 2** model and tested it with some examples from the existing training data but also with some new, not seen questions.   
We also saved the models as a Keras model. 
Then we published the model as a Kaggle Model on Kaggle Models platform.