<a href="https://www.kaggle.com/code/gpreda/fine-tuning-gemma-2-model-using-lora-and-keras?scriptVersionId=193572693" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<center><h1>Fine-tuning Gemma 2 model using LoRA and Keras</h1></center>

<center><img src="https://res.infoq.com/news/2024/02/google-gemma-open-model/en/headerimage/generatedHeaderImage-1708977571481.jpg" width="400"></center>


# Introduction

This notebook will demonstrate three things:

1. How to fine-tune Gemma model using LoRA
2. Creation of a specialised class to query about Kaggle features
3. Some results of querying about Kaggle Docs

This work is largely based on previous work. Here I list the sources:

1. Gemma 2 Model Card, Kaggle Models,https://www.kaggle.com/models/google/gemma-2/
2. Kaggle QA with Gemma - KerasNLP Starter, Kaggle Code, https://www.kaggle.com/code/awsaf49/kaggle-qa-with-gemma-kerasnlp-starter (Version 11)  
3. Fine-tune Gemma models in Keras using LoRA, Kaggle Code, https://www.kaggle.com/code/nilaychauhan/fine-tune-gemma-models-in-keras-using-lora (Version 1) 
4. Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, LoRA: Low-Rank Adaptation of Large Language Models, ArXiv, https://arxiv.org/pdf/2106.09685.pdf
5. Abheesht Sharma, Matthew Watson, Parameter-efficient fine-tuning of GPT-2 with LoRA, https://keras.io/examples/nlp/parameter_efficient_finetuning_of_gpt2_with_lora/
6. Keras 3 API documentation / KerasNLP / Models / Gemma, https://keras.io/api/keras_nlp/models/gemma/
7. Unlock the Power of Gemma 2: Prompt it like a Pro, https://www.kaggle.com/code/gpreda/unlock-the-power-of-gemma-2-prompt-it-like-a-pro  
8. Fine-tune Gemma using LoRA and Keras, https://www.kaggle.com/code/gpreda/fine-tune-gemma-using-lora-and-keras
9. Fine-tunning Gemma model with Kaggle Docs data, https://www.kaggle.com/code/gpreda/fine-tunning-gemma-model-with-kaggle-docs-data
10. Kaggle Docs, Kaggle Dataset, https://www.kaggle.com/datasets/awsaf49/kaggle-docs  


**Let's go**!


# What is Gemma 2?

Gemma is a collection of lightweight, advanced open models developed by Google, leveraging the same research and technology behind the Gemini models. These models are text-to-text, decoder-only large language models available in English, with open weights provided for both pre-trained and instruction-tuned versions. Gemma models excel in a range of text generation tasks, such as question answering, summarization, and reasoning. Their compact size allows for deployment in resource-constrained environments like laptops, desktops, or personal cloud infrastructure, making state-of-the-art AI models more accessible and encouraging innovation for all. 

Gemma 2 represent the 2nd generation of Gemma models. These models were trained on a dataset of text data that includes a wide variety of sources. The **27B** model was trained with **13 trillion** tokens, the **9B** model was trained with **8 trillion tokens**, and **2B** model was trained with **2 trillion** tokens. Here is a summary of their key components: 
* **Web Documents**: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. Primarily English-language content.
* **Code**: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code or understand code-related questions.
* **Mathematics**: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries.

To learn more about Gemma 2, follow this link: [Gemma 2 Model Card](https://www.kaggle.com/models/google/gemma-2).




# What is LoRA?  

**LoRA** stands for **Low-Rank Adaptation**. It is a method used to fine-tune large language models (LLMs) by freezing the weights of the LLM and injecting trainable rank-decomposition matrices. The number of trainable parameters during fine-tunning will decrease therefore considerably. According to **LoRA** paper, this number decreases **10,000 times**, and the computational resources size decreases 3 times. 

# How we proceed?

For fine-tunning with LoRA, we will follow the steps:

1. Install prerequisites
2. Load and process the data for fine-tuning
3. Initialize the code for Gemma causal language model (Gemma Causal LM)
4. Perform fine-tuning
5. Test the fine-tunned model with questions from the data used for fine-tuning and with aditional questions

# Prerequisites


## Install packages

We start by installing `keras-nlp` and `keras` packages.

In [1]:
# Install Keras 3 last. See https://keras.io/getting_started/ for more details.
!pip install -q -U keras-nlp
!pip install -q -U keras>=3

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-decision-forests 1.8.1 requires wurlitzer, which is not installed.[0m[31m
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-decision-forests 1.8.1 requires wurlitzer, which is not installed.
tensorflow 2.15.0 requires keras<2.16,>=2.15.0, but you have keras 3.5.0 which is incompatible.[0m[31m
[0m

## Import packages

Now we can import the packages we just installed. We will also install `os`, so that we can set the environment variables needed for keras backend. We will use `jax` as `KERAS_BACKEND`.

In [2]:
import os
os.environ["KERAS_BACKEND"] = "jax" # you can also use tensorflow or torch
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"] = "1.00" # avoid memory fragmentation on JAX backend.
os.environ["JAX_PLATFORMS"] = ""
import keras
import keras_nlp

import numpy as np
import pandas as pd
from tqdm.notebook import tqdm
tqdm.pandas() # progress bar for pandas

import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display, Markdown

2024-08-22 09:26:28.851770: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-22 09:26:28.851897: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-22 09:26:28.971956: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


## Configurations


We use a `Config` class to group the information needed to control the fine-tuning process:
* random seed 
* dataset path
* preset - name of pretrained Gemma 2
* sequence length - this is the maximum size of input sequence for training
* batch size - size of the input batch in training, x 2 as two GPUs
* lora rank - rank for LoRA, higher means more trainable parameters 
* learning rate used in the train
* epochs - number of epochs for train

In [3]:
class Config:
    seed = 42
    dataset_path = "/kaggle/input/kaggle-docs/questions_answers"
    preset = "gemma2_2b_en" # name of pretrained Gemma 2
    sequence_length = 512 # max size of input sequence for training
    batch_size = 1 # size of the input batch in training
    lora_rank = 3 # rank for LoRA, higher means more trainable parameters
    learning_rate=8e-5 # learning rate used in train
    epochs = 10 # number of epochs to train

Set a random seed for results reproducibility.

In [4]:
keras.utils.set_random_seed(Config.seed)

# Load the data


We load the data we will use for fine-tunining.

In [5]:
df = pd.read_csv(f"{Config.dataset_path}/data.csv")
df.head()

Unnamed: 0,Question,Answer,Category
0,What are the different types of competitions a...,# Types of Competitions\n\nKaggle Competitions...,competition
1,What are the different competition formats on ...,There are handful of different formats competi...,competition
2,How to join a competition?,"Before you start, navigate to the [Competition...",competition
3,"How to form, manage, and disband teams in a co...",Everyone that competes in a Competition does s...,competition
4,How do I make a submission in a competition?,You will need to submit your model predictions...,competition


Let's check the total number of rows in this dataset.

In [6]:
df.shape[0]

60

For easiness, we will create the following template for QA: 

In [7]:
template = "\n\nCategory:\nkaggle-{Category}\n\nQuestion:\n{Question}\n\nAnswer:\n{Answer}"
df["prompt"] = df.apply(lambda row: template.format(Category=row.Category,
                                                             Question=row.Question,
                                                             Answer=row.Answer), axis=1)
data = df.prompt.tolist()

## Template utility function

In [8]:
def colorize_text(text):
    for word, color in zip(["Category", "Question", "Answer"], ["blue", "red", "green"]):
        text = text.replace(f"\n\n{word}:", f"\n\n**<font color='{color}'>{word}:</font>**")
    return text

# Specialized class to query Gemma


We define a specialized class to query Gemma. But first, we need to initialize an object of GemmaCausalLM class.

## Initialize the code for Gemma Causal LM

In [9]:
gemma_causal_lm = keras_nlp.models.GemmaCausalLM.from_preset(Config.preset)
gemma_causal_lm.summary()

Attaching 'model.safetensors' from model 'keras/gemma2/keras/gemma2_2b_en/1' to your Kaggle notebook...
Attaching 'model.safetensors.index.json' from model 'keras/gemma2/keras/gemma2_2b_en/1' to your Kaggle notebook...
Attaching 'metadata.json' from model 'keras/gemma2/keras/gemma2_2b_en/1' to your Kaggle notebook...
Attaching 'metadata.json' from model 'keras/gemma2/keras/gemma2_2b_en/1' to your Kaggle notebook...
Attaching 'task.json' from model 'keras/gemma2/keras/gemma2_2b_en/1' to your Kaggle notebook...
Attaching 'config.json' from model 'keras/gemma2/keras/gemma2_2b_en/1' to your Kaggle notebook...
Attaching 'model.safetensors' from model 'keras/gemma2/keras/gemma2_2b_en/1' to your Kaggle notebook...
Attaching 'model.safetensors.index.json' from model 'keras/gemma2/keras/gemma2_2b_en/1' to your Kaggle notebook...
Attaching 'metadata.json' from model 'keras/gemma2/keras/gemma2_2b_en/1' to your Kaggle notebook...
Attaching 'metadata.json' from model 'keras/gemma2/keras/gemma2_2b_e

## Define the specialized class

Here we define the special class `GemmaQA`. 
in the `__init__` we pass the `GemmaCausalLM` object created before.
The `query` member function uses `GemmaCausalLM` member function `generate` to generate the answer, based on a prompt that includes the category and the question.

In [10]:
class GemmaQA:
    def __init__(self, max_length=512):
        self.max_length = max_length
        self.prompt = template
        self.gemma_causal_lm = gemma_causal_lm
        
    def query(self, category, question):
        response = self.gemma_causal_lm.generate(
            self.prompt.format(
                Category=category,
                Question=question,
                Answer=""), 
            max_length=self.max_length)
        display(Markdown(colorize_text(response)))
        

## Gemma preprocessor


This preprocessing layer will take in batches of strings, and return outputs in a ```(x, y, sample_weight)``` format, where the y label is the next token id in the x sequence.

From the code below, we can see that, after the preprocessor, the data shape is ```(num_samples, sequence_length)```.

In [11]:
x, y, sample_weight = gemma_causal_lm.preprocessor(data[0:2])

In [12]:
print(x, y)

{'token_ids': Array([[   2,  109, 8606, ...,    0,    0,    0],
       [   2,  109, 8606, ...,    0,    0,    0]], dtype=int32), 'padding_mask': Array([[ True,  True,  True, ..., False, False, False],
       [ True,  True,  True, ..., False, False, False]], dtype=bool)} [[   109   8606 235292 ...      0      0      0]
 [   109   8606 235292 ...      0      0      0]]


# Perform fine-tuning with LoRA

## Enable LoRA for the model

LoRA rank is setting the number of trainable parameters. A larger rank will result in a larger number of parameters to train.

In [13]:
# Enable LoRA for the model and set the LoRA rank to the lora_rank as set in Config (4).
gemma_causal_lm.backbone.enable_lora(rank=Config.lora_rank)
gemma_causal_lm.summary()

We see that only a small part of the parameters are trainable. 2.6 billions parameters total, and only 2.9 Millions parameters trainable.

## Run the training sequence

We set the `sequence_length` for the `GemmaCausalLM` (from configuration, will be 512).
We compile the model, with the loss, optimizer and metric.
For the metric, it is used `SparseCategoricalAccuracy`. This metric calculates how often predictions match integer labels.

In [14]:
#set sequence length cf. config (512)
gemma_causal_lm.preprocessor.sequence_length = Config.sequence_length 

# Compile the model with loss, optimizer, and metric
gemma_causal_lm.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(learning_rate=Config.learning_rate),
    weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],
)

# Train model
gemma_causal_lm.fit(data, epochs=Config.epochs, batch_size=Config.batch_size)

Epoch 1/10
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m88s[0m 842ms/step - loss: 1.6867 - sparse_categorical_accuracy: 0.5355
Epoch 2/10
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 1s/step - loss: 1.6137 - sparse_categorical_accuracy: 0.5458
Epoch 3/10
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m50s[0m 837ms/step - loss: 1.5368 - sparse_categorical_accuracy: 0.5551
Epoch 4/10
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m50s[0m 838ms/step - loss: 1.4978 - sparse_categorical_accuracy: 0.5637
Epoch 5/10
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m50s[0m 837ms/step - loss: 1.4636 - sparse_categorical_accuracy: 0.5702
Epoch 6/10
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m50s[0m 838ms/step - loss: 1.4269 - sparse_categorical_accuracy: 0.5774
Epoch 7/10
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m50s[0m 838ms/step - loss: 1.3854 - sparse_categorical_accuracy: 0.5874
Epoch 8/10
[1m60/60[0

<keras.src.callbacks.history.History at 0x7ab5e81d2800>

# Test the fine-tuned model

We instantiate an object of class GemmaQA. Because `gemma_causal_lm` was fine-tuned using LoRA, `gemma_qa` defined here will use the fine-tuned model.

In [15]:
gemma_qa = GemmaQA()

For start, we are testing the model with some of the data from the training set itself.

## Sample 1

In [16]:
row = df.iloc[0]
gemma_qa.query(row.Category,row.Question)



**<font color='blue'>Category:</font>**
kaggle-competition

**<font color='red'>Question:</font>**
What are the different types of competitions available on Kaggle?

**<font color='green'>Answer:</font>**
Kaggle competitions are a great way to get started with machine learning. They are a structured way to learn new skills and techniques, and they are a great way to meet other data scientists.

There are two types of competitions on Kaggle: public and private.

## Public Competitions

Public competitions are open to anyone. They are a great way to learn new skills and techniques, and they are a great way to meet other data scientists.

Public competitions are a great way to get started with Kaggle competitions. They are a structured way to learn new skills and techniques, and they are a great way to meet other data scientists.

Public competitions are a great way to get started with Kaggle competitions. They are a structured way to learn new skills and techniques, and they are a great way to meet other data scientists.

Public competitions are a great way to get started with Kaggle competitions. They are a structured way to learn new skills and techniques, and they are a great way to meet other data scientists.

Public competitions are a great way to get started with Kaggle competitions. They are a structured way to learn new skills and techniques, and they are a great way to meet other data scientists.

Public competitions are a great way to get started with Kaggle competitions. They are a structured way to learn new skills and techniques, and they are a great way to meet other data scientists.

Public competitions are a great way to get started with Kaggle competitions. They are a structured way to learn new skills and techniques, and they are a great way to meet other data scientists.

Public competitions are a great way to get started with Kaggle competitions. They are a structured way to learn new skills and techniques, and they are a great way to meet other data scientists.

Public competitions are a great way to get started with Kaggle competitions. They are a structured way to learn new skills and techniques, and they are a great way to meet other data scientists.

Public competitions are a great way to get started with Kaggle competitions. They are a structured way to learn new skills and techniques, and they are a great way to meet other data scientists.

Public competitions are a great way to get started with Kaggle competitions. They are a structured way to learn new skills and techniques, and they are a great way to meet other data scientists.

Public competitions are

## Sample 2

In [17]:
row = df.iloc[15]
gemma_qa.query(row.Category,row.Question)



**<font color='blue'>Category:</font>**
kaggle-tpu

**<font color='red'>Question:</font>**
How to load and save model on TPU?

**<font color='green'>Answer:</font>**
You can save and load models on TPU using the same syntax as on CPU.

## Saving a model

To save a model on TPU, use the `save_weights` method of the model object. This method takes a `tf.io.TFRecord` object as an argument.

```python
# Load the model from disk
model = tf.keras.models.load_model("saved_model.h5")

# Save the model to disk
model.save_weights("saved_model.h5")
```

## Loading a model

To load a model on TPU, use the `model_from_json` and `load_weights` methods of the model object.

```python
# Load the model from disk
model_json = tf.io.read_file("saved_model.json")
model_json = tf.io.gfile.GFile(model_json, "rb").read()
model = tf.keras.models.model_from_json(model_json)

# Load the weights from disk
model.load_weights("saved_model.h5")
```

## Saving and loading a model with multiple layers

If you have a model with multiple layers, you can save and load the model using the same syntax.

```python
# Load the model from disk
model = tf.keras.models.load_model("saved_model.h5")

# Save the model to disk
model.save_weights("saved_model.h5")
```

## Saving and loading a model with multiple models

If you have a model with multiple models, you can save and load the model using the same syntax.

```python
# Load the model from disk
model_1 = tf.keras.models.load_model("saved_model_1.h5")
model_2 = tf.keras.models.load_model("saved_model_2.h5")

# Save the model to disk
model_1.save_weights("saved_model.h5")
model_2.save_weights("saved_model.h5")
```

## Saving and loading a model with multiple models and layers

If you have a model with multiple models and multiple layers, you can save

## Sample 3

In [18]:
row = df.iloc[25]
gemma_qa.query(row.Category,row.Question)



**<font color='blue'>Category:</font>**
kaggle-noteboook

**<font color='red'>Question:</font>**
What are the different types of notebooks available on Kaggle?

**<font color='green'>Answer:</font>**
Kaggle Notebooks are a powerful tool for data science. They allow you to run code and share interactive visualizations in a collaborative, version-controlled environment.

There are two types of Notebooks available on Kaggle:

- **Public Notebooks**: Notebooks that are publicly accessible. You can view and run the code and visualizations in the notebook.
- **Private Notebooks**: Notebooks that are only accessible to you. You can view and run the code and visualizations in the notebook, but other users cannot view or run it.

You can create a new Notebook of either type by clicking on the “New Notebook” button in the top right corner of the Kaggle platform.

You can also find and explore a wide variety of public and private Notebooks in the “Explore Notebooks” tab of the Kaggle platform.

## Public Notebooks

Public Notebooks are the most common type of Notebook on Kaggle. They are a great way to share your work with the community and learn new techniques from other data scientists.

Public Notebooks are accessible to anyone with a Kaggle account.

## Private Notebooks

Private Notebooks are a great way to work on sensitive or confidential projects. They are only accessible to you and cannot be viewed or run by anyone else.

Private Notebooks are accessible only to you.

## Collaborators

Public and Private Notebooks can be shared with collaborators. Collaborators can view and run the code and visualizations in the notebook, but they cannot edit or delete it.

You can invite collaborators by clicking on the “Invite Collaborators” button in the top right corner of the Notebook editor. You can also add collaborators from your contact list or from a list of Kaggle users.

## Sharing Notebooks

You can share a Public or Private Notebook with anyone by clicking on the “Share Notebook” button in the top right corner of the Notebook editor. You can also copy the link to the Notebook and share it directly.

## Creating a New Notebook

You can create a new Notebook of either type by clicking on the “New Notebook” button in the top right corner of the Kaggle platform.

## Notebook Editor

The Notebook editor is where you write and run your code and visualizations. It is a collaborative, version-controlled environment that allows you to work with others on your projects.

The Notebook editor is available in two modes:

- **Code Mode**: Code Mode is the default

## Not seen question(s)

In [19]:
category = "notebook"
question = "How to run a notebook?"
gemma_qa.query(category,question)



**<font color='blue'>Category:</font>**
kaggle-notebook

**<font color='red'>Question:</font>**
How to run a notebook?

**<font color='green'>Answer:</font>**
You can run a notebook in two ways:

1. From the Notebook Editor
2. From the Command Line

## Notebook Editor

You can run a notebook from the Notebook Editor by clicking on the "Run" button in the top right corner of the editor. This will execute the code in the notebook in a new tab.

If you have a large notebook with a lot of code, you may want to break it up into smaller chunks and run each chunk separately. To do this, you can click on the "Run" button in the top right corner of the editor for each chunk of code you want to run.

## Command Line

You can also run a notebook from the command line. To do this, open a terminal and navigate to the directory where your notebook is located. Then, run the command:

```bash
kaggle run notebook-name.ipynb
```

This will open a new tab in your browser and allow you to run the notebook from the command line.

If you encounter any errors or need to debug your notebook, you can also use the command line to interact with the notebook editor. For example, you can open a new tab in your browser and navigate to the notebook editor by running the command:

```bash
kaggle editor notebook-name.ipynb
```

This will open a new tab in your browser and allow you to edit and run your notebook from the command line.

## Running a Notebook in Parallel

If you have a large dataset or a lot of code, you may want to run your notebook in parallel. This will allow you to run multiple chunks of code at the same time, which can speed up the execution of your notebook.

To run a notebook in parallel, you can use the `kaggle run --parallel` command line flag. This will open a new tab in your browser and allow you to run the notebook from the command line.

## Running a Notebook in a Docker Container

If you are using a GPU, you can also run your notebook in a Docker container. This will allow you to run your notebook in a virtual machine with all the necessary hardware and software installed.

To run a notebook in a Docker container, you can use the `kaggle run --docker` command line flag. This will open a new tab in your browser and allow you to run the notebook from the command line.

##

In [20]:
category = "discussions"
question = "How to create a discussion topic?"
gemma_qa.query(category,question)



**<font color='blue'>Category:</font>**
kaggle-discussions

**<font color='red'>Question:</font>**
How to create a discussion topic?

**<font color='green'>Answer:</font>**
Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

## Creating a new discussion topic

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions are a great way to ask questions, share ideas, and get feedback on your work.

Discussions

In [21]:
category = "competitions"
question = "What is a code competition?"
gemma_qa.query(category,question)



**<font color='blue'>Category:</font>**
kaggle-competitions

**<font color='red'>Question:</font>**
What is a code competition?

**<font color='green'>Answer:</font>**
Code competitions are a great way to practice your coding skills. They are also a great way to learn new skills and techniques.

There are many different types of code competitions, but they all share some common features. First, they are usually held online. Second, they are usually timed. Third, they are usually scored based on how well you solve the problems.

Some code competitions are open to anyone, while others are invite-only for a select group of users. If you are interested in participating in a code competition, be sure to check out the rules and requirements for the competition you are interested in.

Once you have registered for the competition, you will be given a set of problems to solve within a certain amount of time. The problems will be scored based on how well you solve them.

Some code competitions also have a leaderboard where you can see how you compare to other participants. This can be a great motivator to keep you going until you have solved all the problems.

Code competitions are a great way to practice your coding skills and learn new techniques. If you are interested in participating, be sure to check out the different types of code competitions available.

In [22]:
category = "datasets"
question = "What are the steps to create a Kaggle dataset?"
gemma_qa.query(category,question)



**<font color='blue'>Category:</font>**
kaggle-datasets

**<font color='red'>Question:</font>**
What are the steps to create a Kaggle dataset?

**<font color='green'>Answer:</font>**
To create a Kaggle dataset, follow these steps:

1. Navigate to the Datasets page.
2. Click on the “Create Dataset” button.
3. Fill out the form with the following information:

- **Name**: The name of your dataset.
- **Description**: A description of your dataset.
- **License**: The license you want to apply to your dataset.
- **Tags**: A list of tags that describe your dataset.
- **Privacy**: The privacy setting you want to apply to your dataset.
- **Content**: The content of your dataset.

1. Click on the “Create Dataset” button.
2. Your dataset will be created and you will be redirected to the “Overview” page.
3. From here, you can edit your dataset at any time.

You can also create a dataset from scratch by clicking on the “Create Dataset” button on the Datasets page. This will take you through the same steps as above.

If you want to create a dataset from a CSV file, you can do so by clicking on the “Upload CSV” button on the Overview page of your dataset. This will open a modal where you can select the CSV file you want to upload.

If you want to create a dataset from a URL, you can do so by clicking on the “Create Dataset from URL” button on the Datasets page. This will open a modal where you can enter the URL of the dataset you want to create a copy of.

If you want to create a dataset from a notebook, you can do so by clicking on the “Create Dataset from Notebook” button on the Datasets page. This will open a modal where you can select the notebook you want to create a dataset from.

If you want to create a dataset from a private repository, you can do so by clicking on the “Create Dataset from Repository” button on the Datasets page. This will open a modal where you can select the repository you want to create a dataset from.

If you want to create a dataset from a private repository, you can do so by clicking on the “Create Dataset from Repository” button on the Datasets page. This will open a modal where you can select the repository you want to create a dataset from.

If you want to create a dataset from a private repository, you can do so by clicking on the

# Save the model

In [23]:
gemma_causal_lm.save("gemma2_2b_en_kaggle_docs.keras")

# Conclusions



We demonstated how to fine-tune a **Gemma 2** model using LoRA.   
We also created a class to run queries to the **Gemma 2** model and tested it with some examples from the existing training data but also with some new, not seen questions.   
We also saved the models as a Keras model.