<center><h1>Fine-tunning Gemma model with Kaggle Docs data</h1></center>

<center><img src="https://res.infoq.com/news/2024/02/google-gemma-open-model/en/headerimage/generatedHeaderImage-1708977571481.jpg" width="400"></center>


# Introduction

This notebook will demonstrate three things:

1. How to fine-tune Gemma model using LoRA
2. Creation of a specialised class to query about Kaggle features
3. Some results of querying about Kaggle Docs

This work is largely based on previous work. Here I list the sources:

1. Gemma Model Card, Kaggle Models, https://www.kaggle.com/models/google/gemma
2. Kaggle QA with Gemma - KerasNLP Starter, Kaggle Code, https://www.kaggle.com/code/awsaf49/kaggle-qa-with-gemma-kerasnlp-starter (Version 11)  
3. Fine-tune Gemma models in Keras using LoRA, Kaggle Code, https://www.kaggle.com/code/nilaychauhan/fine-tune-gemma-models-in-keras-using-lora (Version 1)  
4. Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, LoRA: Low-Rank Adaptation of Large Language Models, ArXiv, https://arxiv.org/pdf/2106.09685.pdf
5. Abheesht Sharma, Matthew Watson, Parameter-efficient fine-tuning of GPT-2 with LoRA, https://keras.io/examples/nlp/parameter_efficient_finetuning_of_gpt2_with_lora/
6. Keras 3 API documentation / KerasNLP / Models / Gemma, https://keras.io/api/keras_nlp/models/gemma/
7. Kaggle Docs, Kaggle Dataset, https://www.kaggle.com/datasets/awsaf49/kaggle-docs  
8. TPUs in Keras, Kaggle Docs, https://www.kaggle.com/docs/tpu  

**Let's go**!


# What is Gemma?


Gemma is a collection of lightweight source generative AI models designed to be used mostly by developers and researchers. Created by Google DeepMind research lab that also developed Gemini, Gemma is available in several versions, with 2B and 7B parameters, as following:


| Model                  | Parameters      | Tuned versions    | Description                                    | Recomemnded target platforms       |
|------------------------|-----------------|-------------------|------------------------------------------------|------------------------------------|
| `gemma_2b_en`          | 2.51B           | Pretrained        | 18-layer Gemma model (Gemma with 2B parameters)|Mobile devices and laptops          |
| `gemma_instruct_2b_en` | 2.51B           | Instruction tuned | 18-layer Gemma model (Gemma with 2B parameters)| Mobile devices and laptops         | 
| `gemma_7b_en`          | 8.54B           | Pretrained        | 28-layer Gemma model (Gemma with 7B parameters)| Desktop computers and small servers|
| `gemma_instruct_7b_en` | 8.54B           | Instruction tuned | 28-layer Gemma model (Gemma with 7B parameters)| Desktop computers and small servers|




# What is LoRA?  

LoRA stands for Low-Rank Adaptation. It is a method used to fine-tune large language models (LLMs) by freezing the weights of the LLM and injecting trainable rank-decomposition matrices. The number of trainable parameters during fine-tunning will decrease therefore considerably. According to LoRA paper, this number decreases 10,000 times, and the computational resources size decreases 3 times. 

# How we proceed?

For fine-tunning with LoRA, we will follow the steps:

1. Install prerequisites
2. Load and process the data for fine-tuning
3. Initialize the code for Gemma causal language model (Gemma Causal LM)
4. Perform fine-tuning
5. Test the fine-tunned model with questions from the data used for fine-tuning and with aditional questions

# Prerequisites


## Install packages

In [1]:
# Install Keras 3 last. See https://keras.io/getting_started/ for more details.
!pip install -q -U keras-nlp
!pip install -q -U keras>=3

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-decision-forests 1.8.1 requires wurlitzer, which is not installed.[0m[31m
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-decision-forests 1.8.1 requires wurlitzer, which is not installed.
tensorflow 2.15.0 requires keras<2.16,>=2.15.0, but you have keras 3.2.0 which is incompatible.[0m[31m
[0m

## Import packages

In [2]:
import os
os.environ["KERAS_BACKEND"] = "jax" # you can also use tensorflow or torch
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"] = "1.00" # avoid memory fragmentation on JAX backend.
os.environ["JAX_PLATFORMS"] = ""
import keras
import keras_nlp

import numpy as np
import pandas as pd
from tqdm.notebook import tqdm
tqdm.pandas() # progress bar for pandas

import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display, Markdown

2024-04-10 11:02:37.753587: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-10 11:02:37.753684: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-10 11:02:37.877928: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


## Configurations

In [3]:
class Config:
    seed = 42
    dataset_path = "/kaggle/input/kaggle-docs/questions_answers"
    preset = "gemma_2b_en" # name of pretrained Gemma
    sequence_length = 512 # max size of input sequence for training
    batch_size = 1 # size of the input batch in training, x 2 as two GPUs
    epochs = 15 # number of epochs to train

Initialize the TPU.

In [4]:
keras.utils.set_random_seed(Config.seed)

# Load the data

In [5]:
df = pd.read_csv(f"{Config.dataset_path}/data.csv")
df.head()

Unnamed: 0,Question,Answer,Category
0,What are the different types of competitions a...,# Types of Competitions\n\nKaggle Competitions...,competition
1,What are the different competition formats on ...,There are handful of different formats competi...,competition
2,How to join a competition?,"Before you start, navigate to the [Competition...",competition
3,"How to form, manage, and disband teams in a co...",Everyone that competes in a Competition does s...,competition
4,How do I make a submission in a competition?,You will need to submit your model predictions...,competition


Let's check the total number of rows in this dataset.

In [6]:
df.shape[0]

60

For easiness, we will create the following template for QA: 

In [7]:
template = "\n\nCategory:\nkaggle-{Category}\n\nQuestion:\n{Question}\n\nAnswer:\n{Answer}"
df["prompt"] = df.apply(lambda row: template.format(Category=row.Category,
                                                             Question=row.Question,
                                                             Answer=row.Answer), axis=1)
data = df.prompt.tolist()

## Template utility function

In [8]:
def colorize_text(text):
    for word, color in zip(["Category", "Question", "Answer"], ["blue", "red", "green"]):
        text = text.replace(f"\n\n{word}:", f"\n\n**<font color='{color}'>{word}:</font>**")
    return text

# Specialized class to query Gemma


We define a specialized class to query Gemma.

## Initialize the code for Gemma Causal LM

In [9]:
gemma_causal_lm = keras_nlp.models.GemmaCausalLM.from_preset("gemma_2b_en")
gemma_causal_lm.summary()

Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Kaggle notebook...
Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Kaggle notebook...
Attaching 'model.weights.h5' from model 'keras/gemma/keras/gemma_2b_en/2' to your Kaggle notebook...
Attaching 'tokenizer.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Kaggle notebook...
Attaching 'assets/tokenizer/vocabulary.spm' from model 'keras/gemma/keras/gemma_2b_en/2' to your Kaggle notebook...
normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.


## Define the specialized class

In [10]:
class GemmaQA:
    def __init__(self, max_length=512):
        self.max_length = max_length
        self.prompt = template
        self.gemma_causal_lm = gemma_causal_lm
        
    def query(self, category, question):
        response = self.gemma_causal_lm.generate(
            self.prompt.format(
                Category=category,
                Question=question,
                Answer=""), 
            max_length=self.max_length)
        display(Markdown(colorize_text(response)))
        

## Gemma preprocessor


This preprocessing layer will take in batches of strings, and return outputs in a ```(x, y, sample_weight)``` format, where the y label is the next token id in the x sequence.

From the code below, we can see that, after the preprocessor, the data shape is ```(num_samples, sequence_length)```.

In [11]:
x, y, sample_weight = gemma_causal_lm.preprocessor(data[0:2])

In [12]:
print(x, y)

{'token_ids': Array([[   2,  109, 8606, ...,    0,    0,    0],
       [   2,  109, 8606, ...,    0,    0,    0]], dtype=int32), 'padding_mask': Array([[ True,  True,  True, ..., False, False, False],
       [ True,  True,  True, ..., False, False, False]], dtype=bool)} [[   109   8606 235292 ...      0      0      0]
 [   109   8606 235292 ...      0      0      0]]


# Perform fine-tuning with LoRA

## Enable LoRA for the model

LoRA rank is setting the number of trainable parameters. A larger rank will result in a larger number of parameters to train.

In [13]:
# Enable LoRA for the model and set the LoRA rank to 4.
gemma_causal_lm.backbone.enable_lora(rank=4)
gemma_causal_lm.summary()

## Run the training sequence

In [14]:
gemma_causal_lm.preprocessor.sequence_length = Config.sequence_length 

# Compile the model with loss, optimizer, and metric
gemma_causal_lm.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(learning_rate=8e-5),
    weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],
)

# Train model
gemma_causal_lm.fit(data, epochs=Config.epochs, batch_size=Config.batch_size)

Epoch 1/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m66s[0m 736ms/step - loss: 1.7209 - sparse_categorical_accuracy: 0.5241
Epoch 2/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 729ms/step - loss: 1.6869 - sparse_categorical_accuracy: 0.5313
Epoch 3/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 729ms/step - loss: 1.6175 - sparse_categorical_accuracy: 0.5417
Epoch 4/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 729ms/step - loss: 1.5770 - sparse_categorical_accuracy: 0.5509
Epoch 5/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 729ms/step - loss: 1.5537 - sparse_categorical_accuracy: 0.5552
Epoch 6/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 1s/step - loss: 1.5304 - sparse_categorical_accuracy: 0.5568
Epoch 7/15
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 729ms/step - loss: 1.5028 - sparse_categorical_accuracy: 0.5630
Epoch 8/15
[1m60/60[0

<keras.src.callbacks.history.History at 0x7eca14080970>

# Test the fine-tuned model

In [15]:
gemma_qa = GemmaQA()

## Sample 1

In [16]:
row = df.iloc[0]
gemma_qa.query(row.Category,row.Question)



**<font color='blue'>Category:</font>**
kaggle-competition

**<font color='red'>Question:</font>**
What are the different types of competitions available on Kaggle?

**<font color='green'>Answer:</font>**
## Types of Competitions

There are two main types of competitions on Kaggle:

- **Data science challenges**: These are typically data science challenges where the goal is to build a model that outperforms a provided baseline. Examples include building a model to predict the likelihood of a fraudulent transaction or building a model to predict the likelihood of a heart attack.

- **Classification challenges**: These are typically classification challenges where the goal is to build a model that outperforms a provided baseline. Examples include building a model to classify emails as spam or ham or building a model to classify images as cats or dogs.

## How Are the Kaggle Competitions Ranked?

The rankings on Kaggle are based on a combination of two metrics:

- **Submissions**: The total number of submissions made by a user.

- **Leaderboard Rank**: The rank of a user on the leaderboard of a given competition.

The leaderboard rank is calculated by taking the median rank of the top 100 users on the leaderboard. The median rank is calculated by first sorting the leaderboard by leaderboard rank and then taking the middle value of the leaderboard.

The combination of these two metrics is designed to encourage users to make high-quality submissions and to reward users who are consistently at the top of the leaderboard.

## Sample 2

In [17]:
row = df.iloc[15]
gemma_qa.query(row.Category,row.Question)



**<font color='blue'>Category:</font>**
kaggle-tpu

**<font color='red'>Question:</font>**
How to load and save model on TPU?

**<font color='green'>Answer:</font>**
## Overview

TPU models are saved in SavedModel format. To load a model from SavedModel, you can use the `tpu.Model.from_saved_model` method.

To save a model, you can use the `tpu.Model.save_main_session()` method.

## Example

```python
# Load a model from SavedModel
model = tpu.Model.from_saved_model(tpu.saved_model_fn)
```

## TPU Model Format

TPU models are saved in SavedModel format. To load a model from SavedModel, you can use the tpu.Model.from_saved_model method.

To save a model, you can use the tpu.Model.save_main_session() method.

## TPU Model Format Overview

The SavedModel format is a model archive that contains all the information needed to run a model on TPU. It is a zip archive with the following structure:

- `model.pb` is the TensorFlow graph that describes the model.
- `checkpoint.index` is a binary file that contains the metadata for the checkpoints in the model.
- `config.json` is a JSON file that contains the configuration for the model.
- `variables.add_file_io_ops.json` is a JSON file that contains the metadata for the file I/O ops in the model.
- `variables.add_file_io_ops.pb` is a binary file that contains the code for the file I/O ops in the model.
- `variables.add_op_for_each_checkpoint.index` is a binary file that contains the metadata for the checkpoints in the model.
- `variables.add_op_for_each_checkpoint.json` is a JSON file that contains the metadata for the checkpoints in the model.
- `variables.add_op_for_each_checkpoint.pb` is a binary file that contains the code for the checkpoints in the model.
- `variables.add_op_for_each_checkpoint.summary.tsv` is a tab-separated-values file that contains the summary of the checkpoints in the model.
- `variables.add_op_for_each_checkpoint.summary.tsv` is a tab-separated-values

## Sample 3

In [18]:
row = df.iloc[25]
gemma_qa.query(row.Category,row.Question)



**<font color='blue'>Category:</font>**
kaggle-noteboook

**<font color='red'>Question:</font>**
What are the different types of notebooks available on Kaggle?

**<font color='green'>Answer:</font>**
## Notebooks

Kaggle Notebooks are interactive markdown documents that allow users to run code and see the results. Notebooks are a key part of the Kaggle experience, and are used by the community to share data science workflows, discuss techniques, and explore new ideas.

Kaggle Notebooks are hosted on nbviewer.com. You can access them from any web browser, or from the Kaggle mobile app.

You can create a Notebook from the Kaggle website or from the command line using the kaggle notebook command.

## Versions

Each Notebook has multiple versions. Each version is a snapshot of the Notebook at a specific point in time. You can view and compare the versions of a Notebook to see how it has changed over time.

You can access the version history of a Notebook from the Notebook editor.

You can create a new version of a Notebook from the Notebook editor.

You can delete a Notebook version from the Notebook editor.

## Versions and Sharing

Each Notebook version has a URL that looks like this:

```
https://www.kaggle.com/kylegordon/kaggle-noteboook-versions?version=1666666
```

You can share a Notebook version URL with others, and they will be prompted to view the Notebook as a guest.

You can also create a Notebook version from a URL that someone else has shared with you.

## Versions and Permissions

Each Notebook version has a URL that looks like this:

```
https://www.kaggle.com/kylegordon/kaggle-noteboook-versions?version=1666666
```

You can view and compare the versions of a Notebook to see how it has changed over time.

You can access the version history of a Notebook from the Notebook editor.

You can create a new version of a Notebook from the Notebook editor.

You can delete a Notebook version from the Notebook editor.

You can also delete a Notebook version from the Versions tab of the Notebook editor.

## Versions and Privacy

Each Notebook version has a URL that looks like this:

```
https://www.kaggle.com/kylegordon/kaggle-noteboook-versions?version=1666666
```

You can share a Notebook version URL with others

## Not seen question(s)

In [19]:
category = "notebook"
question = "How to run a notebook?"
gemma_qa.query(category,question)



**<font color='blue'>Category:</font>**
kaggle-notebook

**<font color='red'>Question:</font>**
How to run a notebook?

**<font color='green'>Answer:</font>**
## Installation

You can run notebooks on Kaggle without installing anything, but if you want to make use of the community-built add-ons marketplace or you have a team of users that you want to collaborate with on a notebook, then you should consider creating a notebook on Kaggle.

To install Kaggle and Notebook features in your own conda environment, run:

```
kaggle datasets login-or-create <USERNAME>
kaggle kernels --help
```

For a full list of command line arguments, run `kaggle kernels help <command` from the Kaggle command line.

## Run Locally

If you are running locally, you can run the notebook locally by running the following command from the notebook directory:

```python
!kaggle run --kernels /kaggle/notebooks/tutorial-saving-and-loading-models: Cells 1-X
```

The command will run the notebook from the first cell to the most recently executed cell. If you want to run a specific cell, you can specify the cell number after the `--kernels` flag.

If you want to run the notebook from scratch, you can run the notebook locally by running the following command from the notebook directory:

```python
kaggle run --kernels <USERNAME>/notebooks/tutorial-saving-and-loading-models:latest
```

The command will run the notebook from the first cell to the most recently executed cell.

If you want to run a specific cell, you can specify the cell number after the `--kernels` flag.

If you want to run the notebook from scratch, you can run the notebook locally by running the following command from the notebook directory:

```python
kaggle run --kernels <USERNAME>/notebooks/tutorial-saving-and-loading-models:latest
```

## Run on Kaggle

If you are running on Kaggle, you can run the notebook on Kaggle by clicking the “Run Notebook” button in the top right corner of the notebook editor.

If you are running locally, you can upload the notebook to Kaggle and run the notebook locally by running the following command from the notebook directory:

```python
kaggle runnotebooks <USERNAME>/notebooks/tutorial-saving-and-loading-models:latest
```

If you are running on Kaggle, you can run the notebook

In [20]:
category = "discussions"
question = "How to create a discussion topic?"
gemma_qa.query(category,question)



**<font color='blue'>Category:</font>**
kaggle-discussions

**<font color='red'>Question:</font>**
How to create a discussion topic?

**<font color='green'>Answer:</font>**
## Creating a Discussion

To create a new discussion topic, click on the "Discussions" tab and click on the "New Discussion" button.

You can create a new discussion by either uploading a dataset or by pasting a URL. If you upload a dataset, the discussion will be associated with the dataset. If you paste a URL, the discussion will be associated with the notebook that URL points to.

If you upload a dataset, the discussion will be associated with the dataset. If you paste a URL, the discussion will be associated with the notebook that URL points to.

If you are creating a discussion associated with a dataset, the discussion will be public by default. If you are creating a discussion associated with a notebook, the discussion will be private by default.

If you want to make the discussion public, click on the "Privacy" dropdown and select "Public".

If you want to make the discussion private, click on the "Privacy" dropdown and select "Private".

If you want to make the discussion a community discussion, click on the "Privacy" dropdown and select "Community".

If you want to make the discussion a team discussion, click on the "Team" dropdown and select "Team".

If you want to make the discussion a user-to-user conversation, click on the "Conversation" dropdown and select "User to User".

If you want to make the discussion a notebook discussion, click on the "Notebook" dropdown and select "Notebook".

If you want to create a discussion associated with a dataset, click on the "Dataset" dropdown and select the dataset you want to associate the discussion with.

If you want to create a discussion associated with a notebook, click on the "Notebook" dropdown and select the notebook you want to associate the discussion with.

If you are creating a discussion associated with a dataset, you will be presented with a list of all the datasets associated with the selected folder.

If you are creating a discussion associated with a notebook, you will be presented with a list of all the notebooks associated with the selected folder.

If you are creating a discussion associated with a dataset, you will be presented with a list of all the users who have access to the dataset.

If you are creating a discussion associated with a notebook, you will be presented with a list of all the users who have access to the notebook.

If you are creating

In [21]:
category = "competitions"
question = "What is a code competition?"
gemma_qa.query(category,question)



**<font color='blue'>Category:</font>**
kaggle-competitions

**<font color='red'>Question:</font>**
What is a code competition?

**<font color='green'>Answer:</font>**
Code competitions are a type of competition where participants are given a dataset and a set of tools to build machine learning models. The goal is to build the best model for a given task. Code competitions are a popular format for Kaggle Datasets because they encourage participants to use the tools and techniques available on Kaggle.

Code competitions are a great way to learn new tools and techniques, and to see how they can be applied to a real-world problem. They also provide a structured way to evaluate and improve your model, as you'll be competing against other participants who are using the same dataset and tools.

There are many benefits to participating in code competitions. First, they're a great way to learn how to use a dataset and a task to drive model development. Second, they're a great way to improve your model's performance. Third, they're a great way to meet other data scientists and machine learning enthusiasts. Fourth, they're a lot of fun!

## Types of Code Competitions

There are two main types of code competitions on Kaggle: public and private.

Public competitions are open to anyone and everyone. They're a great way to see what other people are working on in the community, and to learn from others' experiences. Public competitions are a great way to get exposure and recognition for your work.

Private competitions are invitation-only and typically invitees are part of a larger community or organization. They're a great way to collaborate with others who share your interest in the same topic. Private competitions are a great way to get feedback on your work from experts in the field.

## How to Participate in a Code Competition

To participate in a code competition, you'll need to create a competition notebook. This notebook will contain all of the code you use to build your model. You can then submit your notebook to the competition.

The first step is to find a competition that interests you. You can browse all of the current competitions on Kaggle Datasets by visiting the [code competitions page](https://www.kaggle.com/competitions/code/overview).

Once you've found a competition you'd like to enter, click on the "Participate" button. This will take you to a page where you can sign up for the competition.

The next step is to familiarize yourself with the dataset. This is the collection

In [22]:
category = "datasets"
question = "What are the steps to create a Kaggle dataset?"
gemma_qa.query(category,question)



**<font color='blue'>Category:</font>**
kaggle-datasets

**<font color='red'>Question:</font>**
What are the steps to create a Kaggle dataset?

**<font color='green'>Answer:</font>**
## Introduction

Kaggle Datasets is a repository of public datasets that are created by members of the Kaggle community. Datasets are used by the Kaggle community for a variety of purposes, including training machine learning models, evaluating models, and sharing data.

Kaggle Datasets is a great resource for anyone working with data.

## Getting Started

To get started creating a new dataset, navigate to the "Create Dataset" button in the left-hand navigation bar.

You can also access the "Create Dataset" menu from your profile dropdown.

You will be prompted to enter a name for your dataset, a description, and a license.

## Dataset Metadata

The metadata is the information about your dataset that will be displayed in the search results.

The metadata is what people will see when they search for your dataset.

## Metadata Fields

The following fields are required:

- **Name**: The name of your dataset. This is what people will see when they search for your dataset.
- **Description**: A short description of your dataset. This description will be displayed in the search results.
- **Tags**: Tags are keywords that describe your dataset. You can add as many as you want.
- **License**: The license of your dataset.
- **Data files**: The files that make up your dataset.
- **Acknowledgements**: A list of people or organizations that contributed to the creation of your dataset.
- **Metadata fields**: A list of the fields in your dataset.
- **Schema**: A diagram of the structure of your dataset.
- **Acknowledgements**: A list of people or organizations that contributed to the creation of your dataset.
- **Metadata fields**: A list of the fields in your dataset.
- **Schema**: A diagram of the structure of your dataset.
- **Data files**: The files that make up your dataset.
- **Versions**: The versions of your dataset that are currently published.

## Metadata Best Practices

- **Name**: Make sure the name of your dataset is descriptive and easy to remember.
- **Description**: Write a concise description of your dataset that includes the purpose, format, and size of the data.
- **Tags**: Use keywords that are relevant to your dataset.
- **License**: Choose a license that is appropriate for your dataset.
- **Data files**: Upload a copy

# Conclusions



We demonstated how to fine-tune a Gemma model using LoRA.   
We also created a class to run queries to the Gemma model and tested it with some examples from the existing training data but also with some new, not seen questions.