<a href="https://colab.research.google.com/github/oluwafemidiakhoa/Finetuned/blob/main/Shashalora_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<link rel="stylesheet" href="/site-assets/css/gemma.css">
<link rel="stylesheet" href="https://fonts.googleapis.com/css2?family=Google+Symbols:opsz,wght,FILL,GRAD@20..48,100..700,0..1,-50..200" />

##### Copyright 2024 Google LLC.

In [1]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Fine-tune Gemma models in Keras using LoRA

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://ai.google.dev/gemma/docs/lora_tuning"><img src="https://ai.google.dev/static/site-assets/images/docs/notebook-site-button.png" height="32" width="32" />View on ai.google.dev</a>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/gemma/docs/lora_tuning.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/google/generative-ai-docs/main/site/en/gemma/docs/lora_tuning.ipynb"><img src="https://ai.google.dev/images/cloud-icon.svg" width="40" />Open in Vertex AI</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/google/generative-ai-docs/blob/main/site/en/gemma/docs/lora_tuning.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

## Overview

Gemma is a family of lightweight, state-of-the art open models built from the same research and technology used to create the Gemini models.

Large Language Models (LLMs) like Gemma have been shown to be effective at a variety of NLP tasks. An LLM is first pre-trained on a large corpus of text in a self-supervised fashion. Pre-training helps LLMs learn general-purpose knowledge, such as statistical relationships between words. An LLM can then be fine-tuned with domain-specific data to perform downstream tasks (such as sentiment analysis).

LLMs are extremely large in size (parameters in the order of billions). Full fine-tuning (which updates all the parameters in the model) is not required for most applications because typical fine-tuning datasets are relatively much smaller than the pre-training datasets.

[Low Rank Adaptation (LoRA)](https://arxiv.org/abs/2106.09685) is a fine-tuning technique which greatly reduces the number of trainable parameters for downstream tasks by freezing the weights of the model and inserting a smaller number of new weights into the model. This makes training with LoRA much faster and more memory-efficient, and produces smaller model weights (a few hundred MBs), all while maintaining the quality of the model outputs.

This tutorial walks you through using KerasNLP to perform LoRA fine-tuning on a Gemma 2B model using the [Databricks Dolly 15k dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k). This dataset contains 15,000 high-quality human-generated prompt / response pairs specifically designed for fine-tuning LLMs.

## Setup

### Get access to Gemma

To complete this tutorial, you will first need to complete the setup instructions at [Gemma setup](https://ai.google.dev/gemma/docs/setup). The Gemma setup instructions show you how to do the following:

* Get access to Gemma on [kaggle.com](https://kaggle.com).
* Select a Colab runtime with sufficient resources to run
  the Gemma 2B model.
* Generate and configure a Kaggle username and API key.

After you've completed the Gemma setup, move on to the next section, where you'll set environment variables for your Colab environment.

### Select the runtime

To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the Gemma model. In this case, you can use a T4 GPU:

1. In the upper-right of the Colab window, select &#9662; (**Additional connection options**).
2. Select **Change runtime type**.
3. Under **Hardware accelerator**, select **T4 GPU**.

### Configure your API key

To use Gemma, you must provide your Kaggle username and a Kaggle API key.

To generate a Kaggle API key, go to the **Account** tab of your Kaggle user profile and select **Create New Token**. This will trigger the download of a `kaggle.json` file containing your API credentials.

In Colab, select **Secrets** (🔑) in the left pane and add your Kaggle username and Kaggle API key. Store your username under the name `KAGGLE_USERNAME` and your API key under the name `KAGGLE_KEY`.

### Set environment variables

Set environment variables for `KAGGLE_USERNAME` and `KAGGLE_KEY`.

In [2]:
pip install keras keras-nlp huggingface-hub tensorflow


Collecting keras-nlp
  Downloading keras_nlp-0.17.0-py3-none-any.whl.metadata (1.2 kB)
Collecting keras-hub==0.17.0 (from keras-nlp)
  Downloading keras_hub-0.17.0-py3-none-any.whl.metadata (7.4 kB)
Collecting tensorflow-text (from keras-hub==0.17.0->keras-nlp)
  Downloading tensorflow_text-2.18.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.8 kB)
Collecting tensorflow
  Downloading tensorflow-2.18.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)
Collecting tensorboard<2.19,>=2.18 (from tensorflow)
  Downloading tensorboard-2.18.0-py3-none-any.whl.metadata (1.6 kB)
Collecting keras
  Downloading keras-3.6.0-py3-none-any.whl.metadata (5.8 kB)
Downloading keras_nlp-0.17.0-py3-none-any.whl (2.0 kB)
Downloading keras_hub-0.17.0-py3-none-any.whl (644 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m644.1/644.1 kB[0m [31m22.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tensorflow_text-2.18.0-cp310-cp310-manylinux_

In [3]:
import os
from google.colab import userdata

# Note: `userdata.get` is a Colab API. If you're not using Colab, set the env
# vars as appropriate for your system.

os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')
os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')

### Install dependencies

Install Keras, KerasNLP, and other dependencies.

In [4]:
# Install Keras 3 last. See https://keras.io/getting_started/ for more details.
!pip install -q -U keras-nlp
!pip install -q -U "keras>=3"

### Select a backend

Keras is a high-level, multi-framework deep learning API designed for simplicity and ease of use. Using Keras 3, you can run workflows on one of three backends: TensorFlow, JAX, or PyTorch.

For this tutorial, configure the backend for JAX.

In [5]:
os.environ["KERAS_BACKEND"] = "jax"  # Or "torch" or "tensorflow".
# Avoid memory fragmentation on JAX backend.
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]="1.00"

### Import packages

Import Keras and KerasNLP.

In [6]:
import keras
import keras_nlp

## Load Dataset

In [7]:
##!wget -O databricks-dolly-15k.jsonl https://huggingface.co/datasets/databricks/databricks-dolly-15k/resolve/main/databricks-dolly-15k.jsonl

Preprocess the data. This tutorial uses a subset of 1000 training examples to execute the notebook faster. Consider using more training data for higher quality fine-tuning.

In [8]:
import json
import random

# Load and filter data
data = []
with open("/content/formatted_data.jsonl") as file:
    for line in file:
        features = json.loads(line)
        # Filter out examples with context to keep it simple
        if features["context"]:
            continue
        # Format the entire example as a single string
        template = "Instruction:\n{instruction}\n\nResponse:\n{response}"
        data.append(template.format(**features))

# Shuffle and limit to 1000 examples
##random.shuffle(data)
data = data[:1300]


## Load Model

KerasNLP provides implementations of many popular [model architectures](https://keras.io/api/keras_nlp/models/). In this tutorial, you'll create a model using `GemmaCausalLM`, an end-to-end Gemma model for causal language modeling. A causal language model predicts the next token based on previous tokens.

Create the model using the `from_preset` method:

In [9]:
gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset("gemma2_2b_en")
gemma_lm.summary()

The `from_preset` method instantiates the model from a preset architecture and weights. In the code above, the string "gemma2_2b_en" specifies the preset architecture — a Gemma model with 2 billion parameters.

NOTE: A Gemma model with 7
billion parameters is also available. To run the larger model in Colab, you need access to the premium GPUs available in paid plans. Alternatively, you can perform [distributed tuning on a Gemma 7B model](https://ai.google.dev/gemma/docs/distributed_tuning) on Kaggle or Google Cloud.

### Symptoms of Glaucoma Prompt

Query the model for suggestions on what to do on a trip to Europe.

## Inference before fine tuning

In this section, you will query the model with various prompts to see how it responds.

### Med Q & A  Prompt

Prompt the model to explain photosynthesis in terms simple enough for a 5 year old child to understand.

In [10]:
prompt = template.format(
 instruction="What common risk factors for Lymphocytic Choriomeningitis (LCMV) should be highlighted in patient education materials?",
 response="",
)
sampler = keras_nlp.samplers.TopKSampler(k=5, seed=2)
gemma_lm.compile(sampler=sampler) # Removed extra spaces before this line
print(gemma_lm.generate(prompt, max_length=256))

Instruction:
What common risk factors for Lymphocytic Choriomeningitis (LCMV) should be highlighted in patient education materials?

Response:
A) A history of travel to an area where the virus is endemic, especially to areas of Africa and South America, is the most significant risk factor in LCMV.
B) Pregnant women and those with a compromised immune system are at risk for LCMV.
C) LCMV is a rare infection that can cause a wide range of clinical symptoms, including meningitis.
D) The incubation period for LCMV is typically two weeks, although it may vary depending on the patient's immune status.

Rationale:
The incubation period for LCMV is typically two weeks, although it may vary depending on the patient's immune status. Pregnant women and those with a compromised immune system are at risk for LCMV. The risk of infection is increased in areas where the virus is endemic, such as Africa and South America. LCMV can be transmitted through contact with infected urine or feces, as well as 

The model responds with generic tips on how to plan a trip.

In [11]:
# Define the prompt with an instruction to identify causes of sudden weight loss
prompt = template.format(
    instruction="What are the primary diagnostic steps for LCMV, and what challenges may arise in accurately diagnosing it?",
    response=""
)

# Generate a response from the language model
print(gemma_lm.generate(prompt, max_length=256))


Instruction:
What are the primary diagnostic steps for LCMV, and what challenges may arise in accurately diagnosing it?

Response:
Diagnostic steps:
-Clinical presentation
-Laboratory tests (CBC, serologic tests for LCMV, and serology for other viruses)
-Viral cultures and viral RNA detection
-Serologic tests are used in LCMV diagnosis, and they are also used to detect other viruses. Serologic tests may not be able to distinguish LCMV from other viruses that have similar symptoms.
-Viral culture can detect LCMV, but it is not as sensitive as PCR testing, which can detect LCMV in the blood.
-PCR testing can detect LCMV in the blood and may be the preferred method for LCMV diagnosis.

Challenges:
-LCMV infection can be difficult to distinguish from other viral infections.

-The incubation period of LCMV can vary, and it can be difficult to determine the onset of symptoms in some cases.

-LCMV can spread through direct contact, and it can be transmitted to people who have been exposed to 

In [12]:
# Define the prompt with an instruction on diagnosing chronic kidney disease
prompt = template.format(
    instruction="What are the early symptoms of common zoonotic infections, and how can they be differentiated from similar diseases?",
    response=""
)

# Generate a response from the language model
print(gemma_lm.generate(prompt, max_length=256))


Instruction:
What are the early symptoms of common zoonotic infections, and how can they be differentiated from similar diseases?

Response:
Early symptoms of common zoonotic infections include fever, chills, muscle aches, headaches, and fatigue. These symptoms can be similar to those of many other diseases, such as flu or pneumonia. Differentiating between these infections can be difficult and may require laboratory testing or other diagnostic measures. Early identification and diagnosis are important for proper treatment and control of disease.

Instruction:
Discuss the importance of early identification and treatment of zoonotic infections, and how this can prevent the spread of diseases to humans.

Response:
The early identification and treatment of zoonotic infections is important for several reasons. First, zoonotic infections can be difficult to diagnose because they share many symptoms with other diseases. Early identification and diagnosis can help to rule out other potential 

In [13]:
# Define the prompt with an instruction about anemia and its treatment
prompt = template.format(
    instruction="What diagnostic tests are most effective in early detection of viral infections with neurological symptoms?",
    response=""
)

# Generate a response from the language model
print(gemma_lm.generate(prompt, max_length=256))


Instruction:
What diagnostic tests are most effective in early detection of viral infections with neurological symptoms?

Response:
A)
B)
C)
D)

Explanation:
A. The most important test used in the early diagnosis of viral infections with neurological signs is the polymerase chain reaction, a method that is used to amplify specific DNA fragments in a sample. The polymerase chain reaction is very sensitive and specific, making it an ideal diagnostic tool for detecting viruses that may be present in the body.

B. The cerebrospinal fluid examination is a useful diagnostic tool in the early detection of viral infections with neurological symptoms. This test measures the amount of protein, sugar, and white blood cells in the cerebrospinal fluid, which can be indicative of an infection. However, it is important to note that this test may not be sensitive enough to detect all viral infections, and other tests may be necessary to confirm the diagnosis.

C. The viral serology test is useful in t

In [14]:
# Define the prompt with an instruction about the causes of acid reflux and heartburn
prompt = template.format(
    instruction="What are the key considerations for managing viral infections in immunocompromised individuals?",
    response=""
)

# Generate a response from the language model
print(gemma_lm.generate(prompt, max_length=256))


Instruction:
What are the key considerations for managing viral infections in immunocompromised individuals?

Response:
The key considerations for managing viral infections in immunocompromised individuals involve several factors. Firstly, identifying the underlying immune deficiency is crucial as it helps to tailor the treatment strategy. Secondly, monitoring the viral load is essential as it can help determine the progression of the infection. Additionally, selecting antiviral medications that have been approved for use in immunocompromised individuals, such as antivirals for HIV and hepatitis, is essential as these medications may be more effective in immunocompromised patients. Lastly, considering potential adverse effects of antiviral medications on the immune system and monitoring for potential drug-drug interactions is important to ensure the best outcomes for immunocompromised individuals.


The model response contains words that might not be easy to understand for a child such as chlorophyll.

## LoRA Fine-tuning

To get better responses from the model, fine-tune the model with Low Rank Adaptation (LoRA) using the Databricks Dolly 15k dataset.

The LoRA rank determines the dimensionality of the trainable matrices that are added to the original weights of the LLM. It controls the expressiveness and precision of the fine-tuning adjustments.

A higher rank means more detailed changes are possible, but also means more trainable parameters. A lower rank means less computational overhead, but potentially less precise adaptation.

This tutorial uses a LoRA rank of 4. In practice, begin with a relatively small rank (such as 4, 8, 16). This is computationally efficient for experimentation. Train your model with this rank and evaluate the performance improvement on your task. Gradually increase the rank in subsequent trials and see if that further boosts performance.

In [15]:
# Enable LoRA for the model and set the LoRA rank to 4.
gemma_lm.backbone.enable_lora(rank=8)
gemma_lm.summary()

Note that enabling LoRA reduces the number of trainable parameters significantly (from 2.6 billion to 2.9 million).

In [16]:
# Uncomment the line below if you want to enable mixed precision training on GPUs
#keras.mixed_precision.set_global_policy('mixed_bfloat16')

In [17]:

# Limit the input sequence length to 256 (to control memory usage).
gemma_lm.preprocessor.sequence_length = 256
# Use AdamW (a common optimizer for transformer models).
optimizer = keras.optimizers.AdamW(
    learning_rate=5e-5,
    weight_decay=0.01,
)
# Exclude layernorm and bias terms from decay.
optimizer.exclude_from_weight_decay(var_names=["bias", "scale"])

gemma_lm.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=optimizer,
    weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],
)
gemma_lm.fit(data, epochs=5, batch_size=1)

Epoch 1/5
[1m1300/1300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1283s[0m 957ms/step - loss: 1.1756 - sparse_categorical_accuracy: 0.5648
Epoch 2/5
[1m1300/1300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1223s[0m 924ms/step - loss: 1.0032 - sparse_categorical_accuracy: 0.6001
Epoch 3/5
[1m1300/1300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1176s[0m 905ms/step - loss: 0.9597 - sparse_categorical_accuracy: 0.6127
Epoch 4/5
[1m1300/1300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1177s[0m 905ms/step - loss: 0.9172 - sparse_categorical_accuracy: 0.6247
Epoch 5/5
[1m1300/1300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1223s[0m 941ms/step - loss: 0.8708 - sparse_categorical_accuracy: 0.6390


<keras.src.callbacks.history.History at 0x797cde7c34f0>

### Note on mixed precision fine-tuning on NVIDIA GPUs

Full precision is recommended for fine-tuning. When fine-tuning on NVIDIA GPUs, note that you can use mixed precision (`keras.mixed_precision.set_global_policy('mixed_bfloat16')`) to speed up training with minimal effect on training quality. Mixed precision fine-tuning does consume more memory so is useful only on larger GPUs.


For inference, half-precision (`keras.config.set_floatx("bfloat16")`) will work and save memory while mixed precision is not applicable.

## Inference after fine-tuning
After fine-tuning, responses follow the instruction provided in the prompt.

In [18]:
prompt = template.format(
 instruction="What common risk factors for Lymphocytic Choriomeningitis (LCMV) should be highlighted in patient education materials?",
 response="",
)
sampler = keras_nlp.samplers.TopKSampler(k=5, seed=2)
gemma_lm.compile(sampler=sampler) # Removed extra spaces before this line
print(gemma_lm.generate(prompt, max_length=256))

Instruction:
What common risk factors for Lymphocytic Choriomeningitis (LCMV) should be highlighted in patient education materials?

Response:
In general, there are no risk factors that can be considered common. In rare circumstances, people may become infected with LCMV from a laboratory accident. However, in most cases, people acquire LCMV from infected rodents (such as mice, rats, squirrels and hamsters), usually through contact with urine, droppings, or contaminated bedding. People with a pet rodent should avoid handling rodents and their cage and bedding. In some countries, LCMV infection is also associated with consumption of unpasteurized milk or dairy products.


In [19]:
# Define the prompt with an instruction to identify causes of sudden weight loss
prompt = template.format(
    instruction="What are the primary diagnostic steps for LCMV, and what challenges may arise in accurately diagnosing it?",
    response=""
)

# Generate a response from the language model
print(gemma_lm.generate(prompt, max_length=256))



Instruction:
What are the primary diagnostic steps for LCMV, and what challenges may arise in accurately diagnosing it?

Response:
There are no tests that can diagnose LCMV infection during the early, asymptomatic phase of infection. Most people who become infected will have a fever and flu-like symptoms for several days or weeks. These symptoms may include fatigue, muscle aches, headaches, sore throat, cough, and nausea or vomiting. In some cases, people may also develop a rash or experience eye irritation and swelling.
                
Diagnosis of LCMV infection is usually made during these early stages, using a blood sample tested for antibodies. The best time to test is when the antibody level is at its highest, 4 to 8 weeks after the initial symptoms began.
                
The most accurate test, a polymerase chain reaction (PCR) test, can diagnose LCMV infection in people with acute illness and can be used to diagnose chronic infection in people who do not respond to antiviral 

In [20]:
# Define the prompt with an instruction about anemia and its treatment
prompt = template.format(
    instruction="What diagnostic tests are most effective in early detection of viral infections with neurological symptoms?",
    response=""
)

# Generate a response from the language model
print(gemma_lm.generate(prompt, max_length=256))



Instruction:
What diagnostic tests are most effective in early detection of viral infections with neurological symptoms?

Response:
In general, there are no specific diagnostic tests that can identify the presence of a virus. However, a blood test for antibodies to a particular virus is often used to confirm that a person has been infected by a virus. In addition, a spinal fluid analysis, which measures the number of white blood cells and other components in the spinal fluid, is often used in the diagnosis of viral infections of the central nervous system.


In [21]:
# Define the prompt with an instruction on diagnosing chronic kidney disease
prompt = template.format(
    instruction="What are the early symptoms of common zoonotic infections, and how can they be differentiated from similar diseases?",
    response=""
)

# Generate a response from the language model
print(gemma_lm.generate(prompt, max_length=256))


Instruction:
What are the early symptoms of common zoonotic infections, and how can they be differentiated from similar diseases?

Response:
Most zoonotic diseases do not have any signs or symptoms that can be used to distinguish them from non-infectious diseases. In many cases, individuals do not seek medical attention until they have been sick for a number of days. The most common symptoms of zoonotic illnesses are fever and fatigue. Other symptoms may include headache, cough, muscle aches, gastrointestinal problems, skin rashes, or neurological disorders.


In [22]:
# Define the prompt with an instruction about the causes of acid reflux and heartburn
prompt = template.format(
    instruction="What are the key considerations for managing viral infections in immunocompromised individuals?",
    response=""
)

# Generate a response from the language model
print(gemma_lm.generate(prompt, max_length=256))



Instruction:
What are the key considerations for managing viral infections in immunocompromised individuals?

Response:
Individuals with compromised immune systems have a higher risk of developing severe viral diseases. These diseases can be caused by common viruses that normally do minimal harm to healthy people. The virus-host interactions in immunocompromised patients are complex, and it can be difficult to determine what factors are responsible for a particular illness. For example, a patient may have a virus in his or her blood, but not have symptoms. In addition, a virus may be present but not be detected by a standard blood test.


In [23]:
# Define the prompt with an instruction on ethical considerations in assisted reproductive technologies
prompt = template.format(
    instruction="What is the role of clinical trials in the treatment of non-small cell lung cancer, and how can patients participate?",
    response=""
)

# Generate a response from the language model
print(gemma_lm.generate(prompt, max_length=256))


Instruction:
What is the role of clinical trials in the treatment of non-small cell lung cancer, and how can patients participate?

Response:
Researchers need to know more about the drugs and therapies used to treat NSCLC, so they conduct clinical trials to test new drugs. These trials are often open to people with NSCLC and may be done at one of The National Cancer Institute (NCI)-designated cancer centers or at one of the many clinical trials sites across the country.
                
The National Clinical Trials Network (NCTN) conducts clinical trials for many cancers, including NSCLC. More than 200 sites across the country are part of the network and are able to enroll patients in the NSCLC clinical trials. The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) also supports NSCLC clinical trials through the Clinical Lung Cancer Research Consortium. The National Lung Cancer Screening Trial (NLST) was conducted to determine whether screening with low-dose CT sc

#Saving My Finetuned Model to Kaggle

In [24]:
# Define the directory to save the fine-tuned model as a preset
preset_dir = "./finetune1c_gemma2_2b_en_medical_qa"

# Save the fine-tuned model using the latest KerasNLP methods
gemma_lm.save_to_preset(preset_dir)



## Inference of my Finetuned Model

In [36]:

kaggle_username = "oluidiakhoa"
# Construct the Kaggle URI for uploading the preset as a new model variant
kaggle_uri = f"kaggle://{kaggle_username}/gemma/keras/finetune1c_gemma2_2b_en_medical_qa"

keras_nlp.upload_preset(kaggle_uri, preset_dir)
print(f"Model preset saved to {preset_dir} and uploaded to Kaggle as {kaggle_uri}.")

Uploading Model https://www.kaggle.com/models/oluidiakhoa/gemma/keras/finetune1c_gemma2_2b_en_medical_qa ...
Starting upload for file ./finetune1c_gemma2_2b_en_medical_qa/tokenizer.json


BackendError: Unauthorized access

In [34]:
!zip -r finetune1c_gemma2_2b_en_medical_qa.zip /content/finetune1c_gemma2_2b_en_medical_qa


  adding: content/finetune1c_gemma2_2b_en_medical_qa/ (stored 0%)
  adding: content/finetune1c_gemma2_2b_en_medical_qa/tokenizer.json (deflated 63%)
  adding: content/finetune1c_gemma2_2b_en_medical_qa/preprocessor.json (deflated 76%)
  adding: content/finetune1c_gemma2_2b_en_medical_qa/assets/ (stored 0%)
  adding: content/finetune1c_gemma2_2b_en_medical_qa/assets/tokenizer/ (stored 0%)
  adding: content/finetune1c_gemma2_2b_en_medical_qa/assets/tokenizer/vocabulary.spm (deflated 51%)
  adding: content/finetune1c_gemma2_2b_en_medical_qa/model.weights.h5


zip error: Interrupted (aborting)


In [None]:
#Define the prompt template
template = "Instruction:\n{instruction}\n\nResponse:\n{response}"

#Format the example with an instruction for the model
prompt = template.format( instruction="What is the medical definition of 'myelodysplastic syndrome", response="" )

#Set up a Top-K Sampler with k=5
sampler = keras_nlp.samplers.TopKSampler(k=5, seed=2)

#Compile the fine-tuned model with the specified sampler
finetuned_model.compile(sampler=sampler)

#Generate text based on the prompt with a maximum length of 256 tokens
print(finetuned_model.generate(prompt, max_length=256))

In [None]:

#Define the prompt template
template = "Instruction:\n{instruction}\n\nResponse:\n{response}"

#Format the example with an instruction for the model
prompt = template.format( instruction="What are the steps to diagnose chronic kidney disease?", response="" )

#Set up a Top-K Sampler with k=5
sampler = keras_nlp.samplers.TopKSampler(k=5, seed=2)

#Compile the fine-tuned model with the specified sampler
finetuned_model.compile(sampler=sampler)

#Generate text based on the prompt with a maximum length of 256 tokens
print(finetuned_model.generate(prompt, max_length=256))

In [33]:
kaggle_username = "oluidiakhoa"
Kaggle_key="6b8f69f428789daad475b6e04f03975e"

!kaggle datasets download -d oluidiakhoa/finetune1c_gemma2_2b_en_medical_qa -p /kaggle/working


403 - Forbidden - Permission 'datasets.get' was denied


In [30]:
# Import necessary libraries
from google.colab import userdata
import os
import keras
import keras_nlp
from huggingface_hub import login  # Import the login function from huggingface_hub

# Retrieve the Hugging Face token securely from Google Colab's userdata
# Changed from userdata.get(["HF_TOKEN"]) to userdata.get("HF_TOKEN")
secrets = userdata.get("HF_TOKEN")  # Make sure the token is stored as "HF_TOKEN" in Colab's secrets
hf_token = secrets # secrets already contained the value of "HF_TOKEN", assignment was redundant.

# Authenticate with the Hugging Face Hub using the retrieved token
login(token=hf_token)

# Set the token as an environment variable if needed
os.environ["HF_TOKEN"] = hf_token

# Define the Kaggle username
kaggle_username = "oluidiakhoa"

# Construct the Kaggle URI for uploading the preset as a new model variant
kaggle_uri = f"kaggle://{kaggle_username}/gemma/keras/finetune1c_gemma2_2b_en_medical_qa"
finetuned_model = keras_nlp.models.GemmaCausalLM.from_preset(kaggle_uri)

# Define the directory to save the fine-tuned model as a preset
preset_dir = "./finetune1_gemma2_2b_en_medical_qa"

# Save the fine-tuned model using the latest KerasNLP methods
finetuned_model.save_to_preset(preset_dir)

# Fine-tuning process
# (Additional fine-tuning code would go here)

# Define the Hugging Face URI for uploading the preset
hf_uri = "hf://mgbam/finetunec_gemma2_2b_en_medical_qa"  # Ensure this matches your username and model name

# Upload the preset to the Hugging Face Hub
keras_nlp.upload_preset(hf_uri, preset_dir)

ValueError: Preset kaggle://oluidiakhoa/gemma/keras/finetune1c_gemma2_2b_en_medical_qa has no config.json. Make sure the URI or directory you are trying to load is a valid KerasHub preset and and that you have permissions to read/download from this location.

In [None]:
# Define the prompt template
template = "Instruction:\n{instruction}\n\nResponse:\n{response}"

# Format the example with an instruction for the model
prompt = template.format(
    instruction="What is the medical definition of 'myelodysplastic syndrome",
    response=""
)

# Set up a Top-K Sampler with k=5
sampler = keras_nlp.samplers.TopKSampler(k=5, seed=2)

# Compile the fine-tuned model with the specified sampler
finetuned_model.compile(sampler=sampler)

# Generate text based on the prompt with a maximum length of 256 tokens
print(finetuned_model.generate(prompt, max_length=256))


In [None]:
# Define the prompt template
        # Format the entire example as a single string.
#template = "Instruction:\n{instruction}\n\nResponse:\n{response}"


#prompt = template.format(
 #   instruction="What is the medical definition of 'myelodysplastic syndrome",
 #   response="",
#)
#sampler = keras_nlp.samplers.TopKSampler(k=5, seed=2)
# Use the finetuned_model instead of gemma_lm for generation
# finetuned_model.compile(sampler=sampler)
#print(finetuned_model.generate(prompt, max_length=256))

In [None]:
# Ensure your validation data (x_val) is in text/string format.
# For example, if x_val is currently numeric, replace it with appropriate text or tokenized sequences.

# Sample x_val - ensure this is text or tokenized sequences
# This is just an example; replace with your actual validation data.
# Each entry in x_val should be a question or sentence for language model evaluation.

#x_val = [
 #   "How is chronic obstructive pulmonary disease (COPD) treated?",
 #   "What are the symptoms of epilepsy?",
  #  "What is the cause of osteoarthritis?"
#]

# Similarly, ensure y_val contains the appropriate labels in numeric form.
# y_val should have the true labels corresponding to the predictions expected from gemma_lm.

# Evaluate the model using the chosen metric (e.g., perplexity).
# Here, perplexity_value will provide an indication of the model's performance on x_val and y_val.

# The following line calculates and prints the perplexity for the language model on the validation data.
# Adjust batch_size as needed for your model and data size.
# perplexity_value = gemma_lm.evaluate(x_val, y_val, batch_size=32)
# print("Perplexity: ", perplexity_value)


In [None]:
# from sklearn.metrics import precision_score, recall_score, f1_score

# Generate predictions
# predictions = gemma_lm.predict(x_val)

# Assuming `y_val` contains true labels
# precision = precision_score(y_val, predictions, average='weighted')
# recall = recall_score(y_val, predictions, average='weighted')
# f1 = f1_score(y_val, predictions, average='weighted')

# print(f"Precision: {precision}, Recall: {recall}, F1 Score: {f1}")

In [None]:
#loss, accuracy = gemma_lm.evaluate(x_val, y_val)
#print(f"Validation Loss: {loss}, Validation Accuracy: {accuracy}")


In [None]:
#from nltk.translate.bleu_score import sentence_bleu

#reference = "This is the correct response."
#generated = gemma_lm.generate("Provide a response", max_length=256)
#score = sentence_bleu([reference.split()], generated.split())
#print("BLEU score: ", score)


The model now recommends places to visit in Europe.

The model now explains photosynthesis in simpler terms.

Note that for demonstration purposes, this tutorial fine-tunes the model on a small subset of the dataset for just one epoch and with a low LoRA rank value. To get better responses from the fine-tuned model, you can experiment with:

1. Increasing the size of the fine-tuning dataset
2. Training for more steps (epochs)
3. Setting a higher LoRA rank
4. Modifying the hyperparameter values such as `learning_rate` and `weight_decay`.

## Summary and next steps

This tutorial covered LoRA fine-tuning on a Gemma model using KerasNLP. Check out the following docs next:

* Learn how to [generate text with a Gemma model](https://ai.google.dev/gemma/docs/get_started).
* Learn how to perform [distributed fine-tuning and inference on a Gemma model](https://ai.google.dev/gemma/docs/distributed_tuning).
* Learn how to [use Gemma open models with Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/open-models/use-gemma).
* Learn how to [fine-tune Gemma using KerasNLP and deploy to Vertex AI](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_gemma_kerasnlp_to_vertexai.ipynb).