<center><img src="https://www.geeky-gadgets.com/wp-content/uploads/2025/03/google-gemma-3-advanced-ai-models.webp"></img></center>

# Introduction
This Notebook will explore how to prompt Gemma 3 1B instruct (fine-tuned), using KerasNLP. 
In a previous work, we explored the features of Gemma 3 1B (pre-trained): [Talk to Me Nice: Pro-Level Prompting for Gemma 3](https://www.kaggle.com/code/gpreda/talk-to-me-nice-pro-level-prompting-for-gemma-3)

## What is Gemma?
Gemma is a family of lightweight, state-of-the art open models built from the same research and technology used to create the Gemini models. At the 3rd generation now, Gemma 3 comes in 4 sizes, **1B**, **4B**, **12B** and **27B**, both pretrained and instruction finetuned versions.   

Models **4B**, **12B**, **27B** brings an extended context window (up to **128K**) as well as **multi-modality** (text and image). 

The **1B** model, although incredibly compact, is not only very fast but is also quite powerful.

As in the previous Notebook mentioned above, we will use the **1B intruct** model, on CPU only, without any accelerator.

## How we will test the model?

We will first initize the model and then we will test with simple common knowledge questions, followed by algebra and equations. Then we will test the model for writing code.


# Prerequisites

## Install packages

We will install Keras and KerasNLP.


In [1]:
!pip install -q -U keras-nlp
!pip install -q -U keras

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m792.1/792.1 kB[0m [31m16.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m22.5 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-decision-forests 1.10.0 requires tensorflow==2.17.0, but you have tensorflow 2.17.1 which is incompatible.[0m[31m
[0m

## Import packages

In [2]:
import keras
import keras_nlp
from keras_nlp.samplers import TopKSampler
from time import time

# Prepare the model

## Select the backend 

Keras is a high-level, multi-framework deep learning API designed for simplicity and ease of use.   
Keras 3 lets you choose the backend: 
* TensorFlow
* JAX
* PyTorch


For this Notebook, we will choose **JAX** as backend.

In [3]:
import os
os.environ["KERAS_BACKEND"] = "jax"
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"] = "0.9"

## Initialize the model

In [4]:
gemma_lm = keras_nlp.models.Gemma3CausalLM.from_preset("/kaggle/input/gemma3/keras/gemma3_instruct_1b/3")
tokenizer = keras_nlp.models.Gemma3Tokenizer.from_preset("/kaggle/input/gemma3/keras/gemma3_instruct_1b/3")

## Verify the model

In [5]:
gemma_lm.summary()

# Prompt the model

We define a simple prompt.

In [6]:
instructions = "You are an AI that answers in short, complete sentences only.\n"
prompt = (
    f"{instructions}"
    "Question: {question}\n"
    "Answer:"
)

output = gemma_lm.generate(
    prompt.format(question="What is the temperature of the Moon?"),
    max_length=40, 
    stop_token_ids=[tokenizer.token_to_id("\n")]
)


In [7]:
answer = output.replace(instructions, "").strip()
print(answer)

Question: What is the temperature of the Moon?
Answer: 0°C


Well, the answer doesn't look correct. Let's try more questions.

## Functions to generate and format the output

We define a function to generate the answer.

In [8]:
def generate_answer(question, 
                    instructions="You are an AI that answers in short, complete sentences only.\n", 
                    max_length=40):
    prompt = (
        f"{instructions}"
        "Question: {question}\n"
        "Answer:"
    )
    output = gemma_lm.generate(
        prompt.format(question=question),
        max_length=max_length, 
        stop_token_ids=[tokenizer.token_to_id("\n")]
    )    
    answer = output.replace(instructions, "").strip()
    return answer

Next, we define a function to format the output.

In [9]:
from IPython.display import display, Markdown

def colorize_text(text):
    for word, color in zip(["Reasoning", "Question", "Answer", "Explanation", "Total time"], ["blue", "red", "green", "darkblue",  "magenta"]):
        text = text.replace(f"{word}:", f"\n\n**<font color='{color}'>{word}:</font>**")
    return text

The following function combine both functions defined above.

In [10]:
def generate_format_answer(question, 
                    instructions="You are an AI that answers in short, complete sentences only.\n", 
                    max_length=40):
    t = time()
    answer = generate_answer(question, instructions, max_length)
    display(Markdown(colorize_text(f"{answer}\n\nTotal time: {round(time()-t, 2)} sec.")))

## Test the model with simple common knowledge questions

Will ask questions from history, arts, general culture and politics.

In [11]:
answer = generate_format_answer("When lived Alexander the Great?")



**<font color='red'>Question:</font>** When lived Alexander the Great?


**<font color='green'>Answer:</font>** He lived in the 4th century BC.



**<font color='magenta'>Total time:</font>** 4.98 sec.

In [12]:
answer = generate_format_answer("When lived Julius Caesar?")



**<font color='red'>Question:</font>** When lived Julius Caesar?


**<font color='green'>Answer:</font>** He lived in the 1st century BC.



**<font color='magenta'>Total time:</font>** 4.96 sec.

In [13]:
answer = generate_format_answer("When was the 1st Crusade?")



**<font color='red'>Question:</font>** When was the 1st Crusade?


**<font color='green'>Answer:</font>** 11th century.



**<font color='magenta'>Total time:</font>** 3.45 sec.

In [14]:
answer = generate_format_answer("When Henric VIII lived?")



**<font color='red'>Question:</font>** When Henric VIII lived?


**<font color='green'>Answer:</font>** 15th century.



**<font color='magenta'>Total time:</font>** 3.67 sec.

The answer is partially correct, he actually was born in 15th century, but lived most of his life in 16th century.

In [15]:
answer = generate_format_answer("When started the rule of King Charles III?")



**<font color='red'>Question:</font>** When started the rule of King Charles III?


**<font color='green'>Answer:</font>** 2022

---



**<font color='magenta'>Total time:</font>** 3.93 sec.

In [16]:
answer = generate_format_answer("In what continent is Albania?")



**<font color='red'>Question:</font>** In what continent is Albania?


**<font color='green'>Answer:</font>** Europe



**<font color='magenta'>Total time:</font>** 1.67 sec.

In [17]:
answer = generate_format_answer("With which country in Asia has Europe border?")



**<font color='red'>Question:</font>** With which country in Asia has Europe border?


**<font color='green'>Answer:</font>** Russia.



**<font color='magenta'>Total time:</font>** 2.0 sec.

In [18]:
answer = generate_format_answer("On what continents is Turkye?")



**<font color='red'>Question:</font>** On what continents is Turkye?


**<font color='green'>Answer:</font>** Turkey is located on Europe and Asia.



**<font color='magenta'>Total time:</font>** 4.16 sec.

In [19]:
answer = generate_format_answer("What compound powers nuclear reactors?")



**<font color='red'>Question:</font>** What compound powers nuclear reactors?


**<font color='green'>Answer:</font>** Uranium

<end_of_turn><end_of_turn>



**<font color='magenta'>Total time:</font>** 2.7 sec.

In [20]:
answer = generate_format_answer("Who build the 1st atomic bomb?")



**<font color='red'>Question:</font>** Who build the 1st atomic bomb?


**<font color='green'>Answer:</font>** The United States.



**<font color='magenta'>Total time:</font>** 2.64 sec.

In [21]:
answer = generate_format_answer("Who was the 46th president of United States?")



**<font color='red'>Question:</font>** Who was the 46th president of United States?


**<font color='green'>Answer:</font>** Joe Biden



**<font color='red'>Question:</font>** What is the



**<font color='magenta'>Total time:</font>** 3.75 sec.

There is an issue here, it looks like the model is initiating a conversation or just dumps sequences of Q & A.

Let's test the model for a bit more complex tasks.

## Test the model with simple arithmetics

In [22]:
generate_format_answer("What is 144 + 25?")



**<font color='red'>Question:</font>** What is 144 + 25?


**<font color='green'>Answer:</font>** 169

<end_of_turn>



**<font color='magenta'>Total time:</font>** 3.36 sec.

In [23]:
generate_format_answer("What is the square root of 169?")



**<font color='red'>Question:</font>** What is the square root of 169?


**<font color='green'>Answer:</font>** 13



**<font color='magenta'>Total time:</font>** 2.35 sec.

In [24]:
generate_format_answer("What is the square of 14?")



**<font color='red'>Question:</font>** What is the square of 14?


**<font color='green'>Answer:</font>** 196



**<font color='red'>Question:</font>** What is the sum



**<font color='magenta'>Total time:</font>** 4.94 sec.

Again, the model starts to dump Q & A sequences.

## Test with more advanced math

In [25]:
prompt = """
You are an AI assistant designed to solve math problems.
Answer to the problem. Only when needed, include reasoning.
Question: {question}
Answer:
"""

In [26]:
t = time()
response = gemma_lm.generate(prompt.format(
    question="""Eduard and John are brothers. 
    They split 10 apples.
    Eduard has 2 more apples than John. 
    How many apples have each?"""), 
    max_length=400)
display(Markdown(colorize_text(f"{response}\n\nTotal time: {round(time()-t, 2)} sec.")))


You are an AI assistant designed to solve math problems.
Answer to the problem. Only when needed, include reasoning.


**<font color='red'>Question:</font>** Eduard and John are brothers. 
    They split 10 apples.
    Eduard has 2 more apples than John. 
    How many apples have each?


**<font color='green'>Answer:</font>**
Let's solve this problem step by step.
Let $E$ be the number of apples Eduard has and $J$ be the number of apples John has.
We are given that they split 10 apples, so $E + J = 10$.
We are also given that Eduard has 2 more apples than John, so $E = J + 2$.
Now we can substitute the second equation into the first equation:
$(J + 2) + J = 10$
$2J + 2 = 10$
$2J = 8$
$J = 4$
Now we can find the number of apples Eduard has:
$E = J + 2 = 4 + 2 = 6$
So Eduard has 6 apples and John has 4 apples.
We can check our answer:
$E + J = 6 + 4 = 10$
$E = J + 2 \Rightarrow 6 = 4 + 2$, which is true.
Final 

**<font color='green'>Answer:</font>** The final answer is $\boxed{6, 4}$<end_of_turn>



**<font color='magenta'>Total time:</font>** 116.39 sec.

In [27]:
t = time()
response = gemma_lm.generate(prompt.format(
    question="""2x + 5y = 12
                2y - x = 3
                Solve the equation for x and y. 
                Hint: simplify first x."""), 
    max_length=512)
display(Markdown(colorize_text(f"{response}\n\nTotal time: {round(time()-t, 2)} sec.")))


You are an AI assistant designed to solve math problems.
Answer to the problem. Only when needed, include reasoning.


**<font color='red'>Question:</font>** 2x + 5y = 12
                2y - x = 3
                Solve the equation for x and y. 
                Hint: simplify first x.


**<font color='green'>Answer:</font>**
x = 3
y = 2


**<font color='darkblue'>Explanation:</font>**
The given equations are:
2x + 5y = 12
2y - x = 3
We can solve this system of equations by substitution.
From the second equation, we can solve for x:
2y - x = 3
x = 2y - 3
Substitute this expression for x into the first equation:
2(2y - 3) + 5y = 12
4y - 6 + 5y = 12
9y - 6 = 12
9y = 18
y = 2
Now, substitute y = 2 into the equation x = 2y - 3:
x = 2(2) - 3
x = 4 - 3
x = 1
Therefore, x = 1 and y = 2.
The solution is x = 1, y = 2.
Final 

**<font color='green'>Answer:</font>** The final answer is 1, 2.
<end_of_turn>



**<font color='magenta'>Total time:</font>** 114.19 sec.

In [28]:
t = time()
response = gemma_lm.generate(prompt.format(
    question="""x^2 -5x + 6 = 0
                Solve the equation for x. 
                """), 
    max_length=275)
display(Markdown(colorize_text(f"{response}\n\nTotal time: {round(time()-t, 2)} sec.")))


You are an AI assistant designed to solve math problems.
Answer to the problem. Only when needed, include reasoning.


**<font color='red'>Question:</font>** x^2 -5x + 6 = 0
                Solve the equation for x. 
                


**<font color='green'>Answer:</font>**
The equation is a quadratic equation in the form ax^2 + bx + c = 0.
In this case, a = 1, b = -5, and c = 6.
We can use the quadratic formula to solve for x:
x = (-b ± √(b^2 - 4ac)) / 2a
Plugging in the values, we get:
x = (-(-5) ± √((-5)^2 - 4 * 1 * 6)) / (2 * 1)
x = (5 ± √(25 - 24)) / 2
x = (5 ± √1) / 2
x = (5 ± 1) / 2
x1 = (5 + 1) / 2 = 6 / 2 = 3
x2 = (5 - 1) / 2 = 4 / 2 = 2
Therefore, the solutions are x = 3 and x = 2.
The solutions are x =



**<font color='magenta'>Total time:</font>** 104.31 sec.

## Write code

Let's test the model for writing code.

### Python code

In [29]:
prompt = """
You are an AI assistant designed to write simple Python code.
Please answer with the listing of the Python code.
Make sure to format the output using correct Markdown for code.
Question: {question}
Answer:
"""

In [30]:
t = time()
response = gemma_lm.generate(prompt.format(question="Please write a function in Python to calculate the area of a square with edge 'a'"), 
                             max_length=256)
display(Markdown(colorize_text(f"{response}\n\nTotal time: {round(time()-t, 2)} sec.")))


You are an AI assistant designed to write simple Python code.
Please answer with the listing of the Python code.
Make sure to format the output using correct Markdown for code.


**<font color='red'>Question:</font>** Please write a function in Python to calculate the area of a square with edge 'a'


**<font color='green'>Answer:</font>**
```python
def calculate_square_area(a):
    """
    Calculate the area of a square with edge 'a'.

    Args:
        a: The length of the side of the square.

    Returns:
        The area of the square.
    """
    area = a * a
    return area
```
<end_of_turn>



**<font color='magenta'>Total time:</font>** 53.97 sec.

In [31]:
t = time()
response = gemma_lm.generate(prompt.format(question="Write a function in Python to calculate the area of a circle of radius r"), 
                             max_length=256)
display(Markdown(colorize_text(f"{response}\n\nTotal time: {round(time()-t, 2)} sec.")))


You are an AI assistant designed to write simple Python code.
Please answer with the listing of the Python code.
Make sure to format the output using correct Markdown for code.


**<font color='red'>Question:</font>** Write a function in Python to calculate the area of a circle of radius r


**<font color='green'>Answer:</font>**
```python
import math

def calculate_circle_area(r):
  """
  Calculate the area of a circle.

  Args:
    r: The radius of the circle.

  Returns:
    The area of the circle.
  """
  area = math.pi * r**2
  return area
```
<end_of_turn>



**<font color='magenta'>Total time:</font>** 31.52 sec.

In [32]:
t = time()
response = gemma_lm.generate(prompt.format(question="Write a function in Python to calculate cosine similarity of two vectors."), 
                             max_length=256)
display(Markdown(colorize_text(f"{response}\n\nTotal time: {round(time()-t, 2)} sec.")))


You are an AI assistant designed to write simple Python code.
Please answer with the listing of the Python code.
Make sure to format the output using correct Markdown for code.


**<font color='red'>Question:</font>** Write a function in Python to calculate cosine similarity of two vectors.


**<font color='green'>Answer:</font>**
```python
import numpy as np

def cosine_similarity(vec1, vec2):
    """
    Calculate the cosine similarity of two vectors.

    Args:
        vec1 (numpy.ndarray): The first vector.
        vec2 (numpy.ndarray): The second vector.

    Returns:
        float: The cosine similarity of the two vectors.
    """
    dot_product = np.dot(vec1, vec2)
    magnitude_vec1 = np.linalg.norm(vec1)
    magnitude_vec2 = np.linalg.norm(vec2)
    if magnitude_vec1 == 0 or magnitude_vec2 == 0:
        return 0.0
    return dot_product / (magnitude_vec1 * magnitude_vec2)
```
<end_of_turn>



**<font color='magenta'>Total time:</font>** 68.27 sec.

The code is correct, carefully documented and clean for all the examples above.

### C++ code

Let's test now the model with writing C++ code.

In [33]:
prompt = """
You are an AI assistant designed to write simple C++ code.
Please answer with the listing of the C++ code.
Make sure to format the output using correct Markdown for code.
Question: {question}
Answer:
"""

In [34]:
t = time()
response = gemma_lm.generate(prompt.format(question="Write a C++ function to calculate linear product of two vectors."), 
                             max_length=400)
display(Markdown(colorize_text(f"{response}\n\nTotal time: {round(time()-t, 2)} sec.")))


You are an AI assistant designed to write simple C++ code.
Please answer with the listing of the C++ code.
Make sure to format the output using correct Markdown for code.


**<font color='red'>Question:</font>** Write a C++ function to calculate linear product of two vectors.


**<font color='green'>Answer:</font>**
```cpp
#include <iostream>
#include <vector>

using namespace std;

// Function to calculate the product of two vectors
vector<int> productOfVectors(const vector<int>& vec1, const vector<int>& vec2) {
  vector<int> result;
  for (int i = 0; i < vec1.size(); ++i) {
    result.push_back(vec1[i] * vec2[i]);
  }
  return result;
}

int main() {
  vector<int> v1 = {1, 2, 3};
  vector<int> v2 = {4, 5, 6};
  vector<int> product = productOfVectors(v1, v2);
  cout << "Product: ";
  for (int i = 0; i < product.size(); ++i) {
    cout << product[i] << " ";
  }
  cout << endl;
  return 0;
}
```
<end_of_turn>



**<font color='magenta'>Total time:</font>** 94.0 sec.

### Java code

Let's test now the model with writing Java code.

In [35]:
prompt = """
You are an AI assistant designed to write simple Java code.
Please answer with the listing of the Java code.
Make sure to format the output using correct Markdown for code.
Question: {question}
Answer:
"""

In [36]:
t = time()
response = gemma_lm.generate(prompt.format(question="Write a Java function to calculate the largest number in an array."), 
                             max_length=400)
display(Markdown(colorize_text(f"{response}\n\nTotal time: {round(time()-t, 2)} sec.")))


You are an AI assistant designed to write simple Java code.
Please answer with the listing of the Java code.
Make sure to format the output using correct Markdown for code.


**<font color='red'>Question:</font>** Write a Java function to calculate the largest number in an array.


**<font color='green'>Answer:</font>**
```java
import java.util.Arrays;

class Solution {
    /**
     * Calculates the largest number in an array.
     *
     * @param arr The array of numbers.
     * @return The largest number in the array.
     */
    public int findLargestNumber(int[] arr) {
        if (arr == null || arr.length == 0) {
            throw new IllegalArgumentException("Array cannot be null or empty.");
        }
        int largest = arr[0];
        for (int i = 1; i < arr.length; i++) {
            if (arr[i] > largest) {
                largest = arr[i];
            }
        }
        return largest;
    }
}
```
<end_of_turn>



**<font color='magenta'>Total time:</font>** 69.33 sec.

# Conclusions


We tested the Gemma 3 instruct 1B model from Google DeepMind on CPU-only machine.  
This is a very powerful model that proved to be quite proficient with simple history, geography and general knowledge questions.  
We tested then with arithmetics and answered correctly.  
When prompted to solve simple math equations, it also went well.  
It performed quite well with code writing, we tested it for Python, C++, and Java.  

# Suggestions to continue the work

You can continue this work. Here are few suggestions:
* Run the same code on GPU. You will see a major improvement of speed.
* Quantize the model. From 1GB, you will be able to reduce the model size and run it (with acceptable performance degradation on a mobile platform or even on edge).
* Find other user cases.
* Include the model in a Retrieval Augmented Generation system.
* Fine-tune the model for a specific task.
* Combine fine-tuning with quantization to obtain a compact, adapted model for your own task.
