# Introduction

This notebook presents a functional prototype built with Gemma 3n, Google’s latest on-device, multimodal AI model. The goal: use compact, private, offline-ready AI to solve a real-world problem. Full demo, code, and technical details follow.

## What are the key features of Gemma 3n?

The key features of this new model from Google are:

1. On-Device Performance
Optimized for mobile and edge devices, Gemma 3n delivers real-time AI with minimal memory usage. The 5B and 8B models run like 2B and 4B models, thanks to innovations like Per-Layer Embeddings (PLE).

2. Mix’n’Match Model Scaling
A single model can act as multiple: the 4B version includes a 2B submodel, enabling dynamic tradeoffs between performance and efficiency. Developers can also create custom-sized submodels tailored to specific tasks.

3. Privacy-First and Offline-Ready
Gemma 3n runs entirely on-device, ensuring user data never leaves the device. This makes it ideal for privacy-sensitive applications and for use in low- or no-connectivity environments.

4. Multimodal Understanding
Supports text, image, audio, and enhanced video input, enabling powerful applications like voice interfaces, transcription, translation, visual recognition, and more—all locally.

5. Multilingual Proficiency
Strong performance across major global languages including Japanese, German, Korean, Spanish, and French, expanding access and inclusivity.



# Prepare the model

## Install prerequisites

In [1]:
!pip install timm --upgrade
!pip install accelerate
!pip install git+https://github.com/huggingface/transformers.git

Collecting timm
  Downloading timm-1.0.16-py3-none-any.whl.metadata (57 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.6/57.6 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch->timm)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch->timm)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch->timm)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch->timm)
  Downloading nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cusolver-cu12==11.6.1.9 (from torch->timm)
  Downloading nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cusp

## Import packages

In [2]:
from time import time
import kagglehub
import transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
from transformers import AutoProcessor, AutoModelForImageTextToText

2025-07-03 15:14:54.523979: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1751555694.769544      13 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1751555694.840120      13 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


## Load the model

In [3]:
GEMMA_PATH = kagglehub.model_download("google/gemma-3n/transformers/gemma-3n-e2b-it")
processor = AutoProcessor.from_pretrained(GEMMA_PATH)
model = AutoModelForImageTextToText.from_pretrained(GEMMA_PATH, torch_dtype="auto", device_map="auto")

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

## Test the model with a simple prompt

In [4]:
prompt = """What is the France capital?"""
input_ids = processor(text=prompt, 
                      return_tensors="pt").to(model.device, 
                                              dtype=model.dtype)

outputs = model.generate(**input_ids, 
                         max_new_tokens=32, 
                         disable_compile=True)
text = processor.batch_decode(
    outputs,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=True
)
print(text[0])

What is the France capital?

Paris is the capital of France.

Final Answer: Paris


Let's wrap this inside a function.

In [5]:
def query_model(prompt, max_new_tokens=32):
    start_time = time()
    input_ids = processor(text=prompt, 
                          return_tensors="pt").to(model.device, 
                                                  dtype=model.dtype)
    
    outputs = model.generate(**input_ids, 
                             max_new_tokens=max_new_tokens, 
                             disable_compile=True)
    text = processor.batch_decode(
        outputs,
        skip_special_tokens=True,
        clean_up_tokenization_spaces=True
    )
    total_time = round(time() - start_time, 2)
    response = text[0].split(prompt)[-1]
    return response, total_time
    


In [6]:
prompt = "Quelle est la capitale de la France?"
response, total_time = query_model(prompt, max_new_tokens=16)
print(f"Execution time: {total_time}")
print(f"Question: {prompt}")
print(f"Response: {response}")

Execution time: 3.44
Question: Quelle est la capitale de la France?
Response: 

Paris.


# Test the model 


## Let's start with some history questions

In [7]:
prompt = "When started WW2?"
response, total_time = query_model(prompt, max_new_tokens=32)
print(f"Execution time: {total_time}")
print(f"Question: {prompt}")
print(f"Response: {response}")

Execution time: 17.4
Question: When started WW2?
Response: 

WW2 started on September 1, 1939, with Germany's invasion of Poland.

Final Answer: The final answer is $\


It doesn't look too right, I would like to keep it as short as possible. Let's refine a bit the function, we will add a system prompt.

## Improve the query function

In [8]:
def query_model_v2(prompt, max_new_tokens=32):
    start_time = time()
    
    system_prompt = """
            You are a smart AI expert in aswering questions.
            Just answer to the point, do not elaborate.
            For example, if you are asked to provide a year, a name, a location,
            return just the information, without any other words.
            """
    messages = [
        {
            "role": "system",
            "content": [
                {"type": "text", "text": system_prompt}
            ],
            "role": "user",
            "content": [
                {"type": "text", "text": prompt}
            ]
        }
    ]
    
    inputs = processor.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt"
    ).to(model.device, dtype=model.dtype)

    # retrieve input length
    input_len = inputs["input_ids"].shape[-1]
    
    outputs = model.generate(**inputs, 
                             max_new_tokens=max_new_tokens, 
                             disable_compile=True)
    text = processor.batch_decode(
        # use input length to filter only the response from the output
        outputs[:, input_len:],
        # skip special tokens
        skip_special_tokens=True,
        # cleanup tokenization spaces
        clean_up_tokenization_spaces=True
    )
    total_time = round(time() - start_time, 2)
    response = text[0]
    return response, total_time

In [9]:
prompt = "What year started WW2?"
response, total_time = query_model_v2(prompt, max_new_tokens=12)
print(f"Execution time: {total_time}")
print(f"Question: {prompt}")
print(f"Response: {response}")

Execution time: 8.49
Question: What year started WW2?
Response: World War II started in **1939**. 


## Colorize the output

In [10]:
from IPython.display import display, Markdown

def colorize_text(text):
    for word, color in zip(["Reasoning", "Question", "Response", "Explanation", "Execution time"], ["blue", "red", "green", "darkblue",  "magenta"]):
        text = text.replace(f"{word}:", f"\n\n**<font color='{color}'>{word}:</font>**")
    return text

In [11]:
prompt = "Between what years was Obama president?"
response, total_time = query_model_v2(prompt, max_new_tokens=32)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 14.68



**<font color='red'>Question:</font>** Between what years was Obama president?



**<font color='green'>Response:</font>** Barack Obama was president of the United States from **2009 to 2017**.


In [12]:
prompt = "Between what years was the 30 years war?"
response, total_time = query_model_v2(prompt, max_new_tokens=32)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 19.97



**<font color='red'>Question:</font>** Between what years was the 30 years war?



**<font color='green'>Response:</font>** The Thirty Years' War was fought between **1618 and 1648**. 

While it began in 1618 with

In [13]:
prompt = "Between what years was the WW1?"
response, total_time = query_model_v2(prompt, max_new_tokens=32)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 12.8



**<font color='red'>Question:</font>** Between what years was the WW1?



**<font color='green'>Response:</font>** World War I lasted from **1914 to 1918**. 


In [14]:
prompt = "What year was the Lepanto battle?"
response, total_time = query_model_v2(prompt, max_new_tokens=32)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 11.86



**<font color='red'>Question:</font>** What year was the Lepanto battle?



**<font color='green'>Response:</font>** The Battle of Lepanto took place in **1571**. 


In [15]:
prompt = "What happened in 1868 in Japan?"
response, total_time = query_model_v2(prompt, max_new_tokens=64)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 36.62



**<font color='red'>Question:</font>** What happened in 1868 in Japan?



**<font color='green'>Response:</font>** 1868 was a pivotal year in Japanese history, marking the end of the Edo period and the beginning of the Meiji Restoration. Here's a breakdown of the key events:

*   **The Boshin War (1868-1869):** This was a civil war between

Let's modify the query function to stop the generation after a maximum character number was reached.

## Add a custom stopping criteria

In [16]:
from transformers import StoppingCriteria, StoppingCriteriaList

class MaxCharLengthCriteria(StoppingCriteria):
    def __init__(self, tokenizer, max_chars, input_len):
        self.tokenizer = tokenizer
        self.max_chars = max_chars
        self.input_len = input_len

    def __call__(self, input_ids, scores, **kwargs):
        # Decode only the generated part
        gen_tokens = input_ids[:, self.input_len:]
        text = self.tokenizer.batch_decode(gen_tokens, skip_special_tokens=True)[0]
        return len(text) >= self.max_chars

def query_model_v3(prompt, max_chars=128, max_new_tokens=64):
    start_time = time()
    
    system_prompt = """
            You are a smart AI expert in aswering questions.
            Just answer to the point, do not elaborate.
            For example, if you are asked to provide a year, a name, a location,
            return just the information, without any other words.
            """
    messages = [
        {
            "role": "system",
            "content": [
                {"type": "text", "text": system_prompt}
            ],
            "role": "user",
            "content": [
                {"type": "text", "text": prompt}
            ]
        }
    ]
    
    inputs = processor.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt"
    ).to(model.device, dtype=model.dtype)

    # retrieve input length
    input_len = inputs["input_ids"].shape[-1]
    
    stopping_criteria = StoppingCriteriaList([
        MaxCharLengthCriteria(processor, max_chars=max_chars, input_len=input_len)
    ])

    outputs = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        stopping_criteria=stopping_criteria,
        disable_compile=True
    )

    text = processor.batch_decode(
        # use input length to filter only the response from the output
        outputs[:, input_len:],
        # skip special tokens
        skip_special_tokens=True,
        # cleanup tokenization spaces
        clean_up_tokenization_spaces=True
    )
    total_time = round(time() - start_time, 2)
    response = text[0]
    return response, total_time

In [17]:
prompt = "What happened in 1868 in Japan?"
response, total_time = query_model_v3(prompt, max_chars=128, max_new_tokens=32)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 18.6



**<font color='red'>Question:</font>** What happened in 1868 in Japan?



**<font color='green'>Response:</font>** 1868 was a pivotal year in Japanese history, marking the end of the Edo period and the beginning of the Meiji Restoration. Here'

In [18]:
prompt = "Who was the first American president?"
response, total_time = query_model_v3(prompt, max_new_tokens=32)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 17.5



**<font color='red'>Question:</font>** Who was the first American president?



**<font color='green'>Response:</font>** The first American president was **George Washington**. 

He served from 1789 to 1797.





## Let's ask some pop culture question

In [19]:
prompt = "In what novel the number 42 is important?"
response, total_time = query_model_v2(prompt, max_new_tokens=32)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 19.89



**<font color='red'>Question:</font>** In what novel the number 42 is important?



**<font color='green'>Response:</font>** The number 42 is famously important in Douglas Adams's science fiction comedy series, **The Hitchhiker's Guide to the Galaxy**. 



In [20]:
prompt = "Name the famous boyfriend of Yoko Ono."
response, total_time = query_model_v2(prompt, max_new_tokens=32)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 19.13



**<font color='red'>Question:</font>** Name the famous boyfriend of Yoko Ono.



**<font color='green'>Response:</font>** The famous boyfriend of Yoko Ono is **John Lennon**. 

They were married from 1969 to 1970 and a hugely

In [21]:
prompt = "Who was nicknamed 'The King' in music?"
response, total_time = query_model_v2(prompt, max_new_tokens=32)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 19.98



**<font color='red'>Question:</font>** Who was nicknamed 'The King' in music?



**<font color='green'>Response:</font>** There are many musicians who have been nicknamed "The King" in music, but the most famous and widely recognized is **Elvis Presley**. 

He earned the

In [22]:
prompt = "What actor played Sheldon in TBBT?"
response, total_time = query_model_v2(prompt, max_new_tokens=32)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 12.15



**<font color='red'>Question:</font>** What actor played Sheldon in TBBT?



**<font color='green'>Response:</font>** Jim Parsons played Sheldon Cooper in The Big Bang Theory (TBBT). 


In [23]:
prompt = "What acctress from `The Friends` married Brad Pitt?"
response, total_time = query_model_v2(prompt, max_new_tokens=16)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 12.48



**<font color='red'>Question:</font>** What acctress from `The Friends` married Brad Pitt?



**<font color='green'>Response:</font>** This is a trick question! There is no actress from "The Friends" who

## Math questions

In [24]:
prompt = "34 + 21"
response, total_time = query_model_v2(prompt, max_new_tokens=16)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 8.53



**<font color='red'>Question:</font>** 34 + 21



**<font color='green'>Response:</font>** 34 + 21 = 55


In [25]:
prompt = "49 x 27"
response, total_time = query_model_v2(prompt, max_new_tokens=16)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 10.56



**<font color='red'>Question:</font>** 49 x 27



**<font color='green'>Response:</font>** 49 x 27 = 1323

Here's

In [26]:
prompt = "Brian and Sarah are brothers. Brian is 5yo, Sarah is 6 years older. How old is Sarah?"
response, total_time = query_model_v2(prompt, max_new_tokens=32)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 22.33



**<font color='red'>Question:</font>** Brian and Sarah are brothers. Brian is 5yo, Sarah is 6 years older. How old is Sarah?



**<font color='green'>Response:</font>** Sarah is 6 years old. 

Since Brian is 5 and Sarah is 6 years older, Sarah is 5 + 6 = 1

In [27]:
prompt = "x + 2 y = 5; y - x = 1. What are x and y? Just return x and y."
response, total_time = query_model_v2(prompt, max_new_tokens=64)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 11.37



**<font color='red'>Question:</font>** x + 2 y = 5; y - x = 1. What are x and y? Just return x and y.



**<font color='green'>Response:</font>** x = 1
y = 2

In [28]:
prompt = "What is the total area of a sphere or radius 3? Just return the result."
response, total_time = query_model_v2(prompt, max_new_tokens=64)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 15.52



**<font color='red'>Question:</font>** What is the total area of a sphere or radius 3? Just return the result.



**<font color='green'>Response:</font>** 113.09733552923255


In [29]:
prompt = "A rectangle with diagonal 4 is circumscribed by a circle. What is the circle's area?"
response, total_time = query_model_v2(prompt, max_new_tokens=200)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 112.66



**<font color='red'>Question:</font>** A rectangle with diagonal 4 is circumscribed by a circle. What is the circle's area?



**<font color='green'>Response:</font>** Let the rectangle be $ABCD$, with sides $AB = x$ and $BC = y$. Since the rectangle is circumscribed by a circle, the diameter of the circle is equal to the length of the diagonal of the rectangle.
We are given that the diagonal of the rectangle is 4, so the diameter of the circle is 4. Thus, the radius of the circle is $r = \frac{4}{2} = 2$.
The area of the circle is given by the formula $A = \pi r^2$.
Since $r=2$, the area of the circle is $A = \pi (2^2) = 4\pi$.

Now, we write out the final answer.
The diagonal of the rectangle is 4, so the diameter of the circumscribed circle is 4.
Therefore, the radius of the circle is $r = \frac{4}{2} = 2$.
The area

## Multiple languages

In [30]:
#Romanian
prompt = "Cine este Mircea Cartarescu?"
response, total_time = query_model_v2(prompt, max_new_tokens=128)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 72.07



**<font color='red'>Question:</font>** Cine este Mircea Cartarescu?



**<font color='green'>Response:</font>** Mircea Cartarescu este unul dintre cei mai importanți și influenți scriitori români contemporani. Este recunoscut pentru stilul său unic, neconvențional, complex și poetic, care amestecă elemente de fantastic, suprarealism, postmodernism și postmodernism. 

Iată câteva aspecte cheie despre Mircea Cartarescu:

* **Stil distinctiv:** Caracterizat de folosirea unui limbaj bogat, evocativ și adesea neașteptat, cu metafore și simboluri complexe. Stilul lui Cartarescu este greu de

In [31]:
#Albanian
prompt = "Kush ishte Ismail Kadare?"
response, total_time = query_model_v2(prompt, max_new_tokens=128)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 72.8



**<font color='red'>Question:</font>** Kush ishte Ismail Kadare?



**<font color='green'>Response:</font>** Ismail Kadare ishte një shkruar shqiptar, i njohur me thellësinë e tij filozofike, shkrimin e tij të shkëlqyer dhe përdorimin e elementeve të historisë, mitologjisë dhe politikanisë shqiptare. Ai konsiderohet një nga më të rëndësishmit dhe më të vlerësuar shkruarës në shqip të shekullit të 20 dhe të 21.

**Këtu janë disa pika të rëndësish

In [32]:
#Japanese
prompt = "夏目漱石とは誰ですか?"
response, total_time = query_model_v2(prompt, max_new_tokens=128)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 75.15



**<font color='red'>Question:</font>** 夏目漱石とは誰ですか?



**<font color='green'>Response:</font>** 夏目漱石（なつめ そうせき、1867年9月19日 – 1916年4月29日）は、日本の近代文学を代表する作家です。明治時代から大正時代にかけて活躍し、日本の文学史に多大な影響を与えました。

**主な特徴と業績:**

*   **近代日本の文学の確立:** 西洋文学の影響を受けつつ、日本独自の文化や精神性を反映した作品を数多く残しました。
*   **多様な作品:** 小説、評論、随筆など、幅広いジャンルの作品を手

In [33]:
#Chinese
prompt = "马拉多纳是谁?"
response, total_time = query_model_v2(prompt, max_new_tokens=128)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 70.98



**<font color='red'>Question:</font>** 马拉多纳是谁?



**<font color='green'>Response:</font>** 迭戈·马拉多纳（Diego Armando Maradona，1960年10月30日－），是一位阿根廷足球运动员，被广泛认为是足球史上最伟大的球员之一。

以下是关于他的一些关键信息：

* **职业生涯：** 马拉多纳在职业生涯中效力于多个俱乐部，包括：
    * **博卡 Juniors (Boca Juniors):** 从年轻时就加入，成为俱乐部历史上最伟大的球员之一。
    * **拿破仑 (Napoli):** 效力于意大利球队，并带领他们赢得了意大利杯

In [34]:
#French
prompt = "Qui était Marguerite Yourcenar?"
response, total_time = query_model_v2(prompt, max_new_tokens=128)
display(Markdown(colorize_text(f"Execution time: {total_time}\n\nQuestion: {prompt}\n\nResponse: {response}")))



**<font color='magenta'>Execution time:</font>** 71.92



**<font color='red'>Question:</font>** Qui était Marguerite Yourcenar?



**<font color='green'>Response:</font>** Marguerite Yourcenar (1900-1984) était une écrivaine française majeure, reconnue mondialement pour son style raffiné, son intelligence et son exploration profonde de l'histoire et de la condition humaine. Elle est considérée comme l'une des plus grandes écrivaines du XXe siècle. Voici un résumé de sa vie et de son œuvre :

**Vie et parcours:**

* **Origines et éducation:** Née à Nancy, en Lorraine, en 1900, elle a reçu une éducation privilégiée et a été influencée par les idées des écrivains et

# Conclusions


Preliminary conclusion after testing the model with:
* History questions  
* Pop culture  
* Math (arithmetics, algebra, geometry)
* Multiple languages.
  
is that the model is performing reasonably well with easy and medium-level questions.

**Good points**:
- When prompted to answer to the point, the model tend to behave well.
- Math seems to be accurate.
- Language capability is extensive.

**Areas to improve**:
- Modify the output to stop at the end of a phrase.