# Python Code QA Using Gemma

<img src="https://miro.medium.com/v2/resize:fit:687/1*oUzj9cozTU4hPd4KfGB1Tw.png" width=200px alt="Gemma x Python">
</img>

# 1. Instroduction

<br><b>1. Information about gemma</b>
<br>Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone

<br><b>2. Task choose and introduction</b>
<br>In this competition, we were given the following five tasks, and according to various considerations, our team finally chose task.

* Answer common questions about the Python programming language.
* Explain or teach basic data science concepts.
* Summarize Kaggle solution write-ups.
* Explain or teach concepts from Kaggle competition solution write-ups.
* Answer common questions about the Kaggle platform.

Task 5 is an obvious natural language processing-based Q&A task, and this kind of task with some of the following characteristics:

a) Natural Language Understanding: NLQA systems are capable of parsing and understanding natural language used by humans. Through deep learning and natural language processing (NLP) technology, the system is able to parse the intent of the problem and extract key information.
<br><br>b) Knowledge Acquisition and Integration: NLQA systems need to be able to capture and integrate large amounts of knowledge. This can be achieved in a variety of ways, and the system also needs to have the ability to sift through and integrate information to ensure accurate and useful answers.
<br><br>c) Reasoning and Interpretation: Ability to reason and interpret to a certain extent. This means that they can analyze the deeper meaning of a question and generate a reasonable answer based on what is already known.
<br><br>d) Robustness and adaptability: The system should be robust and able to cope with the diversity and complexity of languages. At the same time, the system should be adaptable and able to be continuously updated and optimized as new knowledge emerges and languages evolve.
<br><br>e) Interactivity and User Experience: The ultimate goal of the NLQA system is to provide a good user experience.


# 2. Preparation

In [None]:
%pip install -U keras-nlp
%pip install -U keras>=3

In [None]:
import tensorflow
from tensorflow.python.client import device_lib
print(tensorflow.__version__)
# print(device_lib.list_local_devices())
print("Num GPUs Available: ", len(tensorflow.config.list_physical_devices('GPU')))

In [1]:
import os

os.environ["KERAS_BACKEND"] = "torch"  # Or "jax" or "torch" or "tensorflow".
# Avoid memory fragmentation on JAX backend.
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]="1.00"


#from datasets import load_dataset
from IPython.display import display, Markdown

In [None]:
import keras
import keras_nlp
import numpy as np

In [3]:
keras.utils.set_random_seed(516)

In [4]:
template = "Instruction:\n{instruction}\n\nResponse:\n"
template_training = "Instruction:\n{instruction}\n\nResponse:\n{response}"

In [6]:
def colorize_text(text):
    for word, color in zip(["Instruction"], ["blue"]):
        text = text.replace(f"{word}:\n", f"**<font color='{color}'>{word}:</font>**\n\n")
    for word, color in zip(["Response"], ["green"]):
        text = text.replace(f"\n\n{word}:", f"\n\n**<font color='{color}'>{word}:</font>**")
    return text


text_mock_input = """
mock_val = {mock_val}
def mock_input(prompt):
    print(prompt)
    print(mock_val)
    return mock_val
    
input = mock_input

"""

def execute_result_text(out, mock_input=None):
    python_code = out.split('```python')[1].split('```')[0]
    if mock_input is not None:
        python_code = text_mock_input.format(mock_val=mock_input) + python_code
    with open('sample.py', 'w') as f:
        f.write(python_code)

    print("# ================================")
    print("# Exec the generated code.")
    print("# ================================")
    print(python_code)
    print("# ================================")
    print("# Result")
    print("# ================================")

    !python sample.py
    !rm sample.py

def result_text(out):
    #rlist=out.split('\n')
    #responselist=rlist[4::1]
    responselist=out.find("Response")

    return responselist

# 3. Load Model

In [8]:
gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset("gemma_2b_en")

2024-03-29 09:50:14.730948: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-29 09:50:14.731219: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-29 09:50:14.769451: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-

In [9]:
gemma_lm.summary()

# 4. Evaluate Gemma using CODA-Bench samples

[Codal-Bench](https://huggingface.co/datasets/coseal/codal-bench)
<br><br>

<div align="center">
<img src="https://arxiv.org/html/2403.09032v1/x1.png" width="700" alt="Overview of CodeUltraFeedback">
</div>

### Example 1

Good Response for Python

In [12]:
instruction = "Write a Python function that takes a list of integers and returns a pair of elements which have the maximum product. For example, for [1, 5, 2, -7, 3] the correct answer is [5, 3] ."
prompt = template.format(instruction=instruction)

out = gemma_lm.generate(prompt, max_length=256)
display(Markdown(colorize_text(out)))

**<font color='blue'>Instruction:</font>**

Write a Python function that takes a list of integers and returns a pair of elements which have the maximum product. For example, for [1, 5, 2, -7, 3] the correct answer is [5, 3] .

**<font color='green'>Response:</font>**
def max_product(lst):
    max_product = 0
    for i in range(len(lst)):
        for j in range(i+1, len(lst)):
            product = lst[i] * lst[j]
            if product > max_product:
                max_product = product
    return max_product, lst[i], lst[j]


### Example 2

Good Response for Python

In [11]:
instruction = "Develop a Python code snippet to print the diamond structure with the specified number of rows. The program should follow the Fibonacci sequence for the number of characters per row and validate input to ensure it is an odd number."
prompt = template.format(instruction=instruction)

out = gemma_lm.generate(prompt, max_length=256)
display(Markdown(colorize_text(out)))

**<font color='blue'>Instruction:</font>**

Develop a Python code snippet to print the diamond structure with the specified number of rows. The program should follow the Fibonacci sequence for the number of characters per row and validate input to ensure it is an odd number.

**<font color='green'>Response:</font>**
import sys
n = int(sys.argv[1])
if n % 2 == 0:
    print("Invalid input")
else:
    for i in range(n):
        for j in range(n):
            if i == 0 or i == n - 1:
                print("*", end="")
            elif j == 0 or j == n - 1:
                print("*", end="")
            else:
                print(" ", end="")
        print()


### Example 3

Poor Response for other Programming Language

In [13]:
instruction = "This is a self-explanatory question:Why does this thing bubble into my try catch's even when nothing is wrong? "
prompt = template.format(instruction=instruction)

out = gemma_lm.generate(prompt, max_length=256)
display(Markdown(colorize_text(out)))

**<font color='blue'>Instruction:</font>**

This is a self-explanatory question:Why does this thing bubble into my try catch's even when nothing is wrong? 

**<font color='green'>Response:</font>**
This is a very common question. The answer is that the compiler is smart enough to know that the try catch block is not going to throw an exception. So it doesn't need to do anything special.

The try catch block is a block of code that is executed if an exception is thrown. The try catch block is a block of code that is executed if an exception is thrown. The try catch block is a block of code that is executed if an exception is thrown. The try catch block is a block of code that is executed if an exception is thrown. The try catch block is a block of code that is executed if an exception is thrown. The try catch block is a block of code that is executed if an exception is thrown. The try catch block is a block of code that is executed if an exception is thrown. The try catch block is a block of code that is executed if an exception is thrown. The try catch block is a block of code that is executed if an exception is thrown. The try catch block is a block of code that is executed if an exception is thrown

### Example 4

Poor Response for other Programming Language

In [15]:
instruction = "Can you determine which number, when substituted as decimeters, will accurately convert the measurement of 26.1 millimeters? A) 0.261B) 261C) 2.61D) 2,610"
prompt = template.format(instruction=instruction)

out = gemma_lm.generate(prompt, max_length=256)
display(Markdown(colorize_text(out)))

**<font color='blue'>Instruction:</font>**

Can you determine which number, when substituted as decimeters, will accurately convert the measurement of 26.1 millimeters? A) 0.261B) 261C) 2.61D) 2,610

**<font color='green'>Response:</font>**
C) 2.61

Explanation:
The answer is C.
26.1 millimeters is equal to 2.61 decimeters.
26.1 millimeters = 2.61 decimeters
26.1 millimeters = 26.1 millimeters
26.1 millimeters = 26.1 millimeters
26.1 millimeters = 26.1 millimeters
26.1 millimeters = 26.1 millimeters
26.1 millimeters = 26.1 millimeters
26.1 millimeters = 26.1 millimeters
26.1 millimeters = 26.1 millimeters
26.1 millimeters = 26.1 millimeters
26.1 millimeters = 26.1 millimeters
26.1 millimeters = 26.1 millimeters
26.1 millimeters = 26.1 millimeters
26.

<br>

### Analysis of results
There are a variety of instrctions in the dataset, including not only python, but also sql, java, and other related conding instrctions.
<br>In analyzing the results, we noticed that among the various types of instrctions.
<br>The instrctions related to python had a good response. As we gave in Example 1 and 2. 
<br>On the contrary, there are poor answers to other programming questions, as we have given in Example 3 and 4.
<br>For poor responses, the same sentence is output in a loop.

### Future Research
Each programming languge has unique grammar. Training data is composed on each one.
<br>In Fine-tuning phase, domain shift is main problem when training data is chainged.
<br>Generalization based on domain-agnostic tuning should be studied and devleoped.

<br>

# Reference
- [Keras NLP Gemma](https://keras.io/api/keras_nlp/models/gemma/)
- [Kaggle QA with Gemma - KerasNLP Starter](https://www.kaggle.com/code/awsaf49/kaggle-qa-with-gemma-kerasnlp-starter/notebook#Google-%E2%80%93-AI-Assistants-for-Data-Tasks-with-Gemma-with-KerasNLP-and-Keras)
- [🐍🤖Make A Smart Python Assistant with Gemma (INOICHAN)](https://www.kaggle.com/code/inoueu1/make-a-smart-python-assistant-with-gemma)
- [CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences](https://arxiv.org/abs/2403.09032)