# Create an Agent with Gemma 2 from Scratch

In this notebook, you will learn to build a functional agent using Gemma 2, a large language model proposed by Google. This comprehensive guide walks through the essential steps and provides hands-on examples to illustrate key concepts. Whether you are a beginner or an experienced developer, this notebook offers valuable insights into agent creation, from setting up the environment to implementing advanced features. The practical approach ensures that you gain a solid understanding of Gemma 2 and its applications in developing intelligent agents.

## References
* https://learn.deeplearning.ai/courses/ai-agents-in-langgraph/lesson/2/build-an-agent-from-scratch
* [ReAct: Synergizing Reasoning and Acting in Language Models](https://react-lm.github.io)

## Setup and import Libraries

In [None]:
# !pip install --upgrade transformers

In [1]:
import transformers
from tqdm.notebook import tqdm
import pandas as pd
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import numpy as np
import pickle
from huggingface_hub import snapshot_download, login
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)
from kaggle_secrets import UserSecretsClient
print(f"Transformers Version: {transformers.__version__}")
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"

Transformers Version: 4.45.1


## Load model

In [2]:
user_secrets = UserSecretsClient()
HUGGING_FACE_TOKEN = user_secrets.get_secret("HUGGING_FACE_TOKEN")
login(token=HUGGING_FACE_TOKEN, add_to_git_credential=True)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16
)
print(model)

Token is valid (permission: fineGrained).
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub.
Run the following command in your terminal in case you want to set the 'store' credential helper as default.

git config --global credential.helper store

Read https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage for more details.[0m
Token has not been saved to git credential helper.
Your token has been saved to /root/.cache/huggingface/token
Login successful


tokenizer_config.json:   0%|          | 0.00/47.0k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/857 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/39.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.96G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/3.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/173 [00:00<?, ?B/s]

Gemma2ForCausalLM(
  (model): Gemma2Model(
    (embed_tokens): Embedding(256000, 3584, padding_idx=0)
    (layers): ModuleList(
      (0-41): 42 x Gemma2DecoderLayer(
        (self_attn): Gemma2Attention(
          (q_proj): Linear(in_features=3584, out_features=4096, bias=False)
          (k_proj): Linear(in_features=3584, out_features=2048, bias=False)
          (v_proj): Linear(in_features=3584, out_features=2048, bias=False)
          (o_proj): Linear(in_features=4096, out_features=3584, bias=False)
          (rotary_emb): Gemma2RotaryEmbedding()
        )
        (mlp): Gemma2MLP(
          (gate_proj): Linear(in_features=3584, out_features=14336, bias=False)
          (up_proj): Linear(in_features=3584, out_features=14336, bias=False)
          (down_proj): Linear(in_features=14336, out_features=3584, bias=False)
          (act_fn): PytorchGELUTanh()
        )
        (input_layernorm): Gemma2RMSNorm((3584,), eps=1e-06)
        (pre_feedforward_layernorm): Gemma2RMSNorm((3584,), 

In [None]:
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")

## Usage

In [3]:
messages = [
    { "role": "user", "content": "Write a hello world program" },
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
outputs = model.generate(input_ids=input_ids, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

The 'max_batch_size' argument of HybridCache is deprecated and will be removed in v4.46. Use the more precisely named 'batch_size' argument instead.
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


<bos><start_of_turn>user
Write a hello world program<end_of_turn>
<start_of_turn>model
```python
print("Hello, world!")
```

This program will print the text "Hello, world!" to the console. 

Here's how it works:

* **`print()`** is a built-in function in Python that displays output to the console.
* **`"Hello, world!"`** is a string literal, which is a sequence of characters enclosed in double quotes. This is the text that will be printed.


Let me know if you'd like to see this program in a different programming language!<end_of_turn>
<eos>


## Implement a function that could chat with Gemma 2.

In [4]:
history = []
def chat(prompt):
    history.append({ "role": "user", "content": prompt})
    prompt = tokenizer.apply_chat_template(history, tokenize=False, add_generation_prompt=True)
    input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
    outputs = model.generate(input_ids=input_ids, max_new_tokens=512)
    content = tokenizer.decode(outputs[0])
    content = content.split("<start_of_turn>model")[-1].replace("<end_of_turn>", "").replace("<eos>", "").strip("\n")
    message = {"role": "system", "content": content}
    history.append(message)
    return message

In [5]:
from IPython.display import Markdown
message = chat("Hi, what is your name?")
display(Markdown(message["content"]))

My name is Gemma. I am an open-weights AI assistant.

In [10]:
message = chat("My name is Lonnie, nice to meet you.")
display(Markdown(message["content"]))

It's nice to meet you too, Lonnie!  

Is there anything else I can help you with today?

In [7]:
message = chat("Implement a hello world function using javascript.")
display(Markdown(message["content"]))

```javascript
function helloWorld() {
  console.log("Hello, world!");
}

helloWorld(); 
```

**Explanation:**

* **`function helloWorld() { ... }`**: This defines a function named `helloWorld`. Functions are blocks of code that perform a specific task.
* **`console.log("Hello, world!");`**: This line is inside the function. It uses `console.log` to print the text "Hello, world!" to the console (which is where you'll see the output in your browser's developer tools or in a terminal if you're running this code).
* **`helloWorld();`**: This line *calls* the `helloWorld` function, which executes the code inside it and prints the message.

**To run this code:**

1. **Save it as a `.js` file** (e.g., `hello.js`).
2. **Open your browser's developer console:** Usually, you can do this by pressing F12.
3. **Go to the "Console" tab.**
4. **Drag and drop the `.js` file into the console** or use the `console.log()` function to print the output.



Let me know if you have any other questions!

In [8]:
message = chat("How about implementing a welcome page in StreamLit?")
display(Markdown(message["content"]))

```python
import streamlit as st

st.title("Welcome to My Streamlit App!")

st.write("This is a simple welcome page.")

st.write("Explore more features by clicking the navigation bar.")
```

**Explanation:**

1. **Import Streamlit:** `import streamlit as st` imports the Streamlit library and gives it the alias `st` for easier use.

2. **Set the Title:** `st.title("Welcome to My Streamlit App!")` creates a large, centered title for your page.

3. **Add Text:** `st.write("This is a simple welcome page.")` displays a paragraph of text. You can use `st.write` for various data types like strings, lists, and even code.

4. **Add a Call to Action:** `st.write("Explore more features by clicking the navigation bar.")` encourages users to explore further.

**To run this code:**

1. **Make sure you have Streamlit installed:** `pip install streamlit`
2. **Save the code as a `.py` file** (e.g., `app.py`).
3. **Run the file from your terminal:** `streamlit run app.py`
4. **Open your web browser** and go to the address shown in the terminal (usually `http://localhost:8501`).

This will open your Streamlit app with the welcome page.



Let me know if you have any other questions or want to add more elements to your welcome page!

## Implement a React Agent

In [9]:
class Agent:
    def __init__(self, system=""):
        self.system = system
        self.messages = []
        if self.system:
            self.__call__(system)

    def __call__(self, message):
        self.messages.append({"role": "user", "content": message})
        result = self.execute()
        self.messages.append({"role": "assistant", "content": result})
        return result

    def execute(self):
        prompt = tokenizer.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
        input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
        outputs = model.generate(input_ids=input_ids, max_new_tokens=512)
        content = tokenizer.decode(outputs[0])
        result = content.split("<start_of_turn>model")[-1].replace("<end_of_turn>", "").replace("<eos>", "").strip("\n").strip(" ")
        return result

In [11]:
prompt = """
You run in a loop of Thought, Action, PAUSE, Observation.
At the end of the loop you output an Answer
Use Thought to describe your thoughts about the question you have been asked.
Use Action to run one of the actions available to you - then return PAUSE.
Observation will be the result of running those actions.

Your available actions are:

calculate:
e.g. calculate: 4 * 7 / 3
Runs a calculation and returns the number - uses Python so be sure to use floating point syntax if necessary

average_dog_weight:
e.g. average_dog_weight: Collie
returns average weight of a dog when given the breed

Example session:

Question: How much does a Bulldog weigh?
Thought: I should look the dogs weight using average_dog_weight
Action: average_dog_weight: Bulldog
PAUSE

You will be called again with this:

Observation: A Bulldog weights 51 lbs

You then output:

Answer: A bulldog weights 51 lbs
""".strip()

In [12]:
def calculate(what):
    return eval(what)

def average_dog_weight(name):
    if name in "Scottish Terrier": 
        return("Scottish Terriers average 20 lbs")
    elif name in "Border Collie":
        return("a Border Collies average weight is 37 lbs")
    elif name in "Toy Poodle":
        return("a toy poodles average weight is 7 lbs")
    else:
        return("An average dog weights 50 lbs")

known_actions = {
    "calculate": calculate,
    "average_dog_weight": average_dog_weight
}

In [14]:
agent = Agent(prompt)
result = agent("How much does a toy poodle weigh?")
print(result)

Thought: I should look up the average weight of a toy poodle using the average_dog_weight action. 
Action: average_dog_weight: Toy Poodle
PAUSE


In [15]:
result = average_dog_weight("Toy Poodle")
print(result)

a toy poodles average weight is 7 lbs


In [16]:
next_prompt = "Observation: {}".format(result)
agent(next_prompt)

"Answer: A toy poodle's average weight is 7 lbs."

In [17]:
display(agent.messages)

[{'role': 'user',
  'content': 'You run in a loop of Thought, Action, PAUSE, Observation.\nAt the end of the loop you output an Answer\nUse Thought to describe your thoughts about the question you have been asked.\nUse Action to run one of the actions available to you - then return PAUSE.\nObservation will be the result of running those actions.\n\nYour available actions are:\n\ncalculate:\ne.g. calculate: 4 * 7 / 3\nRuns a calculation and returns the number - uses Python so be sure to use floating point syntax if necessary\n\naverage_dog_weight:\ne.g. average_dog_weight: Collie\nreturns average weight of a dog when given the breed\n\nExample session:\n\nQuestion: How much does a Bulldog weigh?\nThought: I should look the dogs weight using average_dog_weight\nAction: average_dog_weight: Bulldog\nPAUSE\n\nYou will be called again with this:\n\nObservation: A Bulldog weights 51 lbs\n\nYou then output:\n\nAnswer: A bulldog weights 51 lbs'},
 {'role': 'assistant',
  'content': "Okay, I und

In [18]:
agent = Agent(prompt)

In [19]:
question = """I have 2 dogs, a border collie and a scottish terrier. \
What is their combined weight"""
print(agent(question))

Thought: I need to find the average weight of each breed and then add them together. 
Action: average_dog_weight: Border Collie
PAUSE


In [20]:
next_prompt = "Observation: {}".format(average_dog_weight("Border Collie"))
print(next_prompt)

Observation: a Border Collies average weight is 37 lbs


In [21]:
print(agent(next_prompt))

Thought: Now I need the average weight of a Scottish Terrier and then add it to the Border Collie's weight.
Action: average_dog_weight: Scottish Terrier
PAUSE


In [22]:
next_prompt = "Observation: {}".format(eval("37 + 20"))
print(next_prompt)

Observation: 57


In [23]:
agent(next_prompt)

'Thought:  Okay, now I add the two weights together.\nAction: calculate: 37 + 57\nPAUSE'

## Action

In [24]:
import re
action_re = re.compile('^Action: (\w+): (.*)$')   # python regular expression to selection action
def query(question, max_turns=5):
    i = 0
    bot = Agent(prompt)
    next_prompt = question
    while i < max_turns:
        i += 1
        result = bot(next_prompt)
        print(result)
        actions = [
            action_re.match(a) 
            for a in result.split('\n') 
            if action_re.match(a)
        ]
        if actions:
            # There is an action to run
            action, action_input = actions[0].groups()
            if action not in known_actions:
                raise Exception("Unknown action: {}: {}".format(action, action_input))
            print(" -- running {} {}".format(action, action_input))
            observation = known_actions[action](action_input)
            print("Observation:", observation)
            next_prompt = "Observation: {}".format(observation)
        else:
            return

In [25]:
question = """I have 2 dogs, a border collie and a scottish terrier. \
What is their combined weight"""
query(question)

Thought: I need to find the average weight of each breed and then add them together. 
Action: average_dog_weight: Border Collie
PAUSE
 -- running average_dog_weight Border Collie
Observation: a Border Collies average weight is 37 lbs
Thought: Now I need the average weight of a Scottish Terrier and then add it to the Border Collie's weight.
Action: average_dog_weight: Scottish Terrier
PAUSE
 -- running average_dog_weight Scottish Terrier
Observation: Scottish Terriers average 20 lbs
Thought:  Time to add the weights together.
Action: calculate: 37 + 20 
PAUSE
 -- running calculate 37 + 20 
Observation: 57
Answer: The combined weight of a Border Collie and a Scottish Terrier is 57 lbs.


In [26]:
question = """Calculate 3 * 88 + 99"""
query(question)

Thought: To get the answer, I need to multiply 3 by 88 and then add 99. 

Action: calculate: 3 * 88 + 99
PAUSE
 -- running calculate 3 * 88 + 99
Observation: 363
Answer: 363


In [28]:
question = """Do you know the result of 99 * 88 + 1289?"""
query(question)

Thought: I can calculate that using the 'calculate' action. 
Action: calculate: 99 * 88 + 1289
PAUSE
 -- running calculate 99 * 88 + 1289
Observation: 10001
Answer: 99 * 88 + 1289 = 10001
