# Dummy Agent Library

This notebook will follow the Hugging Face notebook [here](https://huggingface.co/agents-course/notebooks/blob/main/dummy_agent_library.ipynb) except that we will use local models rather than huggingface hub.

In [None]:
import os
import json
import requests

In [None]:
!ollama list

## LLM Chat

In [None]:
OLLAMA_HOST = "http://localhost:11434"
MODEL_NAME = "llama3.2:latest"

In [None]:
class OllamaClient:
    def __init__(self, model_name, host=OLLAMA_HOST):
        self.model_name = model_name
        self.host = host
        self.chat = ChatCompletions(self)
    
    def text_generation(self, prompt, max_new_tokens=100, stop=None):
        """Generate text using the Ollama API"""
        url = f"{self.host}/api/generate"
        
        payload = {
            "model": self.model_name,
            "prompt": prompt,
            "stream": False,
            "options": {
                "num_predict": max_new_tokens
            }
        }
        
        if stop:
            payload["options"]["stop"] = stop
        
        response = requests.post(url, json=payload)
        response.raise_for_status()
        data = response.json()
        return data["response"]

class ChatCompletions:
    def __init__(self, client):
        self.client = client
        self.completions = self
    
    def create(self, messages, stream=False, max_tokens=1024):
        """Generate chat completions using the Ollama API"""
        url = f"{self.client.host}/api/chat"
        
        payload = {
            "model": self.client.model_name,
            "messages": messages,
            "stream": stream,
            "options": {
                "num_predict": max_tokens
            }
        }
        
        response = requests.post(url, json=payload)
        response.raise_for_status()
        data = response.json()
        
        # Create a response object in a similar format to OpenAI/HF
        return CompletionResponse(data)

class CompletionResponse:
    def __init__(self, data):
        self.choices = [Choice(data)]

class Choice:
    def __init__(self, data):
        self.message = Message(data["message"]["content"])

class Message:
    def __init__(self, content):
        self.content = content

In [None]:
# Create Ollama client
client = OllamaClient(MODEL_NAME)

In [None]:
# Basic text generation
output = client.text_generation(
    "The capital of france is",
    max_new_tokens=100,
)

print(output)

In [None]:
# Using the Llama 3 chat format
prompt = """<|begin_of_text|><|start_header_id|>user<|end_header_id|>

The capital of france is<|eot_id|><|start_header_id|>assistant<|end_header_id|>

"""
output = client.text_generation(
    prompt,
    max_new_tokens=100,
)

print(output)

In [None]:
# Using the chat completions API
output = client.chat.completions.create(
    messages=[
        {"role": "user", "content": "The capital of france is"},
    ],
    stream=False,
    max_tokens=1024,
)

print(output.choices[0].message.content)

## Dummy Agent

In the previous sections, we saw that the core of an agent library is to append information in the system prompt. (Note that some of this is baked in Ollama which explains the identical outputs)

This system prompt is a bit more complex than the one we saw earlier, but it already contains:

1. Information about the tools
2. Cycle instructions (Thought → Action → Observation)

In [None]:
# System prompt with function calling example
SYSTEM_PROMPT = """Answer the following questions as best you can. You have access to the following tools:

get_weather: Get the current weather in a given location

The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are:
get_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}
example use :
```
{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about one action to take. Only one action at a time in this format:
Action:
```
$JSON_BLOB
```
Observation: the result of the action. This Observation is unique, complete, and the source of truth.
... (this Thought/Action/Observation can repeat N times, you should take several steps when needed. The $JSON_BLOB must be formatted as markdown and only use a SINGLE action at a time.)

You must always end your output with the following format:

Thought: I now know the final answer
Final Answer: the final answer to the original input question

Now begin! Reminder to ALWAYS use the exact characters `Final Answer:` when you provide a definitive answer. """

In [None]:
# Format with Llama 3.2 special tokens
prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{SYSTEM_PROMPT}
<|eot_id|><|start_header_id|>user<|end_header_id|>
What's the weather in London ?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

This is equivalent to the following code that happens inside the chat method :

```python
messages=[
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "What's the weather in London ?"},
]
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")

tokenizer.apply_chat_template(messages, tokenize=False,add_generation_prompt=True)
```

The prompt is now:

In [None]:
print(prompt)

In [None]:
# Generate text but stop before the function would be called
output = client.text_generation(
    prompt,
    max_new_tokens=200,
    stop=["Observation:"]
)

print(output)

Let's now create a dummy get weather function. In real situation you could call an API.

In [None]:
# Dummy function
def get_weather(location):
    return f"the weather in {location} is sunny with low temperatures. \n"

get_weather('London')

In [None]:
# Execute function and continue generation
location_request = output.strip()
# Extract location from the model output if needed
# This depends on the exact output format from your model
# You might need to adjust this parsing logic

# Combine the original prompt, model output and function result
new_prompt = prompt + output + "Observation: " + get_weather('London')
print(new_prompt)

In [None]:
# Continue generation with the observation
final_output = client.text_generation(
    new_prompt,
    max_new_tokens=200,
)

print(final_output)