# Dummy Agent Library

Prerequesites:
- create a token in hugging face website with read permissions
- Request access to the model by following Meta's access requirements
- create `.env` file with the token


First testing the text generation (other api may use `complete`) and chat feature


In [10]:
import os
from dotenv import load_dotenv
from huggingface_hub import login
from huggingface_hub import InferenceClient

In [13]:
load_dotenv()
# login to the Hugging Face Hub
login(token=os.getenv("HF_TOKEN"))
client = InferenceClient("meta-llama/Llama-3.2-3B-Instruct")

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


In [14]:
output = client.text_generation(
    "The capital of France is",
    max_new_tokens=100,
)

print(output)

 Paris. The capital of Italy is Rome. The capital of Spain is Madrid. The capital of Germany is Berlin. The capital of the United Kingdom is London. The capital of Australia is Canberra. The capital of China is Beijing. The capital of Japan is Tokyo. The capital of India is New Delhi. The capital of Brazil is Brasília. The capital of Russia is Moscow. The capital of South Africa is Pretoria. The capital of Egypt is Cairo. The capital of Turkey is Ankara. The


Note that the token generation continues until reaching the token limit of 100. This occurs because the model only stops when it predicts an `EOS` (End of Sequence) token.

However, when the text template is applied using `template`, it will stop naturally as in a normal chat conversation.

In [15]:
prompt = """
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
The capital of France is<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

output = client.text_generation(
    prompt,
    max_new_tokens=100,
)

print(output)

...Paris!


We can directly use chat function and is recommended to use in order to ensure a smooth transition between models.

In [23]:
output = client.chat.completions.create(
    messages=[
        {"role": "user", "content": "The capital of France is"},
    ],
    stream=False,
    max_tokens=100,
)

print(output.choices[0].message.content)

Paris.


# Dummy agent
 
The agent's system prompt should encompass two critical components:

1. **Tool Specifications**
    - Detailed descriptions of available tools
    - Input/output formats
    - Usage constraints and requirements

2. **Reasoning Framework**
    - Thought: Analyze the task and plan the approach
    - Action: Execute the chosen tool or response
    - Observation: Process the results and determine next steps

This structured approach ensures consistent and effective agent responses.

In [24]:
SYSTEM_PROMPT = '''
    Answer the following questions as best you can. You have access to the following tools:

    get_weather: Get the current weather in a given location

    The way you use the tools is by specifying a json blob.
    Specifically, this json should have an `action` key (with the name of the tool to use) and an `action_input` key (with the input to the tool going here).

    The only values that should be in the "action" field are:
    get_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}
    example use : 

    {{
      "action": "get_weather",
      "action_input": {"location": "New York"}
    }}

    ALWAYS use the following format:

    Question: the input question you must answer
    Thought: you should always think about one action to take. Only one action at a time in this format:
    Action:

    $JSON_BLOB (inside markdown cell)

    Observation: the result of the action. This Observation is unique, complete, and the source of truth.
    ... (this Thought/Action/Observation can repeat N times, you should take several steps when needed. The $JSON_BLOB must be formatted as markdown and only use a SINGLE action at a time.)

    You must always end your output with the following format:

    Thought: I now know the final answer
    Final Answer: the final answer to the original input question

    Now begin! Reminder to ALWAYS use the exact characters `Final Answer:` when you provide a definitive answer.
'''

In [None]:
prompt=f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{SYSTEM_PROMPT}
<|eot_id|><|start_header_id|>user<|end_header_id|>
What's the weather in London ?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""


In [25]:
message = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "What is the weather in London?"}
]

In [None]:
messages=[
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "What's the weather in London ?"},
    ]
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct", token=os.getenv("HF_TOKEN"))

tokenizer.apply_chat_template(messages, tokenize=False,add_generation_prompt=True)