## Serverless API
In the Hugging Face ecosystem, there is a convenient feature called Serverless API that allows you to easily run inference on many models. There's no installation or deployment required.

To run this notebook, you need a Hugging Face token that you can get from https://hf.co/settings/tokens. If you are running this notebook on Google Colab, you can set it up in the "settings" tab under "secrets". Make sure to call it "HF_TOKEN".

You also need to request access to the Meta Llama models, if you haven't done it before. Approval usually takes up to an hour.

In [None]:
import os
from huggingface_hub import InferenceClient

# os.environ["HF_TOKEN"]="hf_xxxxxxxxxxx"

client = InferenceClient("meta-llama/Llama-3.2-3B-Instruct")

In [2]:
# As seen in the LLM section, if we just do decoding, **the model will only stop when it predicts an EOS token**, 
# and this does not happen here because this is a conversational (chat) model and we didn't apply the chat template it expects.
output = client.text_generation(
    "The capital of france is",
    max_new_tokens=100,
)

print(output)

 a great way to get started with the basics of photography. It's a great way to learn about composition, lighting, and other fundamental concepts of photography.

Here are some tips to get you started with photography:

1.  **Understand your camera**: Familiarize yourself with your camera's settings and features. Read the manual or online resources to learn about the different modes, such as manual, aperture priority, and shutter priority.
2.  **Practice, practice, practice**: The more you


In [4]:
# If we now add the special tokens related to Llama3.2 model, the behaviour changes and is now the expected oen.
prompt="""<|begin_of_text|><|start_header_id|>user<|end_header_id|>

The capital of france is<|eot_id|><|start_header_id|>assistant<|end_header_id|>

"""
output = client.text_generation(
    prompt,
    max_new_tokens=100,
)

print(output)


...Paris!


In [6]:
output = client.chat.completions.create(
    messages=[
        {"role": "user", "content": "The capital of france is"},
    ],
    stream=False,
    max_tokens=1024,
)
print(output.choices[0].message.content)


Paris.


The chat method is the RECOMMENDED method to use in order to ensure a smooth transition between models but since this notebook is only educational, we will keep using the "text_generation" method to understand the details.

## Dummy Agent
In the previous sections, we saw that the core of an agent library is to append information in the system prompt.

This system prompt is a bit more complex than the one we saw earlier, but it already contains:

1. Information about the tools
2. Cycle instructions (Thought → Action → Observation)

In [20]:
# This system prompt is a bit more complex and actually contains the function description already appended.
# Here we suppose that the textual description of the tools have already been appended
SYSTEM_PROMPT = """Answer the following questions as best you can. You have access to the following tools:

get_weather: Get the current weather in a given location

The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are:
get_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}
example use :
```
{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about one action to take. Only one action at a time in this format:
Action:
```
$JSON_BLOB
```
Observation: the result of the action. This Observation is unique, complete, and the source of truth.
... (this Thought/Action/Observation can repeat N times, you should take several steps when needed. The $JSON_BLOB must be formatted as markdown and only use a SINGLE action at a time.)

You must always end your output with the following format:

Thought: I now know the final answer
Final Answer: the final answer to the original input question

Now begin! Reminder to ALWAYS use the exact characters `Final Answer:` when you provide a definitive answer. """

In [21]:
# Since we are running the "text_generation", we need to add the right special tokens.
prompt=f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{SYSTEM_PROMPT}
<|eot_id|><|start_header_id|>user<|end_header_id|>
What's the weather in London ?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

This is equivalent to the following code that happens inside the chat method :

```
messages=[
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "What's the weather in London ?"},
]
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")

tokenizer.apply_chat_template(messages, tokenize=False,add_generation_prompt=True)
```

The prompt is now:

In [22]:
print(prompt)


<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Answer the following questions as best you can. You have access to the following tools:

get_weather: Get the current weather in a given location

The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are:
get_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}
example use :
```
{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about one action to take. Only one action at a time in this format:
Action:
```
$JSON_BLOB
```
Observation: the result of the action. This Observation is unique, complete, and the source of truth.
... (this Thought/Action/

In [23]:
# Do you see the problem?
output = client.text_generation(
    prompt,
    max_new_tokens=200,
)

print(output)

 
 1.  **Identify the problem**: The problem is that the text is not providing a clear answer to the question. It's more focused on the process of creating a website.
 2.  **Provide a clear answer**: The answer to the question "can you write a html code for a simple website" is yes, and here is a simple html code for a website:
 
  ```html
<!DOCTYPE html>
<html>
<head>
    <title>Simple Website</title>
</head>
<body>
    <h1>Welcome to my website</h1>
    <p>This is a simple website created using HTML</p>
</body>
</html>
```
 3.  **Explain the code**: This HTML code creates a simple website with a title, a heading, and a paragraph of text. The `<!DOCTYPE html>` declaration tells the browser that this is an HTML document. The `<html>` element is the root element of the document


In [24]:
# The answer was hallucinated by the model. We need to stop to actually execute the function!
output = client.text_generation(
    prompt,
    max_new_tokens=200,
    stop=["Observation:"] # Let's stop before any actual function is called
)

print(output)

 
defence decisions about the content of your posts. Here's an example of how you could rephrase the question to focus on the topic of university campuses and the potential for cyber threats:

"How can universities in India prevent cyber threats on their campuses, such as exploring ways to detect and prevent cyber attacks, and how can they educate students and faculty about online safety and cyber threats, and cyber attacks, and online safety, and cyber security, and online safety, and cyber threats, and cyber attacks, and online safety, and cyber security, and online safety, and cyber attacks, and cyber threats, and cyber attacks, and cyber threats, and cyber attacks, and cyber threats, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks,

In [25]:
# Dummy function
def get_weather(location):
    return f"the weather in {location} is sunny with low temperatures. \n"

get_weather('London')

'the weather in London is sunny with low temperatures. \n'

In [26]:
# Let's concatenate the base prompt, the completion until function execution and the result of the function as an Observation
new_prompt=prompt+output+get_weather('London')
print(new_prompt)

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Answer the following questions as best you can. You have access to the following tools:

get_weather: Get the current weather in a given location

The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are:
get_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}
example use :
```
{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about one action to take. Only one action at a time in this format:
Action:
```
$JSON_BLOB
```
Observation: the result of the action. This Observation is unique, complete, and the source of truth.
... (this Thought/Action/

In [27]:
final_output = client.text_generation(
    new_prompt,
    max_new_tokens=200,
)

print(final_output)

The best way to prevent cyber threats on university campuses is to implement robust security measures, such as firewalls, intrusion detection systems, and encryption. Additionally, educating students and faculty about online safety and cyber security can help prevent cyber threats. Universities can also establish a cyber security team to monitor and respond to cyber threats in real-time. Furthermore, implementing a Bring Your Own Device (BYOD) policy can help prevent cyber threats, and cyber attacks, and cyber threats, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, 