# Orchestrating Huggingface Transformers Agents

### First install all relevant packages

In [1]:
#%pip install --upgrade huggingface_hub transformers[agents] duckduckgo-search

### Then import the needed libraries

In [34]:
from transformers.agents import HfApiEngine, ReactJsonAgent, ReactCodeAgent, DuckDuckGoSearchTool, ManagedAgent

In [3]:
from huggingface_hub import login, InferenceClient 

In [4]:
import torch, json, os, re

### Login to Huggingface to use their architecture

In [5]:
hf_token = "<YOURTOKEN"
login(hf_token,add_to_git_credential=True) 

Token is valid (permission: read).
Your token has been saved in your configured git credential helpers (stor).
Your token has been saved to /home/moebius/.cache/huggingface/token
Login successful


### Now define the repositity you want to use

In [6]:
repo_id = "Qwen/Qwen2.5-72B-Instruct"

## Select the right remote engine 

In [7]:
llm_engine = HfApiEngine(model=repo_id)  

## And setup the agent

#####    How can I build an agent?

To initialize an agent, you need these arguments:

   1. an LLM to power your agent (Mistral in this case)- the agent is not exactly the LLM, it’s more like the agent is a program that uses an LLM as its engine.
   2. a system prompt: what the LLM engine will be prompted with to generate its output
   3.  a toolbox from which the agent pick tools to execute
   4.  a parser to extract from the LLM output which tools are to call and with which arguments

In [8]:
#sp="""Explain your reasoning step by step"""
#system_prompt=sp

In [20]:
os.environ["SERPAPI_API_KEY"]="<SERPAPIKEY>" #Not really used yet

In [9]:
custom_tools=[]

In [32]:
web_agent = ReactJsonAgent(tools=[DuckDuckGoSearchTool()], llm_engine=llm_engine)

managed_web_agent = ManagedAgent(
    agent=web_agent,
    name="web_search",
    description="Executes the DuckDuckGo Search Tool"
)

manager_agent = ReactCodeAgent(
    tools=[], llm_engine=llm_engine, managed_agents=[managed_web_agent]
)

manager_agent.run("Who is the CEO of Hugging Face?")

[37;1mWho is the CEO of Hugging Face?[0m
[33;1m=== Agent thoughts:[0m
[0mThought: I will use the `web_search` team member to find the current CEO of Hugging Face.[0m
[33;1m>>> Agent is executing the code below:[0m
[0m[38;5;7mweb_search[39m[38;5;7m([39m[38;5;7mrequest[39m[38;5;109;01m=[39;00m[38;5;144m"[39m[38;5;144mWho is the current CEO of Hugging Face?[39m[38;5;144m"[39m[38;5;7m)[39m[0m
[33;1m====[0m
[37;1mYou're a helpful agent named 'web_search'.
You have been submitted this task by your manager.
---
Task:
Who is the current CEO of Hugging Face?
---
You're helping your manager solve a wider task: so make sure to not provide a one-line answer, but give as much information as possible so that they have a clear understanding of the answer.

Your final_answer WILL HAVE to contain these parts:
### 1. Task outcome (short version):
### 2. Task outcome (extremely detailed version):
### 3. Additional context (if relevant):

Put all these in your final_answer too

'Clément Delangue'

In [33]:
manager_agent.run("How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture proposed in Attention is All You Need?")


[37;1mHow many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture proposed in Attention is All You Need?[0m
[33;1m=== Agent thoughts:[0m
[0mThought: To solve the problem, I first need to find out how many blocks (or layers) are in the BERT base encoder and how many are in the encoder architecture proposed in "Attention is All You Need". I will search the web for this information using the `web_search` tool.[0m
[33;1m>>> Agent is executing the code below:[0m
[0m[38;5;7mweb_search[39m[38;5;7m([39m[38;5;7mrequest[39m[38;5;109;01m=[39;00m[38;5;144m"[39m[38;5;144mnumber of layers in BERT base encoder[39m[38;5;144m"[39m[38;5;7m)[39m[0m
[33;1m====[0m
[37;1mYou're a helpful agent named 'web_search'.
You have been submitted this task by your manager.
---
Task:
number of layers in BERT base encoder
---
You're helping your manager solve a wider task: so make sure to not provide a one-line answer, but give as much informati

[33;1mPrint outputs:[0m
[32;20m[0m
[33;1mLast output from code snippet:[0m
[32;20m### 1. Task outcome (short version):

The number of layers in the encoder of the Transformer model as described in the paper 'Attention is All You Need' is 6.

### 2. Task outcome (extremely detailed version):

In the paper 'Attention is All You Need' by Vaswani et al. (2017), the authors introduced the Transformer model, which relies solely on self-attention mechanisms to process sequences. The Transformer architecture is composed of an encoder and a decoder, both of which are built from identical, stacked layers. Specifically, the encoder consists of 6 identical layers, each of which is composed of two sub-layers:

1. **Multi-Head Self-Attention Layer**: This layer is responsible for capturing dependencies between different positions in the input sequence. It uses multiple attention heads to allow the model to focus on different aspects of the input simultaneously.

2. **Feed-Forward Neural Netwo

6