# Creating a basic AI agent using the smolagents library

In the previous notebook we used semantic retrieval to find data in a large set of news articles.  
Now we want to use this retriever as a tool for a LLM to use.

It will be your task to use the [smolagents framework](https://huggingface.co/docs/smolagents/en/index) to create your first simple AI agent.

In [1]:
import os

import pandas as pd
from datasets import load_from_disk
from dotenv import load_dotenv
from sentence_transformers import SentenceTransformer
from smolagents import AzureOpenAIModel, CodeAgent, Tool

load_dotenv()

True

For reference, here is the retriever class, with loading the index, in the cell below, you can either use it in the tool or you can rewrite the code for the tool.

In [2]:
class InformationRetrieval:
    """Handles the retreival of news articles from the corpus.

    Attributes:
        ds (Dataset): The news data corpus.
        embedding_model (SentenceTransformer): Embedding model for the text.
    """

    def __init__(self) -> None:
        """Initialize a new news data retreiver."""
        self.ds = load_from_disk("../../data/financial_retrieval_corpus")

        self.embedding_model = SentenceTransformer(
            "sentence-transformers/all-MiniLM-L6-v2", token=False
        )
        self.load_index()

    def load_index(
        self, index_file: str = "../../data/financial_retrieval_corpus_index.all-MiniLM-L6-v2.faiss"
    ) -> None:
        """Load the index from a local file.

        Args:
            index_file (str): Path for the index file.
        """
        self.ds.load_faiss_index("embeddings", index_file)
        pass

    def search(self, query: str, top_k: int = 3) -> list[str]:
        """Search the corpus for the most similar articles.

        Args:
            query (str): Search query.
            top_k (int): Number of articles returned.

        Returns:
            list[str]: List of most similar articles.
        """
        query_embedding = self.embedding_model.encode(query)

        scores, retrieved_examples = self.ds.get_nearest_examples(
            "embeddings", query_embedding, k=top_k
        )
        return retrieved_examples["content"]

In [3]:
# Sanity Check:
retriever = InformationRetrieval()
retriever.search("Who is the CEO of Nvidia?")

['NVIDIA Announces Financial Results for Second Quarter Fiscal 2024 August 23, 20231\tRecord revenue of $13.51 billion, up 88% from Q1, up 101% from year ago2\tRecord Data Center revenue of $10.32 billion, up 141% from Q1, up 171% from year agoNVIDIA (NASDAQ: NVDA) today reported revenue for the second quarter ended July 30, 2023, of $13.51 billion, up 101% from a year ago and up 88% from the previous quarter.GAAP earnings per diluted share for the quarter were $2.48, up 854% from a year ago and up 202% from the previous quarter. Non-GAAP earnings per diluted share were $2.70, up 429% from a year ago and up 148% from the previous quarter. A new computing era has begun. Companies worldwide are transitioning from general-purpose to accelerated computing and generative AI,” said Jensen Huang, founder and CEO of NVIDIA.  NVIDIA GPUs connected by our Mellanox networking and switch technologies and running our CUDA AI software stack make up the computing infrastructure of generative AI.  Dur

In [4]:
financial_retrieval_queries = pd.read_parquet("../../data/financial_retrieval_queries.parquet")
financial_retrieval_queries.head(5)

Unnamed: 0,query,answer,content_index
0,What was the increase in revenue from the prev...,Up 101%,0
1,What was the revenue in second quarter?,$13.51 billion,0
2,How many shares were repurchased in second qua...,7.5 million shares,0
3,Who is the CEO of Nvidia?,Jensen Huang,0
4,What was the percentage increase in data cente...,Up 141% from Q1,0


## Task 1:

Create a class ```InformationRetrievalTool``` which inherits from [```Tool```](https://huggingface.co/docs/smolagents/v1.23.0/en/reference/tools#smolagents.Tool).  
You can reuse the class/code from earlier and check the [documentation](https://huggingface.co/docs/smolagents/tutorials/tools) for reference.

The child class needs to define:

<div style="border:1px solid #93c5fd; border-left:6px solid #3b82f6; background:#dbeafe; border-radius:6px; padding:12px 14px; color:#1e3a8a; font-family:system-ui,-apple-system,Segoe UI,Roboto,Ubuntu,Cantarell,Noto Sans,sans-serif;">  

* A class attribute - `name` - usually describes what the tool does.
* A class attribute - `description` - which is used to populate the agent’s system prompt.
* A class attribute - `inputs` - a dictionary with keys `type` and `description`.  
    It contains information that helps the Python interpreter to make educated choices about the input.
* A class attribute - `output_type` - which specifies the output type.  
    The types for both `inputs` and `output_type` should be in Pydantic formats.
* A `forward()` method which contains the inference code to be executed.

</div>

In [5]:
class InformationRetrievalTool(Tool):
    """An agentic tool for retrieving financial information."""

    # Your code here:
    name = "information_retrieval_tool"
    description = """
    This is a tool that searches a corpus of financial articles and retrieves the most relevant articles based on a given query.
    It can be used to find information on specific topics, events, or entities mentioned in the news articles.
    """
    inputs = {
        "query": {
            "type": "string",
            "description": "the search query to find relevant news articles",
        },
        "top_k": {
            "type": "integer",
            "description": "the number of top relevant articles to retrieve",
            "default": 3,
        },
    }
    output_type = "array"
    output_schema = {"type": "array", "items": {"type": "string"}}

    retriever = InformationRetrieval()

    def forward(self, query: str, top_k: int) -> list[str]:
        """This method must be implemented for a Tool."""
        results = self.retriever.search(query, top_k=top_k)
        return results

## Setting up the agent

Now that we have created our tool we can set up the agent.  
Below we have configured the connection to the LLM end point and the agent, which is very straight forward.  
[If you are curious here is the prompt which is used for the HuggingFace SmolAgent.](https://github.com/huggingface/smolagents/blob/main/src/smolagents/prompts/code_agent.yaml)

In [6]:
# Load environment variables from .env file
load_dotenv()

llm_endpoint = "https://aa-dsa-training-msca.openai.azure.com/"

llm_model_name = "gpt-5-nano"
llm_deploy_name = "gpt-5-nano"
# llm_model_name = "gpt-5-mini"
# llm_deploy_name = "gpt-5-mini"
# llm_model_name = "gpt-5"
# llm_deploy_name = "gpt-5"

subscription_api_key = os.getenv("OPENAI_API_KEY")
api_version = "2024-12-01-preview"

client = AzureOpenAIModel(
    model_id=llm_model_name,
    api_key=subscription_api_key,
    api_version=api_version,
    azure_endpoint=llm_endpoint,
)

In [7]:
information_retrieval_tool = InformationRetrievalTool()

agent = CodeAgent(
    tools=[
        information_retrieval_tool,
    ],
    model=client,
    stream_outputs=False,
)

## Query the agent

Now everything is set up and you can use the agent!
Feel free to play around and see how the agent reacts to different queries.

In [None]:
agent.run(
    "Search the financial information and tell me many shares were repurchased in second quarter?"
)

# Task 2:

smolagents comes with a few select [built in tools](https://huggingface.co/docs/smolagents/en/reference/default_tools).

Your task is it to build an agent using one (or more) built-in tools and query the agent.  
Of course, you can still keep the previous information retrieval tool to see how they interact.

Please experiment!

In [None]:
# Your code here:
%pip install wikipedia-api

from smolagents import WikipediaSearchTool

wikipedia_tool = WikipediaSearchTool()

agent = CodeAgent(
    tools=[
        information_retrieval_tool,
        wikipedia_tool,
    ],
    model=client,
    stream_outputs=False,
)

agent.run(
    "Search my corpus for entities related to cryptocurrency and then look up their Wikipedia pages."
)