# Building a Exa (formerly Metaphor) Data Agent

This tutorial walks through using the LLM tools provided by the [Exa API](https://exa.ai) to allow LLMs to easily search and retrieve HTML content from the Internet.

To get started, you will need an [OpenAI api key](https://platform.openai.com/account/api-keys) and an [Exa API key](https://dashboard.exa.ai/overview)

We will import the relevant agents and tools and pass them our keys here:

In [1]:
# Set up OpenAI
import os
import openai
from llama_index.agent import OpenAIAgent

openai.api_key = os.environ["OPENAI_API_KEY"]

# Set up Metaphor tool
from llama_hub.tools.exa.base import ExaToolSpec

exa_tool = ExaToolSpec(
    api_key=os.environ["EXA_API_KEY"],
)

exa_tool_list = exa_tool.to_tool_list()
for tool in exa_tool_list:
    print(tool.metadata.name)

search
retrieve_documents
search_and_retrieve_documents
search_and_retrieve_highlights
find_similar
current_date


## Testing the Exa tools

We've imported our OpenAI agent, set up the api key, and initialized our tool, checking the methods that it has available. Let's test out the tool before setting up our Agent.

All of the Exa search tools make use of the `AutoPrompt` option where Exa will pass the query through an LLM to refine and improve it.

In [2]:
exa_tool.search_and_retrieve_documents("machine learning transformers", num_results=3)

[Exa Tool] Autoprompt: Here is a great article about machine learning transformers:


[Document(id_='d9291e44-f359-466d-ae3b-20e2cdac90ea', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='The famous paper “Attention is all you need” in 2017 changed the way we were thinking about attention. With enough data, matrix multiplications, linear layers, and layer normalization we can perform state-of-the-art-machine-translation.Nonetheless, 2020 was definitely the year of transformers! From natural language now they are into computer vision tasks. How did we go from attention to self-attention? Why does the transformer work so damn well? What are the critical components for its success?R', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='8e140d86-adab-4665-af6d-93d2f2ce7743', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='\n \n \n

In [3]:
exa_tool.find_similar(
    "https://www.mihaileric.com/posts/transformers-attention-in-disguise/"
)

[{'title': 'A Deep Dive Into the Transformer Architecture — The Development of Transformer Models',
  'url': 'https://towardsdatascience.com/a-deep-dive-into-the-transformer-architecture-the-development-of-transformer-models-acbdf7ca34e0?gi=b4d77d2ab4db',
  'id': '60J3eIu_oZO9OEulMglxuw'},
 {'title': 'What is a Transformer?',
  'url': 'https://medium.com/inside-machine-learning/what-is-a-transformer-d07dd1fbec04',
  'id': 'uxGX5rLD8HXrmgQiyQIYyw'},
 {'title': 'The Transformer Model',
  'url': 'https://towardsdatascience.com/attention-is-all-you-need-e498378552f9?gi=92758857966b',
  'id': 'RKL4_dd9kKX_OThZCXo8Yg'}]

In [4]:
exa_tool.search_and_retrieve_documents(
    "This is the best explanation for machine learning transformers:", num_results=1
)

[Exa Tool] Autoprompt: Here is a great explanation for machine learning transformers:


[Document(id_='3a0f92c4-acfe-4198-8727-ed8f9b6ba5a2', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='\n \n \n \n Background: Representation Learning for NLP\n Enter the Transformer\n Transformers vs. CNNs \n Language\n Vision\n Multimodal Tasks\n \n \n Breaking down the Transformer \n Background \n One-hot encoding \n Overview\n Idea\n Example: Basic Dataset\n Example: NLP\n \n \n Dot product \n Algebraic Definition\n Geometric Definition\n Properties of the dot product\n \n \n Matrix multiplication as a series of dot products \n Matrix multiplication as a table lookup\n \n \n First order sequence model\n Second order sequenc', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')]

We can see we have different tools to search for results, retrieve the results, find similar results to a web page, and finally a tool that combines search and document retrieval into a single tool. We will test them out in LLM Agents below:

### Using the Search and Retrieve documents tools in an Agent

We can create an agent with access to the above tools and start testing it out:

In [5]:
# We don't give the Agent our unwrapped retrieve document tools, instead passing the wrapped tools
agent = OpenAIAgent.from_tools(
    exa_tool_list,
    verbose=True,
)

In [6]:
print(agent.chat("What are the best resturants in toronto?"))

Added user message to memory: What are the best resturants in toronto?
=== Calling Function ===
Calling function: search with args: {
  "query": "best restaurants in Toronto"
}
[Exa Tool] Autoprompt: Here is a great restaurant in Toronto to try:
Got output: [{'title': 'PATOIS • TORONTO', 'url': 'https://www.patoistoronto.com/', 'id': '5EC2l7fbaPoEydNVNwjc-A'}, {'title': 'Portuguese inspired seafood from around the world | Adega Restaurante', 'url': 'https://adegarestaurante.ca/', 'id': 'oQiAWWgzrU-ryPNmgj3UuA'}, {'title': 'Location', 'url': 'https://osteriagiulia.ca/', 'id': 'mpjelsyCOpNipFFI5AoZTQ'}, {'title': 'Enigma Yorkville | Modern European Restaurant in Toronto, ON', 'url': 'https://www.enigmayorkville.com/', 'id': 'jBOC2QfhTfuPjt0YdibEVA'}, {'title': 'Select A Restaurant', 'url': 'https://www.torontopho.com/', 'id': 'Hk6LQnLIZsCH8SYrNFoO2Q'}, {'title': 'Welcome to "Woodlot Toronto" restaurant! - Woodlot Toronto', 'url': 'https://woodlottoronto.com/', 'id': 'VUKoFW1gttmNySwHYhlg

In [8]:
print(agent.chat("tell me more about Osteria Giulia"))

Added user message to memory: tell me more about Osteria Giulia
Osteria Giulia is a restaurant located in Toronto, Ontario. It offers a unique dining experience with a focus on Italian cuisine. The restaurant is known for its warm and inviting atmosphere, making it a popular choice for both casual dining and special occasions.

The menu at Osteria Giulia features a variety of traditional Italian dishes, prepared with fresh and high-quality ingredients. From homemade pasta and risotto to wood-fired pizzas and seafood, there is something to satisfy every palate. The restaurant also offers a selection of fine wines to complement the flavors of the dishes.

One of the highlights of Osteria Giulia is its commitment to using locally sourced ingredients. The restaurant takes pride in supporting local farmers and suppliers, ensuring that each dish is made with the freshest and most sustainable ingredients available.

In addition to its delicious food, Osteria Giulia provides excellent service,

## Avoiding Context Window Issues

The above example shows the core uses of the Exa tool. We can easily retrieve a clean list of links related to a query, and then we can fetch the content of the article as a cleaned up html extract. Alternatively, the search_and_retrieve_documents tool directly returns the documents from our search result.

We can see that the content of the articles is somewhat long compared to current LLM context windows, and so to allow retrieval and summary of many documents we will set up and use another tool from LlamaIndex that allows us to load text into a VectorStore, and query it for retrieval. This is where the `search_and_retrieve_documents` tool become particularly useful. The Agent can make a single query to retrieve a large number of documents, using a very small number of tokens, and then make queries to retrieve specific information from the documents.

In [9]:
from llama_index.tools.tool_spec.load_and_search.base import LoadAndSearchToolSpec

# The search_and_retrieve_documents tool is the third in the tool list, as seen above
wrapped_retrieve = LoadAndSearchToolSpec.from_defaults(
    exa_tool_list[2],
)

Our wrapped retrieval tools separate loading and reading into separate interfaces. We use `load` to load the documents into the vector store, and `read` to query the vector store. Let's try it out again

In [10]:
wrapped_retrieve.load("This is the best explanation for machine learning transformers:")
print(wrapped_retrieve.read("what is a transformer"))
print(wrapped_retrieve.read("who wrote the first paper on transformers"))

[Exa Tool] Autoprompt: "Check out this article on the best explanation for machine learning transformers:
A transformer is a type of architecture used in Natural Language Processing (NLP) tasks. It is a breakthrough model that overcame the limitations of previous seq-to-seq models like RNNs in capturing long-term dependencies in text. The transformer architecture has become the foundation for various revolutionary models such as BERT, GPT, and T5, which have been widely used in NLP tasks.
Vaswani et al. wrote the first paper on transformers.


## Creating the Agent

We now are ready to create an Agent that can use Metaphors services to it's full potential. We will use our wrapped read and load tools, as well as the `get_date` utility for the following agent and test it out below:

In [11]:
# Just pass the wrapped tools and the get_date utility
agent = OpenAIAgent.from_tools(
    [*wrapped_retrieve.to_tool_list(), exa_tool_list[4]],
    verbose=True,
)

In [12]:
print(
    agent.chat(
        "Can you summarize everything published in the last month regarding news on"
        " superconductors"
    )
)

Added user message to memory: Can you summarize everything published in the last month regarding news on superconductors
=== Calling Function ===
Calling function: search_and_retrieve_documents with args: {
  "query": "news on superconductors",
  "start_published_date": "2022-09-01",
  "end_published_date": "2022-09-30"
}
[Exa Tool] Autoprompt: Here is a recent article about superconductors:
Got output: Content loaded! You can now search the information using read_search_and_retrieve_documents

=== Calling Function ===
Calling function: read_search_and_retrieve_documents with args: {
  "query": "news on superconductors"
}
Got output: The journal Nature has retracted a ground-breaking paper claiming to show the first room-temperature superconductor. The retraction was made due to concerns around the data analysis and allegations of manipulated results. Superconductors are materials that exhibit no electrical resistance and have various applications, such as NMR machines, quantum computi

We asked the agent to retrieve documents related to superconductors from this month. It used the `get_date` tool to determine the current month, and then applied the filters in Metaphor based on publication date when calling `search`. It then loaded the documents using `retrieve_documents` and read them using `read_retrieve_documents`.

We can make another query to the vector store to read from it again, now that the articles are loaded: