# API Search - Make our bot to talk to any API

We have observed the remarkable synergy created by combining **GPT-4 with intelligent agents and detailed prompts**. This powerful combination has consistently delivered impressive results. To further capitalize on this capability, we should aim to integrate it with various systems through API communication. Essentially, we can develop within this notebook what is referred to in OpenAI's ChatGPT as 'GPTs.'

Envision a bot that seamlessly integrates with:

- **CRM Systems:** Including Dynamics, Salesforce, and HubSpot.
- **ERP Systems:** Such as SAP, Dynamics, and Oracle.
- **CMS Systems:** Including Adobe, Oracle, and other content management platforms.

The objective is to **connect our bot with data repositories**, minimizing data duplication as much as possible. These systems typically offer APIs, facilitating programmatic data access.

In this notebook, we plan to develop an agent capable of querying an API to retrieve information and effectively answer questions. We will try to interact with the CRM system HubSpot API (https://api.hubapi.com/).

In [1]:
import os
import requests
from time import sleep

from langchain.chat_models import AzureChatOpenAI
from langchain.callbacks.manager import CallbackManager
from langchain.agents import initialize_agent, AgentType
from langchain.tools import BaseTool
from langchain.chains import APIChain
from langchain.agents.agent_toolkits.openapi.spec import reduce_openapi_spec

from common.callbacks import StdOutCallbackHandler
from common.utils import num_tokens_from_string, reduce_openapi_spec
from common.prompts import APISEARCH_PROMPT_PREFIX

from IPython.display import Markdown, HTML, display  

from dotenv import load_dotenv
load_dotenv("credentials.env")


True

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]

In [3]:
cb_handler = StdOutCallbackHandler()
cb_manager = CallbackManager(handlers=[cb_handler])

llm_1 = AzureChatOpenAI(deployment_name="gpt-4-32k", temperature=0, max_tokens=2000, callback_manager=cb_manager)
llm_2 = AzureChatOpenAI(deployment_name="gpt-35-turbo-16k", temperature=0, max_tokens=1000)

## The Logic

By now, you must infer that the solution for an API Agent has to be something like: give the API specification as part of the system prompt to the LLM , then have an agent plan for the right steps to formulate the API call.<br>

Let's do that. But we must first understand the industry standards of Swagger/OpenAPI


## Introduction to OpenAPI (formerly Swagger)

The OpenAPI Specification, previously known as the Swagger Specification, is a specification for a machine-readable interface definition language for describing, producing, consuming and visualizing web services. Previously part of the Swagger framework, it became a separate project in 2016, overseen by the OpenAPI Initiative, an open-source collaboration project of the Linux Foundation.

OpenAPI Specification is an API description format for REST APIs. An OpenAPI file allows you to describe your entire API, including: Available endpoints (/users for example) and operations on each endpoint ( GET /users, POST /users), description, contact information, license, terms of use and other information.

### Let's get the OpenAPI (Swagger) spec from our desired API that we want to talk to

In [4]:
# Swagger for HubSpot contacts API
url = 'https://api.hubspot.com/api-catalog-public/v1/apis/crm/v3/objects/contacts'
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    spec = response.json()
else:
    spec = None
    print(f"Failed to retrieve data: Status code {response.status_code}")


Let's see how big is this API specification:

In [5]:
api_tokens = num_tokens_from_string(str(spec))
print("API spec size in tokens:",api_tokens)

API spec size in tokens: 7738


## Creating a custom agent that uses the APIChain as a tool

To solve the avobe problem, we can build a REACT Agent that uses the APIChain as a tool to get the information. This agent will create as many calls as needed (using the chain tool) until it answers the question

In [6]:
class HubspotAPIContact(BaseTool):
    """APIChain as an agent tool"""
    
    name = "@apicontacts"
    description = "useful when the questions includes the term: @apicontacts. E.g. querying about contact name, company, and e-mail.\n"

    llm: AzureChatOpenAI
    api_spec: str
    headers: dict = {}
    limit_to_domains: list = []
    verbose: bool = False
    
    def _run(self, query: str) -> str:
        
        chain = APIChain.from_llm_and_api_docs(
                            llm=self.llm,
                            api_docs=self.api_spec,
                            headers=self.headers,
                            verbose=self.verbose,
                            limit_to_domains=self.limit_to_domains
                )
        try:
            sleep(2) # This is optional to avoid possible TPM rate limits
            response = chain.run(query)
        except Exception as e:
            response = e
        
        return response
            
    async def _arun(self, query: str) -> str:
        """Use the tool asynchronously."""
        raise NotImplementedError("This Tool does not support async")

class HubspotAPICompany(BaseTool):
    """APIChain as an agent tool"""
    
    name = "@apicompanies"
    description = "useful when the questions includes the term: @apicompanies. E.g. querying about company name, owner and contact information. \n"

    llm: AzureChatOpenAI
    api_spec: str
    headers: dict = {}
    limit_to_domains: list = []
    verbose: bool = False
    
    def _run(self, query: str) -> str:
        
        chain = APIChain.from_llm_and_api_docs(
                            llm=self.llm,
                            api_docs=self.api_spec,
                            headers=self.headers,
                            verbose=self.verbose,
                            limit_to_domains=self.limit_to_domains
                )
        try:
            sleep(2) # This is optional to avoid possible TPM rate limits
            response = chain.run(query)
        except Exception as e:
            response = e
        
        return response
            
    async def _arun(self, query: str) -> str:
        """Use the tool asynchronously."""
        raise NotImplementedError("This Tool does not support async")
    
class HubspotAPIDeal(BaseTool):
    """APIChain as an agent tool"""
    
    name = "@apideals"
    description = "useful when the questions includes the term: @apideals. E.g. querying about deal status, deal stage and amount. \n"

    llm: AzureChatOpenAI
    api_spec: str
    headers: dict = {}
    limit_to_domains: list = []
    verbose: bool = False
    
    def _run(self, query: str) -> str:
        
        chain = APIChain.from_llm_and_api_docs(
                            llm=self.llm,
                            api_docs=self.api_spec,
                            headers=self.headers,
                            verbose=self.verbose,
                            limit_to_domains=self.limit_to_domains
                )
        try:
            sleep(2) # This is optional to avoid possible TPM rate limits
            response = chain.run(query)
        except Exception as e:
            response = e
        
        return response
            
    async def _arun(self, query: str) -> str:
        """Use the tool asynchronously."""
        raise NotImplementedError("This Tool does not support async")
    
class HubspotAPIProduct(BaseTool):
    """APIChain as an agent tool"""
    
    name = "@apiproducts"
    description = "useful when the questions includes the term: @apiproducts. E.g. querying about product name, description, and price. \n"

    llm: AzureChatOpenAI
    api_spec: str
    headers: dict = {}
    limit_to_domains: list = []
    verbose: bool = False
    
    def _run(self, query: str) -> str:
        
        chain = APIChain.from_llm_and_api_docs(
                            llm=self.llm,
                            api_docs=self.api_spec,
                            headers=self.headers,
                            verbose=self.verbose,
                            limit_to_domains=self.limit_to_domains
                )
        try:
            sleep(2) # This is optional to avoid possible TPM rate limits
            response = chain.run(query)
        except Exception as e:
            response = e
        
        return response
            
    async def _arun(self, query: str) -> str:
        """Use the tool asynchronously."""
        raise NotImplementedError("This Tool does not support async")

In [7]:
def get_spec(url):
    response = requests.get(url)
    
    # Check if the request was successful
    if response.status_code == 200:
        spec = response.json()
    else:
        spec = None
        print(f"Failed to retrieve data: Status code {response.status_code}")
    return spec

Create customized tools for contact and company APIs

In [8]:
access_token = os.environ["HUBSPOT_ACCESS_TOKEN"]
headers = {"Authorization": f"Bearer {access_token}"}

contact_tool = HubspotAPIContact(
    llm=llm_1, 
    api_spec=str(get_spec('https://api.hubspot.com/api-catalog-public/v1/apis/crm/v3/objects/contacts')),
    headers=headers,
    verbose=True,
    limit_to_domains=["https://api.hubapi.com/"]
)

company_tool = HubspotAPICompany(
    llm=llm_1, 
    api_spec=str(get_spec('https://api.hubspot.com/api-catalog-public/v1/apis/crm/v3/objects/companies')),
    headers=headers,
    limit_to_domains=["https://api.hubapi.com/"]
)

deal_tool = HubspotAPIDeal(
    llm=llm_1, 
    api_spec=str(get_spec('https://api.hubspot.com/api-catalog-public/v1/apis/crm/v3/objects/deals')),
    headers=headers,
    limit_to_domains=["https://api.hubapi.com/"]
)

product_tool = HubspotAPIProduct(
    llm=llm_1, 
    api_spec=str(get_spec('https://api.hubspot.com/api-catalog-public/v1/apis/crm/v3/objects/products')),
    headers=headers,
    limit_to_domains=["https://api.hubapi.com/"]
)

In [9]:
tools = [contact_tool, company_tool, deal_tool, product_tool]

agent_executor = initialize_agent(
    tools,
    llm_1,  # Pay attention to the GPT model used by the agent,
    verbose=True,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
    agent_kwargs={'prefix':APISEARCH_PROMPT_PREFIX}, 
    callback_manager=cb_manager
)

print(agent_executor.agent.allowed_tools)

['@apicontacts', '@apicompanies', '@apideals', '@apiproducts']


In [18]:
QUESTION = '''Can you provide me with all existing sales deals?'''
# QUESTION = '''Can you provide me with all existing contacts?'''
# QUESTION = '''Can you provide me all sales products from the Hubspot system?'''

In [19]:
# As LLMs responses are never the same, we do a for loop in case the answer cannot be parsed according to our prompt instructions
for i in range(2):
    try:
        response = agent_executor.invoke(QUESTION) 
        break
    except Exception as e:
        print(e)
        response = str(e)
        continue



[1m> Entering new AgentExecutor chain...[0m
The user is asking for information about all existing sales deals. This information can be obtained using the @apideals tool.
Action: @apideals
Action Input: {"operation": "list"}
[32;1m[1;3mThe user is asking for information about all existing sales deals. This information can be obtained using the @apideals tool.
Action: @apideals
Action Input: {"operation": "list"}[0m
Observation: [38;5;200m[1;3mThe API call lists the deals in the system. There is one deal with the id "10885438957". The deal is for "T-Shirts for hackathon event" with an amount of "1250". The deal is currently in the "appointmentscheduled" stage in the default pipeline. The deal was created on "2024-02-12T10:00:36.458Z" and last modified on "2024-02-12T10:00:43.970Z". The deal is not archived.[0m
Thought:[32;1m[1;3mI have obtained the information about the existing sales deal from the @apideals tool. I can now provide this information to the user.
Final Answer: 

In [13]:
response

{'input': 'Can you provide me with all existing contacts?',
 'output': 'Here are all the existing contacts:\n\n1. Maria Johnson (Sample Contact) with email: emailmaria@hubspot.com\n2. Brian Halligan (Sample Contact) with email: bh@hubspot.com\n3. Olafs Rozitis with email: olafs.rozitis@contoso.com\n4. Kumar Kamei with email: kumar.kamei@bloomberg.com\n5. Sylvie Laramee with email: sylvie.laramee@apple.com\n6. Luis Saucedo with email: luis.saucedo@hubspot.com\n7. Kalyani Benjaree with email: kalyani.benjaree@microsoft.com\n8. Ivana Hadrabova with email: ivana.hadrabova@hubspot.com\n9. Nguyen Banh with email: nguyen.banh@telecom.de'}

**Great!!** we have now an API Agent using APIChain as a tool, capable of reasoning until it can find the answer. And it is pretty fast as well.

# Summary

In this notebook, we learned about how to create very smart API agents for simple or complex APIs that use Swagger or OpenAPI specifications.
We see, again, that the key to success is to use: Agents with Expert tools + GPT-4 + good prompts.

As homework, try to create a shopping assistant for Etsy e-commerce site using the following API spec: (you will need to register for free and create an API-Key)

- https://developers.etsy.com/documentation/
- https://www.etsy.com/openapi/generated/oas/3.0.0.json

# NEXT

The Next Notebook will guide you on how we stick everything together. How do we use the features of all notebooks and create a brain agent that can respond to any request accordingly.