# Internet and Websites Search using Bing API - Bing Chat Clone

In this notebook, we'll delve into the ways in which you can **boost your GPT Smart Search Engine with web search functionalities**, utilizing both Langchain and the Azure Bing Search API service.

As previously discussed in our other notebooks, **harnessing agents and tools is an effective approach**. We aim to leverage the capabilities of OpenAI's large language models (LLM), such as GPT-4 and its successors, to perform the heavy lifting of reasoning and researching on our behalf.

There are numerous instances where it is necessary for our Smart Search Engine to have internet access. For instance, we may wish to **enrich an answer with information available on the web**, or **provide users with up-to-date and recent information**, or **finding information on an specific public website**. Regardless of the scenario, we require our engine to base its responses on search results.

By the conclusion of this notebook, you'll have a solid understanding of the Bing Search API basics, including **how to create a Web Search Agent using the Bing Search API**, and how these tools can strengthen your chatbot. Additionally, you'll learn about **Callback Handlers, their use, and their significance in bot applications**.

In [1]:
import requests
from typing import Dict, List
from pydantic import BaseModel, Extra, root_validator

from langchain.chat_models import AzureChatOpenAI
from langchain.agents import AgentExecutor
from langchain.callbacks.manager import CallbackManager
from langchain.agents import initialize_agent, AgentType
from langchain.tools import BaseTool
from langchain.utilities import BingSearchAPIWrapper

from common.callbacks import StdOutCallbackHandler
from common.prompts import BING_PROMPT_PREFIX

from IPython.display import Markdown, HTML, display  

from dotenv import load_dotenv
load_dotenv("credentials.env")

def printmd(string):
    display(Markdown(string.replace("$","USD ")))

# GPT-4 models are necessary for this feature. GPT-35-turbo will make mistakes multiple times on following system prompt instructions.
MODEL_DEPLOYMENT_NAME = "gpt-4-32k" 

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_BASE"] = os.environ["AZURE_OPENAI_ENDPOINT"]
os.environ["OPENAI_API_KEY"] = os.environ["AZURE_OPENAI_API_KEY"]
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]
os.environ["OPENAI_API_TYPE"] = "azure"

## Introduction to Callback Handlers

This following explanation comes directly from the Langchain documentation about Callbacks ([HERE](https://python.langchain.com/docs/modules/callbacks/)):

**Callbacks**:<br>
LangChain provides a callbacks system that allows you to hook into the various stages of your LLM application. This is useful for logging, monitoring, streaming, and other tasks. You can subscribe to these events by using the callbacks argument available throughout the API. This argument is list of handler objects.

**Callback handlers**:<br>
CallbackHandlers are objects that implement the CallbackHandler interface, which has a method for each event that can be subscribed to. The CallbackManager will call the appropriate method on each handler when the event is triggered.

--------------------
We will incorporate a handler for the callbacks, enabling us to observe the response as it streams and to gain insights into the Agent's reasoning process. This will prove incredibly valuable when we aim to stream the bot's responses to users and keep them informed about the ongoing process as they await the answer.

Our custom handler is on the folder `common/callbacks.py`. Go and take a look at it.

In [6]:
cb_handler = StdOutCallbackHandler()
cb_manager = CallbackManager(handlers=[cb_handler])

# Now we declare our LLM object with the callback handler 
llm = AzureChatOpenAI(deployment_name=MODEL_DEPLOYMENT_NAME, temperature=0, max_tokens=1000)

# or uncomment the below line if you want to see the responses being streamed
llm = AzureChatOpenAI(deployment_name=MODEL_DEPLOYMENT_NAME, temperature=0, max_tokens=1000, streaming=True, callback_manager=cb_manager)

## Creating a custom tool - Bing Search API tool

Langhain has already a pre-created tool called BingSearchAPIWrapper ([HERE](https://github.com/hwchase17/langchain/blob/master/langchain/utilities/bing_search.py)), however we are going to make it a bit better by using the results function instead of the run function, that way we not only have the text results, but also the title and link(source) of each snippet.

In [7]:
class MyBingSearch(BaseTool):
    """Tool for a Bing Search Wrapper"""
    
    name = "@bing"
    description = "useful when the questions includes the term: @bing.\n"

    k: int = 5
    
    def _run(self, query: str) -> str:
        bing = BingSearchAPIWrapper(k=self.k)
        return bing.results(query,num_results=self.k)
            
    async def _arun(self, query: str) -> str:
        """Use the tool asynchronously."""
        raise NotImplementedError("This Tool does not support async")

Now, we create our REACT agent that uses our custom tool and our custom prompt `BING_PROMPT_PREFIX`

In [8]:
tools = [MyBingSearch(k=5)]
agent_executor = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
                         agent_kwargs={'prefix':BING_PROMPT_PREFIX}, callback_manager=cb_manager, )

Try some of the below questions, or others that you might like

In [9]:
QUESTION = "Create a list with the main facts on What is happening with the oil supply in the world right now?"
# QUESTION = "How much is 50 USD in Euros and is it enough for an average hotel in Madrid?"
# QUESTION = "My son needs to build a pinewood car for a pinewood derbi, how do I build such a car?"
# QUESTION = "Who won the 2023 superbowl and who was the MVP?"
# QUESTION = "can I travel to Hawaii, Maui from Dallas, TX for 7 days with $7000 on the month of September, what are the best days to travel?"


# This complex question below needs gpt-4-32k (0613 version) in order to ensure a good answer. 
# ---------------
# QUESTION = """
# compare the number of job opennings (provide the exact number), the average salary within 15 miles of Dallas, TX, for these ocupations:

# - ADN Registerd Nurse 
# - Occupational therapist assistant
# - Dental Hygienist
# - Graphic Designer
# - Real Estate Agent


# Create a table with your findings. Place the sources on each cell.
# """

In [10]:
#As LLMs responses are never the same, we do a for loop in case the answer cannot be parsed according to our prompt instructions
for i in range(2):
    try:
        response = agent_executor.run(QUESTION) 
        break
    except Exception as e:
        response = str(e)
        continue

The user is asking for current information about the global oil supply. I will need to perform a web search to gather the most recent data and facts on this topic.
Action: @bing
Action Input: What is happening with the oil supply in the world right now?The user is asking for current information about the global oil supply. I will need to perform a web search to gather the most recent data and facts on this topic.
Action: @bing
Action Input: What is happening with the oil supply in the world right now?
The search results provide some information about the current state of the global oil supply. The United States is the world's top oil consumer, with China being the second<sup><a href="https://www.forbes.com/sites/rrapier/2023/08/18/the-us-still-leads-in-global-petroleum-production--consumption/" target="_blank">[1]</a></sup>. There are contrasting forecasts about oil demand growth, with the IEA lowering its forecast and OPEC maintaining a stronger growth forecast<sup><a href="https://ww

In [11]:
printmd(response)

Here are the main facts about the current state of the global oil supply:

1. The United States is the world's top oil consumer, with China being the second<sup><a href="https://www.forbes.com/sites/rrapier/2023/08/18/the-us-still-leads-in-global-petroleum-production--consumption/" target="_blank">[1]</a></sup>.
2. There are contrasting forecasts about oil demand growth, with the IEA lowering its forecast and OPEC maintaining a stronger growth forecast<sup><a href="https://www.reuters.com/markets/commodities/iea-lowers-2024-oil-demand-growth-forecast-2023-08-11/" target="_blank">[2]</a></sup>.
3. Oil prices have been on a streak of gains due to increased demand forecasts<sup><a href="https://www.bloomberg.com/news/articles/2023-08-11/latest-oil-market-news-and-analysis-for-august-11" target="_blank">[3]</a></sup>.
4. High oil prices are expected to continue<sup><a href="https://www.cnn.com/2022/06/03/energy/oil-prices-what-next/index.html" target="_blank">[4]</a></sup>.
5. There was a decline in production to 9.7 million BPD in May 2020, but it has since bounced back to 11.3 million BPD<sup><a href="https://www.forbes.com/sites/rrapier/2021/10/22/the-us-oil-supply-is-still-out-of-balance/" target="_blank">[5]</a></sup>.
6. The U.S. oil production hit an all-time high of just below 13 million barrels per day (BPD) before the Covid-19 pandemic, but demand collapsed as the pandemic unfolded<sup><a href="https://www.forbes.com/sites/rrapier/2022/03/11/what-is-holding-back-us-oil-production/" target="_blank">[6]</a></sup>.
7. OPEC+ is also considering the early 2022 oil market, with the possibility of more Covid cases in winter and a lower-demand “shoulder” season in spring<sup><a href="https://www.forbes.com/sites/daneberhart/2021/11/13/why-are-oil-prices-so-high-when-the-us-remains-one-of-the-worlds-largest-producers/" target="_blank">[7]</a></sup>.
8. Oil demand is expected to grow from 2022 to 2030, rising by just under 7 million b/d, but the rate of growth is expected to slow through the period to less than 0.5 million b/d a year by 2030<sup><a href="https://www.forbes.com/sites/woodmackenzie/2021/07/15/is-the-world-sleepwalking-into-an-oil-supply-crunch/" target="_blank">[8]</a></sup>.
9. An outage on the largest oil pipeline to the United States from Canada could affect inventories at a key U.S. storage hub<sup><a

## QnA to specific websites

There are several use cases where we want the smart bot to answer questions about a specific company's public website. There are two approaches we can take:

1. Create a crawler script that runs regularly, finds every page on the website, and pushes the documents to Azure Cognitive Search.
2. Since Bing has likely already indexed the public website, we can utilize Bing search targeted specifically to that site, rather than attempting to index the site ourselves and duplicate the work already done by Bing's crawler.

Below are some sample questions related to specific sites. Take a look:

In [12]:
# QUESTION = "information on how to kill wasps in homedepot.com"
# QUESTION = "in target.com, find how what's the price of a Nesspresso coffee machine and of a Keurig coffee machine"
QUESTION = "in microsoft.com, find out what is the latests news on quantum computing"
# QUESTION = "give me on a list the main points on the latest investor report from mondelezinternational.com"

In [13]:
#As LLMs responses are never the same, we do a for loop in case the answer cannot be parsed according to our prompt instructions
for i in range(3):
    try:
        response = agent_executor.run(QUESTION) 
        break
    except Exception as e:
        response = str(e)
        continue

The user is asking for the latest news on quantum computing from the Microsoft website. I will use the `site:` special operator to search for this information specifically on microsoft.com.
Action: @bing
Action Input: quantum computing news site:microsoft.comThe user is asking for the latest news on quantum computing from the Microsoft website. I will use the `site:` special operator to search for this information specifically on microsoft.com.
Action: @bing
Action Input: quantum computing news site:microsoft.com
The search results provide several pieces of news about quantum computing from Microsoft. The first snippet talks about Microsoft achieving the first milestone towards a quantum supercomputer. The second snippet discusses Azure Quantum demonstrating formerly elusive quantum phenomena. The third snippet mentions new data available for Microsoft's quantum machine on the Azure Quantum platform. The fourth snippet discusses new Microsoft breakthroughs that bring general-purpose qu

In [14]:
printmd(response)

Here are some of the latest news on quantum computing from Microsoft:

1. Microsoft has achieved the first milestone towards a quantum supercomputer<sup><a href="https://cloudblogs.microsoft.com/quantum/2023/06/21/microsoft-achieves-first-milestone-towards-a-quantum-supercomputer/" target="_blank">[1]</a></sup>.
2. Azure Quantum has demonstrated formerly elusive quantum phenomena<sup><a href="https://news.microsoft.com/source/features/innovation/azure-quantum-majorana-topological-qubit/" target="_blank">[2]</a></sup>.
3. New data is available for Microsoft's quantum machine on the Azure Quantum platform<sup><a href="https://cloudblogs.microsoft.com/quantum/2022/11/17/microsofts-quantum-machine-new-data-available-today/" target="_blank">[3]</a></sup>.
4. Microsoft has made breakthroughs that bring general-purpose quantum computing closer to reality<sup><a href="https://news.microsoft.com/features/new-microsoft-breakthroughs-general-purpose-quantum-computing-moves-closer-reality/" target="_blank">[4]</a></sup>.

Is there anything else you would like to know?

In [17]:
# Uncomment if you want to take a look at the custom bing search prompt (This is where the magic happens: a great system promp + GPT-4)
# printmd(agent_executor.agent.llm_chain.prompt.template)

# Summary

In this notebook, we learned about Callback Handlers and how to stream the response from the LLM. We also learn how to create a Bing Chat clone using a clever prompt with specific search and formatting instructions.

The outcome is an agent capable of conducting intelligent web searches and performing research on our behalf. This agent provides us with answers to our questions along with appropriate URL citations and links!

# NEXT

The Next Notebook will guide you on how we stick everything together. How do we use the features of all notebooks and create a brain agent that can respond to any request accordingly.