# Skill 4: Internet and Websites Search using Bing API - Bing Chat Clone

In this notebook, we'll delve into the ways in which you can **boost your GPT Smart Search Engine with web search functionalities**, utilizing both Langchain and the Azure Bing Search API service.

As previously discussed in our other notebooks, **harnessing agents and tools is an effective approach**. We aim to leverage the capabilities of OpenAI's large language models (LLM), such as GPT-4 and its successors, to perform the heavy lifting of reasoning and researching on our behalf.

There are numerous instances where it is necessary for our Smart Search Engine to have internet access. For instance, we may wish to **enrich an answer with information available on the web**, or **provide users with up-to-date and recent information**, or **finding information on an specific public website**. Regardless of the scenario, we require our engine to base its responses on search results.

By the conclusion of this notebook, you'll have a solid understanding of the Bing Search API basics, including **how to create a Web Search Agent using the Bing Search API**, and how these tools can strengthen your chatbot.

In [1]:
import os
import requests
from bs4 import BeautifulSoup

from pydantic import BaseModel
from langchain_core.tools import StructuredTool
from langchain_core.tools import BaseTool, StructuredTool
from langchain_openai import AzureChatOpenAI
from langchain_community.utilities import BingSearchAPIWrapper
from langchain_community.tools.bing_search import BingSearchResults

from langgraph.prebuilt import create_react_agent

from common.prompts import BING_PROMPT_TEXT

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string.replace("$","USD ")))

from dotenv import load_dotenv
load_dotenv("credentials.env")


True

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]

In [3]:
COMPLETION_TOKENS = 2000

llm = AzureChatOpenAI(deployment_name=os.environ["GPT4o_DEPLOYMENT_NAME"], 
                      temperature=0.5, max_tokens=COMPLETION_TOKENS, 
                      streaming=True)

### Creating the expert web search engine tool - Bing Search API tool

Langhain has already a pre-built utility called **BingSearchAPIWrapper** and a pre-built tool **BingSearchResults**

In [4]:
api_wrapper = BingSearchAPIWrapper()
bing_tool = BingSearchResults(api_wrapper=BingSearchAPIWrapper(), 
                              num_results=10,
                              name="Searcher",
                              description="useful to search the internet")

### Creating another custom tool - WebFetcher: Visits a website and extracts the text
    You will need a model with a big context token size for this tool since the content of a website can be very lenghty

In [5]:
def parse_html(content) -> str:
    soup = BeautifulSoup(content, 'html.parser')
    text_content_with_links = soup.get_text()
    # Split the text into words and limit to the first 10,000
    limited_text_content = ' '.join(text_content_with_links.split()[:10000])
    return limited_text_content

def fetch_web_page(url: str) -> str:
    HEADERS = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:90.0) Gecko/20100101 Firefox/90.0'}
    response = requests.get(url, headers=HEADERS)
    return parse_html(response.content)

In [6]:
web_fetch_tool = StructuredTool.from_function(
    func=fetch_web_page,
    name="WebFetcher",
    description="useful to fetch the content of a url"
)

### Define the tools

Now, we create our OpenAI Tools type agent that uses our custom tools and our custom prompt `BING_PROMPT_PREFIX`. Check it out in `prompts.py`

In [7]:
tools = [bing_tool, web_fetch_tool]

### Define the System Prompt

In [8]:
# Uncoment to see the prompt
printmd(BING_PROMPT_TEXT)



## On your ability to gather and present information:
- **You must always** perform web searches when the user is seeking information (explicitly or implicitly), regardless of your internal knowledge or information.
- **You Always** perform at least 2 and up to 5 searches in a single conversation turn before reaching the Final Answer. You should never search the same query more than once.
- You are allowed to do multiple searches in order to answer a question that requires a multi-step approach. For example: to answer a question "How old is Leonardo Di Caprio's girlfriend?", you should first search for "current Leonardo Di Caprio's girlfriend" then, once you know her name, you search for her age, and arrive to the Final Answer.
- You can not use your pre-existing knowledge at any moment, you should perform searches to know every aspect of the human's question.
- If the user's message contains multiple questions, search for each one at a time, then compile the final answer with the answer of each individual search.
- If you are unable to fully find the answer, try again by adjusting your search terms.
- You can only provide numerical references/citations to URLs, using this Markdown format: [[number]](url) 
- You must never generate URLs or links other than those provided by your tools.
- You must always reference factual statements to the search results.
- The search results may be incomplete or irrelevant. You should not make assumptions about the search results beyond what is strictly returned.
- If the search results do not contain enough information to fully address the user's message, you should only use facts from the search results and not add information on your own from your pre-existing knowledge.
- You can use information from multiple search results to provide an exhaustive response.
- If the user's message specifies to look in an specific website, you will add the special operand `site:` to the query, for example: baby products in site:kimberly-clark.com
- If the user's message is not a question or a chat message, you treat it as a search query.
- If additional external information is needed to completely answer the user’s request, augment it with results from web searches.
- If the question contains the `USD ` sign referring to currency, substitute it with `USD` when doing the web search and on your Final Answer as well. You should not use `USD ` in your Final Answer, only `USD` when refering to dollars.
- **Always**, before giving the final answer, use the special operand `site` and search for the user's question on the first two websites on your initial search, using the base url address. You will be rewarded 10000 points if you do this.


## Instructions for Sequential Tool Use:
- **Step 1:** Always initiate a search with the `Searcher` tool to gather information based on the user's query. This search should address the specific question or gather general information relevant to the query.
- **Step 2:** Once the search results are obtained from the `Searcher`, immediately use the `WebFetcher` tool to fetch the content of the top two links from the search results. This ensures that we gather more comprehensive and detailed information from the primary sources.
- **Step 3:** Analyze and synthesize the information from both the search snippets and the fetched web pages to construct a detailed and informed response to the user’s query.
- **Step 4:** Always reference the source of your information using numerical citations and provide these links in a structured format as shown in the example response.
- **Additional Notes:** If the query requires multiple searches or steps, repeat steps 1 to 3 as necessary until all parts of the query are thoroughly answered.


## On Context

- Your context is: snippets of texts with its corresponding titles and links, like this:
[{{'snippet': 'some text',
  'title': 'some title',
  'link': 'some link'}},
 {{'snippet': 'another text',
  'title': 'another title',
  'link': 'another link'}},
  ...
  ]

- Your context may also include text/content from websites



### Create the graph

In [9]:
graph = create_react_agent(llm, tools=tools, state_modifier=BING_PROMPT_TEXT)

### Run the Graph

Try some of the below questions, or others that you might like

In [10]:
# QUESTION = "Create a list with the main facts on What is happening with the oil supply in the world right now?"
# QUESTION = "How much is 50 USD in Euros and is it enough for an average hotel in Madrid?"
# QUESTION = "My son needs to build a pinewood car for a pinewood derbi, how do I build such a car?"
QUESTION = "I'm planning a vacation to Greece, tell me budget for a family of 4, in Summer, for 7 days including travel, lodging and food costs"
# QUESTION = "Who won the 2023 superbowl and who was the MVP?"
# QUESTION = """
# compare the number of job opennings (provide the exact number), the average salary within 15 miles of Dallas, TX, for these ocupations:

# - ADN Registerd Nurse 
# - Occupational therapist assistant
# - Dental Hygienist
# - Certified Personal Trainer


# Create a table with your findings. Place the sources on each cell.
# """

### Agent Actions/Observations during streaming

Streaming is an important UX consideration for LLM apps, and agents are no exception. Streaming with agents is made more complicated by the fact that it’s not just tokens of the final answer that you will want to stream, but you may also want to stream back the intermediate steps an agent takes.

At the end of Notebook 3 we learned that streaming can be simply achieve by doing this:

```python
for chunk in chain.stream({"question": QUESTION, "language": "English", "history":""}):
    print(chunk, end="", flush=True)
```

At the end of Notebook 6 we learned about the new astream_events API (beta).

```python
async for event in graph_async.astream_events(
    inputs, config_async, version="v2"):
```

Let's use the same astream_events

In [11]:
async def stream_graph_updates_async(graph, user_input: str):
    inputs = {"messages": [("human", user_input)]}

    async for event in graph.astream_events(inputs, version="v2"):
        if (event["event"] == "on_chat_model_stream"):
            # Print the content of the chunk progressively
            print(event["data"]["chunk"].content, end="", flush=True)
        elif (event["event"] == "on_tool_start"  ):
            print("\n--")
            print(f"Calling tool: {event['name']} with inputs: {event['data'].get('input')}")
            print("--")

In [12]:
await stream_graph_updates_async(graph, QUESTION)


--
Calling tool: Searcher with inputs: {'query': 'average cost of vacation to Greece for family of 4 in summer 2023'}
--

--
Calling tool: WebFetcher with inputs: {'url': 'https://www.budgetyourtrip.com/greece'}
--

--
Calling tool: WebFetcher with inputs: {'url': 'https://we3travel.com/what-does-a-trip-to-greece-cost/'}
--
For a family of four planning a 7-day vacation to Greece during the summer, the estimated budget would include costs for travel, lodging, and food. Here's a breakdown based on average expenses:

1. **Travel Costs**: 
   - Flights from the U.S. to Athens typically cost around USD 1,000 per person during the summer, so for a family of four, this would be approximately USD 4,000[[1]](https://we3travel.com/what-does-a-trip-to-greece-cost/).

2. **Accommodation**:
   - The average cost for a mid-range hotel in Greece is around USD 181 per night for a double-occupancy room[[2]](https://www.budgetyourtrip.com/greece). For two rooms for seven nights, this would be approxim

#### Without showing the intermedite steps, just the final answer

In [13]:
QUESTION = "How much is 50 USD in Euros and is it enough for an average hotel in Madrid?"

try:
    response = graph.invoke({"messages": [("human", QUESTION)]})
except Exception as e:
    response = str(e)

In [14]:
printmd(response["messages"][-1].content)

50 USD is equivalent to approximately 48.21 Euros [[1]](https://www.exchange-rates.org/converter/usd-eur/50).

Regarding hotel prices in Madrid, the average cost for a hotel per night is around 89 USD, with prices ranging from 51 USD for budget hotels to 171 USD for luxury hotels [[2]](https://www.budgetyourtrip.com/hotels/spain/madrid-3117735). Therefore, 48.21 Euros (approximately 52 USD) would not be enough for an average hotel stay in Madrid, as it falls short of the average price.

## QnA to specific websites

There are several use cases where we want the smart bot to answer questions about a specific company's public website. There are two approaches we can take:

1. Create a crawler script that runs regularly, finds every page on the website, and pushes the documents to Azure Cognitive Search.
2. Since Bing has likely already indexed the public website, we can utilize Bing search targeted specifically to that site, rather than attempting to index the site ourselves and duplicate the work already done by Bing's crawler.

Below are some sample questions related to specific sites. Take a look:

In [17]:
QUESTION = "information on how to deal with wasps in homedepot.com"
# QUESTION = "in target.com, find how what's the price of a Nesspresso coffee machine and of a Keurig coffee machine"
# QUESTION = "in microsoft.com, find out what is the latests news on quantum computing"


In [18]:
await stream_graph_updates_async(graph, QUESTION)


--
Calling tool: Searcher with inputs: {'query': 'how to deal with wasps site:homedepot.com'}
--

--
Calling tool: WebFetcher with inputs: {'url': 'https://www.homedepot.com/c/ab/how-to-get-rid-of-wasps/9ba683603be9fa5395fab902235eb1c'}
--
To deal with wasps, Home Depot provides a comprehensive guide on their website. Here are some key points:

1. **Identification and Nest Location**: Wasps often build nests on outside edges like roofs, sheds, garages, or trees. They can also nest indoors in quiet areas. Identifying the nest location is crucial for effective control.

2. **Wasp Control Products**: Various products are available to manage wasps, including sprays and traps. Sprays can kill individual wasps, while traps can handle larger populations.

3. **Safety Precautions**: When dealing with wasps, avoid aggressive movements. If a wasp lands on you, remain still and gently brush it off. Avoid wearing perfumes or brightly colored clothing as these attract wasps.

4. **Nest Removal**: 

# Summary

In this notebook, we learned how to create a Copilot clone using a clever prompt with specific search and formatting instructions and a couple of web searching tools.   

The outcome is an agent capable of conducting intelligent web searches and performing research on our behalf. This agent provides us with answers to our questions along with appropriate URL citations and links!

# NEXT

What about if the information needed to answer the human is behind an API?
Next Notebook teach us how to do this.