<a href="https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/t81_559_class_07_3_search_tools.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-559: Applications of Generative Artificial Intelligence
**Module 7: LangChain: Agents**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 7 Material

* Part 7.1: Introduction to LangChain Agents [[Video]](https://www.youtube.com/watch?v=J5Vr___lSSs) [[Notebook]](t81_559_class_07_1_agents.ipynb)
* Part 7.2: Understanding LangChain Agent Tools [[Video]](https://www.youtube.com/watch?v=qMquBmteYw4) [[Notebook]](t81_559_class_07_2_tools.ipynb)
* **Part 7.3: LangChain Retrival and Search Tools** [[Video]](https://www.youtube.com/watch?v=NB5qGPLoBBE) [[Notebook]](t81_559_class_07_3_search_tools.ipynb)
* Part 7.4: Constructing LangChain Agents [[Video]](https://www.youtube.com/watch?v=OJe5oHvrdHk) [[Notebook]](t81_559_class_07_4_more_agent.ipynb)
* Part 7.5: Custom Agents [[Video]](https://www.youtube.com/watch?v=IsJemVYSEdc) [[Notebook]](t81_559_class_07_5_custom_agent.ipynb)

# Google CoLab Instructions

The following code ensures that Google CoLab is running and maps Google Drive if needed.

In [3]:
import os

try:
    from google.colab import drive, userdata
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

# OpenAI Secrets
if COLAB:
    os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
    os.environ["TAVILY_API_KEY"] = userdata.get('TAVILY_API_KEY')

# Install needed libraries in CoLab
if COLAB:
    !pip install langchain langchain_openai langchain_experimental duckduckgo-search langchainhub sentence-transformers chromadb

Note: not using Google CoLab


# 7.3: LangChain Retrival and Search Tools

In this section we will look at two forms of retrival for use with LangChain agents: search and retrival tools. Search tools allow your agent to access search engines such as Google to provide very current information to the agent. Additionally, you can use RAG to augment your agent's knowledge with a vectorized document store, similar to previous RAG examples in this course.

## Utilizing Duck Duck Go Search

DuckDuckGo is a search engine designed with a strong emphasis on user privacy. Unlike many mainstream search engines, DuckDuckGo does not track your searches or store personal information, ensuring a more secure and private online experience. Although you may not have heard of DuckDuckGo, it has gained popularity for its straightforward approach to search and its commitment to user privacy.

For many examples in this course, we will use [DuckDuckGo](https://duckduckgo.com/) for several important reasons. Firstly, its API is free to access and does not require an authentication key, simplifying the process and allowing us to dive straight into coding without the hassle of complex setups. Secondly, using DuckDuckGo will help us stay mindful of privacy issues and understand the importance of protecting personal data in our digital interactions. This practical experience will not only improve our technical skills but also broaden our knowledge of the diverse tools available in the digital ecosystem.

In [4]:
from langchain_community.tools import DuckDuckGoSearchRun

search_tool = DuckDuckGoSearchRun()
search_tool.run("Who is the president of the USA?")

"The White House, official residence of the president of the United States, in July 2008. The president of the United States is the head of state and head of government of the United States, [1] indirectly elected to a four-year term via the Electoral College. [2] The officeholder leads the executive branch of the federal government and is the commander-in-chief of the United States Armed ... As the head of the government of the United States, the president is arguably the most powerful government official in the world. The president is elected to a four-year term via an electoral college system. Since the Twenty-second Amendment was adopted in 1951, the American presidency has been limited to a maximum of two terms.. Click on a president below to learn more about each presidency ... Joe Biden is the 46th president of the United States (2021- ). He was born on November 20, 1942, in Scranton, Pennsylvania, and he served as a U.S. senator representing Delaware from 1972 to 2009. He was v

In [5]:
from langchain import hub
from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.agents import AgentExecutor

MODEL = 'gpt-4o-mini'

llm = ChatOpenAI(
        model=MODEL,
        temperature=0.2,
        n=1
    )

search_tool = DuckDuckGoSearchRun()

prompt = hub.pull("hwchase17/openai-functions-agent")

tools = [search_tool]
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "Return the value of the DJIA as a floating point number, return just the number, no text or comments."})





[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `duckduckgo_search` with `{'query': 'DJIA current value'}`


[0m[36;1m[1;3mDow Jones Today: Get all information on the Dow Jones Index including historical chart, news and constituents. Dow Jones Industrial Average + Add to watchlist + Add an alert. DJI:DJI. Dow Jones Industrial Average. Actions. Add to watchlist; Add an alert; Price (USD) 42,063.36; Today's Change 38.17 / 0.09%; Shares traded 1.22bn; ... Nuclear fuel prices surge as west rues shortage of conversion facilities Sep 21 2024; Get Dow Jones Industrial Average (.DJI) real-time stock quotes, news, price and financial information from Reuters to inform your trading and investments Graph and download economic data for Dow Jones Industrial Average (DJIA) from 2014-09-22 to 2024-09-19 about stock market, average, industry, and USA. Dow Jones Industrial Average. ... The observations for the Dow Jones Industrial Average represent the daily index value at ma

{'input': 'Return the value of the DJIA as a floating point number, return just the number, no text or comments.',
 'output': '42063.36'}

## Using Tavily Search

We will now explore how to use [Tavily](https://tavily.com/) with LangChain. While this course will primarily focus on using DuckDuckGo due to its free access, it’s important to note that Tavily also offers a free version. The free plan of Tavily allows users to make up to 1,000 API calls per month without requiring a credit card, making it accessible for those who want to experience its advanced search capabilities tailored for large language models (LLMs).

Using Tavily can offer several benefits. Firstly, Tavily is optimized for LLMs, providing highly relevant, factual, and contextually appropriate search results. This optimization helps in reducing hallucinations and improving the overall decision-making capabilities of AI agents. Secondly, Tavily supports retrieval-augmented generation (RAG), which is crucial for ensuring the accuracy and relevance of information used by LLMs.

Moreover, Tavily’s seamless integration with popular AI frameworks like LangChain and LlamaIndex ensures that developers can easily incorporate it into their existing workflows. The efficiency and speed of Tavily’s search capabilities, combined with its comprehensive coverage by aggregating data from multiple sources, make it an invaluable tool for AI-driven applications. While DuckDuckGo remains a practical and cost-effective choice for many users, exploring Tavily’s advanced features can provide significant enhancements to the performance and accuracy of your AI models​.

The following code shows a sample search query presented to Tavily. The LLM would process these results to give it additional information to form a response.




In [6]:
from langchain_community.tools.tavily_search import TavilySearchResults

tool = TavilySearchResults()
tool.invoke({"query": "What happened in the latest burning man floods"})

[{'url': 'https://www.npr.org/2023/09/03/1197497458/the-latest-on-the-burning-man-flooding',
 {'url': 'https://www.nbcnews.com/news/us-news/live-blog/live-updates-burning-man-flooding-keeps-thousands-stranded-nevada-site-rcna103193',
  'content': "Profile\nSections\ntv\nFeatured\nMore From NBC\nFollow NBC News\nnews Alerts\nThere are no new alerts at this time\nBurning Man flooding keeps thousands stranded at Nevada site as authorities investigate 1 death\nBurning Man attendees struggling to get home\n70,000+ stuck at Burning Man: When will they be able to get out?\n Thousands still stranded at Burning Man after torrential rain\nBurning Man revelers unfazed by deluge and deep mud\nReuters\nThousands of Burning Man attendees partied hard on Sunday despite downpours that turned the Nevada desert where the annual arts and music festival takes place into a sea of sticky mud and led officials to order the multitudes to shelter in place.\n Neal Katyal warns hiking in the mud\ncan be 'worse t

We now see that we can add it as a tool exactly like we used DuckDuckGo.

In [7]:
from langchain import hub
from langchain.agents import create_tool_calling_agent
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.agents import AgentExecutor

search_tool = TavilySearchResults()

prompt = hub.pull("hwchase17/openai-functions-agent")

tools = [search_tool]
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "Who is the oldest world leader?"})





[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `tavily_search_results_json` with `{'query': 'oldest world leader 2023'}`


[0m[36;1m[1;3m[{'url': 'https://www.9news.com.au/world/oldest-world-leaders-by-age-explained/fb257182-9e2b-4958-836a-c6794efd70ca', 'content': 'Irish President Michael\ufeff D Higgins is the sixth-oldest world leader on the planet. ... Al-Sabah became Emir of Kuwait in December 2023. He had been the oldest crown prince in the world until his ...'}, {'url': 'https://digg.com/data-viz/link/the-oldest-and-youngest-world-leaders-visualized-aOa48h0xAC', 'content': 'There are currently 13 world leaders who are women and their median age is 57. Finnish Prime Minister Sanna Marin is the youngest at 37-years-old while Bangladeshi Prime Minister Sheikh Hasina, who has been PM since 2009, is the oldest at 75-years-old. Official records show that the youngest head of government, in 2023, is Gabriel Boric ...'}, {'url': 'https://www.statista.com/stat

{'input': 'Who is the oldest world leader?',
 'output': 'As of 2023, the oldest world leader is Paul Biya, the President of Cameroon, who is 90 years old. Following him is Salman bin Abdulaziz Al Saud, the King of Saudi Arabia, who is also in his late 80s. Other notable older leaders include Sheikh Hasina, the Prime Minister of Bangladesh, who is 75 years old. \n\nFor more details, you can check the sources [here](https://www.washingtonpost.com/world/interactive/2024/biden-trump-age-global-leaders-comparison/).'}

## Using Agents with Retrival

We can also utilize retrieval for agents. We will use the RAG system that we previously used to create randomly generated biographical sketches for people working at five fictional companies. We use the same steps previously covered to load these documents into a LangChain retriever.

In [8]:
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import TextLoader
from langchain import OpenAI, PromptTemplate
from langchain_openai import ChatOpenAI
from IPython.display import display_markdown
from langchain.indexes import VectorstoreIndexCreator
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores.inmemory import InMemoryVectorStore
from langchain.schema import Document
import requests

urls = [
    "https://data.heatonresearch.com/data/t81-559/bios/DD.txt",
    "https://data.heatonresearch.com/data/t81-559/bios/FT.txt",
    "https://data.heatonresearch.com/data/t81-559/bios/GS.txt",
    "https://data.heatonresearch.com/data/t81-559/bios/NGS.txt",
    "https://data.heatonresearch.com/data/t81-559/bios/TI.txt"
]

def chunk_text(text, chunk_size, overlap):
    chunks = []
    for i in range(0, len(text), chunk_size - overlap):
        chunks.append(text[i:i + chunk_size])
    return chunks

chunk_size = 900
overlap = 300

documents = []

for url in urls:
    print(f"Reading: {url}")
    response = requests.get(url)
    response.raise_for_status()  # Ensure we notice bad responses
    content = response.text
    chunks = chunk_text(content, chunk_size, overlap)
    for chunk in chunks:
        document = Document(page_content=chunk)
        documents.append(document)

from langchain_text_splitters import CharacterTextSplitter
from langchain_community.embeddings.sentence_transformer import (
    SentenceTransformerEmbeddings,
)
from langchain.vectorstores import Chroma

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
db = Chroma.from_documents(docs, embedding_function)
retriever = db.as_retriever()

Reading: https://data.heatonresearch.com/data/t81-559/bios/DD.txt
Reading: https://data.heatonresearch.com/data/t81-559/bios/FT.txt
Reading: https://data.heatonresearch.com/data/t81-559/bios/GS.txt
Reading: https://data.heatonresearch.com/data/t81-559/bios/NGS.txt
Reading: https://data.heatonresearch.com/data/t81-559/bios/TI.txt


  embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
  from tqdm.autonotebook import tqdm, trange


We test to see if the retriever can find text associated with one of the ficicious employees, Samantha Doyle.

In [9]:
retriever.invoke("who is Samantha Doyle")[0]

Document(page_content='the next generation of female tech leaders.\n\nSamantha Doyle is a seasoned Project Manager at Global Solutions, an innovative tech company known for pioneering smart city technologies. With over a decade of experience in the tech industry, Samantha has played a pivotal role in leading her team through successful launches of multiple high-profile sustainability projects. She holds a Masterâ\x80\x99s degree in Systems Engineering from MIT and has a passion for integrating eco-friendly practices into urban development. Outside of her professional life, Samantha is an avid rock climancer and enjoys mentoring young women interested in STEM careers, often volunteering her time at local high schools and community centers. Her dedication to both her career and community has made her a respected leader at Global Solutions and an influential figure in her field.\n\nSamantha Clarke is a seasoned Project Ma')

We now construct the tool, as you can see, we also give the agent instructions on when it should make use of this retriever.

In [10]:
from langchain.tools.retriever import create_retriever_tool

retriever_tool = create_retriever_tool(
    retriever,
    "langsmith_search",
    "Search for information about people who work for several companies. For any questions about people you do not know, you must use this tool!",
)

We can now ask the agent a question that uses both search and the RAG documents. The following question:

> What are the job prospects in 2024 for Samantha Doyle's career? Return just a 2-sentence assessment of her career future.

This question requires the RAG data to determine Samantha's job. It then uses the search engine to find the current 2024 prospects of this job.

In [11]:
from langchain import hub
from langchain.agents import create_tool_calling_agent
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.agents import AgentExecutor

MODEL = 'gpt-4o-mini'

search_tool = search_tool = DuckDuckGoSearchRun()

prompt = hub.pull("hwchase17/openai-functions-agent")

tools = [search_tool, retriever_tool]
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "What are the job prospects in 2024 for Samantha Doyle's career? Return just a 2-sentence assessment of her career future."})





[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `langsmith_search` with `{'query': 'Samantha Doyle career prospects 2024'}`


[0m[33;1m[1;3mutions with user-friendly interfaces has earned her multiple awards for innovation and leadership. Outside of her professional life, Samantha is an avid rock climber and volunteers her time mentoring young women interested in STEM careers, aiming to inspire the next generation of female tech leaders.

Samantha Doyle is a seasoned Project Manager at Global Solutions, an innovative tech company known for pioneering solutions in artificial intelligence and big data analytics. With over a decade of experience in the tech industry, Samantha has played a pivotal role in leading her team through successful launches of several high-profile projects aimed at enhancing cybersecurity measures for multinational corporations. A graduate of MIT with a degree in Computer Science, her expertise and forward-thinking approach have not only

{'input': "What are the job prospects in 2024 for Samantha Doyle's career? Return just a 2-sentence assessment of her career future.",
 'output': "Samantha Doyle's career prospects in 2024 appear promising, given her extensive experience in project management and her leadership in innovative tech projects related to smart city technologies. Her strong background in integrating complex systems and her commitment to mentoring young women in STEM positions her as a respected leader, likely to attract further opportunities for advancement and influence in her field."}