<a href="https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/t81_559_class_07_3_search_tools.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-559: Applications of Generative Artificial Intelligence
**Module 7: LangChain: Agents**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 7 Material

* Part 7.1: Introduction to LangChain Agents [[Video]](https://www.youtube.com/watch?v=J5Vr___lSSs) [[Notebook]](t81_559_class_07_1_agents.ipynb)
* Part 7.2: Understanding LangChain Agent Tools [[Video]](https://www.youtube.com/watch?v=qMquBmteYw4) [[Notebook]](t81_559_class_07_2_tools.ipynb)
* **Part 7.3: LangChain Retrival and Search Tools** [[Video]](https://www.youtube.com/watch?v=NB5qGPLoBBE) [[Notebook]](t81_559_class_07_3_search_tools.ipynb)
* Part 7.4: Constructing LangChain Agents [[Video]](https://www.youtube.com/watch?v=OJe5oHvrdHk) [[Notebook]](t81_559_class_07_4_more_agent.ipynb)
* Part 7.5: Custom Agents [[Video]](https://www.youtube.com/watch?v=IsJemVYSEdc) [[Notebook]](t81_559_class_07_5_custom_agent.ipynb)

# Google CoLab Instructions

The following code ensures that Google CoLab is running and maps Google Drive if needed.

In [12]:
import os

try:
    from google.colab import drive, userdata
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

# OpenAI Secrets
if COLAB:
    os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
    os.environ["TAVILY_API_KEY"] = userdata.get('TAVILY_API_KEY')

# Install needed libraries in CoLab
if COLAB:
    !pip install langchain langchain_openai langchain_experimental ddgs duckduckgo-search langchainhub sentence-transformers chromadb

Note: using Google CoLab


# 7.3: LangChain Retrival and Search Tools

In this section we will look at two forms of retrival for use with LangChain agents: search and retrival tools. Search tools allow your agent to access search engines such as Google to provide very current information to the agent. Additionally, you can use RAG to augment your agent's knowledge with a vectorized document store, similar to previous RAG examples in this course.

## Utilizing Duck Duck Go Search

DuckDuckGo is a search engine designed with a strong emphasis on user privacy. Unlike many mainstream search engines, DuckDuckGo does not track your searches or store personal information, ensuring a more secure and private online experience. Although you may not have heard of DuckDuckGo, it has gained popularity for its straightforward approach to search and its commitment to user privacy.

For many examples in this course, we will use [DuckDuckGo](https://duckduckgo.com/) for several important reasons. Firstly, its API is free to access and does not require an authentication key, simplifying the process and allowing us to dive straight into coding without the hassle of complex setups. Secondly, using DuckDuckGo will help us stay mindful of privacy issues and understand the importance of protecting personal data in our digital interactions. This practical experience will not only improve our technical skills but also broaden our knowledge of the diverse tools available in the digital ecosystem.

In [13]:
from langchain_community.tools import DuckDuckGoSearchRun

search_tool = DuckDuckGoSearchRun()
search_tool.run("Who is the president of the USA?")

'The president of the United States is the head of state and head of government of the United States, [1] indirectly elected to a four-year term via the Electoral College. [2] The president of the United States (POTUS) [B] is the head of state and head of government of the United States. The president directs the executive branch of the federal government and is the commander-in-chief of the United States Armed Forces. The president of the United States (POTUS) [11] is the head of state and head of government of the United States of America and the commander-in-chief of the United States Armed Forces. Apr 15, 2025 · Donald J. Trump is the President of the USA in 2025 after a landslide re-election victory. Explore his achievements, policies, and historic second term. Jan 20, 2025 · Donald Trump was sworn in Monday as the 47th president of the United States in one of the most remarkable political comebacks in U.S. history.'

In [14]:
from langchain import hub
from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.agents import AgentExecutor

# You can switch this to gpt-5-mini; however OpenAI will require you to validate
# your account.
# MODEL = 'gpt-5-mini'
MODEL = 'gpt-4o-mini'

llm = ChatOpenAI(
        model=MODEL,
        temperature=0.2,
        n=1
    )

search_tool = DuckDuckGoSearchRun()

prompt = hub.pull("hwchase17/openai-functions-agent")

tools = [search_tool]
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "Return the value of the DJIA as a floating point number, return just the number, no text or comments."})





[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `duckduckgo_search` with `{'query': 'current DJIA value'}`


[0m[36;1m[1;3m. DJI . DJIA . Constituents.The following table shows the annual development of the Dow Jones Index, which was calculated back to 1896.[39][40]. End-of-year closing values for DJIA . We can estimate what the DJIA “should” have been trading at (or is worth) based on the value of its current earnings and dividends and the projected growth in those earnings and dividends. As of December 27, 2024, DIA is trading at $433.21, reflecting the index ’ s current value . ... Current Stock Market Dow Jones: Index Today DJIA ... DJIA stock price reflects the current value of one of the most popular and recognizable stock market indices. ... we will check the current DJIA ... ... calculates the value that the DJIA ... Since the DJIA is (somewhat coincidentally) currently 9744, I conclude that it is likely about fairly valued.[0m[32;1m[1;3m
Invoking:

{'input': 'Return the value of the DJIA as a floating point number, return just the number, no text or comments.',
 'output': '45544.88'}

## Using Tavily Search

We will now explore how to use [Tavily](https://tavily.com/) with LangChain. While this course will primarily focus on using DuckDuckGo due to its free access, it’s important to note that Tavily also offers a free version. The free plan of Tavily allows users to make up to 1,000 API calls per month without requiring a credit card, making it accessible for those who want to experience its advanced search capabilities tailored for large language models (LLMs).

Using Tavily can offer several benefits. Firstly, Tavily is optimized for LLMs, providing highly relevant, factual, and contextually appropriate search results. This optimization helps in reducing hallucinations and improving the overall decision-making capabilities of AI agents. Secondly, Tavily supports retrieval-augmented generation (RAG), which is crucial for ensuring the accuracy and relevance of information used by LLMs.

Moreover, Tavily’s seamless integration with popular AI frameworks like LangChain and LlamaIndex ensures that developers can easily incorporate it into their existing workflows. The efficiency and speed of Tavily’s search capabilities, combined with its comprehensive coverage by aggregating data from multiple sources, make it an invaluable tool for AI-driven applications. While DuckDuckGo remains a practical and cost-effective choice for many users, exploring Tavily’s advanced features can provide significant enhancements to the performance and accuracy of your AI models​.

The following code shows a sample search query presented to Tavily. The LLM would process these results to give it additional information to form a response.




In [15]:
from langchain_community.tools.tavily_search import TavilySearchResults

tool = TavilySearchResults()
tool.invoke({"query": "What happened in the latest burning man floods"})

[{'title': 'Burning Man attendees face more weather woes as thunderstorms ...',
  'url': 'https://www.theguardian.com/culture/2025/aug/26/burning-man-weather-storms',
  'content': 'It was confirmed that at least one major art installation was destroyed in the storm: an 8-ton inflatable thundercloud known as “Black Cloud” reportedly held together for 15 minutes before being ripped apart by winds. The piece, created by a Ukrainian-led team, was meant to symbolize the “specter of world war”.\n\nBy Monday, organizers confirmed that both the festival gates and the Black Rock City airport had reopened, allowing the event to resume as scheduled. [...] “There are some of the structures that are blown over,” Austin Matthew, an art creator at Burning Man, told Fox Weather. “Things had broken. And even just driving to our camp spot, there were all camps that were pretty much wiped out. Things were just laying on their side, completely annihilated at that point.”\n\nMatthew had arrived on Sunday a

We now see that we can add it as a tool exactly like we used DuckDuckGo.

In [16]:
from langchain import hub
from langchain.agents import create_tool_calling_agent
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.agents import AgentExecutor

search_tool = TavilySearchResults()

prompt = hub.pull("hwchase17/openai-functions-agent")

tools = [search_tool]
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "Who is the oldest world leader?"})





[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `tavily_search_results_json` with `{'query': 'oldest world leader 2023'}`


[0m[36;1m[1;3m[{'title': 'List of oldest living state leaders', 'url': 'https://en.wikipedia.org/wiki/List_of_oldest_living_state_leaders', 'content': '| 96 | Abdou Diouf | Senegal |  Prime Minister (1970–1980)  President (1981–2000) | 7 Sep 1935 | 89 years, 353 days | 24 Mar 2025 |\n| 97 | Henri de Coignac | Andorra | French Viguier (1982–1984) | 3 Oct 1935 | 89 years, 327 days | 5 Feb 2023 |\n| 98 | Mahmoud Abbas | Palestine |  Prime Minister of the Palestinian National Authority (2003)  President of the Palestinian National Authority (2005–present)  President of Palestine (2005–present) | 15 Nov 1935 | 89 years, 284 days | 14 Jul 2025 | [...] | 34 | Fernando Henrique Cardoso | Brazil | President (1995–2003) | 18 Jun 1931 | 94 years, 69 days | 18 Jun 2025 |\n| 35 | Khieu Samphan | Democratic Kampuchea |  Acting Prime Minister (1976)  C

{'input': 'Who is the oldest world leader?',
 'output': 'As of now, the oldest world leader is **Paul Biya**, the President of Cameroon, who was born in 1933 and is currently 91 years old. He has been in office for over 40 years and is the only current national leader in his 90s. \n\nFor more details, you can refer to the article [here](https://www.pewresearch.org/short-reads/2024/05/01/as-biden-and-trump-seek-reelection-who-are-the-oldest-and-youngest-current-world-leaders/).'}

## Using Agents with Retrival

We can also utilize retrieval for agents. We will use the RAG system that we previously used to create randomly generated biographical sketches for people working at five fictional companies. We use the same steps previously covered to load these documents into a LangChain retriever.

In [17]:
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import TextLoader
from langchain import OpenAI, PromptTemplate
from langchain_openai import ChatOpenAI
from IPython.display import display_markdown
from langchain.indexes import VectorstoreIndexCreator
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores.inmemory import InMemoryVectorStore
from langchain.schema import Document
import requests

urls = [
    "https://data.heatonresearch.com/data/t81-559/bios/DD.txt",
    "https://data.heatonresearch.com/data/t81-559/bios/FT.txt",
    "https://data.heatonresearch.com/data/t81-559/bios/GS.txt",
    "https://data.heatonresearch.com/data/t81-559/bios/NGS.txt",
    "https://data.heatonresearch.com/data/t81-559/bios/TI.txt"
]

def chunk_text(text, chunk_size, overlap):
    chunks = []
    for i in range(0, len(text), chunk_size - overlap):
        chunks.append(text[i:i + chunk_size])
    return chunks

chunk_size = 900
overlap = 300

documents = []

for url in urls:
    print(f"Reading: {url}")
    response = requests.get(url)
    response.raise_for_status()  # Ensure we notice bad responses
    content = response.text
    chunks = chunk_text(content, chunk_size, overlap)
    for chunk in chunks:
        document = Document(page_content=chunk)
        documents.append(document)

from langchain_text_splitters import CharacterTextSplitter
from langchain_community.embeddings.sentence_transformer import (
    SentenceTransformerEmbeddings,
)
from langchain.vectorstores import Chroma

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
db = Chroma.from_documents(docs, embedding_function)
retriever = db.as_retriever()

Reading: https://data.heatonresearch.com/data/t81-559/bios/DD.txt
Reading: https://data.heatonresearch.com/data/t81-559/bios/FT.txt
Reading: https://data.heatonresearch.com/data/t81-559/bios/GS.txt
Reading: https://data.heatonresearch.com/data/t81-559/bios/NGS.txt
Reading: https://data.heatonresearch.com/data/t81-559/bios/TI.txt


We test to see if the retriever can find text associated with one of the ficicious employees, Samantha Doyle.

In [18]:
retriever.invoke("who is Samantha Doyle")[0]

Document(metadata={}, page_content='the next generation of female tech leaders.\n\nSamantha Doyle is a seasoned Project Manager at Global Solutions, an innovative tech company known for pioneering smart city technologies. With over a decade of experience in the tech industry, Samantha has played a pivotal role in leading her team through successful launches of multiple high-profile sustainability projects. She holds a Masterâ\x80\x99s degree in Systems Engineering from MIT and has a passion for integrating eco-friendly practices into urban development. Outside of her professional life, Samantha is an avid rock climancer and enjoys mentoring young women interested in STEM careers, often volunteering her time at local high schools and community centers. Her dedication to both her career and community has made her a respected leader at Global Solutions and an influential figure in her field.\n\nSamantha Clarke is a seasoned Project Ma')

We now construct the tool, as you can see, we also give the agent instructions on when it should make use of this retriever.

In [19]:
from langchain.tools.retriever import create_retriever_tool

retriever_tool = create_retriever_tool(
    retriever,
    "langsmith_search",
    "Search for information about people who work for several companies. For any questions about people you do not know, you must use this tool!",
)

We can now ask the agent a question that uses both search and the RAG documents. The following question:

> What are the job prospects in 2024 for Samantha Doyle's career? Return just a 2-sentence assessment of her career future.

This question requires the RAG data to determine Samantha's job. It then uses the search engine to find the current 2024 prospects of this job.

In [20]:
from langchain import hub
from langchain.agents import create_tool_calling_agent
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.agents import AgentExecutor

search_tool = search_tool = DuckDuckGoSearchRun()

prompt = hub.pull("hwchase17/openai-functions-agent")

tools = [search_tool, retriever_tool]
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "What are the job prospects in 2024 for Samantha Doyle's career? Return just a 2-sentence assessment of her career future."})





[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `langsmith_search` with `{'query': 'Samantha Doyle career prospects 2024'}`


[0m[33;1m[1;3mutions with user-friendly interfaces has earned her multiple awards for innovation and leadership. Outside of her professional life, Samantha is an avid rock climber and volunteers her time mentoring young women interested in STEM careers, aiming to inspire the next generation of female tech leaders.

Samantha Doyle is a seasoned Project Manager at Global Solutions, an innovative tech company known for pioneering solutions in artificial intelligence and big data analytics. With over a decade of experience in the tech industry, Samantha has played a pivotal role in leading her team through successful launches of several high-profile projects aimed at enhancing cybersecurity measures for multinational corporations. A graduate of MIT with a degree in Computer Science, her expertise and forward-thinking approach have not only

{'input': "What are the job prospects in 2024 for Samantha Doyle's career? Return just a 2-sentence assessment of her career future.",
 'output': "Samantha Doyle's career prospects in 2024 appear promising, given her extensive experience in project management within the innovative tech sector, particularly in smart city technologies. Her leadership skills and commitment to mentoring young women in STEM will likely enhance her professional reputation and open up further opportunities for advancement."}