
# üöÄ Agenda

#Tool calling is how LLMs evolve from *talking* to *doing*.


### 1. Why LLMs Need Tools
LLMs can think, but can they *act*?  
We‚Äôll explore how tools let them reach beyond text ‚Äî to search, calculate, and interact with the real world.

---

### 2. Build a Custom Tool from a Real API
We‚Äôll turn a real API endpoint into an LLM tool ‚Äî step by step.  
You‚Äôll see how to define its name, description, input schema, and function.

---

### 3. How Tools Are Invoked
Behind every tool call is a reasoning loop ‚Äî the model plans, names the right tool, and prepares the arguments.  
But remember: it‚Äôs not executing anything yet.

---

### 4. Binding Tools with the LLM
We‚Äôll connect our tools to the model and show how the LLM learns *when and how* to use each one.

---

### 5. The Hidden Truth: LLMs Don‚Äôt Run Tools
LLMs only decide **what** to call and **with what arguments**.  
It‚Äôs the agent (or human) that actually runs the tool.

---

### 6. Meet the Agent ‚Äî The Real Executor
The agent is the LLM‚Äôs partner ‚Äî it interprets the model‚Äôs plan, executes the tools, and returns results.  
It‚Äôs the bridge between *thinking* and *doing*.

---

### 7. Native Tool Calling in Modern Models
Models like ChatGPT or Claude are fine-tuned for tool calling ‚Äî they natively output structured tool calls.  
We‚Äôll see how effortless that feels.

---

### 8. When Models Don‚Äôt Have Native Support
Even if a model doesn‚Äôt support tool calling, smart prompting can simulate it.  
We‚Äôll teach the model to reason its way to the right tool.

---





## Install OpenAI, and LangChain dependencies


In [1]:
from warnings import filterwarnings
filterwarnings('ignore')

In [None]:
!pip install -q langchain==0.3.14
!pip install -q langchain-openai==0.3.0
!pip install -q langchain-community==0.3.14

## Install Data Extraction APIs

In [3]:
# to create custom tools
!pip install -q wikipedia==1.4.0
!pip install -q markitdown
# to highlight json
!pip install -q rich

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m58.4/58.4 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m15.4/15.4 MB[0m [31m88.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m17.4/17.4 MB[0m [31m73.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m46.0/46.0 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ

## Enter Open AI API Key

In [4]:
from getpass import getpass

OPENAI_KEY = getpass('Enter Open AI API Key: ')

Enter Open AI API Key: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


## Enter Tavily Search API Key

Get a free API key from [here](https://tavily.com/#api)

In [5]:
## Enter Tavily Search API Key

TAVILY_API_KEY = getpass('Enter Tavily Search API Key: ')

Enter Tavily Search API Key: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


## Enter WeatherAPI API Key

Get a free API key from [here](https://www.weatherapi.com/signup.aspx)

In [6]:
WEATHER_API_KEY = getpass('Enter WeatherAPI API Key: ')

Enter WeatherAPI API Key: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


## Setup Environment Variables

In [7]:
import os

os.environ['OPENAI_API_KEY'] = OPENAI_KEY
os.environ['TAVILY_API_KEY'] = TAVILY_API_KEY

## What does tool mean?

 **tools** are how an LLM *acts* ‚Äî they bridge reasoning with real-world capability.  
When a model decides to ‚Äúdo‚Äù something, it does it through a tool.

A tool is defined by:

- üè∑Ô∏è **Name** ‚Äì The name of the tool (e.g., `"search_web"`)
- üí¨ **Description** ‚Äì tells the model when and why to use it ( A description of what the tool is)
- üßæ **Input schema** ‚Äì defines *what arguments* the tool expects (JSON schema of what the inputs to the tool are)
- ‚öôÔ∏è **Function** ‚Äì the real Python function that executes the action  
- üîÅ **Return mode** ‚Äì whether the tool‚Äôs result goes directly to the user or back to the model for reasoning


# Web Search Tool

### Exploring the Tavily Search Tool

Tavily Search API is a search engine optimized for LLMs and RAG, aimed at efficient, quick and persistent search results

In [8]:
from langchain_community.tools.tavily_search import TavilySearchResults

tavily_tool = TavilySearchResults(max_results=8,
                                search_depth='advanced',
                                include_raw_content=True)

In [10]:
# 2. See all fields / parameters
from pprint import pprint
pprint(tavily_tool.dict())


{'api_wrapper': {'tavily_api_key': SecretStr('**********')},
 'args_schema': <class 'langchain_community.tools.tavily_search.tool.TavilyInput'>,
 'description': 'A search engine optimized for comprehensive, accurate, and '
                'trusted results. Useful for when you need to answer questions '
                'about current events. Input should be a search query.',
 'exclude_domains': [],
 'handle_tool_error': False,
 'handle_validation_error': False,
 'include_answer': False,
 'include_domains': [],
 'include_images': False,
 'include_raw_content': True,
 'max_results': 8,
 'metadata': None,
 'name': 'tavily_search_results_json',
 'response_format': 'content_and_artifact',
 'return_direct': False,
 'search_depth': 'advanced',
 'tags': None,
 'verbose': False}


In [11]:
tavily_tool.args

{'query': {'description': 'search query to look up',
  'title': 'Query',
  'type': 'string'}}

In [36]:
results = tavily_tool.invoke("Tell me about LLMs")
results

[{'url': 'https://en.wikipedia.org/wiki/Large_language_model',
  'content': 'A large language model (LLM) is a language model trained with self-supervisedmachine learning on a vast amount of text, designed for natural language processing tasks, especially language generation.( The largest and most capable LLMs are generative pre-trained transformers (GPTs) and provide the core capabilities of chatbots such as ChatGPT, Gemini "Gemini (chatbot)"), Perplexity and Claude "Claude (language model)"). LLMs can be fine-tuned "Fine-tuning (deep learning)") for specific tasks or [...] An LLM is a type of foundation model (large X model) trained on language. LLMs can be trained in different ways. In particular, GPT models are first pretrained to predict the next word on a large amount of data, before being fine-tuned.(\n\n### Cost\n\n[edit]\n\nImage 8 [...] They consist of billions to trillions of parameters and operate as general-purpose sequence models, generating, summarizing, translating, and

In [37]:
results[1]['url']

'https://uit.stanford.edu/service/techtraining/ai-demystified/llm'

In [38]:
from markitdown import MarkItDown

md = MarkItDown()
doc_content = md.convert(results[1]['url'])


In [39]:
doc_content = md.convert(results[1]['url'])
print(doc_content.title.strip())

AI Demystified: Introduction to large language models | University IT


In [40]:
print(doc_content.text_content)

[Skip to main content](#main-content)

[![Stanford](/themes/custom/stanford_uit/logo.svg)
University IT](/ "Stanford")
Main Menu

* [Explore services](https://uit.stanford.edu/services)
  + [View all services](/services)
  + [View services approved for High Risk Data](https://uit.stanford.edu/guide/riskclassifications#security-approved-services)
* [I want to ...](https://uit.stanford.edu)
  + [Use video conferencing tools](/videoconferencing)
  + [Get IT training](/service/techtraining/schedule)
  + [Create web forms and surveys](/service/gsuite/google-forms)
  + [Set up email](/emailcalendar/config)
  + [Set up two-step authentication](/service/authentication/twostep)
  + [Sponsor a SUNet ID](/service/sponsorship/person)
  + [Get software](https://software.stanford.edu)
  + [Connect to the network](/guide/connecting-to-network)
  + [Secure my mobile device](/service/mobiledevice/management)
  + [View website infrastructure options](/guide/website/infrastructure-options)
  + [Publish a

In [42]:
from markitdown import MarkItDown
from langchain_community.tools.tavily_search import TavilySearchResults
from tqdm import tqdm
import requests
from langchain_core.tools import tool

tavily_tool = TavilySearchResults(max_results=5,
                                  search_depth='advanced',
                                  include_answer=False,
                                  include_raw_content=True)
md = MarkItDown()

@tool
def search_web_extract_info(query: str) -> list:
    """Search the web for a query and extracts useful information from the search links"""
    results = tavily_tool.invoke(query)
    docs = []
    for result in tqdm(results):
        # Extracting all text content from the URL
        try:
            extracted_info = md.convert(result['url'])
            text_title = extracted_info.title.strip()
            text_content = extracted_info.text_content.strip()
            docs.append(text_title + '\n' + text_content)
        except:
            print('Extraction blocked for url: ', result['url'])
            pass

    return docs

In [45]:
docs = search_web_extract_info('Claude LLM')

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 5/5 [00:01<00:00,  2.74it/s]

Extraction blocked for url:  https://www.reddit.com/r/technicalwriting/comments/1be9rla/the_claude_llm_is_an_absolute_gamechanger_for_my/
Extraction blocked for url:  https://en.wikipedia.org/wiki/Claude_(language_model)





In [46]:
from IPython.display import display, Markdown

display(Markdown(docs[0]))

What Is Claude AI? | IBM
[Artificial Intelligence](https://www.ibm.com/think/artificial-intelligence)
[IT automation](https://www.ibm.com/think/it-automation)

# What is Claude AI?

![A blue drawing of various cubes representing a computer network](https://assets.ibm.com/is/image/ibm/physicalassetmgmt768x768?ts=1725472445987&dpr=off)

## Authors

[Ivan Belcic](https://www.ibm.com/think/author/ivan-belcic)

Staff writer

[Cole Stryker](https://www.ibm.com/think/author/cole-stryker.html)

Staff Editor, AI Models

IBM Think

## What is Claude AI?

Claude AI (Claude) is a generative [artificial intelligence (AI)](https://www.ibm.com/topics/artificial-intelligence) [chatbot](https://www.ibm.com/topics/chatbots) and family of [large language models (LLMs)](https://www.ibm.com/topics/large-language-models) developed by the research firm Anthropic. Claude excels at [natural language processing (NLP)](https://www.ibm.com/topics/natural-language-processing) and is multimodal: it accepts text, audio and visual inputs and can answer questions, summarize documents and generate long-form text, diagrams, animations, program code and more.

Claude adheres to Anthropic‚Äôs *Constitutional AI* philosophy: a code of ethical norms that the firm believes differentiates Claude from competing AI models such as ChatGPT and Google‚Äôs Gemini. The principles of Constitutional AI are focused on AI safety, designed to guide Claude toward providing more helpful responses while avoiding harmful behaviors such as [AI bias](https://www.ibm.com/topics/ai-bias).

Claude 3, released in May 2024, includes one free and two premium AI [chatbots](https://www.ibm.com/topics/chatbots).

* **Claude 3.5 Sonnet** underpins the free version of Claude AI. Its emphasis on speed enables it to quickly process user queries and other tasks requiring urgent data retrieval. According to Anthropic, Claude 3.5 Sonnet is twice as fast as Claude 3 Opus, one of the two premium offerings.

* **Claude 3 Opus** is one of two Claude models currently available to Claude Pro users. It provides in-depth document processing and content generation services, specializing in complex tasks. While slower than Claude 3.5 Sonnet, Opus runs a lower risk of [hallucinations](https://www.ibm.com/topics/ai-hallucinations): when an AI model provides incorrect information as though it is factually correct.

* **Claude 3 Haiku** is the second premium Claude offering. It‚Äôs the smallest and fastest of the three and is ideal for use in summarizing long documents, real-time customer service and simple text generation.

## What is Claude used for?

Each of the three Claude 3 models has its own specialized use cases. In general, people can use Claude AI to help with a wide range of tasks, including:

* Question-answering and research
* Proofreading and editing
* Document summarization, including PDFs and Word documents
* Text and content generation
* Language translation
* Business plan creation
* Image and audio processing
* Code snippet generation and review

Unlike Claude 2 and 1, Claude 3 is multimodal: it can process image and audio content alongside text-based prompts. For example, Claude 3 can generate e-commerce product descriptions based on images. While Claude 3 cannot generate nontext content on its own, its multimodal integration is one of several new features that allow it to compete with GPT-4.

## How does Claude AI work?

Like Gemini and OpenAI‚Äôs ChatGPT, Anthropic‚Äôs Claude family of AI systems are based on the transformer architecture of neural network. But unlike its competitors, Claude applies the principles of Constitutional AI to govern its behavior.

* **Transformer models** excel at drawing connections between distant words in a user input sequence, enabling them to better understand context and generate long-form replies.

* **Constitutional AI** is a guiding set of harm reduction principles designed to make Claude more beneficial with less risk.

### What are transformer models?

Transformers are a type of AI model built for high-performance natural language processing. They work by applying complex mathematical algorithms to statistically predict the most likely response to a user query. The [workflow](https://www.ibm.com/topics/workflow)¬†can be divided into four basic steps.

The transformer breaks up a user query into **tokens**. Each token represents either a whole word or a portion of a word. AI model pricing is typically represented as the cost per token. Claude Pro‚Äôs context window is 200,000 tokens[1](#footnotes1), meaning it can process user queries of up to 200,000 tokens in length.

1. Each token is plotted into a three-dimensional vector space via mathematical processes. Tokens that are assessed as more similar in meaning are plotted closer together in space, aiding LLMs in understanding user inputs. The result of this process is called a **vector embedding**.
2. Transformers such as Claude and GPT-4 apply **self-attention mechanisms** to self-direct resources on the most relevant portions of a user query and process context.
3. The model applies probabilistic algorithms to generate the **most likely response** to an input. AI models such as Claude don‚Äôt actually ‚Äúknow‚Äù anything‚Äîrather, they combine their training data with advanced statistics to yield the most probable outcomes to prompts.

### What is Constitutional AI?

Constitutional AI[2](#footnote2)¬†is a set of [AI ethics](https://www.ibm.com/topics/ai-ethics)¬†and safety principles created by AI startup Anthropic. When designing Claude, Anthropic sourced input from approximately 1,000 people, asking them to vote on and suggest rules for ethical [generative AI](https://www.ibm.com/topics/generative-ai)¬†operation and [responsible AI](https://www.ibm.com/topics/responsible-ai)¬†use. The final assembly of rules formed the basis of Claude‚Äôs training process.

The first three rules of Constitutional AI are:

* Choose the response that is the least dangerous or hateful.
* Choose the response that is as reliable, honest, and close to the truth as possible.
* Choose the response that best conveys clear intentions.

Where other models have their content reviewed by human trainers in a process called [reinforcement learning from human feedback (RLHF)](https://www.ibm.com/topics/rlhf), Claude‚Äôs was trained with RLHF as well as a second AI model. Reinforcement learning from AI feedback (RLAIF) tasked the ‚Äútrainer‚Äù model with comparing Claude‚Äôs behavior against Constitutional AI and correcting it accordingly.

RLAIF [automates](https://www.ibm.com/topics/automation) the behavior-adjustment portion of the training process, making it cheaper and more efficient to encourage ethical behavior. The intended result is that Claude would [fine-tune](https://www.ibm.com/topics/fine-tuning) itself, learning to avoid harmful prompts while generating helpful replies to prompts it deems answerable.

## Who is Anthropic AI?

Anthropic is an AI startup founded in 2021 by several ex-OpenAI researchers and executives, including siblings Daniela and Dario Amodei. Amazon and Google have each invested billions in USD into the company, while OpenAI continues to enjoy backing from Microsoft.

The Amodei siblings parted ways with OpenAI in 2021, the year before OpenAI released GPT-3.5. This is the same AI model that continues to power the free ChatGPT AI tool today. Along with other former OpenAI researchers, the Amodei siblings founded Anthropic AI and began work on what would become Claude AI.

Anthropic‚Äôs defining feature is their stated approach to ethical AI, represented by the Constitutional AI training process.

## The benefits of Claude vs ChatGPT and Gemini

When releasing Claude 3, Anthropic AI conducted a series of LLM benchmarking tests to evaluate their models against those of their two primary competitors: OpenAI and Google. Both in those tests and otherwise, Claude demonstrated several key advantages:

* Larger context window
* Strong performance in many tests
* No input or output data retention

### Larger context window

Able to field prompts of up to 200,000 tokens‚Äîapproximately 350 pages of text‚ÄîClaude can remember and use more information when creating relevant answers. By comparison, GPT-4 Turbo and GPT-4o limit users to 128,000 tokens.

Claude‚Äôs ability to retain more information allows users to create detailed, data-packed prompts. The more data contained in the input sequence, the more relevant an AI model‚Äôs answer can be.

### Strong performance in many tests

When Anthropic tested Claude 3 against GPT-4 and Gemini 1.0[3](#footnote3), Claude 3 Opus was the top performer in all selected evaluation benchmarks. Gemini 1.0 Ultra came out on top in four of the six vision tests, though the Claude family of models performed comparably.

However, GPT-4o and Gemini 1.5 were not included in the testing pool. When revealing GPT-4o in May 2024[4](#footnote4), OpenAI conducted benchmarking that saw their new flagship model beat Claude 3 Opus in five out of six conducted tests.

### No input or output data retention

Users concerned about data privacy might appreciate Anthropic‚Äôs data retention policy[5](#footnote5): they state that all user inputs and outputs are deleted after 30 days. Google‚Äôs Gemini for Google Cloud data policy[6](#footnote6) says that the company will not train its models with user prompts.

By comparison, OpenAI can retain and use user data[7](#footnote7) to further train their models. Google‚Äôs Gemini Apps policies[8](#footnote8)permit the company to retain user data unless the user manually deactivates this option.

## Claude‚Äôs disadvantages

While Claude‚Äôs overall performance is strong when compared to the competition, it also has a handful of weaknesses that can delay its acceptance by the greater population.

* Limited image generation
* No internet browsing

### Limited image generation

Compared to GPT-4o, Claude is less able to create images. While Claude can produce interactive [flowcharts](https://www.ibm.com/think/topics/flowchart), [entity relationship diagrams](https://www.ibm.com/think/topics/entity-relationship-diagram)¬†and graphs, it stops short of full image generation.

### No internet browsing

Due to Microsoft‚Äôs integration with Bing, GPT-4 is able to search the internet when answering user queries. While Claude is regularly updated with new training data, its knowledge base is always several months behind until Anthropic elects to open Claude up to the internet in the same way.

Link copied

[Ebook

How to choose the right foundation model

Learn how to choose the right approach in preparing datasets and employing foundation models.

Read the ebook](https://www.ibm.com/account/reg/signup?formid=urx-52620)

## Resources

[AI models

Explore IBM Granite

Discover IBM¬Æ Granite‚Ñ¢, our family of open, performant and trusted AI models, tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.

Meet Granite](https://www.ibm.com/granite)

[Ebook

How to choose the right foundation model

Learn how to select the most suitable AI foundation model for your use case.

Read the ebook](https://www.ibm.com/account/reg/signup?formid=urx-52620)

[Article

Discover the power of LLMs

Dive into IBM Developer articles, blogs and tutorials to deepen your knowledge of LLMs.

Explore the articles](https://developer.ibm.com/technologies/large-language-models/)

[Report

IBM is named a Leader in Data Science & Machine Learning

Learn why IBM has been recognized as a Leader in the 2025 Gartner¬Æ Magic Quadrant‚Ñ¢ for Data Science and Machine Learning Platforms.

Read the report](https://www.ibm.com/account/reg/signup?formid=urx-53728)

[Guide

The CEO‚Äôs guide to model optimization

Learn how to continually push teams to improve model performance and outpace the competition by using the latest AI techniques and infrastructure.

Read the guide](https://www.ibm.com/thought-leadership/institute-business-value/report/ceo-generative-ai/ceo-ai-model-optimization)

[Report

A differentiated approach to AI foundation models

Explore the value of enterprise-grade foundation models that
provide trust, performance and cost-effective benefits to
all industries.

Read the report](https://www.ibm.com/downloads/documents/us-en/107a02e94948f49f)

[Ebook

Unlock the power of generative AI and ML

Learn how to incorporate generative AI, machine learning and foundation models into your business operations for improved performance.

Read the ebook](https://www.ibm.com/account/reg/signup?formid=urx-52356)

[Report

AI in Action 2024

Read about 2,000 organizations we surveyed about their AI initiatives to discover what's working, what's not and how you can get ahead.

Read the report](https://www.ibm.com/account/reg/signup?formid=urx-53231)

Related solutions

xml version="1.0" encoding="UTF-16"?

IBM¬Æ¬†watsonx¬†Orchestrate‚Ñ¢

Easily design scalable AI assistants and agents, automate repetitive tasks and simplify complex processes with¬†IBM¬Æ¬†watsonx¬†Orchestrate‚Ñ¢.

[Explore watsonx Orchestrate](https://www.ibm.com/products/watsonx-orchestrate)

Artificial intelligence solutions

Put AI to work in your business with IBM‚Äôs industry-leading AI expertise and portfolio of solutions at your side.

[Explore AI solutions](https://www.ibm.com/artificial-intelligence)

AI consulting and services

Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.

[Explore AI services](https://www.ibm.com/consulting/artificial-intelligence)

Take the next step

Whether you choose to customize pre-built apps and skills or build and deploy custom agentic services using an AI studio, the IBM watsonx platform has you covered.

[Explore watsonx Orchestrate](https://www.ibm.com/products/watsonx-orchestrate)

[Explore watsonx.ai](https://www.ibm.com/products/watsonx-ai/foundation-models)

##### Footnotes

1.¬†[How large is Claude Pro's Context Window?](https://support.anthropic.com/en/articles/8606394-how-large-is-the-context-window-on-paid-claude-ai-plans)¬†Anthropic, 2024

2.¬†[Collective Constitutional AI: Aligning a Language Model with Public Input](https://www.anthropic.com/news/collective-constitutional-ai-aligning-a-language-model-with-public-input), Anthropic, 17 October 2023

3.¬†[Introducing the next generation of Claude](https://www.anthropic.com/news/claude-3-family), Anthropic, 4 March 2024

4.¬†[Hello GPT-4o](https://openai.com/index/hello-gpt-4o/), OpenAI, 13 May 2024

5.¬†[How long do you store personal data?](https://privacy.anthropic.com/en/articles/10023548-how-long-do-you-store-my-data), Anthropic, 2024

6.¬†[How Gemini for Google Cloud uses your data](https://cloud.google.com/gemini/docs/discover/data-governance), Google, 10 September 2024

7.¬†[How your data is used to improve model performance](https://help.openai.com/en/articles/5722486-how-your-data-is-used-to-improve-model-performance), OpenAI, 17 September 2024

8.¬†[Gemini Apps Privacy Hub](https://support.google.com/gemini/answer/13594961?hl=en#your_data), Google, 28 August 2024

### Build a Weather Tool

In [47]:
import requests

@tool
def get_weather(query: str) -> list:
    """Search weatherapi to get the current weather."""
    base_url = "http://api.weatherapi.com/v1/current.json"
    complete_url = f"{base_url}?key={WEATHER_API_KEY}&q={query}"

    response = requests.get(complete_url)
    data = response.json()
    if data.get("location"):
        return data
    else:
        return "Weather Data Not Found"

In [48]:
get_weather.invoke("Hyderabad")

{'location': {'name': 'Hyderabad',
  'region': 'Telangana',
  'country': 'India',
  'lat': 17.3753,
  'lon': 78.4744,
  'tz_id': 'Asia/Kolkata',
  'localtime_epoch': 1764143793,
  'localtime': '2025-11-26 13:26'},
 'current': {'last_updated_epoch': 1764143100,
  'last_updated': '2025-11-26 13:15',
  'temp_c': 27.0,
  'temp_f': 80.6,
  'is_day': 1,
  'condition': {'text': 'Mist',
   'icon': '//cdn.weatherapi.com/weather/64x64/day/143.png',
   'code': 1030},
  'wind_mph': 8.9,
  'wind_kph': 14.4,
  'wind_degree': 89,
  'wind_dir': 'E',
  'pressure_mb': 1018.0,
  'pressure_in': 30.06,
  'precip_mm': 0.0,
  'precip_in': 0.0,
  'humidity': 48,
  'cloud': 0,
  'feelslike_c': 26.9,
  'feelslike_f': 80.3,
  'windchill_c': 27.4,
  'windchill_f': 81.3,
  'heatindex_c': 27.2,
  'heatindex_f': 81.0,
  'dewpoint_c': 12.4,
  'dewpoint_f': 54.4,
  'vis_km': 5.0,
  'vis_miles': 3.0,
  'uv': 6.7,
  'gust_mph': 10.3,
  'gust_kph': 16.6,
  'short_rad': 820.46,
  'diff_rad': 92.07,
  'dni': 1418.61,
  'gt

In [49]:
import rich

result = get_weather.invoke("Zurich")
rich.print_json(data=result)

### Building a Simple Math Tool

In [50]:
from pydantic import BaseModel, Field
from langchain_core.tools import StructuredTool

class CalculatorInput(BaseModel):
    a: float = Field(description="first number")
    b: float = Field(description="second number")


def multiply(a: float, b: float) -> float:
    """Multiply two numbers."""
    return a * b

# we could also use the @tool decorator from before
multiply = StructuredTool.from_function(
    func=multiply,
    name="multiply",
    description="use to multiply numbers",
    args_schema=CalculatorInput,
    return_direct=True
    )

# Let's inspect some of the attributes associated with the tool.
print(multiply.name)
print(multiply.description)
print(multiply.args)

multiply
use to multiply numbers
{'a': {'description': 'first number', 'title': 'A', 'type': 'number'}, 'b': {'description': 'second number', 'title': 'B', 'type': 'number'}}


In [51]:
multiply.invoke({"a": 2, "b": 3})

6.0

## üß† LLM Tool Calling with Custom Tools

- üß∞ Each tool has a **name**, **description**, and **input schema** ‚Äî this helps the model understand when and how to use it.

- üöÄ In this section, we‚Äôll use the **custom tools** we built earlier and see if the LLM can:
  - Pick the right tool automatically  
  - Call it with the correct inputs  


### üí° What Really Happens During Tool Calling

Tool calling doesn‚Äôt mean the model is *executing* code ‚Äî it‚Äôs **deciding what should be executed**.

- The LLM‚Äôs job is to **produce structured arguments** that match a predefined schema (like `{"query": "latest AI research"}`).
- It‚Äôs essentially *planning* the action ‚Äî not performing it.
- The actual tool execution happens outside the model, by the **agent or system** that interprets its output.
- Think of it this way:
  > The LLM *reasons*, the agent *acts*.

This separation keeps the model safe, predictable, and easy to control ‚Äî it suggests actions, but never directly runs them.


In [53]:
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model="gpt-4o", temperature=0)

In [54]:
tools = [multiply, search_web_extract_info, get_weather]
chatgpt_with_tools = chatgpt.bind_tools(tools)

In [60]:
# LLMs are still not perfect in tool calling so you might need to play around with the following prompt
prompt = """
            Given only the tools at your disposal, mention tool calls for the following tasks:
            Do not change the query given for any search tasks
            1. What is 23 times 34
            2. What is the current weather in Delhi today
            3. What are the Agent types
         """

results = chatgpt_with_tools.invoke(prompt)

In [61]:
results


AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_0K8SbLmqrtFPZj05koUYWshx', 'function': {'arguments': '{"a": 23, "b": 34}', 'name': 'multiply'}, 'type': 'function'}, {'id': 'call_Is7n7KKxSQab8ZHpxwz7zg1E', 'function': {'arguments': '{"query": "Delhi"}', 'name': 'get_weather'}, 'type': 'function'}, {'id': 'call_N8ioyfeFfmvm23dCoMzA2O8Q', 'function': {'arguments': '{"query": "Agent types"}', 'name': 'search_web_extract_info'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 64, 'prompt_tokens': 172, 'total_tokens': 236, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_e819e3438b', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run--5321839c-8017-4d08-8328-4adcf07c9b6d-0', tool_calls=[{'name': 'multip

In [62]:
results.tool_calls

[{'name': 'multiply',
  'args': {'a': 23, 'b': 34},
  'id': 'call_0K8SbLmqrtFPZj05koUYWshx',
  'type': 'tool_call'},
 {'name': 'get_weather',
  'args': {'query': 'Delhi'},
  'id': 'call_Is7n7KKxSQab8ZHpxwz7zg1E',
  'type': 'tool_call'},
 {'name': 'search_web_extract_info',
  'args': {'query': 'Agent types'},
  'id': 'call_N8ioyfeFfmvm23dCoMzA2O8Q',
  'type': 'tool_call'}]

In [63]:
multiply


StructuredTool(name='multiply', description='use to multiply numbers', args_schema=<class '__main__.CalculatorInput'>, return_direct=True, func=<function multiply at 0x786b0bf34e00>)

In [66]:
toolkit = {
    "multiply": multiply,
    "search_web_extract_info": search_web_extract_info,
    "get_weather": get_weather
}

for tool_call in results.tool_calls:
    selected_tool = toolkit[tool_call["name"].lower()]
    print(f"Calling tool: {tool_call['name']}")
    tool_output = selected_tool.invoke(tool_call["args"])
    print(tool_output)
    print()

Calling tool: multiply
782.0

Calling tool: get_weather
{'location': {'name': 'Delhi', 'region': 'Ontario', 'country': 'Canada', 'lat': 42.85, 'lon': -80.5, 'tz_id': 'America/Toronto', 'localtime_epoch': 1764145105, 'localtime': '2025-11-26 03:18'}, 'current': {'last_updated_epoch': 1764144900, 'last_updated': '2025-11-26 03:15', 'temp_c': 10.1, 'temp_f': 50.2, 'is_day': 0, 'condition': {'text': 'Mist', 'icon': '//cdn.weatherapi.com/weather/64x64/night/143.png', 'code': 1030}, 'wind_mph': 12.3, 'wind_kph': 19.8, 'wind_degree': 238, 'wind_dir': 'WSW', 'pressure_mb': 1006.0, 'pressure_in': 29.72, 'precip_mm': 0.0, 'precip_in': 0.0, 'humidity': 94, 'cloud': 0, 'feelslike_c': 7.5, 'feelslike_f': 45.5, 'windchill_c': 7.8, 'windchill_f': 46.1, 'heatindex_c': 9.5, 'heatindex_f': 49.0, 'dewpoint_c': 9.0, 'dewpoint_f': 48.3, 'vis_km': 10.0, 'vis_miles': 6.0, 'uv': 0.0, 'gust_mph': 19.5, 'gust_kph': 31.4, 'short_rad': 0, 'diff_rad': 0, 'dni': 0, 'gti': 0}}

Calling tool: search_web_extract_info


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 5/5 [00:02<00:00,  2.15it/s]

['What Is an Agent? Definition, Types, Responsibilities, and Future\nLatest News\n\n### [AI Receptionist for Business: A Complete Guide for 2025](https://devpumas.com/ai-receptionist-for-business/)\n\n### [Outsource Software Development Guide for Business Owners 2025](https://devpumas.com/outsource-software-development/)\n\n### [Web Design 2025: The Ultimate Guide to Stunning, High-Converting Websites](https://devpumas.com/web-design/)\n\n### [E-Commerce Workflow Automation: Everything You Need to Know in 2025](https://devpumas.com/e-commerce-workflow-automation/)\n\n### [Web Application Development: A Complete Beginner‚Äôs Guide in 2025](https://devpumas.com/web-application-development/)\n\n### [The benefits of using DevPumas ActionFigure for Social Media Creators](https://devpumas.com/devpumas-actionfigure-for-social-media/)\n\n### [DevPumas ActionFigure: Transform Your Photos into Hyper-Realistic Action Figures](https://devpumas.com/devpumas-actionfigure/)\n\n### [How to Use Web Scr




In [67]:

tools


[StructuredTool(name='multiply', description='use to multiply numbers', args_schema=<class '__main__.CalculatorInput'>, return_direct=True, func=<function multiply at 0x786b0bf34e00>),
 StructuredTool(name='search_web_extract_info', description='Search the web for a query and extracts useful information from the search links', args_schema=<class 'langchain_core.utils.pydantic.search_web_extract_info'>, func=<function search_web_extract_info at 0x786b10578180>),
 StructuredTool(name='get_weather', description='Search weatherapi to get the current weather.', args_schema=<class 'langchain_core.utils.pydantic.get_weather'>, func=<function get_weather at 0x786b10578d60>)]

### Tool calling for LLMs without native support for tool or function calling

### ‚öôÔ∏è Fine-Tuned vs. Prompt-Guided Tool Calling

Some models ‚Äî like **ChatGPT** ‚Äî are *fine-tuned* to understand tool schemas and handle tool calls natively through special APIs.  
They recognize tools as first-class citizens and can invoke them reliably.

But what if your model isn‚Äôt fine-tuned for tool calling?

You can still achieve similar behavior by using **prompt engineering** ‚Äî  
guiding the model through instructions that *simulate* tool selection and argument generation.

In essence:
> Fine-tuned models *know* how to call tools.  
> Non-fine-tuned models can be *taught* to call tools ‚Äî with the right prompt.


In [68]:
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import render_text_description

rendered_tools = render_text_description(tools)
print(rendered_tools)

multiply(a: float, b: float) -> float - use to multiply numbers
search_web_extract_info(query: str) -> list - Search the web for a query and extracts useful information from the search links
get_weather(query: str) -> list - Search weatherapi to get the current weather.


In [69]:
system_prompt = f"""\
You are an assistant that has access to the following set of tools.
Here are the names and descriptions for each tool:

{rendered_tools}

Given the user instructions, for each instruction do the following:
 - Return the name and input of the tool to use.
 - Return your response as a JSON blob with 'name' and 'arguments' keys.
 - The `arguments` should be a dictionary, with keys corresponding
   to the argument names and the values corresponding to the requested values.
"""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("user", "{input}")
    ]
)

In [70]:
instructions = [
                  {"input" : "What is 2.1 times 3.5"},
                  {"input" : "What is the current weather in Greenland"},
                  {"input" : "Tell me about the current state of Agentic AI in the industry" }
               ]

In [71]:
from langchain_core.output_parsers import JsonOutputParser

chain = (prompt
            |
         chatgpt
            |
         JsonOutputParser())

In [72]:
responses = chain.map().invoke(instructions)

In [73]:
responses


[{'name': 'multiply', 'arguments': {'a': 2.1, 'b': 3.5}},
 {'name': 'get_weather', 'arguments': {'query': 'Greenland'}},
 {'name': 'search_web_extract_info',
  'arguments': {'query': 'current state of Agentic AI in the industry 2023'}}]

In [74]:
toolkit = {
    "multiply": multiply,
    "search_web_extract_info": search_web_extract_info,
    "get_weather": get_weather
}

for tool_call in responses:
    selected_tool = toolkit[tool_call["name"].lower()]
    print(f"Calling tool: {tool_call['name']}")
    tool_output = selected_tool.invoke(tool_call["arguments"])
    print(tool_output)
    print()

Calling tool: multiply
7.3500000000000005

Calling tool: get_weather
{'location': {'name': 'Nuuk', 'region': 'Vestgronland', 'country': 'Greenland', 'lat': 64.183, 'lon': -51.75, 'tz_id': 'America/Nuuk', 'localtime_epoch': 1764145439, 'localtime': '2025-11-26 06:23'}, 'current': {'last_updated_epoch': 1764144900, 'last_updated': '2025-11-26 06:15', 'temp_c': 0.2, 'temp_f': 32.4, 'is_day': 0, 'condition': {'text': 'Partly cloudy', 'icon': '//cdn.weatherapi.com/weather/64x64/night/116.png', 'code': 1003}, 'wind_mph': 8.7, 'wind_kph': 14.0, 'wind_degree': 58, 'wind_dir': 'ENE', 'pressure_mb': 1000.0, 'pressure_in': 29.53, 'precip_mm': 0.0, 'precip_in': 0.0, 'humidity': 69, 'cloud': 50, 'feelslike_c': -4.0, 'feelslike_f': 24.8, 'windchill_c': -5.5, 'windchill_f': 22.0, 'heatindex_c': -1.1, 'heatindex_f': 30.1, 'dewpoint_c': -7.0, 'dewpoint_f': 19.5, 'vis_km': 10.0, 'vis_miles': 6.0, 'uv': 0.0, 'gust_mph': 12.9, 'gust_kph': 20.8, 'short_rad': 0, 'diff_rad': 0, 'dni': 0, 'gti': 0}}

Calling 

 40%|‚ñà‚ñà‚ñà‚ñà      | 2/5 [00:01<00:02,  1.45it/s]

Extraction blocked for url:  https://www.wisdomtree.com/investments/blog/2025/04/21/agentic-ai-the-new-frontier-of-intelligence-that-acts


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 5/5 [00:02<00:00,  2.47it/s]

Extraction blocked for url:  https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027
["Agentic AI Trends 2025: Transform Business with AI Agents\n[![logo](/_next/static/media/logo.17192f99.svg)](/)\n\n[All Blogs](/blog)ProductivityCollaborationAINews\n\n[Talk to an Expert](https://cta-service-cms2.hubspot.com/web-interactives/public/v1/track/click?encryptedPayload=AVxigLKXmAiFtgXv71OwJKmu1CMJ1JFfaqq9Sz6UpjyCCmABXv9KQxUFMleG8bwAacj7QDAoXF8oNhS9QMFLRHjOihBTGq3kVMWmCzVxMV4w4BUBRd%2BnfgGh0azwDH14iyWnCMeaJBCq1f1rellPxWs%2FFznebBk4Vgcs%2BiJRiVGoUmg=&portalId=44494863)\n\n[Login](https://cta-service-cms2.hubspot.com/web-interactives/public/v1/track/click?encryptedPayload=AVxigLIhtw1u4dvQZ20Pz%2BQOuNvEN2wGCz%2F4ZWC9tCt1Mp%2BCXuE9134m8oROzmBilfbAjAr%2FZ5EfF2tgGLLV9W67hvNBlSO0o%2Bah6au9r4txIEzR4s7kUJ2MhyJUYpX1fd0yGebto%2FkYkCx5GsKkB1eWASmRLTPtM3rA9COgMjhCazyV&portalId=44494863)\n\n[Sign Up](https://cta-




In [75]:
for doc in tool_output:
    print(doc)
    print()

Agentic AI Trends 2025: Transform Business with AI Agents
[![logo](/_next/static/media/logo.17192f99.svg)](/)

[All Blogs](/blog)ProductivityCollaborationAINews

[Talk to an Expert](https://cta-service-cms2.hubspot.com/web-interactives/public/v1/track/click?encryptedPayload=AVxigLKXmAiFtgXv71OwJKmu1CMJ1JFfaqq9Sz6UpjyCCmABXv9KQxUFMleG8bwAacj7QDAoXF8oNhS9QMFLRHjOihBTGq3kVMWmCzVxMV4w4BUBRd%2BnfgGh0azwDH14iyWnCMeaJBCq1f1rellPxWs%2FFznebBk4Vgcs%2BiJRiVGoUmg=&portalId=44494863)

[Login](https://cta-service-cms2.hubspot.com/web-interactives/public/v1/track/click?encryptedPayload=AVxigLIhtw1u4dvQZ20Pz%2BQOuNvEN2wGCz%2F4ZWC9tCt1Mp%2BCXuE9134m8oROzmBilfbAjAr%2FZ5EfF2tgGLLV9W67hvNBlSO0o%2Bah6au9r4txIEzR4s7kUJ2MhyJUYpX1fd0yGebto%2FkYkCx5GsKkB1eWASmRLTPtM3rA9COgMjhCazyV&portalId=44494863)

[Sign Up](https://cta-service-cms2.hubspot.com/web-interactives/public/v1/track/click?encryptedPayload=AVxigLJuAq9xeWhJk8yPHtahUTu8pZzCPduO5mg3ZLtyb74cksLAjwjeE4dMs1xIXFDPoa%2F5AmxlUfiDsxhConmhC3G35v%2BABCZf1uGsm