# L10 — Tagging & Extraction (Modern LangChain Tools)

This notebook converts a 2023-era *function-calling* example into **current (2025/2026) best practices**
using **LangChain tools** and **`bind_tools`**.

What this notebook teaches:
- Why `functions=` / `function_call=` are legacy
- How tools define **structured output contracts**
- How models emit structured data via `tool_calls`
- How this pattern scales to agents and LangGraph

Environment assumptions:
- Python 3.13
- langchain / langchain-core / langchain-openai (v1)
- gpt-5-mini


In [1]:
import os
from typing import List

from pydantic import BaseModel, Field

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_core.messages import SystemMessage, HumanMessage, ToolMessage


In [2]:
MODEL = os.environ.get("OPENAI_MODEL", "gpt-5-mini")

llm = ChatOpenAI(
    model=MODEL,
    temperature=0,
)

print("Using model:", MODEL)


Using model: gpt-5-mini


## Why tools replace function calling

Old notebooks relied on OpenAI-specific `functions` and `function_call`.
Modern LangChain exposes **tools** as first-class primitives.

Tools:
- define the *shape* of valid output
- let the model choose when to emit structured data
- work uniformly across agents and graphs


## Tagging schema

In [3]:
class TaggingResult(BaseModel):
    """Structured tagging output."""
    sentiment: str = Field(description="Sentiment label, e.g. pos / neg / neutral")
    language: str = Field(description="ISO language code")


## Tagging tool

In [4]:
@tool(
    description="Tag text with sentiment and language.",
    args_schema=TaggingResult,
)
def tag_text(sentiment: str, language: str) -> TaggingResult:
    # The model does the reasoning; the tool enforces structure.
    return TaggingResult(sentiment=sentiment, language=language)


## Bind tool to the model

In [5]:
model_with_tools = llm.bind_tools([tag_text])


## Example input

In [6]:
text = "non mi piace questo cibo"


## GOOD: Manual tool execution loop

This shows exactly how tool calling works under the hood.


In [7]:
messages = [
    SystemMessage(content="Tag the user's text with sentiment and language."),
    HumanMessage(content=text),
]

ai_msg = model_with_tools.invoke(messages)

print("AI message:")
print(ai_msg)


AI message:
content='' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 220, 'prompt_tokens': 166, 'total_tokens': 386, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 192, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-5-mini-2025-08-07', 'system_fingerprint': None, 'id': 'chatcmpl-CuHo750m8vRYC4da7vj0Fc19Qllov', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None} id='lc_run--019b890c-792a-73a2-9582-96702887c0fb-0' tool_calls=[{'name': 'tag_text', 'args': {'sentiment': 'neg', 'language': 'it'}, 'id': 'call_W5r1p2JmTz0r2J8uYYyZaJbo', 'type': 'tool_call'}] usage_metadata={'input_tokens': 166, 'output_tokens': 220, 'total_tokens': 386, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 192}}


In [8]:
tool_messages = []

for call in ai_msg.tool_calls or []:
    print("\nTool call:", call)

    result = tag_text.invoke(call["args"])
    tool_messages.append(
        ToolMessage(
            content=result.model_dump_json(),
            tool_call_id=call["id"],
        )
    )



Tool call: {'name': 'tag_text', 'args': {'sentiment': 'neg', 'language': 'it'}, 'id': 'call_W5r1p2JmTz0r2J8uYYyZaJbo', 'type': 'tool_call'}


In [9]:
final = model_with_tools.invoke(messages + [ai_msg] + tool_messages)

print("\nFinal response:")
print(final.content)



Final response:
{"sentiment":"neg","language":"it"}


## What to notice

- The model *chose* to call the tool
- `content` was empty during the tool call
- Structured data lived in `tool_calls`
- The final message used the tool output

This is the modern replacement for `JsonOutputFunctionsParser`.


## Key takeaways

- Tools replace OpenAI function calling
- Pydantic schemas define output contracts
- `bind_tools` is the standard integration point
- This pattern is future-proof for agents and LangGraph


## Modern equivalent of `JsonOutputFunctionsParser`: extract tool arguments from a chain

In the legacy (2023) approach, you could do:

```python
tagging_chain = prompt | model_with_functions | JsonOutputFunctionsParser()
```

That returned the parsed JSON **arguments** from the forced function call.

In the modern tools approach, the structured payload lives in `AIMessage.tool_calls`.
Below we build a **chain** that returns the tool arguments dict (similar to `JsonOutputFunctionsParser`).


In [10]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda

# Prompt (LCEL)
prompt = ChatPromptTemplate.from_messages([
    ("system", "Think carefully, and then tag the text with sentiment and language."),
    ("user", "{input}"),
])

# Model bound to tools
model_with_tools_for_chain = llm.bind_tools([tag_text])

def tool_args_parser(ai_msg):
    """Return the first tool-call arguments dict (like JsonOutputFunctionsParser did for functions)."""
    calls = getattr(ai_msg, "tool_calls", None) or []
    if not calls:
        raise ValueError(f"Model did not call a tool. Message was: {ai_msg}")
    return calls[0]["args"]

# Runnable parser (modern equivalent)
tool_args_parser_runnable = RunnableLambda(tool_args_parser)

# Chain: prompt -> model(with tools) -> parse tool args
tagging_chain_tools = prompt | model_with_tools_for_chain | tool_args_parser_runnable


In [11]:
tagging_chain_tools.invoke({"input": "non mi piace questo cibo"})


{'sentiment': 'neg', 'language': 'it'}

### Notes

- This chain returns only the **structured arguments** (e.g., `{"sentiment": "...", "language": "..."}`),
  just like `JsonOutputFunctionsParser` did.
- If you want the full `AIMessage` (including tool call ids, etc.), run `prompt | model_with_tools_for_chain` without the parser.


## When to use `with_structured_output` (no tools)

If you only need **structured extraction/classification** (no external actions),
`with_structured_output` is often simpler than tools.

It returns a **Pydantic object** directly — no manual tool loop, no tool-call parsing.


In [12]:
from pydantic import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate

class SentimentLanguage(BaseModel):
    sentiment: str = Field(description="Sentiment of the text: pos, neg, or neutral")
    language: str = Field(description="ISO 639-1 language code, e.g. en, it, fr")

prompt_structured = ChatPromptTemplate.from_messages([
    ("system", "Analyze the user's text and extract sentiment and language."),
    ("user", "{input}"),
])

structured_llm = llm.with_structured_output(SentimentLanguage)

tagging_chain_structured = prompt_structured | structured_llm


In [13]:
result = tagging_chain_structured.invoke({"input": "non mi piace questo cibo"})
print(result)
print(type(result))
print(result.model_dump())


sentiment='neg' language='it'
<class '__main__.SentimentLanguage'>
{'sentiment': 'neg', 'language': 'it'}


### Key differences vs tools

- **Tools + bind_tools**: best when you need the model to **trigger actions** (APIs, DB, functions) or do multi-step work.
- **with_structured_output**: best when you just want **reliable structured data** from the model.
