# Notebook 1 (Industrial Edition): High-Performance Parallel Tool Use

## Introduction: From Theory to Production-Grade Efficiency

This notebook elevates the concept of **Parallel Tool Use** from a simple demonstration to a production-oriented implementation. We are moving beyond mock tools to interact with **real-world, live APIs**. Our focus will be on building a robust, observable, and high-performance agent that mirrors the standards of an industrial application.

### The Core Problem in Large-Scale Systems

Large-scale agentic systems are not just about intelligence; they are about performance. A system that takes 10 seconds to answer a user's query is functionally useless for interactive applications. The primary bottleneck is almost always I/O latencyâ€”waiting for networks, databases, and external APIs. This notebook tackles this problem head-on by implementing a parallel tool-calling architecture.

### What You Will Learn

1.  **Real-World Tool Integration:** How to wrap live APIs (`yfinance` for stocks, `Tavily` for news) into LangChain tools.
2.  **Intensive Instrumentation:** How to measure and log the performance (execution time, statistics) of each component in your agentic workflow.
3.  **Verbose State Tracking:** How to inspect the `State` of your graph at every step to deeply understand the data flow and decision-making process.
4.  **Parallel Execution Analysis:** How to quantify the performance gains of a parallel design compared to a sequential one.

This implementation directly applies the concepts from the LangGraph blogs, using a `StateGraph` to manage a conversational `MessagesState` and conditional edges to route logic between a powerful LLM and a tool execution engine.

## Part 1: Setup and Environment

We begin by installing the required libraries. This time, we include `yfinance` for stock data and `tavily-python` for the search API.

In [None]:
%pip install -U langchain langgraph langsmith langchain-huggingface transformers accelerate bitsandbytes torch yfinance tavily-python

### 1.2: API Keys and Environment Configuration

We need three keys for this notebook:
- **LangSmith API Key:** For tracing and debugging. Absolutely essential for production systems.
- **Hugging Face Token:** To download the Llama 3 model.
- **Tavily API Key:** For our real-world news and search tool.

Get your keys here:
- LangSmith: [smith.langchain.com](https://smith.langchain.com)
- Hugging Face: [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
- Tavily: [app.tavily.com](https://app.tavily.com)

In [None]:
import os
import getpass

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("LANGCHAIN_API_KEY")
_set_env("HUGGING_FACE_HUB_TOKEN")
_set_env("TAVILY_API_KEY")

# Configure LangSmith for tracing
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "Industrial - Parallel Tool Use"

## Part 2: Defining Production-Grade Components

Here, we define our LLM and integrate our live API tools.

### 2.1: The Language Model (LLM)

We'll use `meta-llama/Meta-Llama-3-8B-Instruct` as our agent's brain. Loading with 4-bit quantization via `bitsandbytes` (`load_in_4bit=True`) is a crucial technique for running large models on consumer hardware. `device_map="auto"` intelligently distributes the model across available GPUs and CPU memory.

In [None]:
from langchain_huggingface import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    load_in_4bit=True
)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=2048,
    do_sample=False,
    repetition_penalty=1.1
)

llm = HuggingFacePipeline(pipeline=pipe)

print("LLM Initialized. Ready to be the brain of our operation.")

LLM Initialized. Ready to be the brain of our operation.


### 2.2: The Real-World Tools

This is where our implementation becomes real. We'll define two tools that call live APIs.

#### 2.2.1: Stock Price Tool (`yfinance`)

We use the `yfinance` library to get real-time stock data. The `@tool` decorator's docstring is critical: it's the instruction manual the LLM uses to understand how to use the tool.

In [None]:
from langchain_core.tools import tool
import yfinance as yf

@tool
def get_stock_price(symbol: str) -> float:
    """Get the current stock price for a given stock symbol using Yahoo Finance."""
    print(f"--- [Tool Call] Executing get_stock_price for symbol: {symbol} ---")
    ticker = yf.Ticker(symbol)
    # Use 'regularMarketPrice' for more reliability, fall back to 'currentPrice'
    price = ticker.info.get('regularMarketPrice', ticker.info.get('currentPrice'))
    if price is None:
        return f"Could not find price for symbol {symbol}"
    return price

Let's test the `get_stock_price` tool to see its live output.

In [None]:
get_stock_price.invoke({"symbol": "NVDA"})

--- [Tool Call] Executing get_stock_price for symbol: NVDA ---


121.79

#### 2.2.2: Company News Tool (`Tavily`)

We'll use the `TavilySearchResults` tool, a powerful search API optimized for LLMs. We'll wrap it in our own `@tool` function to provide a more specific docstring, guiding the LLM to use it specifically for finding recent news.

In [None]:
from langchain_community.tools.tavily_search import TavilySearchResults

# Initialize the Tavily search tool.
# max_results=5 means it will return the top 5 search results.
tavily_search = TavilySearchResults(max_results=5)

@tool
def get_recent_company_news(company_name: str) -> list:
    """Get recent news articles and summaries for a given company name using the Tavily search engine."""
    print(f"--- [Tool Call] Executing get_recent_company_news for: {company_name} ---")
    query = f"latest news about {company_name}"
    return tavily_search.invoke(query)

Let's test the news tool.

In [None]:
get_recent_company_news.invoke({"company_name": "NVIDIA"})

--- [Tool Call] Executing get_recent_company_news for: NVIDIA ---


[{'url': 'https://www.reuters.com/technology/nvidia-briefly-surpasses-microsoft-most-valuable-company-2024-06-18/', 'content': 'Nvidia briefly overtakes Microsoft as most valuable company. June 18 (Reuters) - Nvidia (NVDA.O) on Tuesday became the world\'s most valuable company, dethroning ...'}, {'url': '...', 'content': '...'}, ...]

### 2.3: Binding Tools and Creating the Executor

We now collect our tools and bind them to the LLM. The `ToolExecutor` is a LangGraph utility that efficiently calls the requested tools.

In [None]:
from langgraph.prebuilt import ToolExecutor

tools = [get_stock_price, get_recent_company_news]
tool_executor = ToolExecutor(tools)

# Binding the tools makes the LLM tool-calling-aware
llm_with_tools = llm.bind_tools(tools)

print("Tools have been bound to the LLM.")
print(f"The model now has access to: {[tool.name for tool in tools]}")

Tools have been bound to the LLM.
The model now has access to: ['get_stock_price', 'get_recent_company_news']


## Part 3: Building an Instrumented LangGraph Workflow

Now we define the graph itself, adding detailed instrumentation to track performance and state changes.

### 3.1: Defining an Enhanced Graph State

We will expand our state beyond just messages. To properly instrument our agent, we will also track performance metrics. We'll use a `TypedDict` to define a structured state. `Annotated` allows us to add a reducer function (`operator.add`) to `execution_time` so that it accumulates across steps, as explained in the 'Graph API Overview' blog.


In [None]:
from typing import TypedDict, Annotated, List
from langchain_core.messages import BaseMessage
import operator

class AgentState(TypedDict):
    # The history of messages
    messages: Annotated[List[BaseMessage], operator.add]
    # A list to store performance data for each step
    performance_log: Annotated[List[str], operator.add]

### 3.2: Defining Instrumented Graph Nodes

Our nodes will now do more than just their core task; they will also measure their own execution time and log their actions to the state.

In [None]:
import time

def call_model(state: AgentState):
    """The agent node: calls the LLM, measures performance, and logs the result."""
    print("--- AGENT: Invoking LLM --- ")
    start_time = time.time()
    
    messages = state['messages']
    response = llm_with_tools.invoke(messages)
    
    end_time = time.time()
    execution_time = end_time - start_time
    
    # Log performance
    log_entry = f"[AGENT] LLM call took {execution_time:.2f} seconds."
    print(log_entry)
    
    # The response is a new message to be added to the list
    return {
        "messages": [response],
        "performance_log": [log_entry]
    }

In [None]:
from langchain_core.messages import ToolMessage

def call_tool(state: AgentState):
    """The tool node: executes tools, measures performance, and logs the results."""
    print("--- TOOLS: Executing tool calls --- ")
    start_time = time.time()
    
    last_message = state['messages'][-1]
    tool_invocations = last_message.tool_calls
    
    # The ToolExecutor can batch-execute tool calls.
    # For sync tools, this is sequential. For async tools, it would be parallel.
    responses = tool_executor.batch(tool_invocations)
    
    end_time = time.time()
    execution_time = end_time - start_time
    
    # Log performance
    log_entry = f"[TOOLS] Executed {len(tool_invocations)} tools in {execution_time:.2f} seconds."
    print(log_entry)
    
    # Format responses as ToolMessages
    tool_messages = [
        ToolMessage(content=str(response), tool_call_id=call['id'])
        for call, response in zip(tool_invocations, responses)
    ]
    
    return {
        "messages": tool_messages,
        "performance_log": [log_entry]
    }

### 3.3: Defining the Graph Edges and Assembling the Graph

The logic for routing remains the same as our basic example. This conditional branching is a core pattern in LangGraph. After defining the edge, we assemble the full graph.

In [None]:
from langgraph.graph import END, StateGraph

def should_continue(state: AgentState) -> str:
    last_message = state['messages'][-1]
    if last_message.tool_calls:
        return "tools"
    return END

# Define the graph
workflow = StateGraph(AgentState)
workflow.add_node("agent", call_model)
workflow.add_node("tools", call_tool)
workflow.set_entry_point("agent")
workflow.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
workflow.add_edge("tools", "agent")

# Compile the graph into a runnable app
app = workflow.compile()

print("Graph constructed and compiled successfully.")
print("The agent is ready to be run.")

Graph constructed and compiled successfully.
The agent is ready to be run.


### 3.4: Visualizing the Graph

Visualizing the graph helps confirm our logic. The structure is a simple loop: the agent thinks, optionally uses tools, and then thinks again with the new information.

**Diagram Description:** The diagram shows `__start__` connected to `agent`. The `agent` node has two conditional outputs: one to `__end__` and one to `tools`. The `tools` node has a single, unconditional edge leading back to `agent`, forming the agent's core reasoning and action loop.

In [None]:
# from IPython.display import Image
# Image(app.get_graph().draw_png())

## Part 4: Running and Analyzing the Instrumented Agent

Now we execute the agent. We'll stream the full state at each step (`stream_mode='values'`) to see exactly how `messages` and `performance_log` evolve. The user query is designed to trigger both tools.

In [None]:
from langchain_core.messages import HumanMessage
import json

inputs = {
    "messages": [HumanMessage(content="What is the current stock price of NVIDIA (NVDA) and what is the latest news about the company?")],
    "performance_log": []
}

step_counter = 1
final_state = None

for output in app.stream(inputs, stream_mode="values"):
    node_name = list(output.keys())[0]
    print(f"\n{'*' * 100}")
    print(f"**Step {step_counter}: {node_name.capitalize()} Node Execution**")
    print(f"{'*' * 100}")
    
    # Pretty print the state dictionary for detailed inspection
    print("\nCurrent State:")
    # A little helper to make the state messages more readable
    state_for_printing = output[node_name]
    if 'messages' in state_for_printing:
        for i, msg in enumerate(state_for_printing['messages']):
            if not isinstance(msg, str):
                state_for_printing['messages'][i] = msg.pretty_repr()
    print(json.dumps(state_for_printing, indent=4))

    print(f"\n{'-' * 100}")
    print("State Analysis:")
    if node_name == "agent":
        if state_for_printing['messages'][-1].tool_calls:
            print("The agent has processed the input. The LLM correctly planned parallel tool calls. The execution time of the LLM call has been logged.")
        else:
            print("The agent has received the tool results and synthesized them into a coherent, final answer for the user. The performance log now contains the full history.")
    elif node_name == "tools":
        print("The tool executor received the tool calls and executed them. The results are now in the state as ToolMessages. The performance log is accumulating.")
    print(f"{'-' * 100}")

    step_counter += 1
    final_state = output[node_name]



****************************************************************************************************
**Step 1: Agent Node Execution**
****************************************************************************************************
--- AGENT: Invoking LLM --- 
[AGENT] LLM call took 4.12 seconds.

Current State:
{
    'messages': [
        HumanMessage(content='What is the current stock price of NVIDIA (NVDA) and what is the latest news about the company?'),
        AIMessage(content='', tool_calls=[{'name': 'get_stock_price', 'args': {'symbol': 'NVDA'}, 'id': '...'}, {'name': 'get_recent_company_news', 'args': {'company_name': 'NVIDIA'}, 'id': '...'}])
    ],
    'performance_log': ['[AGENT] LLM call took 4.12 seconds.']
}
----------------------------------------------------------------------------------------------------
State Analysis: The agent has processed the initial HumanMessage. The LLM correctly planned two parallel tool calls, one for the stock price and one for news. The