# Notebook 1 (Industrial Edition): High-Performance Parallel Tool Use

## Introduction: From Theory to Production-Grade Efficiency

This notebook elevates the concept of **Parallel Tool Use** from a simple demonstration to a production-oriented implementation. We are moving beyond mock tools to interact with **real-world, live APIs**. Our focus will be on building a robust, observable, and high-performance agent that mirrors the standards of an industrial application.

### The Core Problem in Large-Scale Systems

Large-scale agentic systems are not just about intelligence; they are about performance. A system that takes 10 seconds to answer a user's query is functionally useless for interactive applications. The primary bottleneck is almost always I/O latency—waiting for networks, databases, and external APIs. This notebook tackles this problem head-on by implementing a parallel tool-calling architecture.

### What You Will Learn

1.  **Real-World Tool Integration:** How to wrap live APIs (`yfinance` for stocks, `Tavily` for news) into LangChain tools.
2.  **Intensive Instrumentation:** How to measure and log the performance (execution time, statistics) of each component in your agentic workflow.
3.  **Verbose State Tracking:** How to inspect the `State` of your graph at every step to deeply understand the data flow and decision-making process.
4.  **Parallel Execution Analysis:** How to quantify the performance gains of a parallel design compared to a sequential one.

This implementation directly applies the concepts from the LangGraph blogs, using a `StateGraph` to manage a conversational `MessagesState` and conditional edges to route logic between a powerful LLM and a tool execution engine.

## Part 1: Setup and Environment

We begin by installing the required libraries. This time, we include `yfinance` for stock data and `tavily-python` for the search API.

#### Create a venv

In [2]:
# Create the virtual environment
import os
!uv venv .venv

# Activate the virtual environment for the shell commands in this notebook
os.environ['VIRTUAL_ENV'] = os.path.abspath(".venv")
os.environ['PATH'] = os.path.abspath(".venv/bin") + ":" + os.environ['PATH']

# Install packages into the created virtual environment
!uv pip install -U langchain langgraph langsmith langchain-huggingface transformers accelerate bitsandbytes torch yfinance tavily-python python-dotenv --quiet

#### Activate the virtual environment.

In [3]:
!source .venv/bin/activate

#### Install dependencies

In [4]:
import sys
!uv pip install --python {sys.executable} -U langchain langgraph langsmith langchain-community langchain-tavily langchain-huggingface transformers accelerate bitsandbytes torch yfinance tavily-python ipywidgets

### 1.2: API Keys and Environment Configuration

We need three keys for this notebook:
- **LangSmith API Key:** For tracing and debugging. Absolutely essential for production systems.
- **Hugging Face Token:** To download the Llama 3 model.
- **Tavily API Key:** For our real-world news and search tool.

Get your keys here:
- LangSmith: [smith.langchain.com](https://smith.langchain.com)
- Hugging Face: [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
- Tavily: [app.tavily.com](https://app.tavily.com)

In [38]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Configure LangSmith for tracing
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "Industrial - Parallel Tool Use"

# Verify API keys
print("API keys loaded:")
print("OPENAI_API_KEY:", os.getenv("OPENAI_API_KEY"))
print("HUGGING_FACE_HUB_TOKEN:", os.getenv("HUGGING_FACE_HUB_TOKEN"))
print("LANGCHAIN_API_KEY:", os.getenv("LANGCHAIN_API_KEY"))


## Part 2: Defining Production-Grade Components

Here, we define our LLM and integrate our live API tools.

### 2.1: The Language Model (LLM)

We'll use `meta-llama/Meta-Llama-3-8B-Instruct` as our agent's brain. Loading with 4-bit quantization via `bitsandbytes` (`load_in_4bit=True`) is a crucial technique for running large models on consumer hardware. `device_map="auto"` intelligently distributes the model across available GPUs and CPU memory.

In [39]:
from huggingface_hub import login
login(new_session=False)

In [56]:
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

model_id = "meta-llama/Llama-3.2-3B-Instruct"
device = "mps" if torch.backends.mps.is_available() else "cpu"

print(f"Loading model on {device}...")

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16 if device == "mps" else torch.float32,
    device_map=device,
    low_cpu_mem_usage=True
)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=2048,
    do_sample=False,
    repetition_penalty=1.1
)

llm_pipeline = HuggingFacePipeline(pipeline=pipe)
llm = ChatHuggingFace(llm=llm_pipeline)

print(f"✅ LLM loaded: {model_id}")

In [58]:
from langchain_core.tools import tool
from langchain_community.tools.tavily_search import TavilySearchResults
import yfinance as yf
import os

# Tool 1: Stock Price
@tool
def get_stock_price(symbol: str) -> float:
    """Get the current stock price for a given stock symbol using Yahoo Finance."""
    print(f"--- [Tool Call] Executing get_stock_price for symbol: {symbol} ---")
    ticker = yf.Ticker(symbol)
    price = ticker.info.get('regularMarketPrice', ticker.info.get('currentPrice'))
    if price is None:
        return f"Could not find price for symbol {symbol}"
    return price

# Tool 2: Company News
tavily_search = TavilySearchResults(
    max_results=5,
    api_key=os.getenv("TAVILY_API_KEY")
)

@tool
def get_recent_company_news(company_name: str) -> list:
    """Get recent news articles and summaries for a given company name using the Tavily search engine."""
    print(f"--- [Tool Call] Executing get_recent_company_news for: {company_name} ---")
    query = f"latest news about {company_name}"
    return tavily_search.invoke(query)

# CREATE THE TOOLS LIST
tools = [get_stock_price, get_recent_company_news]

print(f"✅ Tools defined: {[tool.name for tool in tools]}")

### 2.2: The Real-World Tools

This is where our implementation becomes real. We'll define two tools that call live APIs.

#### 2.2.1: Stock Price Tool (`yfinance`)

We use the `yfinance` library to get real-time stock data. The `@tool` decorator's docstring is critical: it's the instruction manual the LLM uses to understand how to use the tool.

Let's test the `get_stock_price` tool to see its live output.

In [59]:
get_stock_price.invoke({"symbol": "NVDA"})

#### 2.2.2: Company News Tool (`Tavily`)

We'll use the `TavilySearchResults` tool, a powerful search API optimized for LLMs. We'll wrap it in our own `@tool` function to provide a more specific docstring, guiding the LLM to use it specifically for finding recent news.

Let's test the news tool.

In [60]:
get_recent_company_news.invoke({"company_name": "NVIDIA"})

### 2.3: Binding Tools and Creating the Executor

We now collect our tools and bind them to the LLM. The `ToolNode` is a LangGraph utility that efficiently calls the requested tools.

In [61]:
from langgraph.prebuilt import ToolNode

# Create tool node
tool_node = ToolNode(tools)

llm_with_tools = llm.bind_tools(tools)

print("✅ llm_with_tools created.")

In [62]:
# Check what's defined
print("llm defined:", 'llm' in dir())
print("tools defined:", 'tools' in dir())
print("llm_with_tools defined:", 'llm_with_tools' in dir())
print("app defined:", 'app' in dir())

In [63]:
from typing import TypedDict, Annotated, List
from langchain_core.messages import BaseMessage
import operator

class AgentState(TypedDict):
    """State schema for the agent graph."""
    messages: Annotated[List[BaseMessage], operator.add]
    performance_log: Annotated[List[str], operator.add]

print("✅ AgentState defined")

In [64]:
import time

def call_model(state: AgentState):
    """The agent node: calls the LLM."""
    print("--- AGENT: Invoking LLM --- ")
    start_time = time.time()
    
    messages = state['messages']
    response = llm_with_tools.invoke(messages)
    
    end_time = time.time()
    execution_time = end_time - start_time
    
    log_entry = f"[AGENT] LLM call took {execution_time:.2f} seconds."
    print(log_entry)
    
    return {
        "messages": [response],
        "performance_log": [log_entry]
    }

print("✅ call_model defined")

In [65]:
from langgraph.graph import END

def should_continue(state: AgentState) -> str:
    """Determine whether to continue to tools or end."""
    last_message = state['messages'][-1]
    if hasattr(last_message, 'tool_calls') and last_message.tool_calls:
        return "tools"
    return END

print("✅ should_continue defined")

In [66]:
from langgraph.graph import StateGraph
from langgraph.prebuilt import ToolNode

# Create tool node
tool_node = ToolNode(tools)

# Define the graph
workflow = StateGraph(AgentState)  # ✅ Now AgentState exists!
workflow.add_node("agent", call_model)
workflow.add_node("tools", tool_node)
workflow.set_entry_point("agent")
workflow.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
workflow.add_edge("tools", "agent")

# Compile
app = workflow.compile()

print("✅ Graph constructed and compiled successfully!")

## Part 3: Building an Instrumented LangGraph Workflow

Now we define the graph itself, adding detailed instrumentation to track performance and state changes.

### 3.1: Defining an Enhanced Graph State

We will expand our state beyond just messages. To properly instrument our agent, we will also track performance metrics. We'll use a `TypedDict` to define a structured state. `Annotated` allows us to add a reducer function (`operator.add`) to `execution_time` so that it accumulates across steps, as explained in the 'Graph API Overview' blog.


In [67]:
from typing import TypedDict, Annotated, List
from langchain_core.messages import BaseMessage
import operator

class AgentState(TypedDict):
    # The history of messages
    messages: Annotated[List[BaseMessage], operator.add]
    # A list to store performance data for each step
    performance_log: Annotated[List[str], operator.add]

### 3.2: Defining Instrumented Graph Nodes

Our nodes will now do more than just their core task; they will also measure their own execution time and log their actions to the state.

In [69]:
import time

def call_model(state: AgentState):
    """The agent node: calls the LLM, measures performance, and logs the result."""
    print("--- AGENT: Invoking LLM --- ")
    start_time = time.time()
    
    messages = state['messages']
    response = llm_with_tools.invoke(messages)
    
    end_time = time.time()
    execution_time = end_time - start_time
    
    # Log performance
    log_entry = f"[AGENT] LLM call took {execution_time:.2f} seconds."
    print(log_entry)
    
    # The response is a new message to be added to the list
    return {
        "messages": [response],
        "performance_log": [log_entry]
    }

In [70]:
from langchain_core.messages import ToolMessage

def call_tool(state: AgentState):
    """The tool node: executes tools, measures performance, and logs the results."""
    print("--- TOOLS: Executing tool calls --- ")
    start_time = time.time()
    
    last_message = state['messages'][-1]
    tool_invocations = last_message.tool_calls
    
    # The ToolExecutor can batch-execute tool calls.
    # For sync tools, this is sequential. For async tools, it would be parallel.
    responses = tool_executor.batch(tool_invocations)
    
    end_time = time.time()
    execution_time = end_time - start_time
    
    # Log performance
    log_entry = f"[TOOLS] Executed {len(tool_invocations)} tools in {execution_time:.2f} seconds."
    print(log_entry)
    
    # Format responses as ToolMessages
    tool_messages = [
        ToolMessage(content=str(response), tool_call_id=call['id'])
        for call, response in zip(tool_invocations, responses)
    ]
    
    return {
        "messages": tool_messages,
        "performance_log": [log_entry]
    }

### 3.3: Defining the Graph Edges and Assembling the Graph

The logic for routing remains the same as our basic example. This conditional branching is a core pattern in LangGraph. After defining the edge, we assemble the full graph.

In [71]:
from langgraph.graph import END, StateGraph

def should_continue(state: AgentState) -> str:
    last_message = state['messages'][-1]
    if last_message.tool_calls:
        return "tools"
    return END

# Define the graph
workflow = StateGraph(AgentState)
workflow.add_node("agent", call_model)
workflow.add_node("tools", call_tool)
workflow.set_entry_point("agent")
workflow.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
workflow.add_edge("tools", "agent")

# Compile the graph into a runnable app
app = workflow.compile()

print("Graph constructed and compiled successfully.")
print("The agent is ready to be run.")

### 3.4: Visualizing the Graph

Visualizing the graph helps confirm our logic. The structure is a simple loop: the agent thinks, optionally uses tools, and then thinks again with the new information.

**Diagram Description:** The diagram shows `__start__` connected to `agent`. The `agent` node has two conditional outputs: one to `__end__` and one to `tools`. The `tools` node has a single, unconditional edge leading back to `agent`, forming the agent's core reasoning and action loop.

In [72]:
# Generate interactive HTML visualization
html_content = """
<!DOCTYPE html>
<html>
<head>
    <title>LangGraph Visualization</title>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/cytoscape/3.26.0/cytoscape.min.js"></script>
    <style>
        #cy { width: 100%; height: 500px; border: 2px solid #ccc; }
    </style>
</head>
<body>
    <h2>🔄 Agent Graph Visualization</h2>
    <div id="cy"></div>
    <script>
        var cy = cytoscape({
            container: document.getElementById('cy'),
            elements: [
                { data: { id: 'start', label: '__start__' } },
                { data: { id: 'agent', label: 'agent' } },
                { data: { id: 'tools', label: 'tools' } },
                { data: { id: 'end', label: '__end__' } },
                { data: { source: 'start', target: 'agent' } },
                { data: { source: 'agent', target: 'tools', label: 'has_tool_calls' } },
                { data: { source: 'agent', target: 'end', label: 'no_tool_calls' } },
                { data: { source: 'tools', target: 'agent' } }
            ],
            style: [
                {
                    selector: 'node',
                    style: {
                        'background-color': '#4A90E2',
                        'label': 'data(label)',
                        'color': '#fff',
                        'text-valign': 'center',
                        'width': 60,
                        'height': 60
                    }
                },
                {
                    selector: 'edge',
                    style: {
                        'width': 2,
                        'line-color': '#666',
                        'target-arrow-color': '#666',
                        'target-arrow-shape': 'triangle',
                        'curve-style': 'bezier',
                        'label': 'data(label)',
                        'font-size': 10
                    }
                }
            ],
            layout: { name: 'breadthfirst', directed: true }
        });
    </script>
</body>
</html>
"""

# Save to file
with open('graph_viz.html', 'w') as f:
    f.write(html_content)

print("✅ Interactive visualization saved to: graph_viz.html")
print("Open it in your browser!")

## Part 4: Running and Analyzing the Instrumented Agent

Now we execute the agent. We'll stream the full state at each step (`stream_mode='values'`) to see exactly how `messages` and `performance_log` evolve. The user query is designed to trigger both tools.

In [73]:
from langchain_core.messages import HumanMessage

inputs = {
    "messages": [HumanMessage(content="What is the current stock price of NVIDIA (NVDA) and what is the latest news about the company?")],
    "performance_log": []
}

step_counter = 1
final_state = None

for output in app.stream(inputs, stream_mode="values"):
    print(f"\n{'*' * 100}")
    print(f"**Step {step_counter}**")
    print(f"{'*' * 100}")
    
    # Print messages
    if 'messages' in output:
        print(f"\n💬 Messages: {len(output['messages'])} total")
        last_msg = output['messages'][-1]
        msg_type = type(last_msg).__name__
        print(f"   Last message type: {msg_type}")
        
        # Print content
        if hasattr(last_msg, 'content'):
            content = str(last_msg.content)[:200]
            print(f"   Content: {content}...")
        
        # Print tool calls if present
        if hasattr(last_msg, 'tool_calls') and last_msg.tool_calls:
            print(f"   Tool calls: {len(last_msg.tool_calls)}")
            for tc in last_msg.tool_calls:
                print(f"      • {tc.get('name', 'unknown')}")
    
    # Print performance log
    if 'performance_log' in output:
        print(f"\n⏱️  Performance:")
        for log in output['performance_log']:
            print(f"   {log}")
    
    print(f"\n{'-' * 100}")
    
    step_counter += 1
    final_state = output

print("\n" + "="*100)
print("EXECUTION COMPLETE")
print("="*100)

# Print final response
if final_state and 'messages' in final_state:
    last_msg = final_state['messages'][-1]
    if hasattr(last_msg, 'content'):
        print("\n📋 Final Response:")
        print(last_msg.content)