# 04: Tool Integration - Web Search

## Overview
In this notebook, we'll enhance our BS detector with web search capabilities. This addresses a key limitation: the detector can't fact-check claims against current information.

## What We'll Learn
1. How to integrate tools in LangGraph
2. Conditional tool usage based on confidence
3. Evidence-based verdict revision
4. Cost-effective tool usage patterns

## Architecture Diagram

Let's visualize how tools fit into our graph:

In [None]:
import base64
from IPython.display import Image

# Enhanced graph with tool integration
tool_graph_diagram = """
graph TB
    Start([Claim Input]) --> Initial[Initial Check<br/>baseline detector]
    
    Initial --> Router{Confidence<br/>Check}
    
    Router -->|High Confidence<br/>≥70%| Format1[Format Output]
    
    Router -->|Low Confidence<br/><70%| Query[Generate<br/>Search Queries]
    
    Query --> Search[Web Search<br/>🔍 DuckDuckGo]
    
    Search --> Analyze[Analyze<br/>Evidence]
    
    Analyze --> Revise[Revise<br/>Verdict]
    
    Revise --> Format2[Format Output]
    
    Format1 --> End([Final Result])
    Format2 --> End
    
    classDef startEnd fill:#f9f,stroke:#333,stroke-width:2px
    classDef process fill:#bbf,stroke:#333,stroke-width:2px
    classDef decision fill:#fbb,stroke:#333,stroke-width:2px
    classDef tool fill:#bfb,stroke:#333,stroke-width:3px
    
    class Start,End startEnd
    class Initial,Query,Analyze,Revise,Format1,Format2 process
    class Router decision
    class Search tool
"""

def render_mermaid_diagram(graph_def):
    """Render a Mermaid diagram using mermaid.ink API"""
    graph_bytes = graph_def.encode("utf-8")
    base64_string = base64.b64encode(graph_bytes).decode("ascii")
    image_url = f"https://mermaid.ink/img/{base64_string}?type=png"
    return Image(url=image_url)

render_mermaid_diagram(tool_graph_diagram)

## Setup

First, let's import everything we need:

In [None]:
# Add parent directory to path
import sys
from pathlib import Path
sys.path.append(str(Path.cwd().parent))

# Import our modules
from modules.m1_baseline import check_claim
from modules.m2_langgraph import check_claim_with_graph
from modules.m4_tools import (
    BSDetectorState,
    check_claim_with_tools,
    create_bs_detector_with_tools
)
from tools.search_tool import (
    WebSearchTool,
    generate_search_queries,
    search_for_evidence
)
from config.llm_factory import LLMFactory

print("✅ Imports successful!")
print("\n🔧 Available tools:")
print("  - DuckDuckGo Web Search")
print("  - Query Generation")
print("  - Evidence Extraction")

## 1. Understanding Web Search Tools

Let's explore how the web search tool works:

In [None]:
# Test the search tool directly
print("🔍 Web Search Tool Demo\n")

# Create search tool
search_tool = WebSearchTool(max_results=2)

# Test claim
test_claim = "The Concorde could fly at Mach 2.04"
print(f"Claim: \"{test_claim}\"\n")

# Generate search queries
queries = generate_search_queries(test_claim)
print("Generated queries:")
for i, query in enumerate(queries, 1):
    print(f"  {i}. {query}")

# Perform search on first query
print(f"\n🌐 Searching: \"{queries[0]}\"...")
result = search_tool.search_web(queries[0])

if result["success"]:
    print("\n📄 Search Results:")
    print(result["results"][:300] + "...")
    
    # Extract facts
    facts = search_tool.extract_facts([result])
    print("\n💡 Extracted Facts:")
    for i, fact in enumerate(facts[:3], 1):
        print(f"  {i}. {fact}")
else:
    print(f"❌ Search failed: {result['error']}")

## 2. The Enhanced State Model

Our new state tracks the entire fact-checking process:

In [None]:
# Examine the enhanced state
print("📊 Enhanced State Model:\n")

# Create a sample state
sample_state = BSDetectorState(
    claim="The Boeing 797 will have folding wings",
    confidence_threshold=70  # Search if confidence < 70%
)

print("State fields:")
for field, value in sample_state.model_dump().items():
    print(f"  {field}: {value}")

print("\n🔑 Key additions for tools:")
print("  - needs_search: Triggers tool usage")
print("  - search_queries: What to search for")
print("  - search_results: Raw search data")
print("  - extracted_facts: Key information")
print("  - evidence_supports_claim: Analysis result")

## 3. Comparing With and Without Tools

Let's see how tools improve fact-checking:

In [None]:
# Test claim that needs fact-checking
test_claim = "SpaceX was founded in 2002 by Elon Musk"

print(f"🧪 Testing: \"{test_claim}\"\n")
print("=" * 60)

# Without tools (baseline)
print("\n1️⃣ WITHOUT TOOLS (Baseline):")
llm = LLMFactory.create_llm()
baseline_result = check_claim(test_claim, llm)
print(f"Verdict: {baseline_result['verdict']}")
print(f"Confidence: {baseline_result['confidence']}%")
print(f"Reasoning: {baseline_result['reasoning'][:150]}...")

# With tools
print("\n" + "=" * 60)
print("\n2️⃣ WITH TOOLS (Web Search):")
tools_result = check_claim_with_tools(test_claim)
print(f"Verdict: {tools_result['verdict']}")
print(f"Confidence: {tools_result['confidence']}%")
print(f"Used Search: {'✅ Yes' if tools_result['used_search'] else '❌ No'}")
print(f"\nReasoning: {tools_result['reasoning']}")

## 4. Tool Usage Patterns

Tools are used conditionally based on confidence:

In [None]:
# Test different confidence scenarios
test_scenarios = [
    # High confidence - should skip search
    "Airplanes need wings to fly",
    
    # Low confidence - should trigger search
    "The new Boeing 797 will use hydrogen fuel",
    
    # Fact-checkable - should search
    "The Wright brothers first flew in 1903",
]

print("🎯 Testing Conditional Tool Usage\n")

for claim in test_scenarios:
    print(f"Claim: \"{claim}\"")
    result = check_claim_with_tools(claim)
    
    print(f"  → Confidence: {result['confidence']}%")
    print(f"  → Used Search: {'✅ Yes' if result['used_search'] else '❌ No'}")
    print(f"  → Verdict: {result['verdict']}")
    print()

## 5. Visualizing the Tool-Enhanced Graph

Let's trace through the execution:

In [None]:
# Create the graph
app = create_bs_detector_with_tools()

# Visualize the graph structure
print("🔀 Graph Structure:")
print(app.get_graph().draw_mermaid())

# You can copy the output above and paste into mermaid.live to see the graph

## 6. Deep Dive: Evidence Analysis

Let's see how evidence changes verdicts:

In [None]:
# Test a claim where evidence might change the verdict
claim = "The Antonov An-225 is still the largest aircraft in the world"

print(f"🔬 Deep Dive: \"{claim}\"\n")

# Run with detailed tracking
app = create_bs_detector_with_tools()
config = {"configurable": {"thread_id": "deep-dive"}}
state = BSDetectorState(claim=claim)

# Execute step by step
result = app.invoke(state.model_dump(), config)

# Show the journey
print("📍 Execution Path:")
print(f"1. Initial verdict: {result.get('initial_verdict')} (confidence: {result.get('initial_confidence')}%)")
print(f"2. Needed search: {result.get('needs_search')}")

if result.get('used_search'):
    print(f"3. Search queries: {len(result.get('search_queries', []))} generated")
    print(f"4. Facts found: {len(result.get('extracted_facts', []))}")
    print(f"5. Evidence supports: {result.get('evidence_supports_claim')}")

print(f"\n📊 Final Result:")
print(f"Verdict: {result.get('final_verdict')}")
print(f"Confidence: {result.get('final_confidence')}%")
print(f"\nReasoning:\n{result.get('final_reasoning')}")

## 7. Cost-Effective Tool Usage

Tools should be used wisely to manage costs:

In [None]:
# Analyze tool usage patterns
test_claims = [
    "Water boils at 100 degrees Celsius",  # High confidence, skip search
    "The Boeing 777X first flew in 2020",   # Specific fact, needs search
    "Aliens built the pyramids",            # Obvious BS, might skip search
    "Quantum computers can break Bitcoin",  # Complex claim, needs search
]

print("💰 Cost-Effective Tool Usage Analysis\n")

search_count = 0
for claim in test_claims:
    result = check_claim_with_tools(claim)
    if result['used_search']:
        search_count += 1
        status = "🔍 SEARCHED"
    else:
        status = "⏭️  SKIPPED"
    
    print(f"{status} | {claim[:40]}... | Confidence: {result['confidence']}%")

print(f"\n📊 Summary:")
print(f"Total claims: {len(test_claims)}")
print(f"Searches performed: {search_count}")
print(f"Search rate: {search_count/len(test_claims)*100:.0f}%")
print(f"\n💡 Only searching when needed saves resources!")

## 8. Interactive Tool-Enhanced Detection

Try your own claims:

In [None]:
def interactive_tool_detection():
    """Interactive tool-enhanced BS detection"""
    print("🤖 Tool-Enhanced BS Detector")
    print("Type 'quit' to exit\n")
    
    while True:
        claim = input("\nEnter a claim to check: ").strip()
        
        if claim.lower() == 'quit':
            break
            
        if not claim:
            continue
        
        print("\n🔄 Processing...")
        result = check_claim_with_tools(claim)
        
        print(f"\n📊 Results:")
        print(f"Verdict: {result['verdict']}")
        print(f"Confidence: {result['confidence']}%")
        print(f"Used Web Search: {'Yes' if result['used_search'] else 'No'}")
        
        if result['used_search'] and result['sources']:
            print(f"Sources Consulted: {len(result['sources'])}")
        
        print(f"\nReasoning: {result['reasoning'][:300]}...")
        
        # Ask for feedback
        feedback = input("\nWas this helpful? (y/n): ").strip().lower()
        if feedback == 'y':
            print("Great! Tools make fact-checking more reliable.")
        elif feedback == 'n':
            print("Thanks for the feedback. Tools aren't perfect but help with facts!")
    
    print("\n👋 Thanks for testing!")

# Uncomment to run
# interactive_tool_detection()

## 9. Evaluating Tool Impact

Let's measure how tools improve accuracy:

In [None]:
# Load evaluation framework
from modules.m3_evaluation import BSDetectorEvaluator
import os

# Test on fact-checkable claims
fact_checkable_claims = [
    {
        "claim": "The Concorde could fly at Mach 2.04",
        "verdict": "LEGITIMATE",
        "type": "historical fact"
    },
    {
        "claim": "The Boeing 747 has six engines",
        "verdict": "BS",
        "type": "technical fact"
    },
    {
        "claim": "SpaceX was founded in 2010",
        "verdict": "BS",
        "type": "date fact"
    }
]

print("📊 Tool Impact Evaluation\n")

# Test without tools
print("Without Tools:")
baseline_correct = 0
for item in fact_checkable_claims:
    result = check_claim_with_graph(item["claim"])
    if result["verdict"] == item["verdict"]:
        baseline_correct += 1
        print(f"✅ {item['type']}: Correct")
    else:
        print(f"❌ {item['type']}: Wrong")

# Test with tools
print("\nWith Tools:")
tools_correct = 0
for item in fact_checkable_claims:
    result = check_claim_with_tools(item["claim"])
    if result["verdict"] == item["verdict"]:
        tools_correct += 1
        print(f"✅ {item['type']}: Correct (search: {'Yes' if result['used_search'] else 'No'})")
    else:
        print(f"❌ {item['type']}: Wrong")

print(f"\n📈 Results:")
print(f"Baseline accuracy: {baseline_correct}/{len(fact_checkable_claims)} ({baseline_correct/len(fact_checkable_claims)*100:.0f}%)")
print(f"With tools accuracy: {tools_correct}/{len(fact_checkable_claims)} ({tools_correct/len(fact_checkable_claims)*100:.0f}%)")

if tools_correct > baseline_correct:
    print("\n✨ Tools improved accuracy on fact-checkable claims!")
else:
    print("\n🤔 Results vary - tools help most with current events and specific facts")

## Summary

### What We Learned
1. **Tool Integration**: Added DuckDuckGo search to LangGraph
2. **Conditional Usage**: Tools activate based on confidence levels
3. **Evidence Analysis**: LLM analyzes search results to revise verdicts
4. **Cost Efficiency**: Only search when needed to save resources

### Key Patterns
- **Tool Node**: Dedicated node for tool execution
- **Routing Logic**: Conditional edges based on state
- **Evidence Flow**: Search → Extract → Analyze → Revise
- **Graceful Fallback**: Handle search failures gracefully

### Tool Benefits
- ✅ Fact-check current events
- ✅ Verify specific claims
- ✅ Provide source attribution
- ✅ Increase confidence on factual claims

### Next Steps
In Iteration 5, we'll add human-in-the-loop for cases where even tools aren't enough!

### 🎯 Challenge
Before moving on, try:
1. Testing claims about recent events
2. Adjusting the confidence threshold
3. Adding a new search engine tool

Remember: Tools extend agent capabilities from reasoning to real-world interaction!