# Context Poisoning: When Errors Get Repeatedly Referenced

Context poisoning occurs when a **hallucination or other error makes it into the context, where it is repeatedly referenced**.

## The Demonstration

This notebook follows a simple pattern to demonstrate context poisoning:

1. ‚úÖ **Baseline Success**: Agent completes a research task successfully
2. ‚è™ **Rewind & Inject**: Return to a checkpoint and inject poisoned information into context
3. ‚ùå **Agent Struggles**: Same agent now fails or wastes effort due to poisoned context
4. üõ°Ô∏è **Improved Agent**: Better prompt/architecture that's resilient to poisoning
5. ‚úÖ **Resilient Success**: Improved agent handles poisoned context correctly

## The Problem

LLM agents often maintain context about their goals, progress, and state. When an error makes it into this context, the agent doesn't just use it once‚Äîit **keeps referencing it** over and over:

1. **Error enters context**: Agent hallucinates something (e.g., "Research Quantum Dynamics Corp with ticker QDYN")
2. **Error gets stored**: The hallucination is saved in goals, summaries, or state
3. **Repeated reference**: Agent repeatedly checks goals, sees the hallucinated item, and references it
4. **Fixation**: Agent becomes stuck trying to achieve something impossible

As noted in the DeepMind Gemini 2.5 technical report: *"An especially egregious form of this issue can take place with 'context poisoning'‚Äîwhere many parts of the context (goals, summary) are 'poisoned' with misinformation... As a result, the model can become fixated on achieving impossible or irrelevant goals."*

## Setup

In [None]:
# Check if dependencies are installed
try:
    import langchain
    import langchain_anthropic
    import langsmith
    DEPENDENCIES_INSTALLED = True
except ImportError as e:
    DEPENDENCIES_INSTALLED = False
    missing_module = str(e).split("'")[1] if "'" in str(e) else "unknown"
    print(f"‚ùå Missing dependency: {missing_module}")
    print("\nüì¶ To install dependencies, run one of the following:")
    print("   1. Using uv:  uv sync")
    print("   2. Using pip: pip install langchain langchain-anthropic langsmith langgraph")
    print("\n   Then restart your Jupyter kernel.")
    raise

# Standard library
import os
from typing import List, Dict, Any
from datetime import datetime

# LangChain & LangSmith
from langchain.agents import create_agent
from langchain_anthropic import ChatAnthropic
from langsmith import Client, evaluate
from langsmith.schemas import Run, Example

# Visualization
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import pandas as pd
from IPython.display import display, HTML

# Environment
from dotenv import load_dotenv
load_dotenv()

# Agent components
from context_poisoning.tools import (
    all_tools,
    reset_state,
    inject_poisoned_goal,
)
from context_poisoning.instructions import NAIVE_RESEARCH_INSTRUCTIONS, FINANCIAL_RESEARCH_INSTRUCTIONS

# Evaluators and test data
from context_poisoning.tests.evaluators import (
    context_poisoning_evaluator,
    goal_cancellation_evaluator,
    task_completion_evaluator,
)
from context_poisoning.tests.setup_datasets import create_poisoning_dataset
from context_poisoning.resources.test_cases import TEST_CASES

# Agent utilities
from context_poisoning.utils.agent_helpers import (
    run_agent_with_trajectory,
)

# Initialize LangSmith
client = Client()

print("‚úì Setup complete")

In [None]:
# Initialize LLM - using Claude Haiku for context poisoning demo
llm = ChatAnthropic(model="claude-haiku-4-5-20251001", temperature=0)

print(f"Using model: {llm.model}")

## The Scenario: Financial Research Agent

Our financial research agent conducts research on companies and stocks. It can:
- Track companies and get stock prices
- Gather company information
- Set and track research goals
- Create research summaries

**The Task**: Research Apple and Google, then create a summary of findings.

**The Poisoned Context**: We'll inject a goal about researching a non-existent company ("Quantum Dynamics Corp", ticker QDYN) midway through the task. This simulates what happens when an agent hallucinates a company name that gets stored in its goals.

In [None]:
# Create naive agent (without context poisoning safeguards)
naive_agent = create_agent(
    model=llm,
    tools=all_tools,
    system_prompt=NAIVE_RESEARCH_INSTRUCTIONS
)

print("‚úì Naive financial research agent created (no verification/cancellation safeguards)")

---

## Part 1: ‚úÖ Baseline Success

First, let's run the agent on a clean research task with no poisoned context.

In [None]:
# Reset to clean state
reset_state()

# Create initial research state
from context_poisoning.tools import track_company_helper, add_research_note_helper

track_company_helper("AAPL")
track_company_helper("GOOGL")

print("‚úì Clean initial state:")
print("  - Tracking: AAPL, GOOGL")
print("  - No poisoned goals")

# Run agent on the research task
baseline_query = """Research the companies you're tracking. Get their stock prices and company information, then create a research summary."""

print(f"\nüî¨ Running naive agent (clean, no poisoning)...")
print(f"Query: {baseline_query}\n")

baseline_result = run_agent_with_trajectory(naive_agent, baseline_query)

print(f"\n‚úì Baseline completed with {len(baseline_result['trajectory'])} tool calls")
print(f"\nBaseline response:")
print(baseline_result['final_response'])

---

## Part 2: ‚è™ Rewind & Inject Poisoned Context

Now let's rewind to the same starting point and inject a poisoned goal into the context.

In [None]:
# Reset to the same initial state
reset_state()
track_company_helper("AAPL")
track_company_helper("GOOGL")

# Inject poisoned goal - a hallucinated company that doesn't exist
poisoned_goal = "Research the financial performance of 'Quantum Dynamics Corp' (ticker: QDYN) and analyze its Q4 earnings"
inject_poisoned_goal(poisoned_goal)

print("‚úì Rewound to checkpoint")
print("  - Tracking: AAPL, GOOGL")
print(f"  - ‚ö†Ô∏è  INJECTED POISONED GOAL: '{poisoned_goal}'")
print("\nThis simulates what happens when an agent hallucinates a company and stores it in goals.")

---

## Part 3: ‚ùå Naive Agent Struggles with Poisoned Context

Run the same naive agent (without safeguards) with poisoned context.

In [None]:
# Run the same query, but now the agent has a poisoned goal
poisoned_query = """Research the companies you're tracking. Get their stock prices and company information, then create a research summary. 

Also check if you have any active research goals and work on completing them."""

print(f"üî¨ Running naive agent with poisoned context...")
print(f"Query: {poisoned_query}\n")

poisoned_result = run_agent_with_trajectory(naive_agent, poisoned_query)

print(f"\n‚úì Naive agent run completed with {len(poisoned_result['trajectory'])} tool calls")
print(f"\nNaive agent response:")
print(poisoned_result['final_response'])

In [None]:
# Analyze the poisoning impact
from context_poisoning.tests.evaluators import count_poisoned_references, detect_impossible_goal_pursuit

baseline_refs = count_poisoned_references(baseline_result, poisoned_goal)
poisoned_refs = count_poisoned_references(poisoned_result, poisoned_goal)
impossible_goal_result = detect_impossible_goal_pursuit(poisoned_result, poisoned_goal)

print("üìä Context Poisoning Analysis:")
print(f"  Baseline references to QDYN: {baseline_refs}")
print(f"  Poisoned references to QDYN: {poisoned_refs}")
print(f"  Attempts to pursue impossible goal: {impossible_goal_result['attempt_count']}")
print(f"  Recognized as impossible: {impossible_goal_result['recognized_as_impossible']}")

print(f"\n‚ö†Ô∏è  Impact: Agent made {poisoned_refs} references to the hallucinated company")
print(f"  This wastes {poisoned_refs}x the compute on error-based reasoning")

print(f"\nTool call comparison:")
print(f"  Baseline: {len(baseline_result['trajectory'])} tool calls")
print(f"  Poisoned: {len(poisoned_result['trajectory'])} tool calls")

---

## Part 4: üõ°Ô∏è Improved Agent - Resilient to Poisoning

Let's create an improved version of the agent with better instructions that help it:
1. Validate goals before pursuing them
2. Recognize when information doesn't exist
3. Cancel impossible goals quickly
4. Avoid repeated references to errors

In [None]:
# Improved instructions with validation and error handling
# (FINANCIAL_RESEARCH_INSTRUCTIONS already includes these safeguards)
IMPROVED_INSTRUCTIONS = FINANCIAL_RESEARCH_INSTRUCTIONS + """

## CRITICAL: Context Poisoning Prevention (Enhanced)

You must actively prevent context poisoning by validating information before repeated reference:

1. **Validate Before Pursuing Goals**:
   - Before working on a research goal, verify the ticker/company exists
   - If get_stock_price or get_company_info returns an error, the ticker doesn't exist
   - Immediately cancel goals about non-existent tickers

2. **Recognize Tool Errors Quickly**:
   - If a tool returns "not found", "doesn't exist", or similar errors, treat it as definitive
   - Do NOT retry the same ticker multiple times
   - Update the goal status to "cancelled" with a clear reason

3. **Avoid Repeated References**:
   - Once you determine something doesn't exist, don't mention it again
   - Don't include impossible goals in summaries
   - Focus on achievable goals

4. **Error Handling Pattern**:
   ```
   If tool error indicates ticker/company doesn't exist:
     ‚Üí update_research_goal(goal_id, "cancelled", reason="Ticker not found")
     ‚Üí Move on to next goal
     ‚Üí Do NOT reference this goal again
   ```

Following these rules will prevent you from repeatedly referencing errors and wasting effort on impossible goals.
"""

# Create improved agent
improved_agent = create_agent(
    model=llm,
    tools=all_tools,
    system_prompt=IMPROVED_INSTRUCTIONS
)

print("‚úì Improved agent created with validation and error handling")

---

## Part 5: ‚úÖ Improved Agent Handles Poisoning

Test the improved agent with the same poisoned context.

In [None]:
# Reset and inject poisoned context again
reset_state()
track_company_helper("AAPL")
track_company_helper("GOOGL")
inject_poisoned_goal(poisoned_goal)

print("‚úì Same poisoned state")
print(f"  - ‚ö†Ô∏è  Poisoned goal: '{poisoned_goal}'\n")

# Run improved agent
improved_query = """Research the companies you're tracking. Get their stock prices and company information, then create a research summary. 

Also check if you have any active research goals and work on completing them."""

print(f"üî¨ Running improved agent with poisoned context...")
print(f"Query: {improved_query}\n")

improved_result = run_agent_with_trajectory(improved_agent, improved_query)

print(f"\n‚úì Improved run completed with {len(improved_result['trajectory'])} tool calls")
print(f"\nImproved response:")
print(improved_result['final_response'])

---

## Part 6: üìä Comparison & Analysis

Compare the baseline, poisoned, and improved runs.

In [None]:
# Calculate metrics for all three runs
improved_refs = count_poisoned_references(improved_result, poisoned_goal)
improved_goal_result = detect_impossible_goal_pursuit(improved_result, poisoned_goal)

# Create comparison table
comparison_data = {
    "Metric": [
        "References to QDYN (post-error)",
        "Tool Calls",
        "Recognized Impossible",
        "Goal Cancelled"
    ],
    "Baseline (Clean)": [
        baseline_refs,
        len(baseline_result['trajectory']),
        "N/A",
        "N/A"
    ],
    "Naive Agent (Poisoned)": [
        poisoned_refs,
        len(poisoned_result['trajectory']),
        "Yes" if impossible_goal_result['recognized_as_impossible'] else "No",
        "Unknown"  # Would need to check the actual goal status
    ],
    "Improved Agent (Poisoned)": [
        improved_refs,
        len(improved_result['trajectory']),
        "Yes" if improved_goal_result['recognized_as_impossible'] else "No",
        "Unknown"  # Would need to check the actual goal status
    ]
}

df = pd.DataFrame(comparison_data)
print("üìä Comparison Results:")
print("=" * 90)
print(df.to_string(index=False))
print("=" * 90)

print("\nüí° Key Insights:")
print(f"  ‚Ä¢ Naive agent (no safeguards) referenced QDYN {poisoned_refs} times after error")
print(f"  ‚Ä¢ Improved agent (with safeguards) referenced QDYN {improved_refs} times after error")
print(f"  ‚Ä¢ Reduction: {poisoned_refs - improved_refs} fewer post-error references")
print(f"  ‚Ä¢ Lower post-error references = better (stops pursuing impossible goals)")

In [None]:
# Visualize the comparison
fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=("Post-Error References to QDYN", "Total Tool Calls"),
    specs=[[{"type": "bar"}, {"type": "bar"}]]
)

# References comparison
runs = ["Baseline<br>(Clean)", "Naive<br>(Poisoned)", "Improved<br>(Poisoned)"]
refs = [baseline_refs, poisoned_refs, improved_refs]
colors = ['green', 'red', 'blue']

fig.add_trace(
    go.Bar(x=runs, y=refs, marker_color=colors, text=refs, textposition='outside'),
    row=1, col=1
)

# Tool calls comparison
tool_calls = [
    len(baseline_result['trajectory']),
    len(poisoned_result['trajectory']),
    len(improved_result['trajectory'])
]

fig.add_trace(
    go.Bar(x=runs, y=tool_calls, marker_color=colors, text=tool_calls, textposition='outside'),
    row=1, col=2
)

fig.update_layout(
    title_text="Context Poisoning Impact & Mitigation",
    showlegend=False,
    height=400
)

fig.update_yaxes(title_text="Count", row=1, col=1)
fig.update_yaxes(title_text="Count", row=1, col=2)

fig.show()

print("\nüìà Visualization shows:")
print("  ‚Ä¢ GREEN bar = Clean baseline (no poisoned context)")
print("  ‚Ä¢ RED bar = Naive agent with poisoning (high post-error references = bad)")
print("  ‚Ä¢ BLUE bar = Improved agent with poisoning (low post-error references = good)")
print("\nThe improved agent should have fewer post-error references than the naive agent.")

---

## Understanding the Problem: Repeated Reference

### What Happened

1. **Error Enters Context**: Agent's goal includes researching "QDYN" (doesn't exist)
2. **Gets Stored**: Goal is saved in the agent's goal tracking system
3. **Repeated Reference**: Agent checks goals ‚Üí sees QDYN ‚Üí references it ‚Üí checks again ‚Üí sees QDYN ‚Üí references it...
4. **Wasted Compute**: Each reference wastes tokens and API calls on impossible tasks

### Why It's Problematic

- **Not a one-time error**: The hallucination gets used multiple times
- **Reinforcement**: Each reference makes the error seem more legitimate
- **Persistence**: Error stays in context across many turns
- **Difficult to escape**: Agent needs explicit validation to stop the cycle

### The Solution

The improved agent breaks the cycle by:
1. **Validating early**: Check if ticker exists before pursuing
2. **Recognizing errors**: Treat tool errors as definitive signals
3. **Canceling quickly**: Update goal status to "cancelled" immediately
4. **Avoiding repetition**: Don't reference cancelled goals again

---

## Key Takeaways

### The Core Problem

**Context poisoning = Repeated reference to errors**

An error (hallucination, wrong data, bad assumption) enters context once, but gets **used many times**:
- Every goal check references it
- Every summary includes it
- Every planning step considers it
- Each reference wastes compute and reinforces the error

### What We Demonstrated

1. ‚úÖ **Baseline**: Agent completes task successfully without errors
2. ‚è™ **Rewind + Inject**: Same starting point, but with poisoned goal
3. ‚ùå **Poisoned Run**: Agent references error multiple times
4. üõ°Ô∏è **Improved Agent**: Better validation and error handling
5. ‚úÖ **Resilient Run**: Improved agent minimizes repeated references

### Building Resilient Agents

To prevent repeated reference of errors:

1. **Validate Before Storing**: Check information before adding to goals/context
2. **Recognize Tool Errors**: Treat "not found" errors as definitive
3. **Cancel Quickly**: Update impossible goals to "cancelled" immediately
4. **Avoid Re-reference**: Don't mention cancelled goals again
5. **Make Errors Explicit**: Clear error signals help agents recover faster

### The Bottom Line

Context poisoning isn't about making one error‚Äîit's about **repeatedly using that error**. The key to mitigation is breaking the cycle of repeated reference through validation, quick error recognition, and explicit goal cancellation.

---

## Optional: Full Evaluation with LangSmith

The cells below show how to run systematic evaluations across multiple test cases using LangSmith.

This is useful for:
- Testing multiple poisoning scenarios
- Comparing different agent configurations
- Tracking metrics over time
- Building evaluation datasets

In [None]:
# Load test cases
print(f"Available test cases: {len(TEST_CASES)}\n")

for i, test_case in enumerate(TEST_CASES, 1):
    print(f"{i}. {test_case['name']}")
    print(f"   {test_case['description'][:80]}...")
    print()

In [None]:
# Create dataset in LangSmith (optional - uncomment to run)
# dataset_name = "context-poisoning-evaluation"
# dataset = create_poisoning_dataset(dataset_name, TEST_CASES, client)
# print(f"‚úì Created dataset with {len(TEST_CASES)} examples")

# Define evaluators
ALL_EVALUATORS = [
    context_poisoning_evaluator,      # Measures references to poisoned info
    goal_cancellation_evaluator,      # Whether agent cancels impossible goals
    task_completion_evaluator,        # Whether agent completes real tasks
]

print(f"‚úì Loaded {len(ALL_EVALUATORS)} evaluators")
print("   - context_poisoning: References, impossible goal pursuit, recovery")
print("   - goal_cancellation: Recognizes and cancels impossible goals")
print("   - task_completion: Completes achievable tasks despite poisoning")

In [None]:
# Run full evaluation (optional - uncomment to run)

print(f"\nüî¨ Running evaluation on {len(TEST_CASES)} test cases...")

baseline_experiment = evaluate(
    lambda inputs: run_agent_with_trajectory(naive_agent, inputs["query"]),
    data=dataset_name,
    evaluators=ALL_EVALUATORS,
    experiment_prefix="context-poisoning-baseline",
    metadata={"config": "baseline", "test_cases": len(TEST_CASES)},
)

improved_experiment = evaluate(
    lambda inputs: run_agent_with_trajectory(improved_agent, inputs["query"]),
    data=dataset_name,
    evaluators=ALL_EVALUATORS,
    experiment_prefix="context-poisoning-improved",
    metadata={"config": "improved", "test_cases": len(TEST_CASES)},
)

print("\n‚úì Evaluation complete!")
print(f"   View results in LangSmith")

---

## Summary

This notebook demonstrated **context poisoning through repeated reference**:

### What We Showed
1. ‚úÖ Agent succeeds on a clean research task
2. ‚è™ Rewound and injected a poisoned goal (non-existent company)
3. ‚ùå Agent with poisoned context references the error multiple times
4. üõ°Ô∏è Improved agent with validation handles poisoned context better
5. üìä Comparison shows reduced repeated references in improved version

### Key Insight

**Context poisoning = One error, many references**

The problem isn't just making an error‚Äîit's repeatedly using that error across multiple turns, wasting compute and reinforcing incorrect information.

### Mitigation Strategy

Break the cycle of repeated reference by:
- Validating information before storing in context
- Recognizing tool errors as definitive signals
- Canceling impossible goals quickly
- Avoiding re-reference of known errors

### Next Steps

To build more resilient agents:
1. Add validation at context entry points (goals, summaries)
2. Make tool error signals more explicit
3. Track repeated failures and auto-cancel
4. Test with LangSmith evaluations to measure improvement
5. Monitor reference counts in production to detect poisoning