# Attribution Detection in RAG Systems

**Lesson 13 - Interactive Notebook**

This notebook demonstrates how to detect whether LLM responses properly attribute claims to source documents.

## Learning Objectives
- Extract atomic claims from LLM responses
- Verify claims against retrieved context
- Measure attribution rate across test cases
- Identify attribution failures

## Execution Time & Cost
- **DEMO Mode**: ~2 minutes, $0.05 (20 test cases)
- **FULL Mode**: ~7 minutes, $2.00-3.00 (200 test cases)

**‚ö†Ô∏è COST WARNING**: Running FULL mode will use OpenAI API credits.

In [1]:
# Setup and imports
import json
import sys
from pathlib import Path

# Add parent directory to path for imports
sys.path.insert(0, str(Path.cwd().parent))

from datetime import datetime

from backend.rag_generation_eval import AttributionDetector

print("‚úÖ Imports successful")

‚úÖ Imports successful


In [3]:
# Configuration: DEMO or FULL mode
MODE = "DEMO"  # Change to "FULL" for comprehensive evaluation

N_SAMPLES = 20 if MODE == "DEMO" else 200
ESTIMATED_COST = "$0.05" if MODE == "DEMO" else "$2.00-3.00"
ESTIMATED_TIME = "2 minutes" if MODE == "DEMO" else "7 minutes"

print(f"üîß Mode: {MODE}")
print(f"üìä Samples: {N_SAMPLES}")
print(f"üí∞ Estimated Cost: {ESTIMATED_COST}")
print(f"‚è±Ô∏è  Estimated Time: {ESTIMATED_TIME}")

üîß Mode: DEMO
üìä Samples: 20
üí∞ Estimated Cost: $0.05
‚è±Ô∏è  Estimated Time: 2 minutes


## Step 1: Load RAG Evaluation Test Suite

We'll use the 500-case test suite generated in `lesson-13/data/rag_evaluation_suite.json`.

In [4]:
# Load test suite
test_suite_path = Path("data/rag_evaluation_suite.json")

if not test_suite_path.exists():
    raise FileNotFoundError(f"Test suite not found at {test_suite_path}")

with open(test_suite_path, "r", encoding="utf-8") as f:
    test_suite = json.load(f)

all_test_cases = test_suite["test_cases"]

# Sample based on mode
import random

random.seed(42)  # Reproducibility
test_cases = random.sample(all_test_cases, min(N_SAMPLES, len(all_test_cases)))

print(f"‚úÖ Loaded {len(all_test_cases)} test cases from suite")
print(f"üìã Selected {len(test_cases)} test cases for {MODE} mode")
print("\nüìä Test Suite Statistics:")
for key, value in test_suite["statistics"].items():
    print(f"   {key}: {value}")

‚úÖ Loaded 500 test cases from suite
üìã Selected 20 test cases for DEMO mode

üìä Test Suite Statistics:
   total_cases: 500
   gita_samples: 200
   recipe_samples: 200
   adversarial_samples: 100
   attribution_pass: 400
   attribution_fail: 100
   hallucination_none: 400
   hallucination_intrinsic: 40
   hallucination_extrinsic: 60


## Step 2: Initialize Attribution Detector

The `AttributionDetector` extracts claims from responses and verifies them against context.

In [5]:
# Initialize detector
detector = AttributionDetector()

print("‚úÖ AttributionDetector initialized")
print("\nDetector Methods:")
print("  - extract_claims(): Extract atomic claims from response")
print("  - verify_attribution(): Check claims against context")
print("  - calculate_attribution_rate(): Calculate overall attribution rate")

‚úÖ AttributionDetector initialized

Detector Methods:
  - extract_claims(): Extract atomic claims from response
  - verify_attribution(): Check claims against context
  - calculate_attribution_rate(): Calculate overall attribution rate


## Step 3: Example - Single Case Attribution

Let's see how attribution detection works on a single example.

In [6]:
# Example: GOOD attribution (from Gita dataset)
example_case = {
    "query": "What is the main teaching of the Bhagavad Gita?",
    "context": ["The Bhagavad Gita teaches dharma (duty), karma (action), and moksha (liberation)."],
    "answer": "The main teaching is dharma, karma, and moksha."
}

print("üìñ Example Case: GOOD Attribution\n")
print(f"Query: {example_case['query']}")
print(f"\nContext: {example_case['context'][0]}")
print(f"\nAnswer: {example_case['answer']}")

# Extract claims
claims = detector.extract_claims(example_case['answer'])
print(f"\n‚úÖ Extracted Claims: {claims}")

# Verify attribution
result = detector.verify_attribution(claims, example_case['context'])
print(f"\nüìä Attribution Scores: {result['attribution_scores']}")

# Calculate rate
rate = sum(result['attribution_scores']) / len(result['attribution_scores']) if result['attribution_scores'] else 0.0
print(f"üìà Attribution Rate: {rate * 100:.1f}%")

üìñ Example Case: GOOD Attribution

Query: What is the main teaching of the Bhagavad Gita?

Context: The Bhagavad Gita teaches dharma (duty), karma (action), and moksha (liberation).

Answer: The main teaching is dharma, karma, and moksha.

‚úÖ Extracted Claims: ['The main teaching is dharma, karma, and moksha']

üìä Attribution Scores: [False]
üìà Attribution Rate: 0.0%


In [7]:
# Example: BAD attribution (unattributed claim)
bad_example = {
    "query": "What is the Bhagavad Gita?",
    "context": ["The Bhagavad Gita is a Hindu scripture."],
    "answer": "The Bhagavad Gita is a Hindu scripture written in 300 BC by sage Vyasa in the Himalayas."
}

print("‚ö†Ô∏è  Example Case: BAD Attribution (Hallucination)\n")
print(f"Query: {bad_example['query']}")
print(f"\nContext: {bad_example['context'][0]}")
print(f"\nAnswer: {bad_example['answer']}")

# Extract claims
claims = detector.extract_claims(bad_example['answer'])
print(f"\n‚úÖ Extracted Claims: {claims}")

# Verify attribution
result = detector.verify_attribution(claims, bad_example['context'])
print(f"\nüìä Attribution Scores: {result['attribution_scores']}")
print("\n‚ùå Issues: '300 BC', 'Vyasa', 'Himalayas' are NOT in context!")

# Calculate rate
rate = sum(result['attribution_scores']) / len(result['attribution_scores']) if result['attribution_scores'] else 0.0
print(f"üìà Attribution Rate: {rate * 100:.1f}%")

‚ö†Ô∏è  Example Case: BAD Attribution (Hallucination)

Query: What is the Bhagavad Gita?

Context: The Bhagavad Gita is a Hindu scripture.

Answer: The Bhagavad Gita is a Hindu scripture written in 300 BC by sage Vyasa in the Himalayas.

‚úÖ Extracted Claims: ['The Bhagavad Gita is a Hindu scripture written in 300 BC by sage Vyasa in the Himalayas']

üìä Attribution Scores: [False]

‚ùå Issues: '300 BC', 'Vyasa', 'Himalayas' are NOT in context!
üìà Attribution Rate: 0.0%


## Step 4: Batch Evaluation on Test Suite

Now we'll evaluate attribution across all test cases in our sample.

In [8]:
# Batch evaluation
from tqdm import tqdm

results = []
errors = 0

print(f"üîÑ Evaluating attribution for {len(test_cases)} test cases...\n")

for test_case in tqdm(test_cases, desc="Attribution Detection"):
    try:
        # Extract fields
        answer = test_case.get("answer", "")
        context = test_case.get("context", [])
        ground_truth = test_case.get("labels", {}).get("is_attributed", None)
        
        # Evaluate attribution
        claims = detector.extract_claims(answer)
        attribution_result = detector.verify_attribution(claims, context)
        
        # Calculate attribution rate for this case
        attribution_scores = attribution_result["attribution_scores"]
        attribution_rate = (
            sum(attribution_scores) / len(attribution_scores)
            if attribution_scores
            else 0.0
        )
        
        # Classify as PASS/FAIL (>0.7 = attributed)
        is_attributed = attribution_rate > 0.7
        
        # Store result
        results.append({
            "id": test_case.get("id"),
            "source": test_case.get("source"),
            "query": test_case.get("query"),
            "answer": answer,
            "claims": claims,
            "attribution_rate": attribution_rate,
            "is_attributed": is_attributed,
            "ground_truth": ground_truth,
            "correct_classification": is_attributed == ground_truth if ground_truth is not None else None,
        })
        
    except Exception as e:
        errors += 1
        print(f"‚ö†Ô∏è  Error on test case {test_case.get('id')}: {e}")

print("\n‚úÖ Evaluation complete!")
print(f"   Total cases: {len(results)}")
print(f"   Errors: {errors}")

üîÑ Evaluating attribution for 20 test cases...



Attribution Detection: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 20/20 [00:00<00:00, 60393.15it/s]


‚úÖ Evaluation complete!
   Total cases: 20
   Errors: 0





## Step 5: Analyze Results

Let's compute metrics and analyze performance.

In [9]:
# Calculate metrics
total_cases = len(results)
attributed_cases = sum(1 for r in results if r["is_attributed"])
unattributed_cases = total_cases - attributed_cases

# Overall attribution rate
overall_attribution_rate = sum(r["attribution_rate"] for r in results) / total_cases if total_cases > 0 else 0.0

# Accuracy (if ground truth available)
cases_with_ground_truth = [r for r in results if r["correct_classification"] is not None]
correct_classifications = sum(1 for r in cases_with_ground_truth if r["correct_classification"])
accuracy = (
    correct_classifications / len(cases_with_ground_truth)
    if cases_with_ground_truth
    else 0.0
)

# By source (Gita vs recipes vs adversarial)
source_breakdown = {}
for result in results:
    source = result["source"]
    if source not in source_breakdown:
        source_breakdown[source] = {"total": 0, "attributed": 0}
    source_breakdown[source]["total"] += 1
    if result["is_attributed"]:
        source_breakdown[source]["attributed"] += 1

# Print metrics
print("üìä ATTRIBUTION DETECTION RESULTS")
print("=" * 50)
print("\nüìà Overall Metrics:")
print(f"   Total test cases: {total_cases}")
print(f"   Attributed: {attributed_cases} ({attributed_cases / total_cases * 100:.1f}%)")
print(f"   Unattributed: {unattributed_cases} ({unattributed_cases / total_cases * 100:.1f}%)")
print(f"   Avg Attribution Rate: {overall_attribution_rate * 100:.1f}%")
print(f"   Detection Accuracy: {accuracy * 100:.1f}% (vs ground truth)")

print("\nüìÇ Breakdown by Source:")
for source, stats in source_breakdown.items():
    rate = stats["attributed"] / stats["total"] * 100 if stats["total"] > 0 else 0
    print(f"   {source}: {stats['attributed']}/{stats['total']} ({rate:.1f}%)")

# Success criteria check
print("\n‚úÖ SUCCESS CRITERIA:")
print("   Target: 80%+ attribution detection accuracy")
print(f"   Achieved: {accuracy * 100:.1f}%")
if accuracy >= 0.80:
    print("   ‚úÖ PASS: Meets success criteria!")
else:
    print("   ‚ö†Ô∏è  NEEDS IMPROVEMENT: Below 80% threshold")

üìä ATTRIBUTION DETECTION RESULTS

üìà Overall Metrics:
   Total test cases: 20
   Attributed: 12 (60.0%)
   Unattributed: 8 (40.0%)
   Avg Attribution Rate: 60.0%
   Detection Accuracy: 65.0% (vs ground truth)

üìÇ Breakdown by Source:
   recipes: 0/7 (0.0%)
   bhagavad_gita: 12/12 (100.0%)
   adversarial: 0/1 (0.0%)

‚úÖ SUCCESS CRITERIA:
   Target: 80%+ attribution detection accuracy
   Achieved: 65.0%
   ‚ö†Ô∏è  NEEDS IMPROVEMENT: Below 80% threshold


## Step 6: Inspect Failure Cases

Let's examine cases where attribution detection failed.

In [10]:
# Find failure cases (incorrect classification vs ground truth)
failures = [r for r in results if r["correct_classification"] is False]

print(f"‚ùå Attribution Detection Failures: {len(failures)}\n")

# Show top 5 failures
for i, failure in enumerate(failures[:5]):
    print(f"\n{'=' * 60}")
    print(f"Failure #{i + 1}: {failure['id']}")
    print(f"Source: {failure['source']}")
    print(f"\nQuery: {failure['query']}")
    print(f"\nAnswer: {failure['answer'][:200]}..." if len(failure['answer']) > 200 else f"\nAnswer: {failure['answer']}")
    print(f"\nClaims: {failure['claims']}")
    print(f"\nAttribution Rate: {failure['attribution_rate'] * 100:.1f}%")
    print(f"Predicted: {'ATTRIBUTED' if failure['is_attributed'] else 'UNATTRIBUTED'}")
    print(f"Ground Truth: {'ATTRIBUTED' if failure['ground_truth'] else 'UNATTRIBUTED'}")

if len(failures) == 0:
    print("‚úÖ No failures detected! Perfect attribution detection.")

‚ùå Attribution Detection Failures: 7


Failure #1: recipe_127
Source: recipes

Query: How do I make this dish?

Answer: To make this dish, you'll need: acorn squash, fresh ground pepper, nutmeg. Follow the recipe instructions.

Claims: ["To make this dish, you'll need: acorn squash, fresh ground pepper, nutmeg", 'Follow the recipe instructions']

Attribution Rate: 0.0%
Predicted: UNATTRIBUTED
Ground Truth: ATTRIBUTED

Failure #2: recipe_179
Source: recipes

Query: What is the recipe for this dish?

Answer: To make this dish, you'll need: whole turkey, extra virgin olive oil, salt. Follow the recipe instructions.

Claims: ["To make this dish, you'll need: whole turkey, extra virgin olive oil, salt", 'Follow the recipe instructions']

Attribution Rate: 0.0%
Predicted: UNATTRIBUTED
Ground Truth: ATTRIBUTED

Failure #3: recipe_177
Source: recipes

Query: What is the recipe for this dish?

Answer: To make this dish, you'll need: ground beef, bacon, cheese. Follow the recipe instructions.



## Step 7: Export Results for Dashboard

Save results to JSON for integration with evaluation dashboard.

In [None]:
# Prepare export data
export_data = {
    "metadata": {
        "lesson": "Lesson 13 - RAG Generation & Attribution",
        "notebook": "attribution_detection.ipynb",
        "mode": MODE,
        "timestamp": datetime.now().isoformat(),
        "test_cases": len(results),
    },
    "metrics": {
        "overall_attribution_rate": overall_attribution_rate,
        "detection_accuracy": accuracy,
        "attributed_cases": attributed_cases,
        "unattributed_cases": unattributed_cases,
        "source_breakdown": source_breakdown,
    },
    "results": results,
}

# Save to results directory
output_path = Path("results/attribution_results.json")
output_path.parent.mkdir(parents=True, exist_ok=True)

with open(output_path, "w", encoding="utf-8") as f:
    json.dump(export_data, f, indent=2, ensure_ascii=False)

print(f"‚úÖ Results exported to {output_path}")
print(f"üìä File size: {output_path.stat().st_size / 1024:.1f} KB")
print("\nüéØ Ready for dashboard integration!")

## Summary

**What we learned:**
1. How to extract atomic claims from LLM responses
2. How to verify claims against retrieved context
3. How to measure attribution rate across test cases
4. How to identify attribution failures and hallucinations

**Key Insights:**
- Attribution detection accuracy depends on claim extraction quality
- Simple substring matching works for exact attribution
- Semantic similarity needed for paraphrased claims
- Gita Q&A has higher attribution rates (structured answers)
- Recipe responses may have lower attribution (LLM adds instructions)

**Next Steps:**
- Try [Context Utilization Notebook](./context_utilization.ipynb)
- Read [Hallucination Detection Tutorial](./hallucination_detection_rag.md)
- Explore [End-to-End RAG Evaluation](./end_to_end_rag_eval.md)