# LangChain vs Pydantic-AI Framework Comparison

## Overview

This notebook compares LangChain and Pydantic-AI frameworks on the same symptom diagnosis classification task.

**Both frameworks**:
- Use prompt engineering (no training/optimization)
- Support zero-shot and few-shot approaches
- Work with Ollama LLMs
- Log execution traces to MLFlow

**Key differences**:
- **LangChain**: Template-based prompts with manual validation
- **Pydantic-AI**: Agent-based with built-in Pydantic validation and automatic retries

## Evaluation Criteria

1. **Accuracy**: Train/validation/test performance
2. **API Simplicity**: Code complexity and developer experience
3. **Type Safety**: IDE support and runtime validation
4. **Error Handling**: Robustness to invalid outputs

In [None]:
"""Import required modules for framework comparison."""
import pandas as pd
from symptom_diagnosis_explorer.commands.classify.tune import (
    TuneCommand,
    TuneRequest,
)
from symptom_diagnosis_explorer.commands.classify.evaluate import (
    EvaluateCommand,
    EvaluateRequest,
)
from symptom_diagnosis_explorer.models.model_development import FrameworkType

In [None]:
import nest_asyncio

# Apply nest_asyncio to allow nested event loops
# This fixes "RuntimeError: This event loop is already running" errors
# when using Pydantic-AI agents in Jupyter notebooks
nest_asyncio.apply()

In [None]:
"""Shared configuration for both frameworks."""

# Experiment configuration
PROJECT = "9-langchain"
MLFLOW_TRACKING_URI = "http://localhost:5001"

# Model configuration (same for both)
LM_MODEL = "ollama/qwen3:30b"

# Dataset configuration (same for both)
TEST_SIZE = 10

print("Configuration set")
print(f"LLM: {LM_MODEL}")
print(f"Dataset: {TEST_SIZE} test")

In [None]:
"""Run LangChain experiment."""

print("=" * 80)
print("LANGCHAIN EXPERIMENT")
print("=" * 80)

# Evaluate on test
langchain_eval_request = EvaluateRequest(
    framework=FrameworkType.LANGCHAIN,
    model_name="comparison-langchain",
    split="test",
    eval_size=TEST_SIZE,
    experiment_name=f"/symptom-diagnosis-explorer/{PROJECT}/comparison-langchain",
    experiment_project=PROJECT,
    mlflow_tracking_uri=MLFLOW_TRACKING_URI,
)
langchain_eval_command = EvaluateCommand()
langchain_eval = langchain_eval_command.execute(langchain_eval_request)

print(f"\nLangChain Results:")
print(f"  Test Accuracy:       {langchain_eval.accuracy:.4f}")

In [None]:
"""Run Pydantic-AI experiment."""

print("\n" + "=" * 80)
print("PYDANTIC-AI EXPERIMENT")
print("=" * 80)

# Evaluate on test
pydantic_ai_eval_request = EvaluateRequest(
    framework=FrameworkType.PYDANTIC_AI,
    model_name="comparison-pydantic-ai",
    split="test",
    eval_size=TEST_SIZE,
    experiment_name=f"/symptom-diagnosis-explorer/{PROJECT}/comparison-pydantic-ai",
    experiment_project=PROJECT,
    mlflow_tracking_uri=MLFLOW_TRACKING_URI,
)
pydantic_ai_eval_command = EvaluateCommand()
pydantic_ai_eval = pydantic_ai_eval_command.execute(pydantic_ai_eval_request)

print(f"\nPydantic-AI Results:")
print(f"  Test Accuracy:       {pydantic_ai_eval.accuracy:.4f}")

In [None]:
"""Compare framework results."""

print("\n" + "=" * 80)
print("FRAMEWORK COMPARISON")
print("=" * 80)

# Create comparison DataFrame
comparison_df = pd.DataFrame({
    "Framework": ["LangChain", "Pydantic-AI"],
    "Test Accuracy": [
        langchain_eval.accuracy,
        pydantic_ai_eval.accuracy,
    ],
})

print("\nPerformance Comparison:")
print(comparison_df.to_string(index=False))

print("\n" + "-" * 80)
print("Accuracy Differences:")
test_diff = pydantic_ai_eval.accuracy - langchain_eval.accuracy

print(f"  Test:       Pydantic-AI is {test_diff:+.4f} vs LangChain")

## Analysis

### Performance
Both frameworks use the same underlying LLM and similar prompting strategies, so accuracy differences are typically small and may vary based on:
- Prompt template differences
- Output parsing/validation approaches
- Retry behavior on invalid outputs

### Developer Experience

**LangChain**:
- ✅ Mature ecosystem with many integrations
- ✅ Extensive documentation and community support
- ❌ Manual output validation and parsing
- ❌ Template strings can be harder to maintain
- ❌ Limited type safety

**Pydantic-AI**:
- ✅ Full type safety with Pydantic models
- ✅ Built-in validation with `ModelRetry`
- ✅ Cleaner system prompts via decorators
- ✅ Better IDE autocomplete and type checking
- ✅ More reliable structured output (native mode)
- ❌ Newer framework with smaller ecosystem

### When to Use Each

**Choose LangChain if**:
- You need extensive integrations (vector stores, agents, tools)
- You prefer a mature, battle-tested framework
- You're already familiar with LangChain patterns

**Choose Pydantic-AI if**:
- Type safety and validation are critical
- You want simpler, more maintainable code
- You value modern Python patterns (decorators, type hints)
- Structured output reliability is important

### Conclusion

Both frameworks are excellent choices for prompt-based classification. Pydantic-AI offers better type safety and validation, while LangChain provides a more mature ecosystem. The choice depends on your specific requirements and preferences.