# LangSmith Trace Performance Analysis

This notebook analyzes LangSmith trace exports to understand:
1. **Latency Distribution**: p50/p95/p99 metrics, outlier detection
2. **Bottleneck Identification**: Which nodes consume the most time
3. **Parallel Execution Verification**: Do validators run in parallel?

**Input**: JSON export file from `export_langsmith_traces.py`  
**Output**: CSV files with analysis results

## 1. Setup and Imports

In [None]:
import sys
sys.path.append("..")

from analyze_traces import (
    load_from_json,
    analyze_latency_distribution,
    identify_bottlenecks,
    verify_parallel_execution,
)
from pathlib import Path

## 2. Load Trace Data

Load the exported JSON file. Update the file path below to point to your export.

In [None]:
# Update this path to your export file
export_file = "../sample_traces_export.json"

print(f"Loading trace data from: {export_file}")
dataset = load_from_json(export_file)

print("\nLoaded:")
print(f"  - Workflows: {len(dataset.workflows)}")
print(f"  - Orphan traces: {len(dataset.orphan_traces)}")
print(f"  - Hierarchical data: {dataset.is_hierarchical}")

if dataset.workflows:
    print("\nSample workflow:")
    sample = dataset.workflows[0]
    print(f"  - Root: {sample.root_trace.name}")
    print(f"  - Duration: {sample.total_duration/60:.1f} minutes")
    print(f"  - Nodes: {list(sample.nodes.keys())}")

## 3. Latency Distribution Analysis

Calculate percentile metrics and identify outliers.

In [None]:
latency_dist = analyze_latency_distribution(dataset.workflows)

print("Latency Distribution Results:")
print(f"  p50 (median): {latency_dist.p50_minutes:.1f} minutes")
print(f"  p95: {latency_dist.p95_minutes:.1f} minutes")
print(f"  p99: {latency_dist.p99_minutes:.1f} minutes")
print(f"  Range: {latency_dist.min_minutes:.1f} - {latency_dist.max_minutes:.1f} minutes")
print(f"  Mean ± StdDev: {latency_dist.mean_minutes:.1f} ± {latency_dist.std_dev_minutes:.1f} minutes")
print("\nOutliers:")
print(f"  Above 23 min: {len(latency_dist.outliers_above_23min)} workflows")
print(f"  Below 7 min: {len(latency_dist.outliers_below_7min)} workflows")
print("\nClaim Validation:")
print(f"  % within 7-23 min range: {latency_dist.percent_within_7_23_claim:.1f}%")

## 4. Bottleneck Identification

Identify which nodes consume the most time across workflows.

In [None]:
bottleneck_analysis = identify_bottlenecks(dataset.workflows)

print("Bottleneck Analysis Results:")
print(f"  Primary bottleneck: {bottleneck_analysis.primary_bottleneck}")
print(f"  Top 3 bottlenecks: {', '.join(bottleneck_analysis.top_3_bottlenecks)}")
print("\nNode Performance Details:")
print(f"{'Node Name':<30} {'Exec Count':<12} {'Avg Duration':<15} {'% of Workflow':<15}")
print("-" * 75)

for node in bottleneck_analysis.node_performances[:10]:  # Top 10
    print(
        f"{node.node_name:<30} {node.execution_count:<12} "
        f"{node.avg_duration_seconds:>8.1f}s {node.avg_percent_of_workflow:>13.1f}%"
    )

## 5. Parallel Execution Verification

Verify if validator nodes execute in parallel and calculate time savings.

In [None]:
parallel_evidence = verify_parallel_execution(dataset.workflows)

print("Parallel Execution Verification Results:")
print(f"  Verdict: {'PARALLEL' if parallel_evidence.is_parallel else 'SEQUENTIAL'}")
print(f"  Confidence: {parallel_evidence.confidence.upper()}")
print("\nWorkflow Counts:")
print(f"  Parallel workflows: {parallel_evidence.parallel_confirmed_count}")
print(f"  Sequential workflows: {parallel_evidence.sequential_count}")
print("\nTiming Metrics:")
print(f"  Avg start time delta: {parallel_evidence.avg_start_time_delta_seconds:.1f}s")
print(f"  Avg sequential time: {parallel_evidence.avg_sequential_time_seconds:.1f}s")
print(f"  Avg parallel time: {parallel_evidence.avg_parallel_time_seconds:.1f}s")
print(f"  Avg time savings: {parallel_evidence.avg_time_savings_seconds:.1f}s ({parallel_evidence.avg_time_savings_seconds/60:.1f} min)")

## 6. Export Results to CSV

Save analysis results to CSV files for further analysis or reporting.

In [None]:
output_dir = Path("../output")
output_dir.mkdir(exist_ok=True)

# Export latency distribution
latency_csv_path = output_dir / "latency_distribution.csv"
with open(latency_csv_path, "w", encoding="utf-8") as f:
    f.write(latency_dist.to_csv())
print(f"✓ Exported latency distribution to: {latency_csv_path}")

# Export bottleneck analysis
bottleneck_csv_path = output_dir / "bottleneck_analysis.csv"
with open(bottleneck_csv_path, "w", encoding="utf-8") as f:
    f.write(bottleneck_analysis.to_csv())
print(f"✓ Exported bottleneck analysis to: {bottleneck_csv_path}")

# Export parallel execution evidence
parallel_csv_path = output_dir / "parallel_execution_analysis.csv"
with open(parallel_csv_path, "w", encoding="utf-8") as f:
    f.write(parallel_evidence.to_csv())
print(f"✓ Exported parallel execution analysis to: {parallel_csv_path}")

print(f"\n✓ All results exported to: {output_dir.absolute()}")

## Summary

Performance analysis complete! Check the `output/` directory for CSV files with detailed results.