A 5-stage pipeline for automated Python bug detection combining AST analysis with LLM reasoning.
- Precision: 100% | Recall: 94.3% | F1: 97.1%
- Precision: 100% | Recall: 92% | F1: 95.8%
- Precision: 100% | Recall: 88% | F1: 93.6%
pip install openai --break-system-packages
# Simulation mode (free)
python -m src.main examples/buggy_ml_pipeline.py
# GPT-4 mode
export OPENAI_API_KEY="sk-..."
python -m src.main examples/buggy_ml_pipeline.py --use-llm
# Benchmarks
python -m src.paper_eval # 80-case synthetic
python -m data.run_real_eval --curated-only # Real-world
python -m data.run_real_eval --curated-only --use-llm # Real-world + GPT-4Source Code → [Parse] → [Detect] → [Slice] → [Review] → [Aggregate] → Findings
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
Novel Contributions: Adaptive Slicing, Confidence Calibration (Platt scaling), Cross-Detector Correlation
6 Bug Categories: None-deref, Division-by-zero, Index-OOB, Data Leakage, Shape Mismatch, Resource Leak
39 unit tests, 92 benchmark cases, 25 Python files, ~8500 LOC