AST-Guided LLM Code Review Framework v2

A 5-stage pipeline for automated Python bug detection combining AST analysis with LLM reasoning.

Results

80-Case Synthetic Benchmark (Simulation Mode)

Precision: 100% | Recall: 94.3% | F1: 97.1%

12-Case Curated Real-World (Simulation Mode)

Precision: 100% | Recall: 92% | F1: 95.8%

90-Case Real-World (GPT-4)

Precision: 100% | Recall: 88% | F1: 93.6%

Quick Start

pip install openai --break-system-packages

# Simulation mode (free)
python -m src.main examples/buggy_ml_pipeline.py

# GPT-4 mode
export OPENAI_API_KEY="sk-..."
python -m src.main examples/buggy_ml_pipeline.py --use-llm

# Benchmarks
python -m src.paper_eval                               # 80-case synthetic
python -m data.run_real_eval --curated-only             # Real-world
python -m data.run_real_eval --curated-only --use-llm   # Real-world + GPT-4

Architecture

Source Code → [Parse] → [Detect] → [Slice] → [Review] → [Aggregate] → Findings
              Stage 1    Stage 2    Stage 3    Stage 4     Stage 5

Novel Contributions: Adaptive Slicing, Confidence Calibration (Platt scaling), Cross-Detector Correlation

6 Bug Categories: None-deref, Division-by-zero, Index-OOB, Data Leakage, Shape Mismatch, Resource Leak

39 unit tests, 92 benchmark cases, 25 Python files, ~8500 LOC

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bugsinpy_data		bugsinpy_data
data		data
examples		examples
knowledge_base		knowledge_base
src		src
tests		tests
README.md		README.md
fails.json		fails.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_benchmark.py		run_benchmark.py
run_bugsinpy.py		run_bugsinpy.py
run_real_eval.py		run_real_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AST-Guided LLM Code Review Framework v2

Results

80-Case Synthetic Benchmark (Simulation Mode)

12-Case Curated Real-World (Simulation Mode)

90-Case Real-World (GPT-4)

Quick Start

Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AST-Guided LLM Code Review Framework v2

Results

80-Case Synthetic Benchmark (Simulation Mode)

12-Case Curated Real-World (Simulation Mode)

90-Case Real-World (GPT-4)

Quick Start

Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages