A self-correcting Python code generation agent built with LangGraph. It plans a solution, writes implementation and tests, executes the tests in a sandboxed subprocess, and iterates on failures — up to 5 times — until all tests pass.
This is the core loop behind tools like Devin and Cursor's AI editing: generate → execute → observe → fix.
START
│
▼
plan_solution Think through the approach and edge cases before coding
│
▼
generate_code Write implementation + pytest tests in one file
│
▼
execute_and_test Run pytest in an isolated subprocess (10s timeout)
│
├── all tests pass ──────────────────────► write_explanation → END
│
└── tests fail, attempts < 5 ──► fix_code → execute_and_test (loop)
│
└── tests fail, max attempts ──────────► write_explanation → END
Sandboxed execution: generated code runs in a subprocess with a hard timeout and a minimal environment — the parent process is never affected.
State tracking: every code version is saved in code_history, so you can trace exactly how the agent debugged to the final solution.
git clone https://github.com/souvikghosh/coding-agent
cd coding-agent
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Add ANTHROPIC_API_KEY to .env# Single task
python main.py "Implement binary search on a sorted list"
python main.py "Write an LRU cache with get and put in O(1)"
python main.py "Group anagrams from a list of strings"
# Interactive mode — run multiple tasks
python main.py --interactive
# Demo: 3 tasks of increasing difficulty
python examples/demo_tasks.py═══════════════════════════════════════════════════════════
CODING AGENT
═══════════════════════════════════════════════════════════
Task: Implement binary search on a sorted list
Max fix attempts: 5
[Plan]
• Function takes a sorted list and target value
• Return index if found, -1 if not
• Handle empty list edge case
• Use left/right pointers, check midpoint each iteration
• Test: found, not found, empty, duplicates, single element
[Generate] Code written (42 lines)
[Execute] Running tests (attempt 1)...
[Execute] PASSED
═══════════════════════════════════════════════════════════
RESULT
═══════════════════════════════════════════════════════════
✓ PASSED — first try
Why plan before coding? Planning forces the LLM to think about edge cases before writing. This reduces the average number of fix iterations significantly — similar to how test-driven development reduces bugs.
Why subprocess over exec()?
exec() runs in the parent process — generated code can modify globals, import secrets from the environment, or crash the agent. subprocess is fully isolated.
Why include tests in the same file? Self-contained files are easier to debug and iterate on. The LLM can see tests and implementation together when fixing failures, which produces better targeted fixes.
- LangGraph — execution graph with conditional fix loop
- LangChain — LLM abstraction (Anthropic Claude / OpenAI GPT-4o)
- pytest — test runner inside the sandbox
- subprocess — safe code execution with timeout