Skip to content

souvikghosh/coding-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Coding Agent

A self-correcting Python code generation agent built with LangGraph. It plans a solution, writes implementation and tests, executes the tests in a sandboxed subprocess, and iterates on failures — up to 5 times — until all tests pass.

This is the core loop behind tools like Devin and Cursor's AI editing: generate → execute → observe → fix.


How it works

START
  │
  ▼
plan_solution          Think through the approach and edge cases before coding
  │
  ▼
generate_code          Write implementation + pytest tests in one file
  │
  ▼
execute_and_test       Run pytest in an isolated subprocess (10s timeout)
  │
  ├── all tests pass ──────────────────────► write_explanation → END
  │
  └── tests fail, attempts < 5 ──► fix_code → execute_and_test (loop)
  │
  └── tests fail, max attempts ──────────► write_explanation → END

Sandboxed execution: generated code runs in a subprocess with a hard timeout and a minimal environment — the parent process is never affected.

State tracking: every code version is saved in code_history, so you can trace exactly how the agent debugged to the final solution.


Setup

git clone https://github.com/souvikghosh/coding-agent
cd coding-agent
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Add ANTHROPIC_API_KEY to .env

Usage

# Single task
python main.py "Implement binary search on a sorted list"
python main.py "Write an LRU cache with get and put in O(1)"
python main.py "Group anagrams from a list of strings"

# Interactive mode — run multiple tasks
python main.py --interactive

# Demo: 3 tasks of increasing difficulty
python examples/demo_tasks.py

Example output

═══════════════════════════════════════════════════════════
  CODING AGENT
═══════════════════════════════════════════════════════════
  Task: Implement binary search on a sorted list
  Max fix attempts: 5

[Plan]
  • Function takes a sorted list and target value
  • Return index if found, -1 if not
  • Handle empty list edge case
  • Use left/right pointers, check midpoint each iteration
  • Test: found, not found, empty, duplicates, single element

[Generate] Code written (42 lines)
[Execute] Running tests (attempt 1)...
[Execute] PASSED

═══════════════════════════════════════════════════════════
  RESULT
═══════════════════════════════════════════════════════════
  ✓ PASSED — first try

Key design decisions

Why plan before coding? Planning forces the LLM to think about edge cases before writing. This reduces the average number of fix iterations significantly — similar to how test-driven development reduces bugs.

Why subprocess over exec()? exec() runs in the parent process — generated code can modify globals, import secrets from the environment, or crash the agent. subprocess is fully isolated.

Why include tests in the same file? Self-contained files are easier to debug and iterate on. The LLM can see tests and implementation together when fixing failures, which produces better targeted fixes.


Tech stack

  • LangGraph — execution graph with conditional fix loop
  • LangChain — LLM abstraction (Anthropic Claude / OpenAI GPT-4o)
  • pytest — test runner inside the sandbox
  • subprocess — safe code execution with timeout

About

Self-correcting Python coding agent: plan → generate → execute tests → fix → repeat. LangGraph loop with sandboxed subprocess execution. Supports Claude and GPT-4o.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors