Coding Agent

A self-correcting Python code generation agent built with LangGraph. It plans a solution, writes implementation and tests, executes the tests in a sandboxed subprocess, and iterates on failures — up to 5 times — until all tests pass.

This is the core loop behind tools like Devin and Cursor's AI editing: generate → execute → observe → fix.

How it works

START
  │
  ▼
plan_solution          Think through the approach and edge cases before coding
  │
  ▼
generate_code          Write implementation + pytest tests in one file
  │
  ▼
execute_and_test       Run pytest in an isolated subprocess (10s timeout)
  │
  ├── all tests pass ──────────────────────► write_explanation → END
  │
  └── tests fail, attempts < 5 ──► fix_code → execute_and_test (loop)
  │
  └── tests fail, max attempts ──────────► write_explanation → END

Sandboxed execution: generated code runs in a subprocess with a hard timeout and a minimal environment — the parent process is never affected.

State tracking: every code version is saved in code_history, so you can trace exactly how the agent debugged to the final solution.

Setup

git clone https://github.com/souvikghosh/coding-agent
cd coding-agent
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Add ANTHROPIC_API_KEY to .env

Usage

# Single task
python main.py "Implement binary search on a sorted list"
python main.py "Write an LRU cache with get and put in O(1)"
python main.py "Group anagrams from a list of strings"

# Interactive mode — run multiple tasks
python main.py --interactive

# Demo: 3 tasks of increasing difficulty
python examples/demo_tasks.py

Example output

═══════════════════════════════════════════════════════════
  CODING AGENT
═══════════════════════════════════════════════════════════
  Task: Implement binary search on a sorted list
  Max fix attempts: 5

[Plan]
  • Function takes a sorted list and target value
  • Return index if found, -1 if not
  • Handle empty list edge case
  • Use left/right pointers, check midpoint each iteration
  • Test: found, not found, empty, duplicates, single element

[Generate] Code written (42 lines)
[Execute] Running tests (attempt 1)...
[Execute] PASSED

═══════════════════════════════════════════════════════════
  RESULT
═══════════════════════════════════════════════════════════
  ✓ PASSED — first try

Key design decisions

Why plan before coding? Planning forces the LLM to think about edge cases before writing. This reduces the average number of fix iterations significantly — similar to how test-driven development reduces bugs.

Why subprocess over exec()? exec() runs in the parent process — generated code can modify globals, import secrets from the environment, or crash the agent. subprocess is fully isolated.

Why include tests in the same file? Self-contained files are easier to debug and iterate on. The LLM can see tests and implementation together when fixing failures, which produces better targeted fixes.

Tech stack

LangGraph — execution graph with conditional fix loop
LangChain — LLM abstraction (Anthropic Claude / OpenAI GPT-4o)
pytest — test runner inside the sandbox
subprocess — safe code execution with timeout

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coding Agent

How it works

Setup

Usage

Example output

Key design decisions

Tech stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agent		agent
examples		examples
sandbox		sandbox
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Coding Agent

How it works

Setup

Usage

Example output

Key design decisions

Tech stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages