Skip to content

runaicode/ai-coding-benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

AI Coding Benchmarks

Standardized test prompts for evaluating AI coding assistants. Reproducible, transparent, open.

By RunAICode.ai — making AI tool reviews verifiable.

Why This Exists

Most AI coding tool reviews are subjective. We're changing that. Every benchmark in this repo is a concrete, reproducible task that any AI tool can attempt, with clear pass/fail criteria.

Benchmark Categories

  • Code Generation — Write functions from descriptions
  • Refactoring — Improve existing code structure
  • Debugging — Find and fix bugs in provided code
  • Multi-file — Changes that span multiple files
  • Testing — Generate meaningful test suites
  • DevOps — Infrastructure and automation tasks

How We Score

Each benchmark includes:

  1. Task description — What the AI needs to do
  2. Input files — Starting code state
  3. Expected output — What correct completion looks like
  4. Evaluation criteria — How we score (correctness, code quality, speed)

Results

See our benchmark results page for the latest head-to-head comparisons.

Contributing

Submit new benchmarks via PR. Include all four components listed above.

License

MIT


Part of the RunAICode collection

About

Standardized test prompts and benchmarks for evaluating AI coding assistants. Reproducible, transparent, open.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors