AI Coding Benchmarks

Standardized test prompts for evaluating AI coding assistants. Reproducible, transparent, open.

By RunAICode.ai — making AI tool reviews verifiable.

Why This Exists

Most AI coding tool reviews are subjective. We're changing that. Every benchmark in this repo is a concrete, reproducible task that any AI tool can attempt, with clear pass/fail criteria.

Benchmark Categories

Code Generation — Write functions from descriptions
Refactoring — Improve existing code structure
Debugging — Find and fix bugs in provided code
Multi-file — Changes that span multiple files
Testing — Generate meaningful test suites
DevOps — Infrastructure and automation tasks

How We Score

Each benchmark includes:

Task description — What the AI needs to do
Input files — Starting code state
Expected output — What correct completion looks like
Evaluation criteria — How we score (correctness, code quality, speed)

Results

See our benchmark results page for the latest head-to-head comparisons.

Contributing

Submit new benchmarks via PR. Include all four components listed above.

License

MIT

Part of the RunAICode collection

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Coding Benchmarks

Why This Exists

Benchmark Categories

How We Score

Results

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AI Coding Benchmarks

Why This Exists

Benchmark Categories

How We Score

Results

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages