Autoresearch experiment by dondonz · Pull Request #4347 · graphql-java/graphql-java

dondonz · 2026-03-21T22:34:46Z

Let's see if Claude can find any performance gains. I plan to run this in the evenings while I'm asleep!

What is this?

An experiment applying Karpathy's Autoresearch pattern to graphql-java — specifically targeting ExecutableNormalizedOperationFactory performance.

The idea: an AI agent (Claude Code, Sonnet) runs in an autonomous loop overnight, making one small optimization per iteration, benchmarking it, and keeping only improvements. Think of it as automated performance tuning with a tight feedback loop.

What's in this PR

Three files in autoresearch/:

program.md — Strategic guidance for the AI agent: which files to modify, 10 ranked optimization strategies, constraints (must pass tests, no new deps, preserve public API)
run_benchmark.sh — Runs ENF1Performance.benchMarkThroughput via JMH and extracts a single ops/sec number
autoresearch.sh — The loop driver: invokes claude --model sonnet non-interactively, runs tests, benchmarks, keeps improvements via git commit, reverts regressions
How it works
for each iteration:

Claude reads program.md + previous results log
Claude makes ONE focused code change to src/main/java/graphql/normalized/
./gradlew test — if fails, revert
JMH benchmark — if score improved, git commit; otherwise revert
Repeat (50 iterations ≈ 5-7 hours)

How to run it locally
git checkout claude/autoresearch-graphql-java-EA3If
./autoresearch/autoresearch.sh 50 # or 5 for a quick test

Requires: Claude Code CLI (claude on PATH), JDK 25.

Why ENF?
ExecutableNormalizedOperationFactory is one of the most expensive operations in the graphql-java hot path. It has:

~960 lines of algorithmic code with clear optimization surface (allocation, data structures, traversal)
Multiple JMH benchmarks already in place (ENF1, ENF2, ENFExtraLarge, ENFDeepIntrospection)
A comprehensive test suite (3,297 lines) to gate correctness

Three-file autoresearch framework targeting ExecutableNormalizedOperationFactory throughput: program.md (strategy), run_benchmark.sh (metric), autoresearch.sh (loop). https://claude.ai/code/session_01GfoPorZWo99NczxzJTYh9Q

- Use `claude --dangerously-skip-permissions --max-turns 20` for unattended operation - Separate test run from benchmark run (avoid running tests twice) - Add CLI availability check - Improve logging with printf instead of echo -e - Show percentage improvement in final summary https://claude.ai/code/session_01GfoPorZWo99NczxzJTYh9Q

github-actions · 2026-03-21T22:43:55Z

Test Report

Test Results

Java Version	Total	Passed	Failed	Errors	Skipped
Java 11	5708 (±0)	5652 (±0)	0 (±0)	0 (±0)	56 (±0)
Java 17	5708 (±0)	5651 (±0)	0 (±0)	0 (±0)	57 (±0)
Java 21	5708 (±0)	5651 (±0)	0 (±0)	0 (±0)	57 (±0)
Java 25	5708 (±0)	5651 (±0)	0 (±0)	0 (±0)	57 (±0)
jcstress	32 (±0)	32 (±0)	0 (±0)	0 (±0)	0 (±0)
Total	22864 (±0)	22637 (±0)	0 (±0)	0 (±0)	227 (±0)

Code Coverage (Java 25)

Metric	Covered	Missed	Coverage	vs Master
Lines	28775	3120	90.2%	±0.0%
Branches	8354	1506	84.7%	±0.0%
Methods	7698	1222	86.3%	±0.0%

No per-class coverage changes detected.

Full HTML report: build artifact jacoco-html-report

Updated: 2026-03-21 22:43:54 UTC

dondonz · 2026-03-22T03:50:14Z

autoresearch/autoresearch.sh

+    echo "--- Asking Claude to make an optimization ---"
+    CLAUDE_OUTPUT=$(claude \
+        --model sonnet \
+        --dangerously-skip-permissions \


dondonz · 2026-03-22T06:26:55Z

Was fun to see Claude making small changes and testing them out live - the next few runs will be much longer

claude added 2 commits March 21, 2026 21:58

Add autoresearch setup for ENF performance optimization

4e4b444

Three-file autoresearch framework targeting ExecutableNormalizedOperationFactory throughput: program.md (strategy), run_benchmark.sh (metric), autoresearch.sh (loop). https://claude.ai/code/session_01GfoPorZWo99NczxzJTYh9Q

dondonz commented Mar 22, 2026

View reviewed changes

dondonz closed this Mar 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoresearch experiment#4347

Autoresearch experiment#4347
dondonz wants to merge 2 commits intomasterfrom
claude/autoresearch-graphql-java-EA3If

dondonz commented Mar 21, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

dondonz Mar 22, 2026

Uh oh!

dondonz commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dondonz commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is this?

What's in this PR

Uh oh!

github-actions bot commented Mar 21, 2026

Test Report

Test Results

Code Coverage (Java 25)

Uh oh!

dondonz Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

dondonz commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dondonz commented Mar 21, 2026 •

edited

Loading