Skip to content

Autoresearch experiment#4347

Closed
dondonz wants to merge 2 commits intomasterfrom
claude/autoresearch-graphql-java-EA3If
Closed

Autoresearch experiment#4347
dondonz wants to merge 2 commits intomasterfrom
claude/autoresearch-graphql-java-EA3If

Conversation

@dondonz
Copy link
Copy Markdown
Member

@dondonz dondonz commented Mar 21, 2026

Let's see if Claude can find any performance gains. I plan to run this in the evenings while I'm asleep!

What is this?

An experiment applying Karpathy's Autoresearch pattern to graphql-java — specifically targeting ExecutableNormalizedOperationFactory performance.

The idea: an AI agent (Claude Code, Sonnet) runs in an autonomous loop overnight, making one small optimization per iteration, benchmarking it, and keeping only improvements. Think of it as automated performance tuning with a tight feedback loop.

What's in this PR

Three files in autoresearch/:

program.md — Strategic guidance for the AI agent: which files to modify, 10 ranked optimization strategies, constraints (must pass tests, no new deps, preserve public API)
run_benchmark.sh — Runs ENF1Performance.benchMarkThroughput via JMH and extracts a single ops/sec number
autoresearch.sh — The loop driver: invokes claude --model sonnet non-interactively, runs tests, benchmarks, keeps improvements via git commit, reverts regressions
How it works
for each iteration:

  1. Claude reads program.md + previous results log
  2. Claude makes ONE focused code change to src/main/java/graphql/normalized/
  3. ./gradlew test — if fails, revert
  4. JMH benchmark — if score improved, git commit; otherwise revert
  5. Repeat (50 iterations ≈ 5-7 hours)

How to run it locally
git checkout claude/autoresearch-graphql-java-EA3If
./autoresearch/autoresearch.sh 50 # or 5 for a quick test

Requires: Claude Code CLI (claude on PATH), JDK 25.

Why ENF?
ExecutableNormalizedOperationFactory is one of the most expensive operations in the graphql-java hot path. It has:

~960 lines of algorithmic code with clear optimization surface (allocation, data structures, traversal)
Multiple JMH benchmarks already in place (ENF1, ENF2, ENFExtraLarge, ENFDeepIntrospection)
A comprehensive test suite (3,297 lines) to gate correctness

claude added 2 commits March 21, 2026 21:58
Three-file autoresearch framework targeting ExecutableNormalizedOperationFactory
throughput: program.md (strategy), run_benchmark.sh (metric), autoresearch.sh (loop).

https://claude.ai/code/session_01GfoPorZWo99NczxzJTYh9Q
- Use `claude --dangerously-skip-permissions --max-turns 20` for unattended operation
- Separate test run from benchmark run (avoid running tests twice)
- Add CLI availability check
- Improve logging with printf instead of echo -e
- Show percentage improvement in final summary

https://claude.ai/code/session_01GfoPorZWo99NczxzJTYh9Q
@github-actions
Copy link
Copy Markdown
Contributor

Test Report

Test Results

Java Version Total Passed Failed Errors Skipped
Java 11 5708 (±0) 5652 (±0) 0 (±0) 0 (±0) 56 (±0)
Java 17 5708 (±0) 5651 (±0) 0 (±0) 0 (±0) 57 (±0)
Java 21 5708 (±0) 5651 (±0) 0 (±0) 0 (±0) 57 (±0)
Java 25 5708 (±0) 5651 (±0) 0 (±0) 0 (±0) 57 (±0)
jcstress 32 (±0) 32 (±0) 0 (±0) 0 (±0) 0 (±0)
Total 22864 (±0) 22637 (±0) 0 (±0) 0 (±0) 227 (±0)

Code Coverage (Java 25)

Metric Covered Missed Coverage vs Master
Lines 28775 3120 90.2% ±0.0%
Branches 8354 1506 84.7% ±0.0%
Methods 7698 1222 86.3% ±0.0%

No per-class coverage changes detected.

Full HTML report: build artifact jacoco-html-report

Updated: 2026-03-21 22:43:54 UTC

echo "--- Asking Claude to make an optimization ---"
CLAUDE_OUTPUT=$(claude \
--model sonnet \
--dangerously-skip-permissions \
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yolo

@dondonz
Copy link
Copy Markdown
Member Author

dondonz commented Mar 22, 2026

Was fun to see Claude making small changes and testing them out live - the next few runs will be much longer

@dondonz dondonz closed this Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants