Closed
Conversation
Three-file autoresearch framework targeting ExecutableNormalizedOperationFactory throughput: program.md (strategy), run_benchmark.sh (metric), autoresearch.sh (loop). https://claude.ai/code/session_01GfoPorZWo99NczxzJTYh9Q
- Use `claude --dangerously-skip-permissions --max-turns 20` for unattended operation - Separate test run from benchmark run (avoid running tests twice) - Add CLI availability check - Improve logging with printf instead of echo -e - Show percentage improvement in final summary https://claude.ai/code/session_01GfoPorZWo99NczxzJTYh9Q
Contributor
Test ReportTest Results
Code Coverage (Java 25)
|
dondonz
commented
Mar 22, 2026
| echo "--- Asking Claude to make an optimization ---" | ||
| CLAUDE_OUTPUT=$(claude \ | ||
| --model sonnet \ | ||
| --dangerously-skip-permissions \ |
Member
Author
|
Was fun to see Claude making small changes and testing them out live - the next few runs will be much longer |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Let's see if Claude can find any performance gains. I plan to run this in the evenings while I'm asleep!
What is this?
An experiment applying Karpathy's Autoresearch pattern to graphql-java — specifically targeting ExecutableNormalizedOperationFactory performance.
The idea: an AI agent (Claude Code, Sonnet) runs in an autonomous loop overnight, making one small optimization per iteration, benchmarking it, and keeping only improvements. Think of it as automated performance tuning with a tight feedback loop.
What's in this PR
Three files in autoresearch/:
program.md — Strategic guidance for the AI agent: which files to modify, 10 ranked optimization strategies, constraints (must pass tests, no new deps, preserve public API)
run_benchmark.sh — Runs ENF1Performance.benchMarkThroughput via JMH and extracts a single ops/sec number
autoresearch.sh — The loop driver: invokes claude --model sonnet non-interactively, runs tests, benchmarks, keeps improvements via git commit, reverts regressions
How it works
for each iteration:
How to run it locally
git checkout claude/autoresearch-graphql-java-EA3If
./autoresearch/autoresearch.sh 50 # or 5 for a quick test
Requires: Claude Code CLI (claude on PATH), JDK 25.
Why ENF?
ExecutableNormalizedOperationFactory is one of the most expensive operations in the graphql-java hot path. It has:
~960 lines of algorithmic code with clear optimization surface (allocation, data structures, traversal)
Multiple JMH benchmarks already in place (ENF1, ENF2, ENFExtraLarge, ENFDeepIntrospection)
A comprehensive test suite (3,297 lines) to gate correctness