chore: refresh JMH benchmark numbers + benchmark infra fixes (KOJAK-82)#46
Merged
endrju19 merged 6 commits intoMay 17, 2026
Merged
Conversation
# Conflicts: # README.md
…(KOJAK-82) Replaces the smoke-run numbers (fork=1, warmup=1, iter=2, n=2, scoreError=NaN) with a full publishable run (fork=2, warmup=3 × 10s, iter=5 × 30s, n=10). ## Headline (Kafka throughput, msg/s) | batchSize | Smoke | Full run | Improvement vs baseline | |-----------|----------|----------------|-------------------------| | 10 | ~1,468 | ~1,825 ± 70 | 16.7× (was 13.5×) | | 50 | ~3,731 | ~4,184 ± 140 | 36.3× (was 32.3×) | | 100 | ~4,717 | ~5,128 ± 105 | 44.6× (was 41.0×) | All Kafka throughput error bars <5% of score — multipliers now statistically defensible. The smoke-run numbers were directionally correct but slightly conservative; full-run shows the optimization is even better than initially claimed. ## Benchmark infrastructure fixes (needed to land the rerun) - okapi-benchmarks build.gradle.kts: bump JMH JVM heap to -Xmx8g (the previous default -Xmx2g OOMed inside throughput-mode microbenches) - okapi-benchmarks build.gradle.kts: pass -Dliquibase.duplicateFileMode=WARN (okapi-postgres.jar and the fat JMH jar both carry the changelog at the same path; Liquibase 4.x treats this as an error by default; the files are identical so WARN is safe) - DelivererMicroBenchmark.kt: subclass MockProducer to clear() history after every send. MockProducer retains every record sent for inspection; at ~1M ops/s for 30s × forks × iters that list grew to GBs and OOMed the JVM regardless of heap size. The fix discards history per call — microbench doesn't need to inspect what was sent. ## Files updated - benchmarks/kafka-deliverbatch.json: replaced with full-config results - benchmarks/results-kafka-deliverbatch.md: new Score +/- Error tables; removed "Statistical caveat" callout; tightened narrative; added HTTP companion table for full-run completeness - README.md: refreshed throughput table (1,470 -> 1,825 / 4,720 -> 5,130), improvement claim (13-41x -> 17-45x), JDK note (25 -> 21) Note on JDK delta: smoke run was on JDK 25.0.2 (anomaly - SDKMAN default shifted between runs); this full run is on JDK 21.0.7. CLAUDE.md target is JVM 21 so this matches what consumers will see. DelivererMicroBenchmark.kafkaDeliver still produces high-variance results (error > score) - JIT warmup interacts poorly with the Jackson-per-call deserialization. Not a blocker for KOJAK-82 (the throughput benchmarks are the publishable surface); a follow-up could switch the microbench to AverageTime mode or cache the deserialized DeliveryInfo.
Replaces previous JMH run results with a re-run under the same config (fork=2, warmup=3, iter=5). Kafka throughput numbers move <3% vs prior run — well within error bars — confirming reproducibility. Kafka throughput (msg/s): batchSize=10 → ~1,790 (was ~1,825) batchSize=50 → ~4,132 (was ~4,184) batchSize=100 → ~5,181 (was ~5,128) DelivererMicroBenchmark.kafkaDeliver now produces meaningful numbers (2.3M ± 19k ops/s — error <1%) thanks to the MockProducer.clear() fix shipped earlier in this PR. Previous run had error > score (benchmark was hitting GC pressure from the MockProducer leak before the fix).
…atch' into chore/kojak-82-full-jmh-rerun # Conflicts: # README.md
2 tasks
endrju19
added a commit
that referenced
this pull request
May 17, 2026
#48) ## Summary Three independent fixes that make \`./gradlew :okapi-benchmarks:jmh\` complete cleanly. Before these, the JMH run OOMs partway through. All three issues exist on main today; running the benchmark suite without these fixes will fail. No test or production code is touched — pure benchmark infrastructure. ## Fixes ### 1. Bump JMH JVM heap to \`-Xmx8g\` Throughput-mode microbenchmarks call \`deliver()\` at ~1M ops/s; each call allocates Jackson + Kotlin reflection state for JSON deserialization. At the previous default \`-Xmx2g\` the allocation rate exceeds GC throughput and OOMs within the first measurement iteration. ### 2. Pass \`-Dliquibase.duplicateFileMode=WARN\` as JMH JVM arg \`okapi-postgres.jar\` and the fat JMH jar both ship the changelog at the same classpath path (\`com/softwaremill/okapi/db/postgres/changelog.xml\`). Liquibase 4.x treats duplicate resources as an error by default, which aborts \`PostgresBenchmarkSupport\` setup. The two files are identical (same jar source on the classpath twice), so \`WARN\` is safe. ### 3. Subclass \`MockProducer\` in \`DelivererMicroBenchmark\` to \`clear()\` history after every \`send()\` \`MockProducer.history\` (internal \`sent\` list) retains every record sent for inspection — there is no eviction. In throughput mode at ~1M ops/s for 30s × forks × iterations that list grew to GBs and OOMed the JVM regardless of heap size. Discarding per call is safe because microbench doesn't inspect what was sent — only timing. With this fix, \`DelivererMicroBenchmark.kafkaDeliver\` now produces meaningful numbers (~2.3M ops/s ± <1%) instead of \`error > score\`. ## Files - \`okapi-benchmarks/build.gradle.kts\` — JVM args - \`okapi-benchmarks/src/jmh/kotlin/.../DelivererMicroBenchmark.kt\` — MockProducer override ## Why a separate PR These are pure infrastructure fixes — completely independent of any specific benchmark or transport implementation. Carved out from PR #46 (KOJAK-82) so they can land on main right away, without waiting for the Kafka deliverBatch (#40) review cycle. PR #46 will then contain only the refreshed JMH numbers. ## Test plan - [x] \`./gradlew :okapi-benchmarks:compileJmhKotlin\` passes - [x] Verified locally: full \`./gradlew :okapi-benchmarks:jmh\` run completes with \`BUILD SUCCESSFUL\` and no OOM
…atch' into chore/kojak-82-full-jmh-rerun
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Refreshes
benchmarks/kafka-deliverbatch.jsonandbenchmarks/results-kafka-deliverbatch.mdwith a full-config JMH run (fork=2, warmup=3, iter=5, n=10 samples per benchmark). Also lands three benchmark-infra fixes that were needed to make the run complete cleanly.Headline (Kafka throughput, msg/s)
All Kafka error bars <5% of score. Numbers reproduced across two independent runs (delta <3% between them).
Benchmark infrastructure fixes (also in this PR)
Without these, the JMH run cannot complete:
-Xmx8g. Throughput-mode microbenches were OOMing at the previous-Xmx2gbecause Jackson + Kotlin reflection allocate per call at ~1M ops/s rates.-Dliquibase.duplicateFileMode=WARNto JVM args. The fat JMH jar andokapi-postgres.jarboth ship the changelog at the same path; Liquibase 4.x treats this as an error by default. Files are identical so WARN is safe.MockProducer.historycleared after eachsend()inDelivererMicroBenchmark.MockProducerretains every record sent for inspection; at ~1M ops/s for 30s × forks × iters that list grew to GBs and OOMed the JVM regardless of heap size. Microbench doesn't need to inspect what was sent — discarding per call is safe. With this fix,DelivererMicroBenchmark.kafkaDelivernow produces meaningful numbers (2.3M ± 19k ops/s) instead oferror > score.Files touched
benchmarks/kafka-deliverbatch.json— full-config raw resultsbenchmarks/results-kafka-deliverbatch.md—Score ± Errortables + microbench section + HTTP companion tableREADME.md— refreshed throughput tableokapi-benchmarks/build.gradle.kts— heap bump + Liquibase JVM argokapi-benchmarks/src/jmh/kotlin/.../DelivererMicroBenchmark.kt— MockProducer overrideNotes
DelivererMicroBenchmark.httpDeliverbenchmarks the WireMock-local HTTP path; numbers are dominated by loopback TCP cost, not library overhead.Base branch
Based on
feature/kojak-73-kafka-deliver-batch(PR #40). Merge after #40.Test plan
./gradlew :okapi-benchmarks:jmh— completes withBUILD SUCCESSFUL, all benchmarks produce non-NaN error bars