perf: increase default benchmark sizes for retrieval_core and graph_analytics#41
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 11 minutes and 44 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (12)
WalkthroughAdded an optional benchmark size override through the CLI, propagated Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI (`chuck/__main__.py`)
participant Orchestrator as run_benchmarks (`chuck/benchmark.py`)
participant Runner as Benchmark Runner (`chuck/benchmarks/*`)
participant Task as TASK_SPEC / benchmark_task (`chuck/tasks/*`)
CLI->>Orchestrator: run_benchmarks(task?, size?)
Orchestrator->>Runner: runner(size=size)
Runner->>Task: benchmark_task(TASK_SPEC, seed, size=size)
Task->>Task: generate(...) / solve(...)
Task-->>Runner: results dict
Runner-->>Orchestrator: runner result
Orchestrator-->>CLI: aggregated results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@Aaryan-Dadu , Kindly review this |
|
Could you please run the A/B comparison workflow for the same commit? |
|
|
Could you make the benchmark size configurable? By default keep it at |
One more thing, please add screenshot for the same python backend for the same commit, just create two snapshots at same state and compare those |
Please work on this too |
|
And please add Fixes #2 in your PR description to link the issue to the PR. |
On it. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@chuck/__main__.py`:
- Around line 40-45: The --size argument currently allows zero and negative
values; change validation so only positive integers are accepted by providing a
custom argparse type (e.g., a function named positive_int that parses int(x) and
raises argparse.ArgumentTypeError if value <= 0) and use it in
bench_parser.add_argument("--size", type=positive_int, ...); alternatively
perform the same check after parsing and raise a clear error/exit if args.size
is not None and args.size <= 0. Ensure the unique symbol
bench_parser.add_argument("--size", ...) is updated to reference the validator
(or the post-parse check validates args.size).
In `@chuck/benchmark.py`:
- Around line 46-49: The function run_benchmarks has trailing whitespace on the
blank/closing line causing pre-commit failures; open the run_benchmarks
definition and remove any trailing spaces (e.g., after the return line or the
subsequent blank line) so the block contains no trailing whitespace—check the
run_benchmarks(...) signature and the return [runner(size=size) for runner in
RUNNERS] line and ensure they end cleanly with no extra spaces.
In `@chuck/benchmarks/io_pipeline/__init__.py`:
- Around line 9-11: The run entrypoint currently forwards any integer size to
benchmark_task (TASK_SPEC) allowing 0 or negative sizes; add a guard in
run(size: int | None = None) that if size is not None and size <= 0 it raises a
ValueError with a clear message (or alternatively add the same check inside
benchmark_task) so only None or positive sizes are dispatched to benchmark_task.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 821c69f2-1f42-4104-94a3-267ccebe0f78
📒 Files selected for processing (12)
chuck/__main__.pychuck/benchmark.pychuck/benchmarks/compute_core/__init__.pychuck/benchmarks/data_encoding/__init__.pychuck/benchmarks/graph_analytics/__init__.pychuck/benchmarks/io_pipeline/__init__.pychuck/benchmarks/memory_index/__init__.pychuck/benchmarks/memory_tier/__init__.pychuck/benchmarks/ordering_core/__init__.pychuck/benchmarks/prime_analytics/__init__.pychuck/benchmarks/relational_fusion/__init__.pychuck/benchmarks/retrieval_core/__init__.py
| bench_parser.add_argument( | ||
| "--size", | ||
| type=int, | ||
| default=None, | ||
| help="Override the default benchmark size" | ||
| ) |
There was a problem hiding this comment.
Validate --size as a positive integer.
Line 42 currently accepts 0 and negative values, which can produce invalid benchmark payload sizes.
Proposed fix
+def _positive_int(value: str) -> int:
+ parsed = int(value)
+ if parsed <= 0:
+ raise argparse.ArgumentTypeError("--size must be a positive integer")
+ return parsed
+
...
bench_parser.add_argument(
- "--size",
- type=int,
- default=None,
- help="Override the default benchmark size"
+ "--size",
+ type=_positive_int,
+ default=None,
+ help="Override the default benchmark size",
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@chuck/__main__.py` around lines 40 - 45, The --size argument currently allows
zero and negative values; change validation so only positive integers are
accepted by providing a custom argparse type (e.g., a function named
positive_int that parses int(x) and raises argparse.ArgumentTypeError if value
<= 0) and use it in bench_parser.add_argument("--size", type=positive_int, ...);
alternatively perform the same check after parsing and raise a clear error/exit
if args.size is not None and args.size <= 0. Ensure the unique symbol
bench_parser.add_argument("--size", ...) is updated to reference the validator
(or the post-parse check validates args.size).
| def run() -> dict[str, Any]: | ||
| return benchmark_task(TASK_SPEC, seed=1_001) | ||
| def run(size: int | None = None) -> dict[str, Any]: | ||
| # We pass the size override into the core benchmark_task logic |
| def run() -> dict[str, Any]: | ||
| return benchmark_task(TASK_SPEC, seed=1_009) | ||
| def run(size: int | None = None) -> dict[str, Any]: | ||
| return benchmark_task(TASK_SPEC, seed=1_009,size=size) |
There was a problem hiding this comment.
returning size=None? will that work?
There was a problem hiding this comment.
Yes, In the benchmark_task function I have added a conditional statement if size is none it defaults to default benchmark_size
| type=int, | ||
| default=None, | ||
| help="Override the default benchmark size" | ||
| ) |
There was a problem hiding this comment.
Please show the demo for this
|
CI checks are failing, resolve them too |
1216ad7 to
4542919
Compare
4542919 to
403c704
Compare
|
I am happy with the changes now! Thanks for your contribution |





Description
This PR adressses Issue #2.
The previous default sizes for
retrieval_coreandgraph_analyticsresulted in execution times under 10ms. At this scale, OS scheduler noise caused a variance (Delta) of over 35%, making the benchmarks unreliable for performance comparison.I have increased the default TASK_SPEC sizes so that the execution times are ~3.2 and ~0.99 seconds for
graph_analysisandretrieval_corerepectively.Also, the benchmark size is now configurable from the CLI using the --size argument.
Stability Benchmark Results
Data collected over 50 iterations per task using a custom tracking script to measure Relative Standard Deviation (RSD) and Maximum Delta.
graph_analyticsretrieval_coreretrival_core
graph_analysis
Fixes #2
Summary by CodeRabbit
New Features
Chores