perf: increase default benchmark sizes for retrieval_core and graph_analytics by notdarking · Pull Request #41 · iiitl/chuck

notdarking · 2026-04-11T22:10:36Z

Description

This PR adressses Issue #2.
The previous default sizes for retrieval_core and graph_analytics resulted in execution times under 10ms. At this scale, OS scheduler noise caused a variance (Delta) of over 35%, making the benchmarks unreliable for performance comparison.

I have increased the default TASK_SPEC sizes so that the execution times are ~3.2 and ~0.99 seconds for graph_analysis and retrieval_core repectively.

Also, the benchmark size is now configurable from the CLI using the --size argument.

Stability Benchmark Results

Data collected over 50 iterations per task using a custom tracking script to measure Relative Standard Deviation (RSD) and Maximum Delta.

Task	Size (Old)	Size (New)	Mean Time	Delta (Max Variance)	RSD (Std%)
`graph_analytics`	1,000	200,000	~0.98s	9.5% (was 70%+)	2.16%
`retrieval_core`	2,000	2000,000	~3.28s	7.44% (was 40%+)	1.68%

retrival_core

Old	New

graph_analysis

Old	New

Fixes #2

Summary by CodeRabbit

New Features
- CLI bench command now accepts an optional --size to override benchmark size.
Chores
- Increased internal default sizes for analytics and retrieval tasks to support larger workloads.
- Benchmark entrypoints updated to accept and forward an optional size override for more flexible runs.

…iitl#2)

coderabbitai · 2026-04-11T22:10:52Z

Warning

Rate limit exceeded

@notdarking has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 44 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 11 minutes and 44 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 653a6581-6c71-4d20-aa66-65c4a0d1cbc9

📥 Commits

Reviewing files that changed from the base of the PR and between 4542919 and 403c704.

📒 Files selected for processing (12)

chuck/__main__.py
chuck/benchmark.py
chuck/benchmarks/compute_core/__init__.py
chuck/benchmarks/data_encoding/__init__.py
chuck/benchmarks/graph_analytics/__init__.py
chuck/benchmarks/io_pipeline/__init__.py
chuck/benchmarks/memory_index/__init__.py
chuck/benchmarks/memory_tier/__init__.py
chuck/benchmarks/ordering_core/__init__.py
chuck/benchmarks/prime_analytics/__init__.py
chuck/benchmarks/relational_fusion/__init__.py
chuck/benchmarks/retrieval_core/__init__.py

Walkthrough

Added an optional benchmark size override through the CLI, propagated size into the benchmark orchestrator and runners, and increased TASK_SPEC sizes for graph_analytics (1_000 → 200_000) and retrieval_core (2_000 → 2_000_000). No other logic changed.

Changes

Cohort / File(s)	Summary
CLI `chuck/__main__.py`	Added `--size` integer option to `bench` and forward `size=args.size` into `run_benchmarks(...)`.
Benchmark orchestrator `chuck/benchmark.py`	`run_benchmarks` signature extended to `run_benchmarks(..., size: int
Benchmark runners (entrypoints) `chuck/benchmarks/.../__init__.py` `chuck/benchmarks/compute_core/__init__.py`, `chuck/benchmarks/data_encoding/__init__.py`, `chuck/benchmarks/graph_analytics/__init__.py`, `chuck/benchmarks/io_pipeline/__init__.py`, `chuck/benchmarks/memory_index/__init__.py`, `chuck/benchmarks/memory_tier/__init__.py`, `chuck/benchmarks/ordering_core/__init__.py`, `chuck/benchmarks/prime_analytics/__init__.py`, `chuck/benchmarks/relational_fusion/__init__.py`, `chuck/benchmarks/retrieval_core/__init__.py`	Public `run()` signatures changed to `run(size: int
Task specification updates `chuck/tasks/graph_analytics/task.py`, `chuck/tasks/retrieval_core/task.py`	Increased `TASK_SPEC` size constants: `graph_analytics` from `1_000` → `200_000`; `retrieval_core` from `2_000` → `2_000_000`. No other behavior changed.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI (`chuck/__main__.py`)
    participant Orchestrator as run_benchmarks (`chuck/benchmark.py`)
    participant Runner as Benchmark Runner (`chuck/benchmarks/*`)
    participant Task as TASK_SPEC / benchmark_task (`chuck/tasks/*`)

    CLI->>Orchestrator: run_benchmarks(task?, size?)
    Orchestrator->>Runner: runner(size=size)
    Runner->>Task: benchmark_task(TASK_SPEC, seed, size=size)
    Task->>Task: generate(...) / solve(...)
    Task-->>Runner: results dict
    Runner-->>Orchestrator: runner result
    Orchestrator-->>CLI: aggregated results

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: increasing default benchmark sizes for retrieval_core and graph_analytics tasks to address performance stability issues.
Linked Issues check	✅ Passed	All coding requirements from Issue `#2` are met: TASK_SPEC sizes increased (retrieval_core 2,000→2,000,000, graph_analytics 1,000→200,000), benchmark size is now CLI-configurable via --size argument, and changes achieve target execution times (retrieval_core ~0.99s, graph_analytics ~3.28s) with reduced variance.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to Issue `#2` requirements: updating TASK_SPEC constants, adding size parameter propagation through the benchmark pipeline, and adding CLI --size argument. No unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

notdarking · 2026-04-12T03:43:11Z

@Aaryan-Dadu , Kindly review this

Aaryan-Dadu · 2026-04-12T05:36:22Z

Could you please run the A/B comparison workflow for the same commit?

notdarking · 2026-04-12T07:51:36Z

Could you please run the A/B comparison workflow for the same commit?
@Aaryan-Dadu

Aaryan-Dadu · 2026-04-12T08:24:03Z

Could you make the benchmark size configurable? By default keep it at 200_000

Aaryan-Dadu · 2026-04-12T08:25:11Z

Could you please run the A/B comparison workflow for the same commit?
@Aaryan-Dadu

One more thing, please add screenshot for the same python backend for the same commit, just create two snapshots at same state and compare those

notdarking · 2026-04-12T09:40:30Z

Aaryan-Dadu · 2026-04-12T09:42:13Z

Could you make the benchmark size configurable? By default keep it at 200_000

Please work on this too

Aaryan-Dadu · 2026-04-12T09:43:25Z

And please add Fixes #2 in your PR description to link the issue to the PR.

notdarking · 2026-04-12T09:46:33Z

Could you make the benchmark size configurable? By default keep it at 200_000

Please work on this too

On it.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@chuck/__main__.py`:
- Around line 40-45: The --size argument currently allows zero and negative
values; change validation so only positive integers are accepted by providing a
custom argparse type (e.g., a function named positive_int that parses int(x) and
raises argparse.ArgumentTypeError if value <= 0) and use it in
bench_parser.add_argument("--size", type=positive_int, ...); alternatively
perform the same check after parsing and raise a clear error/exit if args.size
is not None and args.size <= 0. Ensure the unique symbol
bench_parser.add_argument("--size", ...) is updated to reference the validator
(or the post-parse check validates args.size).

In `@chuck/benchmark.py`:
- Around line 46-49: The function run_benchmarks has trailing whitespace on the
blank/closing line causing pre-commit failures; open the run_benchmarks
definition and remove any trailing spaces (e.g., after the return line or the
subsequent blank line) so the block contains no trailing whitespace—check the
run_benchmarks(...) signature and the return [runner(size=size) for runner in
RUNNERS] line and ensure they end cleanly with no extra spaces.

In `@chuck/benchmarks/io_pipeline/__init__.py`:
- Around line 9-11: The run entrypoint currently forwards any integer size to
benchmark_task (TASK_SPEC) allowing 0 or negative sizes; add a guard in
run(size: int | None = None) that if size is not None and size <= 0 it raises a
ValueError with a clear message (or alternatively add the same check inside
benchmark_task) so only None or positive sizes are dispatched to benchmark_task.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 821c69f2-1f42-4104-94a3-267ccebe0f78

📥 Commits

Reviewing files that changed from the base of the PR and between 81db385 and 1216ad7.

📒 Files selected for processing (12)

chuck/__main__.py
chuck/benchmark.py
chuck/benchmarks/compute_core/__init__.py
chuck/benchmarks/data_encoding/__init__.py
chuck/benchmarks/graph_analytics/__init__.py
chuck/benchmarks/io_pipeline/__init__.py
chuck/benchmarks/memory_index/__init__.py
chuck/benchmarks/memory_tier/__init__.py
chuck/benchmarks/ordering_core/__init__.py
chuck/benchmarks/prime_analytics/__init__.py
chuck/benchmarks/relational_fusion/__init__.py
chuck/benchmarks/retrieval_core/__init__.py

coderabbitai · 2026-04-12T10:43:49Z

+    bench_parser.add_argument(
+    "--size",
+    type=int,
+    default=None,
+    help="Override the default benchmark size"
+    )


⚠️ Potential issue | 🟡 Minor

Validate --size as a positive integer.

Line 42 currently accepts 0 and negative values, which can produce invalid benchmark payload sizes.

Proposed fix

+def _positive_int(value: str) -> int: + parsed = int(value) + if parsed <= 0: + raise argparse.ArgumentTypeError("--size must be a positive integer") + return parsed + ... bench_parser.add_argument( - "--size", - type=int, - default=None, - help="Override the default benchmark size" + "--size", + type=_positive_int, + default=None, + help="Override the default benchmark size", )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@chuck/__main__.py` around lines 40 - 45, The --size argument currently allows zero and negative values; change validation so only positive integers are accepted by providing a custom argparse type (e.g., a function named positive_int that parses int(x) and raises argparse.ArgumentTypeError if value <= 0) and use it in bench_parser.add_argument("--size", type=positive_int, ...); alternatively perform the same check after parsing and raise a clear error/exit if args.size is not None and args.size <= 0. Ensure the unique symbol bench_parser.add_argument("--size", ...) is updated to reference the validator (or the post-parse check validates args.size).

Aaryan-Dadu · 2026-04-12T10:45:33Z

-def run() -> dict[str, Any]:
-    return benchmark_task(TASK_SPEC, seed=1_001)
+def run(size: int | None = None) -> dict[str, Any]:
+    # We pass the size override into the core benchmark_task logic


remove this comment

Aaryan-Dadu · 2026-04-12T10:48:20Z

-def run() -> dict[str, Any]:
-    return benchmark_task(TASK_SPEC, seed=1_009)
+def run(size: int | None = None) -> dict[str, Any]:
+    return benchmark_task(TASK_SPEC, seed=1_009,size=size)


returning size=None? will that work?

Yes, In the benchmark_task function I have added a conditional statement if size is none it defaults to default benchmark_size

Aaryan-Dadu · 2026-04-12T10:49:01Z

+    type=int,
+    default=None,
+    help="Override the default benchmark size"
+    )


Please show the demo for this

Screenshot added in PR

Aaryan-Dadu · 2026-04-12T10:49:38Z

CI checks are failing, resolve them too

Aaryan-Dadu · 2026-04-12T12:17:53Z

I am happy with the changes now! Thanks for your contribution

fix : changed data size in retrieval_core and graph_analytics (fixes i…

81db385

…iitl#2)

Aaryan-Dadu requested changes Apr 12, 2026

View reviewed changes

Comment thread chuck/tasks/retrieval_core/task.py

Comment thread chuck/tasks/graph_analytics/task.py

IronKommander mentioned this pull request Apr 12, 2026

Fixes #6 - perf: optimize python backend for task graph-analytics #35

Open

4 tasks

mini-walkerx Bot mentioned this pull request Apr 12, 2026

Fast Tasks Dominated by OS Noise #2

Open

coderabbitai Bot reviewed Apr 12, 2026

View reviewed changes

Aaryan-Dadu requested changes Apr 12, 2026

View reviewed changes

notdarking force-pushed the stability-patch-graph-and-retrieval branch from 1216ad7 to 4542919 Compare April 12, 2026 10:49

feat: make benchmark size configurable

403c704

notdarking force-pushed the stability-patch-graph-and-retrieval branch from 4542919 to 403c704 Compare April 12, 2026 10:59

notdarking requested a review from Aaryan-Dadu April 12, 2026 11:00

Aaryan-Dadu changed the base branch from main to dev April 12, 2026 12:18

Aaryan-Dadu merged commit b9fa979 into iiitl:dev Apr 12, 2026
2 checks passed

Aaryan-Dadu added the accepted-45 label Apr 12, 2026

Conversation

notdarking commented Apr 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Stability Benchmark Results

retrival_core

graph_analysis

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

notdarking commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aaryan-Dadu commented Apr 12, 2026

Uh oh!

Uh oh!

Uh oh!

notdarking commented Apr 12, 2026

Uh oh!

Aaryan-Dadu commented Apr 12, 2026

Uh oh!

Aaryan-Dadu commented Apr 12, 2026

Uh oh!

notdarking commented Apr 12, 2026

Uh oh!

Aaryan-Dadu commented Apr 12, 2026

Uh oh!

Aaryan-Dadu commented Apr 12, 2026

Uh oh!

notdarking commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Aaryan-Dadu Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Aaryan-Dadu Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

notdarking Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Aaryan-Dadu Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

notdarking Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Aaryan-Dadu commented Apr 12, 2026

Uh oh!

Aaryan-Dadu commented Apr 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

notdarking commented Apr 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 11, 2026 •

edited

Loading

notdarking commented Apr 12, 2026 •

edited

Loading

notdarking commented Apr 12, 2026 •

edited

Loading