Skip to content

perf: increase default benchmark sizes for retrieval_core and graph_analytics#41

Merged
Aaryan-Dadu merged 2 commits into
iiitl:devfrom
notdarking:stability-patch-graph-and-retrieval
Apr 12, 2026
Merged

perf: increase default benchmark sizes for retrieval_core and graph_analytics#41
Aaryan-Dadu merged 2 commits into
iiitl:devfrom
notdarking:stability-patch-graph-and-retrieval

Conversation

@notdarking
Copy link
Copy Markdown

@notdarking notdarking commented Apr 11, 2026

Description

This PR adressses Issue #2.
The previous default sizes for retrieval_core and graph_analytics resulted in execution times under 10ms. At this scale, OS scheduler noise caused a variance (Delta) of over 35%, making the benchmarks unreliable for performance comparison.

I have increased the default TASK_SPEC sizes so that the execution times are ~3.2 and ~0.99 seconds for graph_analysis and retrieval_core repectively.

Also, the benchmark size is now configurable from the CLI using the --size argument.

image

Stability Benchmark Results

Data collected over 50 iterations per task using a custom tracking script to measure Relative Standard Deviation (RSD) and Maximum Delta.

Task Size (Old) Size (New) Mean Time Delta (Max Variance) RSD (Std%)
graph_analytics 1,000 200,000 ~0.98s 9.5% (was 70%+) 2.16%
retrieval_core 2,000 2000,000 ~3.28s 7.44% (was 40%+) 1.68%

retrival_core

Old New
ret old Retriever NEw

graph_analysis

Old New
Graph Old Graph New

Fixes #2

Summary by CodeRabbit

  • New Features

    • CLI bench command now accepts an optional --size to override benchmark size.
  • Chores

    • Increased internal default sizes for analytics and retrieval tasks to support larger workloads.
    • Benchmark entrypoints updated to accept and forward an optional size override for more flexible runs.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 11, 2026

Warning

Rate limit exceeded

@notdarking has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 44 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 11 minutes and 44 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 653a6581-6c71-4d20-aa66-65c4a0d1cbc9

📥 Commits

Reviewing files that changed from the base of the PR and between 4542919 and 403c704.

📒 Files selected for processing (12)
  • chuck/__main__.py
  • chuck/benchmark.py
  • chuck/benchmarks/compute_core/__init__.py
  • chuck/benchmarks/data_encoding/__init__.py
  • chuck/benchmarks/graph_analytics/__init__.py
  • chuck/benchmarks/io_pipeline/__init__.py
  • chuck/benchmarks/memory_index/__init__.py
  • chuck/benchmarks/memory_tier/__init__.py
  • chuck/benchmarks/ordering_core/__init__.py
  • chuck/benchmarks/prime_analytics/__init__.py
  • chuck/benchmarks/relational_fusion/__init__.py
  • chuck/benchmarks/retrieval_core/__init__.py

Walkthrough

Added an optional benchmark size override through the CLI, propagated size into the benchmark orchestrator and runners, and increased TASK_SPEC sizes for graph_analytics (1_000 → 200_000) and retrieval_core (2_000 → 2_000_000). No other logic changed.

Changes

Cohort / File(s) Summary
CLI
chuck/__main__.py
Added --size integer option to bench and forward size=args.size into run_benchmarks(...).
Benchmark orchestrator
chuck/benchmark.py
run_benchmarks signature extended to `run_benchmarks(..., size: int
Benchmark runners (entrypoints)
chuck/benchmarks/.../__init__.py
chuck/benchmarks/compute_core/__init__.py, chuck/benchmarks/data_encoding/__init__.py, chuck/benchmarks/graph_analytics/__init__.py, chuck/benchmarks/io_pipeline/__init__.py, chuck/benchmarks/memory_index/__init__.py, chuck/benchmarks/memory_tier/__init__.py, chuck/benchmarks/ordering_core/__init__.py, chuck/benchmarks/prime_analytics/__init__.py, chuck/benchmarks/relational_fusion/__init__.py, chuck/benchmarks/retrieval_core/__init__.py
Public run() signatures changed to `run(size: int
Task specification updates
chuck/tasks/graph_analytics/task.py, chuck/tasks/retrieval_core/task.py
Increased TASK_SPEC size constants: graph_analytics from 1_000200_000; retrieval_core from 2_0002_000_000. No other behavior changed.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI (`chuck/__main__.py`)
    participant Orchestrator as run_benchmarks (`chuck/benchmark.py`)
    participant Runner as Benchmark Runner (`chuck/benchmarks/*`)
    participant Task as TASK_SPEC / benchmark_task (`chuck/tasks/*`)

    CLI->>Orchestrator: run_benchmarks(task?, size?)
    Orchestrator->>Runner: runner(size=size)
    Runner->>Task: benchmark_task(TASK_SPEC, seed, size=size)
    Task->>Task: generate(...) / solve(...)
    Task-->>Runner: results dict
    Runner-->>Orchestrator: runner result
    Orchestrator-->>CLI: aggregated results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: increasing default benchmark sizes for retrieval_core and graph_analytics tasks to address performance stability issues.
Linked Issues check ✅ Passed All coding requirements from Issue #2 are met: TASK_SPEC sizes increased (retrieval_core 2,000→2,000,000, graph_analytics 1,000→200,000), benchmark size is now CLI-configurable via --size argument, and changes achieve target execution times (retrieval_core ~0.99s, graph_analytics ~3.28s) with reduced variance.
Out of Scope Changes check ✅ Passed All changes are directly scoped to Issue #2 requirements: updating TASK_SPEC constants, adding size parameter propagation through the benchmark pipeline, and adding CLI --size argument. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@notdarking
Copy link
Copy Markdown
Author

notdarking commented Apr 12, 2026

@Aaryan-Dadu , Kindly review this

@Aaryan-Dadu
Copy link
Copy Markdown
Member

Could you please run the A/B comparison workflow for the same commit?

Comment thread chuck/tasks/retrieval_core/task.py
Comment thread chuck/tasks/graph_analytics/task.py
@notdarking
Copy link
Copy Markdown
Author

Could you please run the A/B comparison workflow for the same commit?
@Aaryan-Dadu

image image

@Aaryan-Dadu
Copy link
Copy Markdown
Member

Could you make the benchmark size configurable? By default keep it at 200_000

@Aaryan-Dadu
Copy link
Copy Markdown
Member

Could you please run the A/B comparison workflow for the same commit?
@Aaryan-Dadu

image image

One more thing, please add screenshot for the same python backend for the same commit, just create two snapshots at same state and compare those

@notdarking
Copy link
Copy Markdown
Author

image

@Aaryan-Dadu
Copy link
Copy Markdown
Member

Could you make the benchmark size configurable? By default keep it at 200_000

Please work on this too

@Aaryan-Dadu
Copy link
Copy Markdown
Member

And please add Fixes #2 in your PR description to link the issue to the PR.

@notdarking
Copy link
Copy Markdown
Author

notdarking commented Apr 12, 2026

Could you make the benchmark size configurable? By default keep it at 200_000

Please work on this too

On it.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@chuck/__main__.py`:
- Around line 40-45: The --size argument currently allows zero and negative
values; change validation so only positive integers are accepted by providing a
custom argparse type (e.g., a function named positive_int that parses int(x) and
raises argparse.ArgumentTypeError if value <= 0) and use it in
bench_parser.add_argument("--size", type=positive_int, ...); alternatively
perform the same check after parsing and raise a clear error/exit if args.size
is not None and args.size <= 0. Ensure the unique symbol
bench_parser.add_argument("--size", ...) is updated to reference the validator
(or the post-parse check validates args.size).

In `@chuck/benchmark.py`:
- Around line 46-49: The function run_benchmarks has trailing whitespace on the
blank/closing line causing pre-commit failures; open the run_benchmarks
definition and remove any trailing spaces (e.g., after the return line or the
subsequent blank line) so the block contains no trailing whitespace—check the
run_benchmarks(...) signature and the return [runner(size=size) for runner in
RUNNERS] line and ensure they end cleanly with no extra spaces.

In `@chuck/benchmarks/io_pipeline/__init__.py`:
- Around line 9-11: The run entrypoint currently forwards any integer size to
benchmark_task (TASK_SPEC) allowing 0 or negative sizes; add a guard in
run(size: int | None = None) that if size is not None and size <= 0 it raises a
ValueError with a clear message (or alternatively add the same check inside
benchmark_task) so only None or positive sizes are dispatched to benchmark_task.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 821c69f2-1f42-4104-94a3-267ccebe0f78

📥 Commits

Reviewing files that changed from the base of the PR and between 81db385 and 1216ad7.

📒 Files selected for processing (12)
  • chuck/__main__.py
  • chuck/benchmark.py
  • chuck/benchmarks/compute_core/__init__.py
  • chuck/benchmarks/data_encoding/__init__.py
  • chuck/benchmarks/graph_analytics/__init__.py
  • chuck/benchmarks/io_pipeline/__init__.py
  • chuck/benchmarks/memory_index/__init__.py
  • chuck/benchmarks/memory_tier/__init__.py
  • chuck/benchmarks/ordering_core/__init__.py
  • chuck/benchmarks/prime_analytics/__init__.py
  • chuck/benchmarks/relational_fusion/__init__.py
  • chuck/benchmarks/retrieval_core/__init__.py

Comment thread chuck/__main__.py
Comment on lines +40 to +45
bench_parser.add_argument(
"--size",
type=int,
default=None,
help="Override the default benchmark size"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate --size as a positive integer.

Line 42 currently accepts 0 and negative values, which can produce invalid benchmark payload sizes.

Proposed fix
+def _positive_int(value: str) -> int:
+    parsed = int(value)
+    if parsed <= 0:
+        raise argparse.ArgumentTypeError("--size must be a positive integer")
+    return parsed
+
 ...
     bench_parser.add_argument(
-    "--size",
-    type=int,
-    default=None,
-    help="Override the default benchmark size"
+        "--size",
+        type=_positive_int,
+        default=None,
+        help="Override the default benchmark size",
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chuck/__main__.py` around lines 40 - 45, The --size argument currently allows
zero and negative values; change validation so only positive integers are
accepted by providing a custom argparse type (e.g., a function named
positive_int that parses int(x) and raises argparse.ArgumentTypeError if value
<= 0) and use it in bench_parser.add_argument("--size", type=positive_int, ...);
alternatively perform the same check after parsing and raise a clear error/exit
if args.size is not None and args.size <= 0. Ensure the unique symbol
bench_parser.add_argument("--size", ...) is updated to reference the validator
(or the post-parse check validates args.size).

Comment thread chuck/benchmark.py Outdated
Comment thread chuck/benchmarks/io_pipeline/__init__.py
def run() -> dict[str, Any]:
return benchmark_task(TASK_SPEC, seed=1_001)
def run(size: int | None = None) -> dict[str, Any]:
# We pass the size override into the core benchmark_task logic
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment

def run() -> dict[str, Any]:
return benchmark_task(TASK_SPEC, seed=1_009)
def run(size: int | None = None) -> dict[str, Any]:
return benchmark_task(TASK_SPEC, seed=1_009,size=size)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returning size=None? will that work?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, In the benchmark_task function I have added a conditional statement if size is none it defaults to default benchmark_size

Comment thread chuck/__main__.py
type=int,
default=None,
help="Override the default benchmark size"
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please show the demo for this

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot added in PR

@Aaryan-Dadu
Copy link
Copy Markdown
Member

CI checks are failing, resolve them too

@notdarking notdarking force-pushed the stability-patch-graph-and-retrieval branch from 1216ad7 to 4542919 Compare April 12, 2026 10:49
@notdarking notdarking force-pushed the stability-patch-graph-and-retrieval branch from 4542919 to 403c704 Compare April 12, 2026 10:59
@notdarking notdarking requested a review from Aaryan-Dadu April 12, 2026 11:00
@Aaryan-Dadu
Copy link
Copy Markdown
Member

I am happy with the changes now! Thanks for your contribution

@Aaryan-Dadu Aaryan-Dadu changed the base branch from main to dev April 12, 2026 12:18
@Aaryan-Dadu Aaryan-Dadu merged commit b9fa979 into iiitl:dev Apr 12, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fast Tasks Dominated by OS Noise

2 participants