feat: graphwalks token filter by nmayorga7 · Pull Request #115 · groq/openbench

nmayorga7 · 2025-08-20T19:37:44Z

Summary

Adds optional token-count–based filtering to the GraphWalks dataset pipeline.
Samples that exceed user-specified max_context_size are now dropped during dataset preparation, constraining evals to contexts within a token budget.

What are you adding?

Changes Made

Extended record_to_sample in graphwalks.py to:
- Compute input token counts via get_token_count.
- Drop records whose token length exceeds max_context_size.
- Attach raw_input_tok_cnt to sample metadata for downstream metrics
Updated get_dataset to accept max_context_size and pass it through to the mapper.

Testing

Ran evaluation with and without -T max_context_size
I have run the existing test suite (pytest)
I have added tests for my changes
I have tested with multiple model providers (if applicable)
I have run pre-commit hooks (pre-commit run --all-files)

Checklist

My code follows the project's style guidelines
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (if applicable)
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Additional Context

This feature mirrors the MRCR evaluation’s token gating strategy, but is adapted for GraphWalks.
It provides a consistent mechanism for controlling dataset size relative to model context limits, and sets up future work on token-binned scoring.

AarushSah · 2025-08-21T04:19:57Z

@claude please review

claude · 2025-08-21T04:20:10Z

Claude encountered an error —— View job

Failed with exit code 128

I'll analyze this and get back to you.

AarushSah

Thanks for the PR! Left some comments.

src/openbench/datasets/graphwalks.py

src/openbench/scorers/graphwalks.py

nmayorga7 · 2025-08-23T03:24:09Z

Thanks for comments :)
See changes in latest commit:
fix: simplify tokenization, binning, and scoring logic

nmayorga7 added 6 commits August 18, 2025 16:45

updated Graphwalks token filtration

c9c09f0

chore: update token filtration

16af1ff

chore: increase max_tokens default

d0d5e5a

fix: register task_type arg

fb80b53

chore: comments

b6a8242

fix: imports

83dda83

nmayorga7 marked this pull request as ready for review August 20, 2025 22:04

nmayorga7 requested a review from AarushSah as a code owner August 20, 2025 22:04

nmayorga7 changed the title ~~Feat/graphwalks token filter~~ feat: graphwalks token filter Aug 20, 2025

AarushSah suggested changes Aug 21, 2025

View reviewed changes

nmayorga7 marked this pull request as draft August 23, 2025 02:23

fix: simplify tokenization, binning, and scoring logic

c939dc3

nmayorga7 marked this pull request as ready for review August 23, 2025 03:19

ruff format fix

a8c49e1

This comment was marked as outdated.

Sign in to view

bug fix record_to_sample()

2edd213

This comment was marked as outdated.

Sign in to view

nmayorga7 added 2 commits August 28, 2025 16:13

bug fix: make _parse_nodes() more robust

739f3a5

ruff format fix

de515cf

AarushSah approved these changes Aug 31, 2025

View reviewed changes

AarushSah merged commit e38658c into groq:main Aug 31, 2025
14 of 15 checks passed

github-actions bot mentioned this pull request Aug 31, 2025

chore: release 0.5.0 #132

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: graphwalks token filter#115

feat: graphwalks token filter#115
AarushSah merged 11 commits intogroq:mainfrom
nmayorga7:feat/graphwalks-token-filter

nmayorga7 commented Aug 20, 2025 •

edited

Loading

Uh oh!

AarushSah commented Aug 21, 2025

Uh oh!

claude bot commented Aug 21, 2025 •

edited

Loading

Uh oh!

AarushSah left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nmayorga7 commented Aug 23, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

nmayorga7 commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What are you adding?

Changes Made

Testing

Checklist

Additional Context

Uh oh!

AarushSah commented Aug 21, 2025

Uh oh!

claude bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AarushSah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nmayorga7 commented Aug 23, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

nmayorga7 commented Aug 20, 2025 •

edited

Loading

claude bot commented Aug 21, 2025 •

edited

Loading