feat(seer_grouping): Filter Seer grouping requests by token count instead of frame count #103997

yuvmen · 2025-11-25T19:49:25Z

Change the stacktrace length check to go by token count and not be stacktrace frame count. We also do a sanity check on raw string length first, if it is below the token max then no point counting tokens, just pass. We are keeping the old bypass for certain platforms to not change current behaviour, once we move to v2 grouping model we will enable this for them as well.

…tead of frame count Change the stacktrace length check to go by token count and not be stacktrace frame count. We also do a sanity check on raw string length first, if it is below the token max then no point counting tokens, just pass. We are keeping the old bypass for certain platforms to not change current behaviour, once we move to v2 grouping model we will enable this for them as well.

cursor · 2025-11-25T19:51:25Z

src/sentry/seer/similarity/utils.py

-        report_token_count_metric(event, variants, "block_frames")
-        return True
+        report_token_count_metric(event, variants, "pass_string_length")
+        return False


Bug: String length compared to token count without conversion

The code compares string_length (measured in characters) directly against max_token_count (measured in tokens) at line 383. These are different units and cannot be meaningfully compared. Since one token typically represents ~4 characters, a stacktrace with several thousand characters could have far fewer tokens. This comparison will almost always be true, causing most stacktraces to skip the expensive token counting and pass immediately, defeating the token-based filtering logic.

Wrong analysis here, we are comparing exactly because tokens are 4~ characters, which means if a string length is even less than the max token count it will never exceed the token count limit. We could have even made it 4 times that according to this anaylsis, so what we are doing is very conservative actually.

codecov · 2025-11-25T20:07:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #103997      +/-   ##
===========================================
+ Coverage   76.12%    80.64%   +4.52%     
===========================================
  Files        9312      9318       +6     
  Lines      397283    397962     +679     
  Branches    25357     25357              
===========================================
+ Hits       302433    320954   +18521     
+ Misses      94393     76551   -17842     
  Partials      457       457

lobsterkatie

Two questions, but otherwise LGTM!

src/sentry/seer/similarity/utils.py

lobsterkatie · 2025-11-26T23:48:04Z

src/sentry/seer/similarity/utils.py

-        # Exception-message-based grouping
-        or not hasattr(contributing_component, "frame_counts")


Why remove this fail-fast option?

well it seemed like I no longer need it to have frame_counts to be able to do this check, which just needs the raw stacktrace, but maybe if it doesnt have it its indicative of something more important?

Actually, now that variants all have a key property, you could just do something like if 'stacktrace' not in contributing_variant.key or not contributing_component: ... and that'd catch all the cases mentioned in the current version of the check.

nice, refactored the conditions there to this 👍

Now that there's no type-check ahead of the mypy-appeasment check, the comment about it doesn't make sense - I'd just remove it (the comment that is, not the check).

src/sentry/seer/similarity/utils.py

tests/sentry/grouping/seer_similarity/test_seer.py

cursor

Bug: Test expects `stacktrace_type` tag not provided by production code

The tests were updated to expect stacktrace_type in the metric tags passed to get_similarity_data_from_seer, but the production code in get_seer_similar_issues (in src/sentry/grouping/ingest/seer.py) only includes platform, model_version, training_mode, and hybrid_fingerprint in seer_request_metric_tags. Since stacktrace_type is never added to these tags, the test assertions will fail. Either the production code needs to be updated to include stacktrace_type, or these test expectations are incorrect.

tests/sentry/grouping/seer_similarity/test_seer.py#L70-L71

sentry/tests/sentry/grouping/seer_similarity/test_seer.py

Lines 70 to 71 in fcc165e

    
           "hybrid_fingerprint": False, 
        
           "stacktrace_type": "system",

tests/sentry/grouping/seer_similarity/test_seer.py#L208-L209

sentry/tests/sentry/grouping/seer_similarity/test_seer.py

Lines 208 to 209 in fcc165e

    
           "hybrid_fingerprint": False, 
        
           "stacktrace_type": "system",

…luding `stacktrace` makes for a better condition and clear what we are looking for

yuvmen requested a review from a team as a code owner November 25, 2025 19:49

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Nov 25, 2025

cursor bot reviewed Nov 25, 2025

View reviewed changes

vercel bot deployed to Preview November 25, 2025 19:51 View deployment

test fixes

2ed0bfb

vercel bot deployed to Preview November 25, 2025 22:18 View deployment

lobsterkatie approved these changes Nov 26, 2025

View reviewed changes

pr comments

e036cd6

vercel bot deployed to Preview December 1, 2025 18:36 View deployment

cursor bot reviewed Dec 1, 2025

View reviewed changes

tests/sentry/grouping/seer_similarity/test_seer.py Show resolved Hide resolved

tests/sentry/grouping/seer_similarity/test_seer.py Show resolved Hide resolved

vercel bot deployed to Preview December 1, 2025 19:18 View deployment

cursor bot reviewed Dec 1, 2025

View reviewed changes

test fixes

7df4650

yuvmen force-pushed the yuvmen/seer-grouping-token-count-filtering branch from fcc165e to 7df4650 Compare December 1, 2025 19:46

vercel bot deployed to Preview December 1, 2025 19:49 View deployment

change fail-fast conditions to cehck for contributing_variant.key inc…

af1c991

…luding `stacktrace` makes for a better condition and clear what we are looking for

vercel bot deployed to Preview December 1, 2025 22:34 View deployment

rephrase comment

8cbfab1

vercel bot deployed to Preview December 1, 2025 23:31 View deployment

yuvmen merged commit db9fc5e into master Dec 2, 2025
67 checks passed

yuvmen deleted the yuvmen/seer-grouping-token-count-filtering branch December 2, 2025 17:17

		# Exception-message-based grouping
		or not hasattr(contributing_component, "frame_counts")

Uh oh!

feat(seer_grouping): Filter Seer grouping requests by token count instead of frame count #103997

feat(seer_grouping): Filter Seer grouping requests by token count instead of frame count #103997

Uh oh!

Conversation

yuvmen commented Nov 25, 2025

Uh oh!

cursor bot Nov 25, 2025

Choose a reason for hiding this comment

Bug: String length compared to token count without conversion

Uh oh!

yuvmen Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lobsterkatie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lobsterkatie Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

yuvmen Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

lobsterkatie Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

yuvmen Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

lobsterkatie Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Bug: Test expects `stacktrace_type` tag not provided by production code

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Nov 25, 2025 •

edited

Loading

lobsterkatie Dec 1, 2025 •

edited

Loading