fix(nodestore): Avoid full bigtable scans in get_multi by kcons · Pull Request #91976 · getsentry/sentry

kcons · 2025-05-20T20:40:05Z

Add an id_list deduplication so that we can correctly recognize cache hits and avoid full table scans in bigtable.
Add a test that demonstrates the fix.

We observed some timeouts in nodestore.get_multi calls, with lookups for a small number of IDs taking 40s+.
One commonality was that they were lookups that included duplicated IDs and cache hits; this is because we
recognize full cache hits by comparing cached records to len(id_list), return if they are equal, and otherwise
pass on id_list with cached IDs removed to bigtable get_bytes_multi. With dupes removed, id_list is empty,
and the resulting row_scan to bigtable is unconstrained, meaning it either won't finish, or will return irrelevant data.
Duplicates aren't visible in the result, so deduping early avoid this issue and any others that might result from repeated keys.

codecov · 2025-05-20T20:56:18Z

Codecov Report

Attention: Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.

⚠️ Parser warning

The parser emitted a warning. Please review your JUnit XML file:

Warning while parsing testcase attributes: Limit of string is 1000 chars, for name, we got 2083 at 1:157236 in /home/runner/work/sentry/sentry/.artifacts/pytest.junit.xml

Files with missing lines	Patch %	Lines
src/sentry/utils/kvstore/bigtable.py	33.33%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #91976      +/-   ##
==========================================
+ Coverage   81.60%   87.63%   +6.02%     
==========================================
  Files       10355    10355              
  Lines      587356   587463     +107     
  Branches    22586    22586              
==========================================
+ Hits       479306   514810   +35504     
+ Misses     107623    72226   -35397     
  Partials      427      427

wedamija

It might be a good idea to put a warning or something inside of _get_bytes_multi too if we pass an empty list of ids

kcons · 2025-05-20T22:37:36Z

@wedamija I think all the other many/multi implementations do a reasonable thing with an empty list.. thoughts on just returning here, maybe with a warning that you probably didn't mean to ask for nothing?

wedamija · 2025-05-20T23:01:37Z

@wedamija I think all the other many/multi implementations do a reasonable thing with an empty list.. thoughts on just returning here, maybe with a warning that you probably didn't mean to ask for nothing?

That seems reasonable to me!

…d expect

Add an id_list deduplication so that we can correctly recognize cache hits and avoid full table scans in bigtable. Add a test that demonstrates the fix. We observed some timeouts in nodestore.get_multi calls, with lookups for a small number of IDs taking 40s+. One commonality was that they were lookups that included duplicated IDs and cache hits; this is because we recognize full cache hits by comparing cached records to len(id_list), return if they are equal, and otherwise pass on id_list with cached IDs removed to bigtable get_bytes_multi. With dupes removed, id_list is empty, and the resulting row_scan to bigtable is unconstrained, meaning it either won't finish, or will return irrelevant data. Duplicates aren't visible in the result, so deduping early avoid this issue and any others that might result from repeated keys.

fix(nodestore): Avoid keyless bigtable scans in get_multi

74a0036

github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 20, 2025

wedamija approved these changes May 20, 2025

View reviewed changes

Add a warning to BigTableKVStorage.get_multi and make it do what you'…

6fe5bc1

…d expect

vercel Bot deployed to Preview May 20, 2025 23:06 View deployment

kcons merged commit d0ab122 into master May 21, 2025
60 checks passed

kcons deleted the kcons/nsfix branch May 21, 2025 15:54

github-actions Bot locked and limited conversation to collaborators Jun 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(nodestore): Avoid full bigtable scans in get_multi#91976

fix(nodestore): Avoid full bigtable scans in get_multi#91976
kcons merged 2 commits into
masterfrom
kcons/nsfix

kcons commented May 20, 2025

Uh oh!

codecov Bot commented May 20, 2025 •

edited

Loading

Uh oh!

wedamija left a comment

Uh oh!

kcons commented May 20, 2025

Uh oh!

wedamija commented May 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

kcons commented May 20, 2025

Uh oh!

codecov Bot commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

⚠️ Parser warning

Uh oh!

wedamija left a comment

Choose a reason for hiding this comment

Uh oh!

kcons commented May 20, 2025

Uh oh!

wedamija commented May 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 20, 2025 •

edited

Loading