Skip to content

fix(nodestore): Avoid full bigtable scans in get_multi#91976

Merged
kcons merged 2 commits into
masterfrom
kcons/nsfix
May 21, 2025
Merged

fix(nodestore): Avoid full bigtable scans in get_multi#91976
kcons merged 2 commits into
masterfrom
kcons/nsfix

Conversation

@kcons

@kcons kcons commented May 20, 2025

Copy link
Copy Markdown
Member

Add an id_list deduplication so that we can correctly recognize cache hits and avoid full table scans in bigtable.
Add a test that demonstrates the fix.

We observed some timeouts in nodestore.get_multi calls, with lookups for a small number of IDs taking 40s+.
One commonality was that they were lookups that included duplicated IDs and cache hits; this is because we
recognize full cache hits by comparing cached records to len(id_list), return if they are equal, and otherwise
pass on id_list with cached IDs removed to bigtable get_bytes_multi. With dupes removed, id_list is empty,
and the resulting row_scan to bigtable is unconstrained, meaning it either won't finish, or will return irrelevant data.
Duplicates aren't visible in the result, so deduping early avoid this issue and any others that might result from repeated keys.

@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 20, 2025
@codecov

codecov Bot commented May 20, 2025

Copy link
Copy Markdown

Codecov Report

Attention: Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.

⚠️ Parser warning

The parser emitted a warning. Please review your JUnit XML file:

Warning while parsing testcase attributes: Limit of string is 1000 chars, for name, we got 2083 at 1:157236 in /home/runner/work/sentry/sentry/.artifacts/pytest.junit.xml
Files with missing lines Patch % Lines
src/sentry/utils/kvstore/bigtable.py 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #91976      +/-   ##
==========================================
+ Coverage   81.60%   87.63%   +6.02%     
==========================================
  Files       10355    10355              
  Lines      587356   587463     +107     
  Branches    22586    22586              
==========================================
+ Hits       479306   514810   +35504     
+ Misses     107623    72226   -35397     
  Partials      427      427              

@wedamija wedamija left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to put a warning or something inside of _get_bytes_multi too if we pass an empty list of ids

@kcons

kcons commented May 20, 2025

Copy link
Copy Markdown
Member Author

@wedamija I think all the other many/multi implementations do a reasonable thing with an empty list.. thoughts on just returning here, maybe with a warning that you probably didn't mean to ask for nothing?

@wedamija

Copy link
Copy Markdown
Member

@wedamija I think all the other many/multi implementations do a reasonable thing with an empty list.. thoughts on just returning here, maybe with a warning that you probably didn't mean to ask for nothing?

That seems reasonable to me!

@kcons kcons merged commit d0ab122 into master May 21, 2025
60 checks passed
@kcons kcons deleted the kcons/nsfix branch May 21, 2025 15:54
andrewshie-sentry pushed a commit that referenced this pull request Jun 2, 2025
Add an id_list deduplication so that we can correctly recognize cache
hits and avoid full table scans in bigtable.
Add a test that demonstrates the fix.

We observed some timeouts in nodestore.get_multi calls, with lookups for
a small number of IDs taking 40s+.
One commonality was that they were lookups that included duplicated IDs
and cache hits; this is because we
recognize full cache hits by comparing cached records to len(id_list),
return if they are equal, and otherwise
pass on id_list with cached IDs removed to bigtable get_bytes_multi.
With dupes removed, id_list is empty,
and the resulting row_scan to bigtable is unconstrained, meaning it
either won't finish, or will return irrelevant data.
Duplicates aren't visible in the result, so deduping early avoid this
issue and any others that might result from repeated keys.
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 6, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants