-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
perf(eventstream): Add caching layer, raise sort #103756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #103756 +/- ##
=========================================
Coverage 80.59% 80.60%
=========================================
Files 9274 9279 +5
Lines 395956 396164 +208
Branches 25250 25250
=========================================
+ Hits 319138 319320 +182
- Misses 76389 76415 +26
Partials 429 429 |
4da8ef3 to
edef3d0
Compare
edef3d0 to
171400b
Compare
171400b to
30ca5dc
Compare
Two changes: 1. Add a caching layer so that we can avoid hitting the DB for hot groups. 2. We don't need the sort in 95% of cases. Let's raise that into Python and only do it when necessary.
30ca5dc to
0ac93a0
Compare
wedamija
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some broader concerns about how we're doing this. It looks like for every snuba query that uses SnubaQueryParams we're potentially making multiple calls to postgres? The caching will help here, but we've added 1000s of queries a second to the database. Let's see if this drops a lot and then maybe we need to revisit this solution in general.
I think it's surprising behaviour that just initializing SnubaQueryParams causes postgres queries
| running_data = { | ||
| (group_id, datetime.now(UTC) + timedelta(minutes=1)) for group_id in group_ids | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just be a dict of <group_id>: <max_date>? So when we add rows in later, we just do running_data[group_id] = max(running_data[group_id], new_date)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the set easier to reason about and don't think there'll be that much duplication (only expect one Redirect per group)
Two changes: