-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
perf(eventstore): get_events fetches event data from nodestore #16230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
These queries yield 10+ columns from Snuba (including all of the tag data) and are frequently timing out in production. Instead we now fetch the minimal column set from Snuba, then use those IDs to populate the event bodies from nodestore. Users of eventstore.get_events() should no longer manually bind nodes since this has already been done.
|
|
||
| self.eventstore = SnubaDiscoverEventStorage() | ||
|
|
||
| def test_get_events(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were unnecessary duplicates of snuba/test_backend.py
tests/snuba/models/test_event.py
Outdated
| assert event.data._node_data is not None | ||
| assert event.data["user"]["id"] == u"user1" | ||
|
|
||
| def test_event_with_no_body(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't have sampling as a feature anymore, I don't think this is a valid use case. Much of our product already depends on the events being present in nodestore and would otherwise be breaking if the event bodies were missing.
This code didn't previously work (user should be an interface of an event), it just happens to not currently be called in any situation where snuba data isn't populated.
627bdc0 to
153e727
Compare
ff86950 to
2b2b77f
Compare
src/sentry/models/event.py
Outdated
| return self.snuba_data["message"] | ||
| return self.data.get("message") | ||
| from sentry.event_manager import EventManager | ||
| event_manager = EventManager(self.data.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't EventManager depending on Event itself ? Could we avoid this dependency by moving get_search_message ? search_message feels like a property of an event.
|
|
||
| if "error" not in result: | ||
| return [SnubaEvent(evt) for evt in result["data"]] | ||
| events = [SnubaEvent(evt) for evt in result["data"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on how you noticed the performance issue? Knowing that the timeout is around 30 seconds, I would be surprised if the culprit was fetching multiple columns instead and I would be less surprised if the issue was the select process, which means the queries are still likely to timeout. Could this be related to the max_thread parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is more about increasing the speed of existing queries, than reducing the count of timed out queries (hopefully we don't have too many of those). I've seen a number of transactions that look similar to this https://sentry.io/organizations/sentry/eventsv2/snuba:7d55be78a00d43329937c42fcba9d2d5. The 2nd query is significantly slower than the first since it's getting all columns . I've changed the get_events implementation to only ever get the minimal columns from Snuba and the rest from Nodestore.
| if "error" not in result: | ||
| return [SnubaEvent(evt) for evt in result["data"]] | ||
| events = [SnubaEvent(evt) for evt in result["data"]] | ||
| self.bind_nodes(events) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way we would be doing one request to bigtable per event. I guess, it would be more efficient to rely on nodestore.get_multi that makes one query and streams the results.
Also would the performance improvement be still relevant if we fetch like 1000 events (which means blasting bigtable with 1000 requests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were 2 place where we were previously fetching more than 100 events:
- unmerge: This fetches 500 events at a time, since it was already doing bind_nodes immediately after the eventstore fetch (which has been removed) there should be no difference in performance
- nodestore deletions: Rewrote to not use eventstore since we only need to get 2 columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are doing a multi_get (which we are) I think we should be fine with that. But we should keep an eye on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only place we are adding a multi get where where we weren't (at some later point outside eventstore) is in the group deletions task, we should keep an eye on that.
10f9d5a to
97882c0
Compare
| if "error" not in result: | ||
| return [SnubaEvent(evt) for evt in result["data"]] | ||
| events = [SnubaEvent(evt) for evt in result["data"]] | ||
| self.bind_nodes(events) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are doing a multi_get (which we are) I think we should be fine with that. But we should keep an eye on this.
| event = SnubaEvent( | ||
| { | ||
| "timestamp": result["latest_event_timestamp"], | ||
| "event_id": result["event_id"], | ||
| "group_id": group_id, | ||
| "project_id": project_id, | ||
| } | ||
| ) | ||
| event.bind_node_data() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this become after SnubaEvent is gone ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been fixed in #16286, will rebase to pull in those changes
| referrer="deletions.group", | ||
| orderby=["-timestamp", "-event_id"], | ||
| ) | ||
| )["data"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should bypass eventstore to fetch events. See my comment on PR #16284.
Let's first see if loading everything from eventstore is a problem. Maybe we are fine with a multi_get. If it is a problem we should revisit the issue of loading a small number of fields on a big number of events through eventstore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll revert this commit, and we can observe if it proves to be a problem before attempting something like this.
This reverts commit 8c12bde.
fpacifici
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So after the discussion we had I believe you are going to still support binding nodes in some cases. RIght ?
fpacifici
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing tests are related.
| def get_unfetched_events( | ||
| self, filter, orderby=None, limit=100, offset=0, referrer="eventstore.get_unfetched_events" | ||
| ): | ||
| raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring here is specifically important to explain the difference with get_events
src/sentry/eventstore/models.py
Outdated
| return self._snuba_data[column] | ||
| return super(Event, self).location | ||
|
|
||
| def get_search_message(self, event_metadata=None, culprit=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change actually needed for fetching data from nodestore ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a dependency. This was kind of a hack before since the message from snuba is different from the message from nodestore. We're no longer providing a way to explicitly say that you want the column to be loaded from snuba (a good thing since this was being abused before and fields should be the same regardless of the source they are loaded from). I'm going to try to improve this a bit more in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #16473 for a more proper fix for the event.message / event.search_message behaviour
| Get events from Snuba. | ||
| Get events from Snuba, with node data loaded. | ||
| """ | ||
| return self.get_events( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a call to __get_events instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uhhh, yes it should be
Since search_message is distinct from message, we should have them as separate properties of Event, rather than use them interchangably. Search_message is the internal only property for search, while message is the publicly exposed property. We temporarily save search_message into the message field in event_manager.save() to maintain backward compatibility with Django event format.
… fetch-events-from-nodestore
… fetch-events-from-nodestore
This reverts commit 7ee79ad.
This reverts eventstore changes introduced in #16230, which bind node data. Since loading from nodestore is slower than expected we will attempt this change again once node data caching is introduced. I am not using reverting the entire commit since we still want to keep the search_message / message improvements that were included in the original.
These queries yield 10+ columns from Snuba (including all
of the tag data) and are frequently taking a long time.
Instead we now fetch the minimal column set from Snuba, then use
those IDs to populate the event bodies from nodestore. Users of
eventstore.get_events() should no longer manually bind nodes since this
has already been done.
This depends on improvements to the message / search_message field #16473