Skip to content

Conversation

@lynnagara
Copy link
Member

@lynnagara lynnagara commented Dec 31, 2019

These queries yield 10+ columns from Snuba (including all
of the tag data) and are frequently taking a long time.
Instead we now fetch the minimal column set from Snuba, then use
those IDs to populate the event bodies from nodestore. Users of
eventstore.get_events() should no longer manually bind nodes since this
has already been done.

This depends on improvements to the message / search_message field #16473

These queries yield 10+ columns from Snuba (including all
of the tag data) and are frequently timing out in production.
Instead we now fetch the minimal column set from Snuba, then use
those IDs to populate the event bodies from nodestore. Users of
eventstore.get_events() should no longer manually bind nodes since this
has already been done.

self.eventstore = SnubaDiscoverEventStorage()

def test_get_events(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were unnecessary duplicates of snuba/test_backend.py

assert event.data._node_data is not None
assert event.data["user"]["id"] == u"user1"

def test_event_with_no_body(self):
Copy link
Member Author

@lynnagara lynnagara Dec 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't have sampling as a feature anymore, I don't think this is a valid use case. Much of our product already depends on the events being present in nodestore and would otherwise be breaking if the event bodies were missing.

@lynnagara lynnagara requested a review from a team December 31, 2019 19:29
@lynnagara lynnagara changed the title perf(eventstore): Fetch event data from nodestore perf(eventstore): get_events fetches event data from nodestore Dec 31, 2019
This code didn't previously work (user should be an interface of an
event), it just happens to not currently be called in any situation
where snuba data isn't populated.
@lynnagara lynnagara force-pushed the fetch-events-from-nodestore branch from 627bdc0 to 153e727 Compare December 31, 2019 21:58
@lynnagara lynnagara force-pushed the fetch-events-from-nodestore branch from ff86950 to 2b2b77f Compare December 31, 2019 22:16
return self.snuba_data["message"]
return self.data.get("message")
from sentry.event_manager import EventManager
event_manager = EventManager(self.data.data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't EventManager depending on Event itself ? Could we avoid this dependency by moving get_search_message ? search_message feels like a property of an event.


if "error" not in result:
return [SnubaEvent(evt) for evt in result["data"]]
events = [SnubaEvent(evt) for evt in result["data"]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on how you noticed the performance issue? Knowing that the timeout is around 30 seconds, I would be surprised if the culprit was fetching multiple columns instead and I would be less surprised if the issue was the select process, which means the queries are still likely to timeout. Could this be related to the max_thread parameter?

Copy link
Member Author

@lynnagara lynnagara Jan 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more about increasing the speed of existing queries, than reducing the count of timed out queries (hopefully we don't have too many of those). I've seen a number of transactions that look similar to this https://sentry.io/organizations/sentry/eventsv2/snuba:7d55be78a00d43329937c42fcba9d2d5. The 2nd query is significantly slower than the first since it's getting all columns . I've changed the get_events implementation to only ever get the minimal columns from Snuba and the rest from Nodestore.

if "error" not in result:
return [SnubaEvent(evt) for evt in result["data"]]
events = [SnubaEvent(evt) for evt in result["data"]]
self.bind_nodes(events)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way we would be doing one request to bigtable per event. I guess, it would be more efficient to rely on nodestore.get_multi that makes one query and streams the results.
Also would the performance improvement be still relevant if we fetch like 1000 events (which means blasting bigtable with 1000 requests)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were 2 place where we were previously fetching more than 100 events:

  • unmerge: This fetches 500 events at a time, since it was already doing bind_nodes immediately after the eventstore fetch (which has been removed) there should be no difference in performance
  • nodestore deletions: Rewrote to not use eventstore since we only need to get 2 columns

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are doing a multi_get (which we are) I think we should be fine with that. But we should keep an eye on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place we are adding a multi get where where we weren't (at some later point outside eventstore) is in the group deletions task, we should keep an eye on that.

@lynnagara lynnagara force-pushed the fetch-events-from-nodestore branch from 10f9d5a to 97882c0 Compare January 6, 2020 21:14
if "error" not in result:
return [SnubaEvent(evt) for evt in result["data"]]
events = [SnubaEvent(evt) for evt in result["data"]]
self.bind_nodes(events)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are doing a multi_get (which we are) I think we should be fine with that. But we should keep an eye on this.

Comment on lines 82 to 90
event = SnubaEvent(
{
"timestamp": result["latest_event_timestamp"],
"event_id": result["event_id"],
"group_id": group_id,
"project_id": project_id,
}
)
event.bind_node_data()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this become after SnubaEvent is gone ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been fixed in #16286, will rebase to pull in those changes

referrer="deletions.group",
orderby=["-timestamp", "-event_id"],
)
)["data"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should bypass eventstore to fetch events. See my comment on PR #16284.

Let's first see if loading everything from eventstore is a problem. Maybe we are fine with a multi_get. If it is a problem we should revisit the issue of loading a small number of fields on a big number of events through eventstore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll revert this commit, and we can observe if it proves to be a problem before attempting something like this.

Copy link
Contributor

@fpacifici fpacifici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So after the discussion we had I believe you are going to still support binding nodes in some cases. RIght ?

Copy link
Contributor

@fpacifici fpacifici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing tests are related.

def get_unfetched_events(
self, filter, orderby=None, limit=100, offset=0, referrer="eventstore.get_unfetched_events"
):
raise NotImplementedError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring here is specifically important to explain the difference with get_events

return self._snuba_data[column]
return super(Event, self).location

def get_search_message(self, event_metadata=None, culprit=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change actually needed for fetching data from nodestore ?

Copy link
Member Author

@lynnagara lynnagara Jan 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a dependency. This was kind of a hack before since the message from snuba is different from the message from nodestore. We're no longer providing a way to explicitly say that you want the column to be loaded from snuba (a good thing since this was being abused before and fields should be the same regardless of the source they are loaded from). I'm going to try to improve this a bit more in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #16473 for a more proper fix for the event.message / event.search_message behaviour

Get events from Snuba.
Get events from Snuba, with node data loaded.
"""
return self.get_events(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a call to __get_events instead ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhhh, yes it should be

Since search_message is distinct from message, we should have them as
separate properties of Event, rather than use them interchangably.

Search_message is the internal only property for search, while message is
the publicly exposed property.

We temporarily save search_message into the message field in
event_manager.save() to maintain backward compatibility with Django
event format.
@lynnagara lynnagara merged commit 6635af5 into master Jan 16, 2020
@lynnagara lynnagara deleted the fetch-events-from-nodestore branch January 16, 2020 22:03
lynnagara added a commit that referenced this pull request Jan 17, 2020
lynnagara added a commit that referenced this pull request Jan 17, 2020
This reverts eventstore changes introduced in #16230, which bind node data. Since loading from nodestore is slower than expected we will attempt this change again once node data caching is introduced.

I am not using reverting the entire commit since we still want to keep the search_message / message improvements that were included in the original.
@github-actions github-actions bot locked and limited conversation to collaborators Dec 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants