Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large events strategy #70342

Open
lhunath opened this issue Apr 30, 2024 · 9 comments
Open

Large events strategy #70342

lhunath opened this issue Apr 30, 2024 · 9 comments

Comments

@lhunath
Copy link

lhunath commented Apr 30, 2024

Problem Statement

https://develop.sentry.dev/sdk/envelopes/#size-limits
Reading the docs on envelope size limits has me worried: what is the action taken by Sentry should my events reach the limit? The docs do not specify.
As such, I need to assume the worst-case scenario: the server will drop the envelope and all events will be lost.

Is this correct, or will the Sentry server trim the envelope intelligently until it reaches the limit? Seeing as the network data was already consumed at this point anyway — it seems like a blind drop doesn't benefit the server but in storage.

If a drop /is/ what happens, will I learn about it somehow, such as through a drop event recorded in its stead?

Since I attach runtime logs to the event as breadcrumbs, there is a real possibility that a long or heavy runtime might lead to hitting the limit. It's quite important that I can be aware of this rather than having a blind-spot to it and silently building in a mechanism that ignores crashes from a certain segment of the user-base.

Solution Brainstorm

We do have SentryOptions.maxBreadcrumbs, an imperfect guard. The default maximum is set to 100, which feels low as a count of log statements. Of course, it is also no guarantee that it will keep the envelope within the server-defined limits. Furthermore, it is undefined by the documentation what happens when the limit is hit. Will new breadcrumbs drop old ones (sounds good) or will new breadcrumbs be dropped once the limit is reached (sounds bad).

We can use SentryOptions.beforeSend to introspect the event before send-out and assess its size. Theoretically this might be an opportunity for us to vet it against the limit and trim it manually before sending, but we have a few issues: [1] we do not know what the current limits are (we can hard-code the 1MB event limit from the doc), [2] we do not know the byte-size of the event. The best proxy is event.serialize() which yields us a dictionary, but AFAIK Sentry itself uses an internal MsgPack implementation that we cannot access, so at best we can JSON serialize or over-count with something like event.serialize().description.utf8.count.

If I had to assess who is best equipped to deal with the problem, then my list from best-to-worst would be:

  1. The Sentry server
  2. The Sentry SDK
  3. The app developer

Where the strategy from best-to-worst would be:

  1. Trim the envelope intelligently/incrementally, dropping oldest/least relevant data points first.
  2. Replace with a "envelope limit reached" event that contains some basic details.
  3. Drop the envelope silently, ignoring the events with no recourse.

Are you willing to submit a PR?

No response

@lhunath
Copy link
Author

lhunath commented Apr 30, 2024

Worth mentioning getsentry/sentry-java#910

@lhunath
Copy link
Author

lhunath commented Apr 30, 2024

What's curious to note is that when I add > 100 breadcrumbs to the scope, Sentry stops invoking beforeSend:

[Sentry] [debug] [SentryCrashStackCursor_SelfThread:61] Retrieving backtrace with async swift stitching...
[Sentry] [debug] [SentryCrashStackCursor_SelfThread:78] Finished retrieving backtrace.

At this point, adding < 100 breadcrumbs halts the code in the beforeSend call-back.
Adding > 100 breadcrumbs skips the callback and proceeds directly to sending the event off:

[Sentry] [debug] [SentryFileManager:321] Writing envelope to path: /Users/hubstaff/Library/Developer/CoreSimulator/Devices/🆔/data/Containers/Data/Application/🆔/Library/Caches/io.sentry/a999994c4ff1a25927f709eb50500057d6cdae95/envelopes/1714509134.833000-00000-🆔.json
[...]

I should note that I have SentryOptions.maxBreadcrumbs set to 500, and still I am seeing this odd behaviour.

That clearly suggests a bug to me, my beforeSend should not be skipped, it does important things.

As a result, it is now also impossible to use the beforeSend call-back to intelligently trim the size of the event to remain within the server-documented size limit.

@kahest
Copy link
Member

kahest commented May 2, 2024

Hey @lhunath thanks for reaching out - the product docs on size limits offer more details about the handling. In short, there's an overall size limit for the whole compressed/uncompressed event, which causes events to be dropped. After that, individual fields are trimmed as applicable.

The number of events that are dropped because they exceed the overall size limit are shown as part of the "Dropped" stat on the Stats page.

Wrt. the breadcrumbs limit - yes old breadcrumbs will be removed as new ones are added (LIFO).

@kahest
Copy link
Member

kahest commented May 2, 2024

What's curious to note is that when I add > 100 breadcrumbs to the scope, Sentry stops invoking beforeSend:

This is a bug - thanks for opening a separate issue, we'll investigate!

@lhunath
Copy link
Author

lhunath commented May 2, 2024

Thank you, I regret I was unable to find this information on my own from reading the docs. Perhaps some additional pointers could be added with respect to the server behaviour & dropped events statistics, in the places where it is missing.

Unfortunately, dropped events is rather ambiguous and there's very little clarity in what the cause was for the dropped event, or what it was about.
It would appear that SentrySDK is in a good position to attempt a drop recovery of some sort by uploading a minimal stub with some minimally safe information for the event, should this occur as a result of a known and recoverable situation such as envelope size limits.

At a minimum, it might be valuable to expose to the developer an onError callback to inform him of the circumstance and give him an opportunity to accommodate.

@kahest
Copy link
Member

kahest commented May 2, 2024

Thanks for the feedback about the docs - I'll see what we can do to make this info easier to find 👍

The overall topic would involve all SDKs, API, Ingestion, etc. so I'll cycle back to the larger team and may move this issue to a different repo later

@kahest
Copy link
Member

kahest commented May 6, 2024

related: #59320

@getsantry
Copy link
Contributor

getsantry bot commented May 6, 2024

Assigning to @getsentry/support for routing ⏲️

@brianthi brianthi transferred this issue from getsentry/sentry-cocoa May 6, 2024
@getsantry
Copy link
Contributor

getsantry bot commented May 6, 2024

Routing to @getsentry/product-owners-stats for triage ⏲️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Status: No status
Development

No branches or pull requests

3 participants