-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle Rate Limiting for Replay Events #6710
Comments
Personally, I think option B is the safest & most "correct" one. Keeping stuff around when we hit rate limits is potentially problematic, so I think option B would cover us best. |
I think it depends a bit on how long rate limits last for -- if they are short B/C are viable. If they are long (in the minutes), then maybe A. It would be great to track rate limit start/end events, but this also assumes that the user stays active through the rate limiting period. |
Internal notion doc with more input: https://www.notion.so/sentry/Replay-Rate-Limiting-Replay-Quality-853db9e906f44c6e94c1f416d7e2eb8f |
@Lms24 I think in the short term option B sounds good. Long-term a solution that combines B + C would be ideal where 1) we can show the user when they get rate limited and 2) we keep the events that occur during the rate limiting. We should be able to buffer the events and turn off flushing for that time period and decide to reset the buffer if it gets too large? |
closing with the decision made, and can revisit if need be |
Problem Statement
For all Sentry events, we currently do not do anything if an event is not ingested by the Sentry backend due to rate limits. We of course respect the retry-after time in which we don't send events but we don't retry sending the original event. This is fine for regular events (errors, transactions, sessions) as they are mostly atomic. For Replay however, this is not the case, as we're sending multiple events in one replay. If one segment goes missing, the replay cannot be continued after this segment, as one (or multiple) diffs would be missing.
Solution Brainstorm
We have a couple of options how to handle replays and replay events if we hit a rate limit:
Option A: Splitting Replays
When we hit a rate limit, we pause the replay and once the
retry-after
period expired, we start a new replay with a new checkout. The obivous question here is: Can we link the two (or more) replays effectively? This will probably require additional complexity in the SDK and in the Sentry Replay UI. Possibly also for replay event ingestion (not sure here...)Pros:
Cons:
Option B: Pausing the Replay
When we hit a rate limit, we pause the replay and continue the same replay after the rate limit period expired. When we restart, we take full snapshot, which should theoretically make it possible to continue the replay even though we obviously missed segments during the rate limit period. IIRC this should work out and users would basically see a paused/inactive period of time.
Q: Can we show users in the UI that the "missing" segements are due to rate limits? What information do we need to pass along? and when?
Pros:
Cons:
Option C: Retrying rate-limited Replay Requests
In order to not loose any segments, we could leave events that were rate-limited in the queue and retry sending them at a later time. There are implications around this as we would potentially accumulate a lot of events in the queue which we'd try to re-send after the first rate limit period in addition to newer segments. This increases the potential for more rate-limits occuring at that time, therefore again increasing the amount of queued events, etc....
This would even occur if we just attempt to retry a request for 1/2/3 times.
Pros:
Cons:
==> Is this scaleable at all?
My strong feeling is that option B is probably the best but I'm happy to hear everyone's opinions.
The text was updated successfully, but these errors were encountered: