Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(replay): Lower the flush max delay from 15 seconds to 5 seconds #6761

Merged
merged 1 commit into from Jan 18, 2023

Conversation

billyvg
Copy link
Member

@billyvg billyvg commented Jan 12, 2023

This means that we will only wait 5 seconds before flushing. This should help reduce potential lost segments due to page reloads/closed tabs.

@billyvg billyvg marked this pull request as ready for review January 12, 2023 16:41
Copy link
Member

@mydea mydea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just simplify our debounce implemention and get rid of this config then? (so just make it have a single flush time option, instead of two)?

@Lms24
Copy link
Member

Lms24 commented Jan 12, 2023

Hmm but at the same time, this will increase the chance of running into rate limits, right?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 12, 2023

size-limit report 📦

Path Size
@sentry/browser - ES5 CDN Bundle (gzipped + minified) 19.82 KB (+0.01% 🔺)
@sentry/browser - ES5 CDN Bundle (minified) 61.44 KB (0%)
@sentry/browser - ES6 CDN Bundle (gzipped + minified) 18.49 KB (+0.03% 🔺)
@sentry/browser - ES6 CDN Bundle (minified) 54.74 KB (0%)
@sentry/browser - Webpack (gzipped + minified) 20.22 KB (0%)
@sentry/browser - Webpack (minified) 66.17 KB (0%)
@sentry/react - Webpack (gzipped + minified) 20.24 KB (0%)
@sentry/nextjs Client - Webpack (gzipped + minified) 47.51 KB (0%)
@sentry/browser + @sentry/tracing - ES5 CDN Bundle (gzipped + minified) 26.73 KB (+0.03% 🔺)
@sentry/browser + @sentry/tracing - ES6 CDN Bundle (gzipped + minified) 25.01 KB (-0.01% 🔽)
@sentry/replay ES6 CDN Bundle (gzipped + minified) 43.11 KB (-0.83% 🔽)
@sentry/replay - Webpack (gzipped + minified) 38.45 KB (-1% 🔽)

@billyvg
Copy link
Member Author

billyvg commented Jan 12, 2023

Hmm but at the same time, this will increase the chance of running into rate limits, right?

Yeah that's true

@billyvg
Copy link
Member Author

billyvg commented Jan 12, 2023

Should we just simplify our debounce implementation and get rid of this config then? (so just make it have a single flush time option, instead of two)?

Yeah I think we should see how this performs IRT to missing segments first (and maybe see rate limiting increases?)

@Lms24
Copy link
Member

Lms24 commented Jan 12, 2023

see how this performs IRT to missing segments first (and maybe see rate limiting increases?)

I thought we don't yet have a view into rate limiting client reports (thinking of our _admin discussion and how it only shows errors and transactions)? I mean we can make this change, sure, but the question is if we can actually analyze if it changed improved something.

@bruno-garcia
Copy link
Member

Hmm but at the same time, this will increase the chance of running into rate limits, right?

yes and if we get customers running into this due to higher loads, we can ask them to use this option to 'reduce load'

Copy link
Member

@Lms24 Lms24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and if we get customers running into this due to higher loads, we can ask them to use this option to 'reduce load'

Fair enough, let's try this

@bruno-garcia bruno-garcia added the CI-Overhead-Measurements Add this label to run SDK overhead measurements on a PR label Jan 16, 2023
@bruno-garcia
Copy link
Member

Added label to trigger overhead measurement job

image

This means that we will only wait 5 seconds before flushing
@mydea mydea force-pushed the feat-replay-change-flush-timeout-to-5-seconds branch from 0a97be7 to 94fed11 Compare January 17, 2023 07:56
@github-actions
Copy link
Contributor

github-actions bot commented Jan 17, 2023

Replay SDK metrics 🚀

Latest data for: 1f8f2bf
  Plain +Sentry +Replay
Value Value Diff Ratio Value Diff Ratio
LCP 68.58 ms 92.82 ms +24.25 ms +35.36 % 131.28 ms +38.45 ms +41.43 %
CLS 0.02 ms 0.02 ms +0.00 ms +0.09 % 0.02 ms +0.00 ms +0.96 %
CPU 14.72 % 15.15 % +0.43 pp +2.90 % 27.44 % +12.29 pp +81.14 %
JS heap avg 1.73 MB 2.14 MB +405.53 kB +23.44 % 3.3 MB +1.17 MB +54.70 %
JS heap max 1.89 MB 2.35 MB +459.04 kB +24.24 % 4.44 MB +2.09 MB +88.73 %

*) pp - percentage points - an absolute difference between two percentages.
Last updated: Tue, 17 Jan 2023 09:07:08 GMT

@mydea
Copy link
Member

mydea commented Jan 17, 2023

So both CPU as well as LCP do seem a bit better (compared to #6611):

  • CPU: 25.43% vs. 34.75%
  • LCP: 116ms vs. 125ms

I wonder how stable that is - I'll rerun the overhead job to be sure. If that's correct, it would be nice, but I wonder if such a large CPU reduction is realistic with this PR? 🤔

Edit: Hmm, ok, so second run gives very different results 😅 156ms LCP & 42.50% CPU.
Third run: LCP 131, CPU 27.44.

So overall we got for this PR:

  • LCP: 116ms, 156ms, 131ms
  • CPU: 25%, 42%, 27%

@Lms24
Copy link
Member

Lms24 commented Jan 17, 2023

ok, so second run gives very different results

@vaind any idea how to get more consistent results? Should we increase the number of measurement runs to get more consistent averages?

@vaind
Copy link
Contributor

vaind commented Jan 17, 2023

ok, so second run gives very different results

@vaind any idea how to get more consistent results? Should we increase the number of measurement runs to get more consistent averages?

It's currently collected as an average over 10 subsequent runs and there are checks on (a fairly low) standard deviance between runs. Locally the results are pretty stable regardless of the time of the collection so the issue here may be caused by running on Linux VMs which are a shared resource. I think all the currently available public runners on GitHub Actions are VM based, even macOS ones, though I can give it a go and see if it's more stable throughout the day.

On mobile, we ran these on real devices so while there's variance still, it may be a bit lower.

Also, these need to be treated as a rough indicator of whether there may be something wrong and needs closer inspection, e.g. by manually profiling and verifying the changes. It's not like measuring bundle size which will be super stable all the time.

Also, it may be better to compare the "Ratio" field which compares to the current baseline, e.g. for the CPU usage. If you consider that it doesn't change nearly as much as absolute values (which are more influenced by the parallel load on the VM).

BTW, the table currently shows diff & ratio to the previous "scenario", so "+Replay" shows diff & ratio to "+Sentry"... Looking at it now maybe we should change that so that both "+Replay" and "+Sentry" are compared directly to "Plain" - that would make it easier to read I think

@Lms24
Copy link
Member

Lms24 commented Jan 18, 2023

Looking at it now maybe we should change that so that both "+Replay" and "+Sentry" are compared directly to "Plain" - that would make it easier to read I think

Agreed, would you mind opening a PR for that?

@billyvg billyvg merged commit 9ed9c23 into master Jan 18, 2023
@billyvg billyvg deleted the feat-replay-change-flush-timeout-to-5-seconds branch January 18, 2023 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-Overhead-Measurements Add this label to run SDK overhead measurements on a PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants