-
-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Track stalls in the JavaScript event loop as measurements #1542
Conversation
As discussed on the grooming, this will be updated to send {
"measurements": {
"stall_count": {
"value": 2
},
"stall_total_time": {
"value": 247.875
},
"stall_longest_time": {
"value": 175.125
},
}
}
|
Requested review from @getsentry/team-webplatform due to how heavy JS this PR is along with being applicable to the rest of the JS universe. |
@@ -0,0 +1,328 @@ | |||
/* eslint-disable max-lines */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice approach - will do a more detailed review in a bit. What do you think we need to do to move this into sentry-javascript?
Also the monkey patching is not ideal, we should update tracing to make it easier to patch rather than patch the idle transaction this way tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I strongly agree the monkey patching is bad, but we reckoned that we're going to release this as a prerelease to see how it goes, if it's actually effective, and if it's valuable to customers, if it turns out well we can then move it over to the javascript repo and solidify it by updating tracing to actually support hooks to the start and finish methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a plan π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give me some more time to think about this implementation, but on a high level I like the direction.
After some deliberating. I will remove the transaction |
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gimme another day to reflect on this, but tbh we can probably ship it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spent the weekend thinking about it, and ok let's ship it. I would get a mobile team +1 just to confirm before you merge though. Also please make sure to address the default integration thing as well.
@@ -24,6 +25,7 @@ const DEFAULT_OPTIONS: ReactNativeOptions = { | |||
enableNativeCrashHandling: true, | |||
enableNativeNagger: true, | |||
autoInitializeNativeSdk: true, | |||
enableStallTracking: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thought, do we want to make it not default enabled for this release, ask users to try it out manually, and then enable by default in the release after that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bruno-garcia What do you say. Although this would be released in a beta, if we just enable it, it might work because the people who would be trying this out would be doing so just for it anyways? I'm fine with either.
Changed so tracing needs to be enabled for the |
if (endTimestamp && isIdleTransaction) { | ||
/* | ||
There is different behavior regarding child spans in a normal transaction and an idle transaction. In normal transactions, | ||
the child spans that aren't finished will be dumped, while in an idle transaction they're cancelled and finished. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just the current state of affairs but we should have a chat with the team and align one way or the other.
Doesn't mean this blocks the PR, but it's something we need to address.
Co-authored-by: Abhijeet Prasad <aprasad@sentry.io>
Cases include multiple stalls, idleTransactions, and stalls outside the timestamp set by trimEnd
Co-authored-by: Abhijeet Prasad <aprasad@sentry.io>
Co-authored-by: Abhijeet Prasad <aprasad@sentry.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we see this flying?
π’ Type of change
π Description
Important TL;DR
hub.startTransaction
andtransaction.finish
along with routing idleTransactions throughReactNativeTracing
to instrument every transaction.trimEnd
functionality where the end timestamp of the transaction is set to the last finished span.startTimetamp
andendTimestamp
being custom set.measurements
in the if statement on line 32 here when you run the jest tests:sentry-react-native/test/integrations/stalltracking.test.ts
Lines 6 to 40 in 0d573be
How it works
Measures the stall time in the JS event loop by setting a timeout and seeing how long it strays from the expected time that it's supposed to run. By default, the "
minimumStallThreshold
" is set to50ms
, with each timeout running every50ms
which allows a margin of error ofminimumStallThreshold = 50ms
before the straying is considered a stall. The stall time is then calculated by taking the total difference from the time the timeout is supposed to be run βtotalTimeTaken - 50ms
.This is inspired by the
JSEventLoopWatchdog
used internally in React Native.Spans
We monkey patch the finish method of spans that have been added to the
spanRecorder
. Every time a span inside a transaction is finished, we log the last end timestamp in the chance thattrimEnd
is used and the transaction's end timestamp is set to the last span's end timestamp.Example Measurement
Why we don't use this method to measure slow and frozen JS frames:
We would need to set the timeout to 16.67ms (for 60fps) and log any iterations that take longer than that. However, in practice when doing so it ends up being incredibly flakey, with the majority of the event loop iterations taking longer than the 16.67ms needed for 60fps and it ends up logging almost all frames as slow. This is not accurate to what the fps performance monitor is logging when using RN in dev mode.
π‘ Motivation and Context
Tracking JS event loop stalls should be correlated to JS Frame drops. Although we would prefer to track frame drops to align the behavior with the native SDKs, this is the only way to track a measurement that represents app slowdowns on the JS layer that I can find.
This method is inspired by how React Native measures stalls internally.
π How did you test it?
Extensive unit tests, tested on Android and iOS sample app.
π Checklist
Next Steps
We also discussed logging each stall as a child span, which I believe would be massively helpful in pinpointing exactly when slowdowns happened in your app.