feat: Track stalls in the JavaScript event loop as measurements #1542

jennmueng · 2021-05-25T14:19:10Z

📢 Type of change

Bugfix
New feature
Enhancement
Refactoring

📜 Description

Important TL;DR

Tracks stalls in the JavaScript event loop as measurements attached to transactions.
Is enabled by default as a default integration. Can be disabled by disabling the default integration.
Will actually work on other Javascript platforms other than React Native. We should consider expending it if it turns out to work well.
Monkey patches the hub.startTransaction and transaction.finish along with routing idleTransactions through ReactNativeTracing to instrument every transaction.
Supports the trimEnd functionality where the end timestamp of the transaction is set to the last finished span.
Does not support startTimetamp and endTimestamp being custom set.

If you want to see it in action running on JS, try logging measurements in the if statement on line 32 here when you run the jest tests:

sentry-react-native/test/integrations/stalltracking.test.ts

Lines 6 to 40 in 0d573be

    
           it("Stall tracking detects a JS stall", async () => { 
        
             const stallTracking = new StallTracking(); 
        
             const transaction = new Transaction({ 
        
               name: "Test Transaction", 
        
             }); 
        
             transaction.initSpanRecorder(); 
        
             const finishTracking = stallTracking.registerTransactionStart(transaction); 
        
             // We create a very **nasty** object 
        
             const nastyObject: { value: string[] } = { value: [] }; 
        
             for (let i = 0; i < 100000; i++) { 
        
               nastyObject.value.push("nasty"); 
        
             } 
        
             // This works in sync, so it should stall the js event loop 
        
             for (let i = 0; i < 50; i++) { 
        
               JSON.parse(JSON.stringify(nastyObject)); 
        
             } 
        
             await new Promise<void>((resolve) => { 
        
               setTimeout(() => { 
        
                 const measurements = finishTracking(); 
        
                 expect(measurements).not.toEqual(null); 
        
                 if (measurements !== null) { 
        
                   expect(measurements.stall_count.value).toBeGreaterThan(0); 
        
                   expect(measurements.stall_longest_time.value).toBeGreaterThan(0); 
        
                   expect(measurements.stall_total_time.value).toBeGreaterThan(0); 
        
                 } 
        
                 resolve(); 
        
               }, 500); 
        
             });

How it works

Measures the stall time in the JS event loop by setting a timeout and seeing how long it strays from the expected time that it's supposed to run. By default, the "minimumStallThreshold " is set to 50ms, with each timeout running every 50ms which allows a margin of error of minimumStallThreshold = 50ms before the straying is considered a stall. The stall time is then calculated by taking the total difference from the time the timeout is supposed to be run — totalTimeTaken - 50ms.

This is inspired by the JSEventLoopWatchdog used internally in React Native.

Spans

We monkey patch the finish method of spans that have been added to the spanRecorder. Every time a span inside a transaction is finished, we log the last end timestamp in the chance that trimEnd is used and the transaction's end timestamp is set to the last span's end timestamp.

Example Measurement

{
	"measurements": {
	  "stall_count": { 
			"value": 2 
		},
	  "stall_total_time": { 
			"value": 247.875 
		}, 
	  "stall_longest_time": { 
			"value": 175.125 
		},
	}
}

Why we don't use this method to measure slow and frozen JS frames:

We would need to set the timeout to 16.67ms (for 60fps) and log any iterations that take longer than that. However, in practice when doing so it ends up being incredibly flakey, with the majority of the event loop iterations taking longer than the 16.67ms needed for 60fps and it ends up logging almost all frames as slow. This is not accurate to what the fps performance monitor is logging when using RN in dev mode.

💡 Motivation and Context

Tracking JS event loop stalls should be correlated to JS Frame drops. Although we would prefer to track frame drops to align the behavior with the native SDKs, this is the only way to track a measurement that represents app slowdowns on the JS layer that I can find.

This method is inspired by how React Native measures stalls internally.

💚 How did you test it?

Extensive unit tests, tested on Android and iOS sample app.

📝 Checklist

I reviewed submitted code
I added tests to verify changes
All tests passing
No breaking changes

Next Steps

We also discussed logging each stall as a child span, which I believe would be massively helpful in pinpointing exactly when slowdowns happened in your app.

src/js/measurements.ts

jennmueng · 2021-06-02T16:00:33Z

As discussed on the grooming, this will be updated to send

{
	"measurements": {
	  "stall_count": { 
			"value": 2 
		},
	  "stall_total_time": { 
			"value": 247.875 
		}, 
	  "stall_longest_time": { 
			"value": 175.125 
		},
	}
}

jennmueng · 2021-06-16T08:38:55Z

Requested review from @getsentry/team-webplatform due to how heavy JS this PR is along with being applicable to the rest of the JS universe.

AbhiPrasad · 2021-06-16T11:20:59Z

src/js/integrations/stalltracking.ts

@@ -0,0 +1,328 @@
+/* eslint-disable max-lines */


This is a nice approach - will do a more detailed review in a bit. What do you think we need to do to move this into sentry-javascript?

Also the monkey patching is not ideal, we should update tracing to make it easier to patch rather than patch the idle transaction this way tbh.

Yeah I strongly agree the monkey patching is bad, but we reckoned that we're going to release this as a prerelease to see how it goes, if it's actually effective, and if it's valuable to customers, if it turns out well we can then move it over to the javascript repo and solidify it by updating tracing to actually support hooks to the start and finish methods.

Sounds like a plan 👍

AbhiPrasad

Give me some more time to think about this implementation, but on a high level I like the direction.

package.json

src/js/index.ts

src/js/integrations/stalltracking.ts

test/integrations/stalltracking.test.ts

jennmueng · 2021-06-16T15:52:28Z

After some deliberating. I will remove the transaction endTimestamp support and reduce the span finish tracking to just the last one which will greatly reduce complexity.

jennmueng · 2021-06-17T12:00:22Z

Removed the complex support for endTimestamp and instead we just support trimEnd being used. Although due to the different behavior regarding unfinished spans, we handle idleTransactions and normal transactions differently. Please double check the logic I used for this here:

sentry-react-native/src/js/integrations/stalltracking.ts

Lines 149 to 187 in 9ddd3e9

    
               if (endTimestamp && isIdleTransaction) { 
        
                 /* 
        
                   There is different behavior regarding child spans in a normal transaction and an idle transaction. In normal transactions, 
        
                   the child spans that aren't finished will be dumped, while in an idle transaction they're cancelled and finished. 
        
                   Note: `endTimestamp` will always be defined if this is called on an idle transaction finish. This is because we only instrument 
        
                   idle transactions inside `ReactNativeTracing`, which will pass an `endTimestamp`. 
        
                 */ 
        
                 // There will be cancelled spans, which means that the end won't be trimmed 
        
                 const spansWillBeCancelled = transaction.spanRecorder 
        
                   ? transaction.spanRecorder.spans.some( 
        
                       (s) => 
        
                         s !== transaction && 
        
                         s.startTimestamp < endTimestamp && 
        
                         !s.endTimestamp 
        
                     ) 
        
                   : false; 
        
                 if (endWillBeTrimmed && !spansWillBeCancelled) { 
        
                   // the last span's timestamp will be used. 
        
                   const statsAtLastSpanFinish = this._statsAtTimestamp.get(transaction); 
        
                   if (statsAtLastSpanFinish) { 
        
                     statsOnFinish = statsAtLastSpanFinish.stats; 
        
                   } 
        
                 } else { 
        
                   // this endTimestamp will be used. 
        
                   statsOnFinish = this._getCurrentStats(transaction); 
        
                 } 
        
               } else if (endWillBeTrimmed) { 
        
                 // If `trimEnd` is used, and there is a span to trim to. If there isn't, then the transaction should use `endTimestamp` or generate one. 
        
                 const statsAtLastSpanFinish = this._statsAtTimestamp.get(transaction); 
        
                 if (statsAtLastSpanFinish) { 
        
                   statsOnFinish = statsAtLastSpanFinish.stats; 
        
                 } 
        
               } else if (!endTimestamp) { 
        
                 statsOnFinish = this._getCurrentStats(transaction); 
        
               }

Updated tests to test the latest implementation along with adding new tests:
- Multiple stalls are counted for
- Idletransaction with unfinished spans
- Unfinished spans in idle transactions
- Only measures stalls within the transaction duration when trimEnd is used

github-actions · 2021-06-17T12:02:18Z

	Messages
📖	Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Generated by 🚫 dangerJS against 9658ea0

AbhiPrasad

gimme another day to reflect on this, but tbh we can probably ship it

src/js/integrations/stalltracking.ts

AbhiPrasad

Spent the weekend thinking about it, and ok let's ship it. I would get a mobile team +1 just to confirm before you merge though. Also please make sure to address the default integration thing as well.

src/js/integrations/stalltracking.ts

src/js/sdk.ts

AbhiPrasad · 2021-06-21T11:30:17Z

src/js/sdk.ts

@@ -24,6 +25,7 @@ const DEFAULT_OPTIONS: ReactNativeOptions = {
  enableNativeCrashHandling: true,
  enableNativeNagger: true,
  autoInitializeNativeSdk: true,
+  enableStallTracking: true,


One thought, do we want to make it not default enabled for this release, ask users to try it out manually, and then enable by default in the release after that?

@bruno-garcia What do you say. Although this would be released in a beta, if we just enable it, it might work because the people who would be trying this out would be doing so just for it anyways? I'm fine with either.

jennmueng · 2021-06-21T14:32:33Z

Changed so tracing needs to be enabled for the StallTracking integration to be added along with tests for it.

src/js/integrations/stalltracking.ts

bruno-garcia · 2021-06-22T14:46:34Z

src/js/integrations/stalltracking.ts

+    if (endTimestamp && isIdleTransaction) {
+      /*
+        There is different behavior regarding child spans in a normal transaction and an idle transaction. In normal transactions,
+        the child spans that aren't finished will be dumped, while in an idle transaction they're cancelled and finished.


I think this is just the current state of affairs but we should have a chat with the team and align one way or the other.

Doesn't mean this blocks the PR, but it's something we need to address.

Co-authored-by: Abhijeet Prasad <aprasad@sentry.io>

Cases include multiple stalls, idleTransactions, and stalls outside the timestamp set by trimEnd

Co-authored-by: Abhijeet Prasad <aprasad@sentry.io>

src/js/integrations/stalltracking.ts

bruno-garcia

Can we see this flying?

jennmueng commented May 25, 2021

View reviewed changes

src/js/measurements.ts Outdated Show resolved Hide resolved

jennmueng commented May 25, 2021

View reviewed changes

src/js/measurements.ts Outdated Show resolved Hide resolved

jennmueng commented May 25, 2021

View reviewed changes

src/js/measurements.ts Show resolved Hide resolved

jennmueng requested a review from a team May 27, 2021 06:34

bruno-garcia reviewed May 31, 2021

View reviewed changes

src/js/measurements.ts Outdated Show resolved Hide resolved

jennmueng force-pushed the jenn/stalling branch from 9620b5e to 05d86dd Compare June 16, 2021 05:59

jennmueng changed the title ~~feat: Tracking JS event loop stalls as measurements~~ feat: Track stalls in the JavaScript event loop as measurements Jun 16, 2021

jennmueng force-pushed the jenn/stalling branch from 7858cb1 to 0d573be Compare June 16, 2021 08:35

jennmueng marked this pull request as ready for review June 16, 2021 08:36

jennmueng requested a review from HazAT as a code owner June 16, 2021 08:36

jennmueng requested review from a team and kamilogorek and removed request for a team June 16, 2021 08:37

jennmueng requested a review from AbhiPrasad June 16, 2021 08:39

AbhiPrasad reviewed Jun 16, 2021

View reviewed changes

AbhiPrasad requested changes Jun 16, 2021

View reviewed changes

jennmueng marked this pull request as draft June 16, 2021 16:40

jennmueng marked this pull request as ready for review June 17, 2021 12:01

AbhiPrasad reviewed Jun 17, 2021

View reviewed changes

AbhiPrasad approved these changes Jun 21, 2021

View reviewed changes

bruno-garcia reviewed Jun 22, 2021

View reviewed changes

jennmueng added 2 commits July 6, 2021 20:56

draft: Stalling

8d7ddbd

feat: Stall tracking on all transactions

aca0568

jennmueng and others added 17 commits July 6, 2021 20:56

feat: Allow stall tracking to be disabled via enableStallTracking

030d186

ref: Small comments

6b666ee

Suggestions from Abhi

b8294f5

Co-authored-by: Abhijeet Prasad <aprasad@sentry.io>

ref: Clean ReactNativeTracing

3cdb21c

fix: Fix stalling test

c6eff32

ref: Refactor and cleanup after code review

53c90b7

fix: We don't support ms integer timestamps

35c4e90

ref: Simplify stall tracking to just support trimEnd

f050b0d

test: Update test and add new cases

d867045

Cases include multiple stalls, idleTransactions, and stalls outside the timestamp set by trimEnd

Apply suggestions from code review

884c01e

Co-authored-by: Abhijeet Prasad <aprasad@sentry.io>

ref: Cleanup after code review

9ebced1

ref: Extract longest stall into variable

216b0ef

ref: Add class comment to setupOnce

b27be76

feat: Update stall interval implementation to minimumStallThreshold

1895ff8

feat: Check for tracing enabled before enabling StallTracking

8cdfe63

test: Add tests for stall tracking enable in init

31fa9c9

Remove useless comment

202a304

Co-authored-by: Abhijeet Prasad <aprasad@sentry.io>

jennmueng force-pushed the jenn/stalling branch from 6d2b93c to 202a304 Compare July 6, 2021 13:56

jennmueng added 2 commits July 8, 2021 14:13

ref: Refactor StallTracking integration to use single map

94f70d9

feat: Add max transaction limit

c593a20

jennmueng self-assigned this Jul 8, 2021

jennmueng added this to Waiting for Review in Mobile Platform Team Archived Jul 8, 2021

fix: Fix open handle in test

74c3644

AbhiPrasad reviewed Jul 8, 2021

View reviewed changes

src/js/integrations/stalltracking.ts Outdated Show resolved Hide resolved

bruno-garcia approved these changes Jul 14, 2021

View reviewed changes

ref: Use explicit for of loop in leak flush

9658ea0

AbhiPrasad approved these changes Jul 15, 2021

View reviewed changes

jennmueng merged commit e2895d7 into master Jul 16, 2021

Mobile Platform Team Archived automation moved this from Waiting for Review to Done Jul 16, 2021

jennmueng deleted the jenn/stalling branch July 16, 2021 10:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Track stalls in the JavaScript event loop as measurements #1542

feat: Track stalls in the JavaScript event loop as measurements #1542

jennmueng commented May 25, 2021 •

edited

jennmueng commented Jun 2, 2021

jennmueng commented Jun 16, 2021

AbhiPrasad Jun 16, 2021 •

edited

jennmueng Jun 16, 2021

AbhiPrasad Jun 16, 2021

AbhiPrasad left a comment

jennmueng commented Jun 16, 2021

jennmueng commented Jun 17, 2021

github-actions bot commented Jun 17, 2021 •

edited

AbhiPrasad left a comment

AbhiPrasad left a comment

AbhiPrasad Jun 21, 2021

jennmueng Jun 21, 2021

jennmueng commented Jun 21, 2021

bruno-garcia Jun 22, 2021

bruno-garcia left a comment

	it("Stall tracking detects a JS stall", async () => {
	const stallTracking = new StallTracking();

	const transaction = new Transaction({
	name: "Test Transaction",
	});
	transaction.initSpanRecorder();

	const finishTracking = stallTracking.registerTransactionStart(transaction);

	// We create a very nasty object
	const nastyObject: { value: string[] } = { value: [] };
	for (let i = 0; i < 100000; i++) {
	nastyObject.value.push("nasty");
	}

	// This works in sync, so it should stall the js event loop
	for (let i = 0; i < 50; i++) {
	JSON.parse(JSON.stringify(nastyObject));
	}

	await new Promise<void>((resolve) => {
	setTimeout(() => {
	const measurements = finishTracking();

	expect(measurements).not.toEqual(null);
	if (measurements !== null) {
	expect(measurements.stall_count.value).toBeGreaterThan(0);
	expect(measurements.stall_longest_time.value).toBeGreaterThan(0);
	expect(measurements.stall_total_time.value).toBeGreaterThan(0);
	}

	resolve();
	}, 500);
	});

feat: Track stalls in the JavaScript event loop as measurements #1542

feat: Track stalls in the JavaScript event loop as measurements #1542

Conversation

jennmueng commented May 25, 2021 • edited

📢 Type of change

📜 Description

Important TL;DR

How it works

Spans

Example Measurement

Why we don't use this method to measure slow and frozen JS frames:

💡 Motivation and Context

💚 How did you test it?

📝 Checklist

Next Steps

jennmueng commented Jun 2, 2021

jennmueng commented Jun 16, 2021

AbhiPrasad Jun 16, 2021 • edited

Choose a reason for hiding this comment

jennmueng Jun 16, 2021

Choose a reason for hiding this comment

AbhiPrasad Jun 16, 2021

Choose a reason for hiding this comment

AbhiPrasad left a comment

Choose a reason for hiding this comment

jennmueng commented Jun 16, 2021

jennmueng commented Jun 17, 2021

github-actions bot commented Jun 17, 2021 • edited

AbhiPrasad left a comment

Choose a reason for hiding this comment

AbhiPrasad left a comment

Choose a reason for hiding this comment

AbhiPrasad Jun 21, 2021

Choose a reason for hiding this comment

jennmueng Jun 21, 2021

Choose a reason for hiding this comment

jennmueng commented Jun 21, 2021

bruno-garcia Jun 22, 2021

Choose a reason for hiding this comment

bruno-garcia left a comment

Choose a reason for hiding this comment

jennmueng commented May 25, 2021 •

edited

AbhiPrasad Jun 16, 2021 •

edited

github-actions bot commented Jun 17, 2021 •

edited