Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Track stalls in the JavaScript event loop as measurements #1542

Merged
merged 33 commits into from
Jul 16, 2021

Conversation

jennmueng
Copy link
Member

@jennmueng jennmueng commented May 25, 2021

πŸ“’ Type of change

  • Bugfix
  • New feature
  • Enhancement
  • Refactoring

πŸ“œ Description

Important TL;DR

  • Tracks stalls in the JavaScript event loop as measurements attached to transactions.
  • Is enabled by default as a default integration. Can be disabled by disabling the default integration.
  • Will actually work on other Javascript platforms other than React Native. We should consider expending it if it turns out to work well.
  • Monkey patches the hub.startTransaction and transaction.finish along with routing idleTransactions through ReactNativeTracing to instrument every transaction.
  • Supports the trimEnd functionality where the end timestamp of the transaction is set to the last finished span.
  • Does not support startTimetamp and endTimestamp being custom set.
  • If you want to see it in action running on JS, try logging measurements in the if statement on line 32 here when you run the jest tests:
    it("Stall tracking detects a JS stall", async () => {
    const stallTracking = new StallTracking();
    const transaction = new Transaction({
    name: "Test Transaction",
    });
    transaction.initSpanRecorder();
    const finishTracking = stallTracking.registerTransactionStart(transaction);
    // We create a very **nasty** object
    const nastyObject: { value: string[] } = { value: [] };
    for (let i = 0; i < 100000; i++) {
    nastyObject.value.push("nasty");
    }
    // This works in sync, so it should stall the js event loop
    for (let i = 0; i < 50; i++) {
    JSON.parse(JSON.stringify(nastyObject));
    }
    await new Promise<void>((resolve) => {
    setTimeout(() => {
    const measurements = finishTracking();
    expect(measurements).not.toEqual(null);
    if (measurements !== null) {
    expect(measurements.stall_count.value).toBeGreaterThan(0);
    expect(measurements.stall_longest_time.value).toBeGreaterThan(0);
    expect(measurements.stall_total_time.value).toBeGreaterThan(0);
    }
    resolve();
    }, 500);
    });

How it works

Measures the stall time in the JS event loop by setting a timeout and seeing how long it strays from the expected time that it's supposed to run. By default, the "minimumStallThreshold " is set to 50ms, with each timeout running every 50ms which allows a margin of error of minimumStallThreshold = 50ms before the straying is considered a stall. The stall time is then calculated by taking the total difference from the time the timeout is supposed to be run β€” totalTimeTaken - 50ms.

This is inspired by the JSEventLoopWatchdog used internally in React Native.

Spans

We monkey patch the finish method of spans that have been added to the spanRecorder. Every time a span inside a transaction is finished, we log the last end timestamp in the chance that trimEnd is used and the transaction's end timestamp is set to the last span's end timestamp.

Example Measurement

{
	"measurements": {
	  "stall_count": { 
			"value": 2 
		},
	  "stall_total_time": { 
			"value": 247.875 
		}, 
	  "stall_longest_time": { 
			"value": 175.125 
		},
	}
}

Why we don't use this method to measure slow and frozen JS frames:

We would need to set the timeout to 16.67ms (for 60fps) and log any iterations that take longer than that. However, in practice when doing so it ends up being incredibly flakey, with the majority of the event loop iterations taking longer than the 16.67ms needed for 60fps and it ends up logging almost all frames as slow. This is not accurate to what the fps performance monitor is logging when using RN in dev mode.

πŸ’‘ Motivation and Context

Tracking JS event loop stalls should be correlated to JS Frame drops. Although we would prefer to track frame drops to align the behavior with the native SDKs, this is the only way to track a measurement that represents app slowdowns on the JS layer that I can find.

This method is inspired by how React Native measures stalls internally.

πŸ’š How did you test it?

Extensive unit tests, tested on Android and iOS sample app.

πŸ“ Checklist

  • I reviewed submitted code
  • I added tests to verify changes
  • All tests passing
  • No breaking changes

Next Steps

We also discussed logging each stall as a child span, which I believe would be massively helpful in pinpointing exactly when slowdowns happened in your app.

src/js/measurements.ts Outdated Show resolved Hide resolved
src/js/measurements.ts Outdated Show resolved Hide resolved
@jennmueng jennmueng requested a review from a team May 27, 2021 06:34
@jennmueng
Copy link
Member Author

As discussed on the grooming, this will be updated to send

{
	"measurements": {
	  "stall_count": { 
			"value": 2 
		},
	  "stall_total_time": { 
			"value": 247.875 
		}, 
	  "stall_longest_time": { 
			"value": 175.125 
		},
	}
}

@jennmueng jennmueng changed the title feat: Tracking JS event loop stalls as measurements feat: Track stalls in the JavaScript event loop as measurements Jun 16, 2021
@jennmueng jennmueng marked this pull request as ready for review June 16, 2021 08:36
@jennmueng jennmueng requested a review from HazAT as a code owner June 16, 2021 08:36
@jennmueng jennmueng requested review from a team and kamilogorek and removed request for a team June 16, 2021 08:37
@jennmueng
Copy link
Member Author

Requested review from @getsentry/team-webplatform due to how heavy JS this PR is along with being applicable to the rest of the JS universe.

@@ -0,0 +1,328 @@
/* eslint-disable max-lines */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice approach - will do a more detailed review in a bit. What do you think we need to do to move this into sentry-javascript?

Also the monkey patching is not ideal, we should update tracing to make it easier to patch rather than patch the idle transaction this way tbh.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I strongly agree the monkey patching is bad, but we reckoned that we're going to release this as a prerelease to see how it goes, if it's actually effective, and if it's valuable to customers, if it turns out well we can then move it over to the javascript repo and solidify it by updating tracing to actually support hooks to the start and finish methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a plan πŸ‘

Copy link
Member

@AbhiPrasad AbhiPrasad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give me some more time to think about this implementation, but on a high level I like the direction.

package.json Outdated Show resolved Hide resolved
src/js/index.ts Show resolved Hide resolved
src/js/integrations/stalltracking.ts Outdated Show resolved Hide resolved
src/js/integrations/stalltracking.ts Outdated Show resolved Hide resolved
src/js/integrations/stalltracking.ts Outdated Show resolved Hide resolved
src/js/integrations/stalltracking.ts Outdated Show resolved Hide resolved
src/js/integrations/stalltracking.ts Outdated Show resolved Hide resolved
src/js/integrations/stalltracking.ts Outdated Show resolved Hide resolved
test/integrations/stalltracking.test.ts Outdated Show resolved Hide resolved
test/integrations/stalltracking.test.ts Outdated Show resolved Hide resolved
@jennmueng
Copy link
Member Author

After some deliberating. I will remove the transaction endTimestamp support and reduce the span finish tracking to just the last one which will greatly reduce complexity.

@jennmueng jennmueng marked this pull request as draft June 16, 2021 16:40
@jennmueng
Copy link
Member Author

  • Removed the complex support for endTimestamp and instead we just support trimEnd being used. Although due to the different behavior regarding unfinished spans, we handle idleTransactions and normal transactions differently. Please double check the logic I used for this here:

    if (endTimestamp && isIdleTransaction) {
    /*
    There is different behavior regarding child spans in a normal transaction and an idle transaction. In normal transactions,
    the child spans that aren't finished will be dumped, while in an idle transaction they're cancelled and finished.
    Note: `endTimestamp` will always be defined if this is called on an idle transaction finish. This is because we only instrument
    idle transactions inside `ReactNativeTracing`, which will pass an `endTimestamp`.
    */
    // There will be cancelled spans, which means that the end won't be trimmed
    const spansWillBeCancelled = transaction.spanRecorder
    ? transaction.spanRecorder.spans.some(
    (s) =>
    s !== transaction &&
    s.startTimestamp < endTimestamp &&
    !s.endTimestamp
    )
    : false;
    if (endWillBeTrimmed && !spansWillBeCancelled) {
    // the last span's timestamp will be used.
    const statsAtLastSpanFinish = this._statsAtTimestamp.get(transaction);
    if (statsAtLastSpanFinish) {
    statsOnFinish = statsAtLastSpanFinish.stats;
    }
    } else {
    // this endTimestamp will be used.
    statsOnFinish = this._getCurrentStats(transaction);
    }
    } else if (endWillBeTrimmed) {
    // If `trimEnd` is used, and there is a span to trim to. If there isn't, then the transaction should use `endTimestamp` or generate one.
    const statsAtLastSpanFinish = this._statsAtTimestamp.get(transaction);
    if (statsAtLastSpanFinish) {
    statsOnFinish = statsAtLastSpanFinish.stats;
    }
    } else if (!endTimestamp) {
    statsOnFinish = this._getCurrentStats(transaction);
    }

  • Updated tests to test the latest implementation along with adding new tests:

    • Multiple stalls are counted for
    • Idletransaction with unfinished spans
    • Unfinished spans in idle transactions
    • Only measures stalls within the transaction duration when trimEnd is used

@jennmueng jennmueng marked this pull request as ready for review June 17, 2021 12:01
@github-actions
Copy link
Contributor

github-actions bot commented Jun 17, 2021

Messages
πŸ“– Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Generated by 🚫 dangerJS against 9658ea0

Copy link
Member

@AbhiPrasad AbhiPrasad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gimme another day to reflect on this, but tbh we can probably ship it

src/js/integrations/stalltracking.ts Show resolved Hide resolved
src/js/integrations/stalltracking.ts Outdated Show resolved Hide resolved
src/js/integrations/stalltracking.ts Outdated Show resolved Hide resolved
src/js/integrations/stalltracking.ts Outdated Show resolved Hide resolved
src/js/integrations/stalltracking.ts Outdated Show resolved Hide resolved
src/js/integrations/stalltracking.ts Outdated Show resolved Hide resolved
Copy link
Member

@AbhiPrasad AbhiPrasad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spent the weekend thinking about it, and ok let's ship it. I would get a mobile team +1 just to confirm before you merge though. Also please make sure to address the default integration thing as well.

src/js/integrations/stalltracking.ts Outdated Show resolved Hide resolved
src/js/integrations/stalltracking.ts Outdated Show resolved Hide resolved
src/js/sdk.ts Show resolved Hide resolved
@@ -24,6 +25,7 @@ const DEFAULT_OPTIONS: ReactNativeOptions = {
enableNativeCrashHandling: true,
enableNativeNagger: true,
autoInitializeNativeSdk: true,
enableStallTracking: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought, do we want to make it not default enabled for this release, ask users to try it out manually, and then enable by default in the release after that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bruno-garcia What do you say. Although this would be released in a beta, if we just enable it, it might work because the people who would be trying this out would be doing so just for it anyways? I'm fine with either.

@jennmueng
Copy link
Member Author

Changed so tracing needs to be enabled for the StallTracking integration to be added along with tests for it.

src/js/integrations/stalltracking.ts Outdated Show resolved Hide resolved
src/js/integrations/stalltracking.ts Outdated Show resolved Hide resolved
if (endTimestamp && isIdleTransaction) {
/*
There is different behavior regarding child spans in a normal transaction and an idle transaction. In normal transactions,
the child spans that aren't finished will be dumped, while in an idle transaction they're cancelled and finished.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just the current state of affairs but we should have a chat with the team and align one way or the other.

Doesn't mean this blocks the PR, but it's something we need to address.

@jennmueng jennmueng self-assigned this Jul 8, 2021
@jennmueng jennmueng added this to Waiting for Review in Mobile Platform Team Archived Jul 8, 2021
Copy link
Member

@bruno-garcia bruno-garcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we see this flying?

@jennmueng jennmueng merged commit e2895d7 into master Jul 16, 2021
Mobile Platform Team Archived automation moved this from Waiting for Review to Done Jul 16, 2021
@jennmueng jennmueng deleted the jenn/stalling branch July 16, 2021 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants