Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of ANRv2 #2549

Merged
merged 21 commits into from
Mar 14, 2023
Merged

Initial implementation of ANRv2 #2549

merged 21 commits into from
Mar 14, 2023

Conversation

romtsn
Copy link
Member

@romtsn romtsn commented Feb 16, 2023

#skip-changelog

📜 Description

  • Introduces new pipeline for reporting historical events that should be backfilled with scope/options data that has been serialized to disk in the previous application run (Backfillable hint and BackfillingEventProcessor)
  • Introduces PersistingScopeObserver/PersistingOptionsObserver that persist scope and option values to disk and get notified whenever any of the observed data changes
  • New AnrV2Integration which wroks for API 30 and above and uses getHistoricalProcessExitReasons to report ANRs
    • Only the latest ANR being reported enriched with the serialized scope/options data from the previous run
    • All other historical ANRs are still reported, but without enriching (only static data, like device params)
  • New AnrV2EventProcessor which takes care of backfilling/filling contexts/scope data for the ANRv2
  • As the historical exits list stores values in a ring buffer and does not remove them upon reading, we have to keep track of the last reported ANR timestamp - this is done in AndroidEnvelopeCache, where we store this timestamp in a marker file under cacheDir
  • Extracted common things to ContextUtils that are used both in AnrV2EventProcessor and DefaultAndroidEventProcessor
  • Added a bunch of equals/hashCode for the protocol classes to be able to assert against them in tests
  • Made UncaughtExceptionHint public, because we were using DiskFlushNotification as a proxy for handling uncaught exceptions logic, now it's more explicit (and we also use DiskFlushNotification for ANR events)

💡 Motivation and Context

#1796

💚 How did you test it?

Automated and manually

📝 Checklist

  • I reviewed the submitted code.
  • I added tests to verify the changes.
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • I updated the docs if needed.
  • Review from the native team if needed.
  • No breaking change or entry added to the changelog.
  • No breaking change for hybrid SDKs or communicated to hybrid SDKs.

🔮 Next steps

The PR is targeting integration branch to simplify PR review process. Other PRs to follow:

  • Make sure last known TX is correctly associated with the latest ANR
  • Mark last known session as abnormal
  • Parse the thread dump and transform it into sentry threads payload

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2023

Messages
📖 Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Generated by 🚫 dangerJS against 6c51875

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2023

Performance metrics 🚀

  Plain With Sentry Diff
Startup time 347.55 ms 362.60 ms 15.05 ms
Size 1.73 MiB 2.35 MiB 633.74 KiB

Previous results on branch: feat/new-anr-impl

Startup times

Revision Plain With Sentry Diff
7a4e12e 269.74 ms 308.46 ms 38.71 ms
73f0b5d 323.22 ms 377.60 ms 54.38 ms
d504c21 363.02 ms 435.73 ms 72.71 ms
9ca7f05 368.14 ms 413.57 ms 45.43 ms
3b85ba6 363.41 ms 417.52 ms 54.11 ms

App size

Revision Plain With Sentry Diff
7a4e12e 1.73 MiB 2.35 MiB 633.85 KiB
73f0b5d 1.73 MiB 2.35 MiB 633.85 KiB
d504c21 1.73 MiB 2.35 MiB 633.84 KiB
9ca7f05 1.73 MiB 2.35 MiB 633.85 KiB
3b85ba6 1.73 MiB 2.35 MiB 633.74 KiB

@codecov
Copy link

codecov bot commented Mar 10, 2023

Codecov Report

Patch coverage: 69.67% and project coverage change: -0.28 ⚠️

Comparison is base (2078e71) 80.28% compared to head (bea3016) 80.01%.

❗ Current head bea3016 differs from pull request most recent head 73b5508. Consider uploading reports for the commit 73b5508 to get more accurate results

Additional details and impacted files
@@                Coverage Diff                @@
##             feat/anr-v2    #2549      +/-   ##
=================================================
- Coverage          80.28%   80.01%   -0.28%     
- Complexity          3990     4131     +141     
=================================================
  Files                327      339      +12     
  Lines              15017    15582     +565     
  Branches            1977     2089     +112     
=================================================
+ Hits               12057    12468     +411     
- Misses              2183     2234      +51     
- Partials             777      880     +103     
Impacted Files Coverage Δ
...y/spring/jakarta/CachedBodyHttpServletRequest.java 80.00% <ø> (ø)
...y/spring/jakarta/CachedBodyServletInputStream.java 40.00% <ø> (ø)
...sentry/spring/jakarta/RequestPayloadExtractor.java 37.50% <ø> (ø)
...a/io/sentry/spring/jakarta/SentrySpringFilter.java 73.68% <ø> (ø)
...karta/SentrySpringServletContainerInitializer.java 0.00% <ø> (ø)
...ava/io/sentry/spring/jakarta/SentryUserFilter.java 93.33% <ø> (ø)
...racing/SentrySpanClientHttpRequestInterceptor.java 0.00% <0.00%> (ø)
.../spring/jakarta/webflux/SentryRequestResolver.java 71.42% <0.00%> (ø)
...ebflux/SentryWebFilterWithThreadLocalAccessor.java 0.00% <0.00%> (ø)
sentry/src/main/java/io/sentry/HubAdapter.java 9.23% <0.00%> (ø)
... and 69 more

... and 2 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

final ApplicationNotResponding error =
new ApplicationNotResponding(message, Looper.getMainLooper().getThread());
final Mechanism mechanism = new Mechanism();
mechanism.setType("ANRv2");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should keep it as ANRv2 or change it to ANR? This way we can query on looker by mechanism I think and see who's sending new ANR events. But also, it'd be possible through the integration list.

The question here is rather: do the users care about this and want to see if the ANR has been reported using new mechanism?

@romtsn romtsn marked this pull request as ready for review March 10, 2023 09:52
@romtsn
Copy link
Member Author

romtsn commented Mar 10, 2023

@markushi re. serializing collections - I've checked, and actually all of our collections on scope are concurrent already (event Contexts class, which itself extends ConcurrentHashMap), so I think we're good here.

Copy link
Member

@markushi markushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, the earlier walkthrough definitely helped a lot when reviewing! I added a few minor comments about final keywords and one suggestion for the JSON serializer APIs.

A few more thoughts:

  1. I think we need to ensure that the same ANR won't be sent over and over again, ultimately DDoS-ing us in case something goes wrong along the whole process. I can see the marker file gets written in the AndroidEnvelopeCache after super.store(...) is called. Could it make sense to reverse the order here? Like first save the marker file and then store the envelope? If storing the envelope fails we'd loose the event, but we'd be safe from reporting the same ANR on every app start.

  2. The mechanism which ensures an ANR event is enriched with old disk data while no new data is written is A) adding the ANR integration at the beginning and B) having a single threaded executer service right?

} catch (Throwable e) {
options.getLogger().log(ERROR, "Error reading last ANR marker", e);
}
return 0L;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's technically not correct, but safe since this is way before Android was released 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah pretty much my thoughts, but I didn't want to deal with nullable values here, so should be fine I guess 🤣

sentry/src/main/java/io/sentry/SentryOptions.java Outdated Show resolved Hide resolved
sentry/src/test/java/io/sentry/SentryTest.kt Show resolved Hide resolved
@romtsn
Copy link
Member Author

romtsn commented Mar 13, 2023

The mechanism which ensures an ANR event is enriched with old disk data while no new data is written is A) adding the ANR integration at the beginning and B) having a single threaded executer service right?

Yes, that's right. Even though it's a bit brittle, I've added a super sophisticated integration test in 6c51875, but I think this one is necessary as it tests exactly this scenario end-to-end. Not sure if we can do something better for now without changing even more in the pipeline which I'd really not want to do :D

@romtsn romtsn merged commit f346b26 into feat/anr-v2 Mar 14, 2023
@romtsn romtsn deleted the feat/new-anr-impl branch March 14, 2023 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants