Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize DuplicateEventDetectionEventProcessor performance. #1247

Merged
merged 10 commits into from
Feb 18, 2021

Conversation

maciejwalkowiak
Copy link
Contributor

📜 Description

Optimize potential DuplicateEventDetectionEventProcessor performance issue by changing how objects are stored from WeakHashMap to ConcurrentLinkedQueue.

💡 Motivation and Context

Using WeakHashMap to store objects opens up a possibility that the map will grow to thousands of entries and causes performance loss when searching for stored exception. This may happen either if outside of the SDK code references to exceptions are kept or if GC does not run for long time.

Changing the implementation to ConcurrentLinkedQueue gives the event processor control over how many exceptions are stored in the memory.

💚 How did you test it?

Unit tests.

📝 Checklist

  • I reviewed the submitted code
  • I added tests to verify the changes
  • I updated the docs if needed
  • No breaking changes

Using WeakHashMap to store objects opens up a possibility that the map will grow to thousands of entries and causes performance loss when searching for stored exception. This may happen either if outside of the SDK code references to exceptions are kept or if GC does not run for long time.

Changing the implementation to `ConcurrentLinkedDeque` gives the event processor control over how many exceptions are stored in the memory.
}

@Override
public SentryEvent process(final @NotNull SentryEvent event, final @Nullable Object hint) {
final Throwable throwable = event.getOriginThrowable();
if (throwable != null) {
if (capturedObjects.containsKey(throwable)
if (capturedObjects.contains(throwable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I read this allCauses, doesn't seem optimal.

Even with this refactor, if exceptions have many causes we'll be allocating new lists and running this thing for each captureException, and doing the lookup for each one of those. The 100 limit is multiplied by the number of inner exceptions.

Do we even need to allocate a list there? That while could be inlined here and the contains could be checked on each iteration.

And with this refactor, we'll build up a list of 100 and always keep them around, even long after the exception was collected. This means for a long running process we'll always lookup on the full map. Still an improvement in case a service is throwing a ton of exceptions and for some reason it doesn't run GC because it has TB of RAM but does sound suboptimal to other scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand. allCauses is executed on the exception that is being captured now. We are not looking into causes of all previously captured exceptions.

And with this refactor, we'll build up a list of 100 and always keep them around, even long after the exception was collected. This means for a long running process we'll always lookup on the full map. Still an improvement in case a service is throwing a ton of exceptions and for some reason it doesn't run GC because it has TB of RAM but does sound suboptimal to other scenarios.

That's true. Do you have a suggestion that would work in all scenarios? We can also store exceptions along with the time they were captured, and once DuplicateEventProcessor while iterating for checking for equality, check also if it's not older than X, if so - remove.

@@ -6,6 +6,7 @@
* Enchancement: Support @SentrySpan and @SentryTransaction on classes and interfaces. (#1243)
* Enchancement: Do not serialize empty collections and maps (#1245)
* Ref: Simplify RestTemplate instrumentation (#1246)
* Ref: Optimize DuplicateEventDetectionEventProcessor performance (#1247).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also go under the breaking change section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's changing the algorithm that sits under the hood. I am not sure why this would be qualified as a breaking change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its a behavior change, there's a bufferSize now and there is the possibility to report the same exception twice, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlikely, only if applications sends hundreds/thousands errors per second. Buffer size is hidden from the user. I can add it to breaking change if you like but I don't believe we should bump a version because of this.

@codecov-io
Copy link

codecov-io commented Feb 16, 2021

Codecov Report

Merging #1247 (4d21932) into main (9003653) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main    #1247      +/-   ##
============================================
+ Coverage     74.91%   74.95%   +0.04%     
- Complexity     1743     1748       +5     
============================================
  Files           183      183              
  Lines          6118     6130      +12     
  Branches        609      610       +1     
============================================
+ Hits           4583     4595      +12     
  Misses         1256     1256              
  Partials        279      279              
Impacted Files Coverage Δ Complexity Δ
.../sentry/DuplicateEventDetectionEventProcessor.java 100.00% <100.00%> (ø) 11.00 <6.00> (+1.00)
sentry/src/main/java/io/sentry/SentryOptions.java 85.06% <100.00%> (+0.42%) 119.00 <3.00> (+4.00)
sentry/src/main/java/io/sentry/Scope.java 92.82% <0.00%> (ø) 65.00% <0.00%> (ø%)
sentry/src/main/java/io/sentry/Attachment.java 100.00% <0.00%> (ø) 12.00% <0.00%> (ø%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9003653...4d21932. Read the comment docs.


/** Deduplicates events containing throwable that has been already processed. */
public final class DuplicateEventDetectionEventProcessor implements EventProcessor {
private final WeakHashMap<Throwable, Object> capturedObjects = new WeakHashMap<>();
private final ConcurrentLinkedQueue<Throwable> capturedObjects = new ConcurrentLinkedQueue<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not confortable with this to be honest. Can we discuss this before going ahead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure sure, I was waiting for your blessing anyway :-) We must change the WeakHashMap though (https://kdowbecki.github.io/WeakHashMap-is-not-thread-safe/ - this likely contributed to our issues). The option could be to synchronize WeakHashMap but this comes with drawbacks too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no thread safe WeakHashMap in Java standard library, but perhaps we can include this dependency? https://github.com/raphw/weak-lock-free

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or: considering that Deduplication is neccessary only with Spring + logging framework integration, perhaps we can move this event processor to Spring module and use ConcurrentReferenceHashMap from Spring Framework. This would solve the problem assuming that only thread safety is an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this on a call and decided to go with @maciejwalkowiak's suggestion of using Collections.synchronizedMap.

@maciejwalkowiak
Copy link
Contributor Author

WeakHashMap is not thread safe and can end up in endless loop. We've discussed with @bruno-garcia to synchronize map. This should make the performance issue go away.

@maciejwalkowiak
Copy link
Contributor Author

@marandaneto take a look please at the recent changes. What do you think about this implementation?

Comment on lines +14 to +18
val options = SentryOptions().apply {
if (enableDeduplication != null) {
this.setEnableDeduplication(enableDeduplication)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could init SentryOptions outside of getSut but still inside of Fixture and call options setters directly on each test, I think that's the way we've been doing so far.
getSut only takes params if the Sut (eg DuplicateEventDetectionEventProcessor) itself requires those params

@marandaneto
Copy link
Contributor

marandaneto commented Feb 18, 2021

@marandaneto take a look please at the recent changes. What do you think about this implementation?

looks good to me, left a comment about the testing, its a nit though, feel free to ignore.
update the PR template please, its outdated now.
Also, let's add enable-deduplication to the docs

@maciejwalkowiak maciejwalkowiak merged commit cf436a0 into main Feb 18, 2021
@maciejwalkowiak maciejwalkowiak deleted the optimize-deduplication-event-processor branch February 18, 2021 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants