Skip to content

Conversation

@baywet
Copy link
Member

@baywet baywet commented Dec 1, 2020

related #90
improves performance by using pre-compiled regex

@baywet baywet self-assigned this Dec 1, 2020
@baywet baywet added this to the 2.0.0 milestone Dec 1, 2020
@baywet baywet merged commit 862acfd into dev Dec 1, 2020
@baywet baywet deleted the feature/regex-pre-compile branch December 1, 2020 19:41
Comment on lines +119 to +128
private Matcher protocolAndHostReplacementMatcher; // not static to avoid multithreading issues as the object is mutable
private JsonObject getBatchRequestObjectFromRequestStep(final MSBatchRequestStep batchRequestStep) {
final JsonObject contentmap = new JsonObject();
contentmap.add("id", new JsonPrimitive(batchRequestStep.getRequestId()));

final String url = batchRequestStep.getRequest().url().toString()
.replaceAll("(?i)^http[s]?:\\/\\/graph\\.microsoft\\.com\\/(?>v1\\.0|beta)\\/?", ""); // (?i) case insensitive
if(protocolAndHostReplacementMatcher == null) {
protocolAndHostReplacementMatcher = protocolAndHostReplacementPattern.matcher(batchRequestStep.getRequest().url().toString());
} else {
protocolAndHostReplacementMatcher = protocolAndHostReplacementMatcher.reset(batchRequestStep.getRequest().url().toString());
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't safe to do since getBatchRequestObjectFromRequestStep is called by getBatchRequestContent, which is a public method, and therefore can be called concurrently by multiple threads. You can do the following:

  1. synchronize the method
  2. use a thread local Matcher
  3. just do the Pattern.compile as a static and allocate a new Matcher each time in the loop
  4. pass in a a Matcher object from getBatchRequestContent that you initialize outside of the for loop

I would do 3 or 4 personally. I don't think allocating a matcher object is very expensive it's really the regex compilation that you want to avoid doing over and over.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm saying this after it's been merged... I've been very busy at work the past few weeks and haven't been able to watch github like I normally do during the day.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I should have waited for your review before merging since you made the original comment, my bad.
I didn't know that java could do multithreading on a simple for loop. Or are you saying that callers could use the class in a multi-threaded context?
I've submitted #93 for you to review, according to this the change would represent a 50% performance drop. But that's already on 4 times less replaceAll and a pre-compilation that represents (with a big extrapolation) a 1400% improvement over the original code.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I meant the caller could use the class in a multi-threaded context. If there was a guarantee that getBatchRequestContent could never be called simultaneously it would be safe. Your new change looked good :)

If this were a very hot loop it'd probably make sense to go with option 3 (from my previous comment) but this really shouldn't be called that often where iit will really matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants