Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of STOMP message header encoding [SPR-14901] #19467

Closed
spring-projects-issues opened this issue Nov 11, 2016 · 15 comments
Closed
Assignees
Milestone

Comments

@spring-projects-issues
Copy link
Collaborator

@spring-projects-issues spring-projects-issues commented Nov 11, 2016

Christoph Dreis opened SPR-14901 and commented

Hey,

Problem

In recent loadtests I noticed quite some heap pressure coming from the encoding and escaping from STOMP headers that are sent to the client by our application. This is caused by mainly two things:

  • Escaping creates StringBuilder and String objects for every header and every of its values
  • Serializing the escaped header (and its values) requires to copy the bytes coming from StringCoding.safeTrim() every time.

Overall this creates around ~3-4GB of heap pressure in a 10 minute profile for us and is the TOP 1 reason for causing allocations right now. (I'd be happy to post a screenshot on Monday since I don't have it running on my laptop currently - if you need it).

Proposed solution

I thought a bit about a possible solution and came up with the idea to allow the StompEncoder (and StompDecoder) to be configured on the StompSubProtocolHandler. In the proposed solution this is done via a new StompWebSocketCodecRegistration object - consisting of both the encoder (and the decoder for consistency reasons). (I explicitly didn't call it a StompWebSocketCodec because it doesn't offer the actual decoding and encoding a real codec would offer.) In order to don't change too much contracts and allow a possible backport to 4.3.x I didn't create an interface for StompEncoder (and StompDecoder), but decided to go for encapsulating the header encoding via a new interface StompHeaderEncoder. Which in the end is the sole culprit for the allocations and thereby the interesting part.

With the proposed solution I am now able to specify a customized StompEncoder with a specialized version of a StompHeaderEncoder. In our case I would now write an implementation that makes use of a precompiled Map of headers and their byte representation, since we know them pretty much upfront. JMH microbenchmarks show a possible uplift of factor 20 with that sort of mechanism, but one could also think about a "trainable" header encoding.

Let me know what you think about the proposed solution. I'd be happy to adjust it according to your feedback.

Cheers,
Christoph


Affects: 4.3.4

Reference URL: #1236

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 14, 2016

Rossen Stoyanchev commented

We could open up the encodeHeaderString method of StompEncoder to be protected. That said before we go in that direction, I'm wondering if we can optimize the StompEncoder out of the box? In your case is it often the case the special characters are not present and would a scan to avoid creating a StringBuilder help? We could maybe cache encoded header names. What else did you have in mind with "trainable"?

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 15, 2016

Christoph Dreis commented

Hey Rossen,

thanks for your response. Let me go through your suggestions/questions:

We could open up the encodeHeaderString method of StompEncoder to be protected.
I have no strong feelings against that. I just find it a bit confusing to look at protected methods that have no obvious reason in the framework itself - e.g. no override in the framework itself. I found the extraction of the interface StompHeaderEncoder more expressive in that regards. But again - this is probably a matter of taste.

That said before we go in that direction, I'm wondering if we can optimize the StompEncoder out of the box? In your case is it often the case the special characters are not present and would a scan to avoid creating a StringBuilder help?
I'm not quite sure what you mean with "scan", but I assume you simply mean checking for the special characters with contains(). I did some JMH benchmarks comparing the current behaviour against the additional scan on top of escape with strings that have no special character:

Name Mode Count Score Error Unit
MyBenchmark.testNormal thrpt 20 3396526,547 ± 37841,739 ops/s
MyBenchmark.testPrescan thrpt 20 4915120,974 ± 46960,430 ops/s

Compared to a string with e.g. a newline:

Name Mode Count Score Error Unit
MyBenchmark.testNormal thrpt 20 3639070,889 ± 57429,971 ops/s
MyBenchmark.testPrescan thrpt 20 3119045,855 ± 35128,002 ops/s

So doing the check is of course faster when it hits, but when it doesn't it has some overhead. Additionally escaping is only 50% of the problem, I'd say. Creating the byte representation of the string is probably the even bigger allocation driver.

We could maybe cache encoded header names. What else did you have in mind with "trainable"?
"Trainable" in comparison to the precompiled solution. I thought about that at first, but there are some implications to that. Maintaining a cache during runtime means some sort of limit - otherwise you could run in the possibility of an infinite growing cache. E.g. when you have some sort of message counter like we do. So you need something like that I suppose:

private LinkedHashMap<String, byte[]> cache = new LinkedHashMap<String, byte[]>(CACHE_LIMIT, 0.75f, true) {
     @Override
     protected boolean removeEldestEntry(Map.Entry<String, byte[]> eldest) {
          return size() > CACHE_LIMIT;
     }
};

That said, you need a possibility to set/tweak the cache limit. And this functionality adds the overhead of maintaining the cache. Here is another benchmark that compares my precompiled cache against the runtime/"trainable" cache:

Name Mode Count Score Error Unit
MyBenchmark.testRuntimeCache thrpt 20 2610029,315 ± 45693,279 ops/s
MyBenchmark.testPrecompiledCache thrpt 20 2902212,715 ± 32059,696 ops/s

All of that said, I think it still makes sense to allow the customization. If this is done via a protected method or like I did it is probably a matter of taste and I'd be happy to change it in whatever direction you think is best.

I generally like the idea of tweaking the default behaviour, but I didn't want to make assumptions based on my use-case that might make things worse for others. While doing an additional check for the special characters might be worth it, a cache has some implications I didn't want to put on others. I hope my benchmarks show this in a reasonable manner.

Let me know how we should proceed :-)

Cheers,
Christoph

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 15, 2016

Rossen Stoyanchev commented

What I meant by scanning is exactly what we are doing now but avoiding automatic creation and copying to a StringBuilder unless a special char is encountered.

Something like this (not tested!):

	private static final Map<Character, String> specialChars = new HashMap<>(4);
	
	static {
		specialChars.put('\\', "\\\\");
		specialChars.put(':', "\\c");
		specialChars.put('\n', "\\n");
		specialChars.put('\r', "\\r");
	}

	private String escape(String inString) {
		StringBuilder sb = null;
		for (int i = 0; i < inString.length(); i++) {
			char c = inString.charAt(i);
			String escapeValue = specialChars.get(c);
			if (escapeValue != null) {
				sb = getStringBuilder(sb, inString, i);
				sb.append(escapeValue);
			}
			else if (sb != null){
				sb.append(c);
			}
		}
		return (sb != null ? sb.toString() : inString);
	}

	private StringBuilder getStringBuilder(StringBuilder sb, String inString, int i) {
		if (sb == null) {
			sb = new StringBuilder(inString.length());
			sb.append(inString.substring(0, i));
		}
		return sb;
	}

I would assume the special chars are not present in a majority of cases in which case we'd typically avoid the StringBuilder instance, which was your first point. Hence my question how commonly special chars appear in your case? Either way this change should always perform as well or better.

For the cache, a Map<String, byte[]> for header names should be very effective since header names are very likely to be a finite, and reasonably small set (and yes it would need to be capped). A cache of header values is going to be more dynamic but still caching some subset of commonly used values may be useful.

I didn't see anything wrong with a StompHeaderEncoder abstraction. It's more that I don't see a lot of variations. Essentially we are discussing a performance optimization, so really just trying to keep it simple.

I'm not against making the decoder/encoder configurable either. Adding a setter on StompSubProtocolHandler to begin with is a straight-forward enough thing to do. Exposing a config option however is less so. This is an advanced option and by adding such options to the config it becomes slightly more complex. It's the added effect over time I'm more concerned about, not this particular change. Note that we'd also have to update the XML config to. Also, now or later, consider the same option for other places that use a StompEncoder/Decoder such as the StompBrokerRelayHandler (and shouldn't there be one place to configure a StompEncoder/Decoder from for all places that need it?)

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 15, 2016

Christoph Dreis commented

Again - thank you for the quick response!

I see - I just fired your proposal against JMH. Here are the results:

Name Mode Cnt Score Error Units
MyBenchmark.testRossen thrpt 20 3616851,998 ± 20387,389 ops/s
MyBenchmark.testContains thrpt 20 4694047,124 ± 274808,158 ops/s
MyBenchmark.testNormal thrpt 20 3342870,775 ± 35367,936 ops/s

Unfortunately, the additional Map.get() calls and the underlying hash creation eat up the performance benefit here - at least that's my assumption without profiling it too heavily.

For the cache(s): Would you create two caches then? One for the header names and one for the values? What would you say would be the subset of commonly used values in that case? I'm having trouble to define this subset in my mind currently. What would be your suggestions here?

For the StompHeaderEncoder abstraction: Agreed - the variations will be very low, I guess. Maybe therefore it doesn't justify the abstraction.

For the configuration: I didn't mention that our projects use Spring-Boot. I'm therefore having a bit of trouble to understand your statement:

Exposing a config option however is less so (straight-forward)
Maybe, I'm missing something but I currently see no way to set the StompEncoder on the StompSubProtocolHandler in Spring-Boot without giving an option in WebMvcStompEndpointRegistry to do so - at least not without exposing the protocol handler itself (which I find more problematic, don't you!?). Though I see and share your concerns that this increases complexity slightly.

What would be the best approach to proceed with this now? I'm more than happy to adjust my PR, but I'm currently feeling that I might not fully understand your goals and (future) quality concerns and the impact on a possible solution.

Cheers,
Christoph

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 15, 2016

Rossen Stoyanchev commented

Okay the specialChars map was maybe a bit too much. We could keep the explicit character comparison:

	private String escape(String inString) {
		StringBuilder sb = null;
		for (int i = 0; i < inString.length(); i++) {
			char c = inString.charAt(i);
			if (c == '\\') {
				sb = getStringBuilder(sb, inString, i);
				sb.append("\\\\");
			}
			else if (c == ':') {
				sb = getStringBuilder(sb, inString, i);
				sb.append("\\c");
			}
			else if (c == '\n') {
				sb = getStringBuilder(sb, inString, i);
				sb.append("\\n");
			}
			else if (c == '\r') {
				sb = getStringBuilder(sb, inString, i);
				sb.append("\\r");
			}
			else if (sb != null){
				sb.append(c);
			}
		}
		return (sb != null ? sb.toString() : inString);
	}

	private StringBuilder getStringBuilder(StringBuilder sb, String inString, int i) {
		if (sb == null) {
			sb = new StringBuilder(inString.length());
			sb.append(inString.substring(0, i));
		}
		return sb;
	}

For the caches, I guess a header name cache would be the place to start. You're right the header values probably wouldn't be ideally suited -- I was thinking of things like destinations but not enough there.

What does your testContains method do besides checking for the presence of each special char? How do you then escape?

For the configuration, the simplest option would be to expose setters on StompSubProcotolHandler. Then you can access the "subProtocolWebSocketHandler" bean, for example using a BeanPostProcessor, obtain sub-protocol handler from it, and use the setters. Not the cleanest of options but certainly doable.

The next option would be to expose it as you have done in the config but since StompEncoder/Decoder are also used in the StompBrokerRelayMessageHandler we need to consider whether that should be separately configurable or not.

Yet another option would be to expose the StompSubProtocolHandler as a bean in WebSocketMessageBrokerConfigurationSupport and have it passed into the WebMvcStompEndpointRegistry. Then Spring Boot could configure that bean through properties or even detect a bean of that type and use it instead of the default. This approach requires changes on both the Spring Framework and the Spring Boot side but it strikes a better balance.

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 15, 2016

Christoph Dreis commented

Hey,

testContains is simply checking for the chars and doing the normal escape like it is now:

	private String escape(String inString) {
		if (!inString.contains("\\") && !inString.contains(":") && !inString.contains("\n") && !inString.contains("\r")) {
			return inString;
		}
		StringBuilder sb = new StringBuilder(inString.length());
		for (int i = 0; i < inString.length(); i++) {
			char c = inString.charAt(i);
			if (c == '\\') {
				sb.append("\\\\");
			}
			else if (c == ':') {
				sb.append("\\c");
			}
			else if (c == '\n') {
				sb.append("\\n");
			}
			else if (c == '\r') {
				sb.append("\\r");
			}
			else {
				sb.append(c);
			}
		}
		return sb.toString();
	}

I will compare this tomorrow against your new proposal and post the results. Moreover, I feel confident again that I can craft something new. Given the options you gave me, I will take a look at options 2 and 3 for a possible PR. I missed the "subProtocolWebSocketHandler" bean, so even if those options turn out to be more complex than expected I have a fallback available. Thank you.

Any wishes for the header cache? A LinkedHashMap solution with a size of 16 (or 32?) like the one above should be sufficient in my opinion!? What do you think?

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 15, 2016

Christoph Dreis commented

I couldn't wait for the benchmarks - here they are :)

For string "message-counter":

Benchmark Mode Cnt Score Error Units
MyBenchmark.testContains thrpt 20 9929309,961 ± 346618,869 ops/s
MyBenchmark.testNormal thrpt 20 7117740,952 ± 102970,861 ops/s
MyBenchmark.testRossen thrpt 20 13163373,422 ± 538732,859 ops/s

For string "message\n-counter"

Benchmark Mode Cnt Score Error Units
MyBenchmark.testContains thrpt 20 4576119,065 ± 77279,878 ops/s
MyBenchmark.testNormal thrpt 20 5139908,179 ± 54456,206 ops/s
MyBenchmark.testRossen thrpt 20 4938472,416 ± 41383,504 ops/s

I would say the small impact is something I could live with in the second benchmark. I still think the more common case are values without special chars and the improvement here is much higher with your new version (because of iterating only once through the string).

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Nov 16, 2016

Rossen Stoyanchev commented

Yes the case without special chars should be the norm. For the header cache 32 sounds good.

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Dec 1, 2016

Christoph Dreis commented

Hey, is the latest work on #19100 affecting this ticket somehow? Especially in regards to the options you suggested!?

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Dec 1, 2016

Rossen Stoyanchev commented

Not really because we are still using the same underlying StompEncoder/Decoder, just plugging them into a newer version of Reactor Netty. That said I'm surprised I didn't set the fix version for this yet. Let me fix that!

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Dec 1, 2016

Rossen Stoyanchev commented

Also are you planning to update the PR which appears to be as it was first submitted, i.e. before the discussion that followed?

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Dec 1, 2016

Christoph Dreis commented

I investigated the Spring-Boot option and I think this involves some level of "indirection" that didn't feel good in my local tests. I will therefore go with option 2 and extend my initial PR to include the relay handler configuration. At least that's my current plan - unfortunately I'm without internet at home at the moment and can't work on that too much. Sorry for that.

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Dec 3, 2016

Christoph Dreis commented

Finally had some time to work on this again. Unfortunately, even option 2 seems a bit more complicated than I thought. Adjusting the StompBrokerRelayRegistration in the same way is quite a bit different to the WebMvcEndpointRegistry flow - creating more and more questions where to put which code to don't handle both cases too differently. Long story short, I went with option 1 and just left the two setters for the encoder and decoder in StompSubProtocolHandler and did the discussed changes on the StompEncoder that should improve most of the default cases anyway.

Sorry for any inconveniences.

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Dec 8, 2016

Christoph Dreis commented

Thank you for your patience on this one :)

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Dec 8, 2016

Rossen Stoyanchev commented

No worries, thanks for putting it all together and testing the impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.