Improve performance of STOMP message header encoding [SPR-14901] #19467
Comments
Rossen Stoyanchev commented We could open up the |
Christoph Dreis commented Hey Rossen, thanks for your response. Let me go through your suggestions/questions:
Compared to a string with e.g. a newline:
So doing the check is of course faster when it hits, but when it doesn't it has some overhead. Additionally escaping is only 50% of the problem, I'd say. Creating the byte representation of the string is probably the even bigger allocation driver.
private LinkedHashMap<String, byte[]> cache = new LinkedHashMap<String, byte[]>(CACHE_LIMIT, 0.75f, true) {
@Override
protected boolean removeEldestEntry(Map.Entry<String, byte[]> eldest) {
return size() > CACHE_LIMIT;
}
}; That said, you need a possibility to set/tweak the cache limit. And this functionality adds the overhead of maintaining the cache. Here is another benchmark that compares my precompiled cache against the runtime/"trainable" cache:
All of that said, I think it still makes sense to allow the customization. If this is done via a protected method or like I did it is probably a matter of taste and I'd be happy to change it in whatever direction you think is best. I generally like the idea of tweaking the default behaviour, but I didn't want to make assumptions based on my use-case that might make things worse for others. While doing an additional check for the special characters might be worth it, a cache has some implications I didn't want to put on others. I hope my benchmarks show this in a reasonable manner. Let me know how we should proceed :-) Cheers, |
Rossen Stoyanchev commented What I meant by scanning is exactly what we are doing now but avoiding automatic creation and copying to a StringBuilder unless a special char is encountered. Something like this (not tested!): private static final Map<Character, String> specialChars = new HashMap<>(4);
static {
specialChars.put('\\', "\\\\");
specialChars.put(':', "\\c");
specialChars.put('\n', "\\n");
specialChars.put('\r', "\\r");
}
private String escape(String inString) {
StringBuilder sb = null;
for (int i = 0; i < inString.length(); i++) {
char c = inString.charAt(i);
String escapeValue = specialChars.get(c);
if (escapeValue != null) {
sb = getStringBuilder(sb, inString, i);
sb.append(escapeValue);
}
else if (sb != null){
sb.append(c);
}
}
return (sb != null ? sb.toString() : inString);
}
private StringBuilder getStringBuilder(StringBuilder sb, String inString, int i) {
if (sb == null) {
sb = new StringBuilder(inString.length());
sb.append(inString.substring(0, i));
}
return sb;
} I would assume the special chars are not present in a majority of cases in which case we'd typically avoid the StringBuilder instance, which was your first point. Hence my question how commonly special chars appear in your case? Either way this change should always perform as well or better. For the cache, a I didn't see anything wrong with a StompHeaderEncoder abstraction. It's more that I don't see a lot of variations. Essentially we are discussing a performance optimization, so really just trying to keep it simple. I'm not against making the decoder/encoder configurable either. Adding a setter on StompSubProtocolHandler to begin with is a straight-forward enough thing to do. Exposing a config option however is less so. This is an advanced option and by adding such options to the config it becomes slightly more complex. It's the added effect over time I'm more concerned about, not this particular change. Note that we'd also have to update the XML config to. Also, now or later, consider the same option for other places that use a StompEncoder/Decoder such as the StompBrokerRelayHandler (and shouldn't there be one place to configure a StompEncoder/Decoder from for all places that need it?) |
Christoph Dreis commented Again - thank you for the quick response! I see - I just fired your proposal against JMH. Here are the results:
Unfortunately, the additional Map.get() calls and the underlying hash creation eat up the performance benefit here - at least that's my assumption without profiling it too heavily. For the cache(s): Would you create two caches then? One for the header names and one for the values? What would you say would be the subset of commonly used values in that case? I'm having trouble to define this subset in my mind currently. What would be your suggestions here? For the StompHeaderEncoder abstraction: Agreed - the variations will be very low, I guess. Maybe therefore it doesn't justify the abstraction. For the configuration: I didn't mention that our projects use Spring-Boot. I'm therefore having a bit of trouble to understand your statement:
What would be the best approach to proceed with this now? I'm more than happy to adjust my PR, but I'm currently feeling that I might not fully understand your goals and (future) quality concerns and the impact on a possible solution. Cheers, |
Rossen Stoyanchev commented Okay the specialChars map was maybe a bit too much. We could keep the explicit character comparison: private String escape(String inString) {
StringBuilder sb = null;
for (int i = 0; i < inString.length(); i++) {
char c = inString.charAt(i);
if (c == '\\') {
sb = getStringBuilder(sb, inString, i);
sb.append("\\\\");
}
else if (c == ':') {
sb = getStringBuilder(sb, inString, i);
sb.append("\\c");
}
else if (c == '\n') {
sb = getStringBuilder(sb, inString, i);
sb.append("\\n");
}
else if (c == '\r') {
sb = getStringBuilder(sb, inString, i);
sb.append("\\r");
}
else if (sb != null){
sb.append(c);
}
}
return (sb != null ? sb.toString() : inString);
}
private StringBuilder getStringBuilder(StringBuilder sb, String inString, int i) {
if (sb == null) {
sb = new StringBuilder(inString.length());
sb.append(inString.substring(0, i));
}
return sb;
} For the caches, I guess a header name cache would be the place to start. You're right the header values probably wouldn't be ideally suited -- I was thinking of things like destinations but not enough there. What does your testContains method do besides checking for the presence of each special char? How do you then escape? For the configuration, the simplest option would be to expose setters on StompSubProcotolHandler. Then you can access the "subProtocolWebSocketHandler" bean, for example using a BeanPostProcessor, obtain sub-protocol handler from it, and use the setters. Not the cleanest of options but certainly doable. The next option would be to expose it as you have done in the config but since StompEncoder/Decoder are also used in the StompBrokerRelayMessageHandler we need to consider whether that should be separately configurable or not. Yet another option would be to expose the StompSubProtocolHandler as a bean in WebSocketMessageBrokerConfigurationSupport and have it passed into the WebMvcStompEndpointRegistry. Then Spring Boot could configure that bean through properties or even detect a bean of that type and use it instead of the default. This approach requires changes on both the Spring Framework and the Spring Boot side but it strikes a better balance. |
Christoph Dreis commented Hey,
private String escape(String inString) {
if (!inString.contains("\\") && !inString.contains(":") && !inString.contains("\n") && !inString.contains("\r")) {
return inString;
}
StringBuilder sb = new StringBuilder(inString.length());
for (int i = 0; i < inString.length(); i++) {
char c = inString.charAt(i);
if (c == '\\') {
sb.append("\\\\");
}
else if (c == ':') {
sb.append("\\c");
}
else if (c == '\n') {
sb.append("\\n");
}
else if (c == '\r') {
sb.append("\\r");
}
else {
sb.append(c);
}
}
return sb.toString();
}
I will compare this tomorrow against your new proposal and post the results. Moreover, I feel confident again that I can craft something new. Given the options you gave me, I will take a look at options 2 and 3 for a possible PR. I missed the "subProtocolWebSocketHandler" bean, so even if those options turn out to be more complex than expected I have a fallback available. Thank you. Any wishes for the header cache? A LinkedHashMap solution with a size of 16 (or 32?) like the one above should be sufficient in my opinion!? What do you think? |
Christoph Dreis commented I couldn't wait for the benchmarks - here they are :) For string "message-counter":
For string "message\n-counter"
I would say the small impact is something I could live with in the second benchmark. I still think the more common case are values without special chars and the improvement here is much higher with your new version (because of iterating only once through the string). |
Rossen Stoyanchev commented Yes the case without special chars should be the norm. For the header cache 32 sounds good. |
Christoph Dreis commented Hey, is the latest work on #19100 affecting this ticket somehow? Especially in regards to the options you suggested!? |
Rossen Stoyanchev commented Not really because we are still using the same underlying StompEncoder/Decoder, just plugging them into a newer version of Reactor Netty. That said I'm surprised I didn't set the fix version for this yet. Let me fix that! |
Rossen Stoyanchev commented Also are you planning to update the PR which appears to be as it was first submitted, i.e. before the discussion that followed? |
Christoph Dreis commented I investigated the Spring-Boot option and I think this involves some level of "indirection" that didn't feel good in my local tests. I will therefore go with option 2 and extend my initial PR to include the relay handler configuration. At least that's my current plan - unfortunately I'm without internet at home at the moment and can't work on that too much. Sorry for that. |
Christoph Dreis commented Finally had some time to work on this again. Unfortunately, even option 2 seems a bit more complicated than I thought. Adjusting the StompBrokerRelayRegistration in the same way is quite a bit different to the WebMvcEndpointRegistry flow - creating more and more questions where to put which code to don't handle both cases too differently. Long story short, I went with option 1 and just left the two setters for the encoder and decoder in StompSubProtocolHandler and did the discussed changes on the StompEncoder that should improve most of the default cases anyway. Sorry for any inconveniences. |
Christoph Dreis commented Thank you for your patience on this one :) |
Rossen Stoyanchev commented No worries, thanks for putting it all together and testing the impact. |
Christoph Dreis opened SPR-14901 and commented
Hey,
Problem
In recent loadtests I noticed quite some heap pressure coming from the encoding and escaping from STOMP headers that are sent to the client by our application. This is caused by mainly two things:
StringBuilder
andString
objects for every header and every of its valuesStringCoding.safeTrim()
every time.Overall this creates around ~3-4GB of heap pressure in a 10 minute profile for us and is the TOP 1 reason for causing allocations right now. (I'd be happy to post a screenshot on Monday since I don't have it running on my laptop currently - if you need it).
Proposed solution
I thought a bit about a possible solution and came up with the idea to allow the
StompEncoder
(andStompDecoder
) to be configured on theStompSubProtocolHandler
. In the proposed solution this is done via a newStompWebSocketCodecRegistration
object - consisting of both the encoder (and the decoder for consistency reasons). (I explicitly didn't call it a StompWebSocketCodec because it doesn't offer the actual decoding and encoding a real codec would offer.) In order to don't change too much contracts and allow a possible backport to 4.3.x I didn't create an interface forStompEncoder
(andStompDecoder
), but decided to go for encapsulating the header encoding via a new interfaceStompHeaderEncoder
. Which in the end is the sole culprit for the allocations and thereby the interesting part.With the proposed solution I am now able to specify a customized
StompEncoder
with a specialized version of aStompHeaderEncoder
. In our case I would now write an implementation that makes use of a precompiled Map of headers and their byte representation, since we know them pretty much upfront. JMH microbenchmarks show a possible uplift of factor 20 with that sort of mechanism, but one could also think about a "trainable" header encoding.Let me know what you think about the proposed solution. I'd be happy to adjust it according to your feedback.
Cheers,
Christoph
Affects: 4.3.4
Reference URL: #1236
The text was updated successfully, but these errors were encountered: