Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of StringUtils.replace() if pattern is not found [SPR-15430] #19991

Closed
spring-issuemaster opened this issue Apr 10, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@spring-issuemaster
Copy link
Collaborator

commented Apr 10, 2017

Christoph Dreis opened SPR-15430 and commented

Hey,

just noticed a small improvement for StringUtils.replace() in case the pattern that should be replaced is not found in the given String. I noticed this in our loadtests where lots of messages are send to user destinations, which are sanitized like this: StringUtils.replace(user, "/", "%2F")

Most of the time, users don't contain a slash, so there is no replacement needed. Please find the PR attached, that yields the following microbenchmark results:

MyBenchmark.testNew                                   thrpt   20  56148767,184 ± 571831,993   ops/s
MyBenchmark.testNew:·gc.alloc.rate                    thrpt   20         0,002 ±      0,003  MB/sec
MyBenchmark.testNew:·gc.alloc.rate.norm               thrpt   20        ? 10??                 B/op
MyBenchmark.testNew:·gc.count                         thrpt   20           ? 0               counts
MyBenchmark.testOld                                   thrpt   20  15643379,634 ± 404629,728   ops/s
MyBenchmark.testOld:·gc.alloc.rate                    thrpt   20      3698,512 ±     95,454  MB/sec
MyBenchmark.testOld:·gc.alloc.rate.norm               thrpt   20       248,000 ±      0,001    B/op
MyBenchmark.testOld:·gc.count                         thrpt   20       312,000               counts
MyBenchmark.testOld:·gc.time                          thrpt   20       183,000                   ms

Affects: 4.3.7

Referenced from: pull request #1384

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 11, 2017

Juergen Hoeller commented

Good point! I've addressed this as part of a larger revision, also initializing the StringBuilder capacity with the length of the input (in replace as well as deleteAny). We're enforcing the returning of the original String in our unit tests now.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 11, 2017

Christoph Dreis commented

Thanks for addressing this. I have one remark though. Setting the initial capacity for StringBuilder in replace might cause more allocations when the replacement is longer than the oldPattern. Which is the case for StringUtils.replace(user, "/", "%2F")

Benchmark                                                   Mode  Cnt         Score        Error   Units
MyBenchMark.testNew                                   thrpt   20  11015863,506 ± 197360,119   ops/s
MyBenchMark.testNew:·gc.alloc.rate                    thrpt   20      4285,222 ±     77,203  MB/sec
MyBenchMark.testNew:·gc.alloc.rate.norm               thrpt   20       408,000 ±      0,001    B/op
MyBenchMark.testNew:·gc.churn.PS_Eden_Space           thrpt   20      4275,075 ±    148,756  MB/sec
MyBenchMark.testNew:·gc.churn.PS_Eden_Space.norm      thrpt   20       406,958 ±      9,871    B/op
MyBenchMark.testNew:·gc.churn.PS_Survivor_Space       thrpt   20         0,261 ±      0,057  MB/sec
MyBenchMark.testNew:·gc.churn.PS_Survivor_Space.norm  thrpt   20         0,025 ±      0,005    B/op
MyBenchMark.testNew:·gc.count                         thrpt   20       357,000               counts
MyBenchMark.testNew:·gc.time                          thrpt   20       178,000                   ms
MyBenchMark.testOld                                   thrpt   20  11519565,769 ±  77319,784   ops/s
MyBenchMark.testOld:·gc.alloc.rate                    thrpt   20      3954,225 ±     26,527  MB/sec
MyBenchMark.testOld:·gc.alloc.rate.norm               thrpt   20       360,000 ±      0,001    B/op
MyBenchMark.testOld:·gc.churn.PS_Eden_Space           thrpt   20      3963,766 ±     86,836  MB/sec
MyBenchMark.testOld:·gc.churn.PS_Eden_Space.norm      thrpt   20       360,866 ±      7,397    B/op
MyBenchMark.testOld:·gc.churn.PS_Survivor_Space       thrpt   20         0,280 ±      0,059  MB/sec
MyBenchMark.testOld:·gc.churn.PS_Survivor_Space.norm  thrpt   20         0,025 ±      0,005    B/op
MyBenchMark.testOld:·gc.count                         thrpt   20       348,000               counts
MyBenchMark.testOld:·gc.time                          thrpt   20       182,000                   ms

Should be okay for most cases - I wanted to note it for completeness reasons though.

Cheers

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 11, 2017

Juergen Hoeller commented

Hmm, indeed, if we end up with a capacity less than 16 (the StringBuilder default), there could indeed be more allocations than before. This was primarily meant as an optimization for larger Strings where the builder's capacity always has to be grown... so strictly speaking, we could enforce a minimum of 16 to have no regression there. Probably better, for cases with a replacement larger than the original pattern, we could simply use a builder capacity of input size + 16 (analogous to the StringBuilder(String) constructor). Any objections?

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 11, 2017

Christoph Dreis commented

Actually the example above is a username larger than 16 characters. Here is one with less than 16 characters:

Benchmark                                                   Mode  Cnt         Score        Error   Units
MyBenchmark.testNew                                   thrpt   20  11689788,924 ± 298611,841   ops/s
MyBenchmark.testNew:·gc.alloc.rate                    thrpt   20      2764,230 ±     70,548  MB/sec
MyBenchmark.testNew:·gc.alloc.rate.norm               thrpt   20       248,000 ±      0,001    B/op
MyBenchmark.testNew:·gc.count                         thrpt   20       368,000               counts
MyBenchmark.testNew:·gc.time                          thrpt   20       171,000                   ms
MyBenchmark.testOld                                   thrpt   20  16474672,626 ± 142928,901   ops/s
MyBenchmark.testOld:·gc.alloc.rate                    thrpt   20      3267,337 ±     28,232  MB/sec
MyBenchmark.testOld:·gc.alloc.rate.norm               thrpt   20       208,000 ±      0,001    B/op
MyBenchmark.testOld:·gc.count                         thrpt   20       368,000               counts
MyBenchmark.testOld:·gc.time                          thrpt   20       176,000                   ms

Opting for size + 16 in case the replacement is larger than the original pattern might be the way to go, though for smaller Strings it may create more allocations than before (shown in the results below with something like new StringBuilder(newPattern.length() > oldPattern.length() ? inString.length() + 16 : inString.length()).

Benchmark                                                   Mode  Cnt         Score        Error   Units
MyBenchmark.testNew                                   thrpt   20  14444566,216 ± 111195,218   ops/s
MyBenchmark.testNew:·gc.alloc.rate                    thrpt   20      3305,505 ±     25,452  MB/sec
MyBenchmark.testNew:·gc.alloc.rate.norm               thrpt   20       240,000 ±      0,001    B/op
MyBenchmark.testNew:·gc.count                         thrpt   20       327,000               counts
MyBenchmark.testNew:·gc.time                          thrpt   20       163,000                   ms
MyBenchmark.testOld                                   thrpt   20  16699064,442 ± 469500,527   ops/s
MyBenchmark.testOld:·gc.alloc.rate                    thrpt   20      3439,320 ±     96,715  MB/sec
MyBenchmark.testOld:·gc.alloc.rate.norm               thrpt   20       216,000 ±      0,001    B/op
MyBenchmark.testOld:·gc.count                         thrpt   20       377,000               counts
MyBenchmark.testOld:·gc.time                          thrpt   20       179,000                   ms

I intentionally left this out of my initial PR, because it leaves you with the question what the best capacity for the StringBuilder is and what the most common case for users is. ;-) I personally have no strong feelings against size + 16, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.