Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ShallowEtagHeaderFilter should use a more efficiently dynamically resizing buffer than ResizableByteArrayOutputStream [SPR-12081] #16697

Closed
spring-issuemaster opened this issue Aug 14, 2014 · 5 comments

Comments

Projects
None yet
2 participants
@spring-issuemaster
Copy link
Collaborator

commented Aug 14, 2014

Craig opened SPR-12081 and commented

ShallowEtagHeaderFilter buffers the response in ResizableByteArrayOutputStream. When it needs to grows the response, it uses ResizableByteArrayOutputStream.resize(int):
https://github.com/spring-projects/spring-framework/blob/v4.0.6.RELEASE/spring-web/src/main/java/org/springframework/web/filter/ShallowEtagHeaderFilter.java#L245
https://github.com/spring-projects/spring-framework/blob/v4.0.6.RELEASE/spring-core/src/main/java/org/springframework/util/ResizableByteArrayOutputStream.java#L64

ResizableByteArrayOutputStream then creates a new buffer, copies the old buffer into the new buffer, then (implicitly) releases the old buffer.

It would be more efficient to use a linked list of buffers. The resize operation would create a new empty buffer and add it to the linked list of buffers. This approach results in less copying so less overall memory use (reducing the frequently of problems such as #15482 when due to buffer resizing).

For an example of this approach in use today, see Grails' StreamCharBuffer: https://github.com/grails/grails-core/blob/355a031bfacbdc94c60b4a8fe4131a500c8833cb/grails-encoder/src/main/groovy/org/grails/buffer/StreamCharBuffer.java Note that this approach is also in use in MyFaces, see https://issues.apache.org/jira/browse/MYFACES-3450


Affects: 4.0.6

Issue Links:

  • #12919 ShallowEtagHeaderFilter should make use of specified content length
@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 26, 2014

Craig commented

#653

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 4, 2014

Juergen Hoeller commented

I suppose this is only really inefficient when the content length has not been specified? Since with a specified content length, we'll use an appropriately sized buffer in the first place...

I'm just wondering how important this is for the 4.1.x line. We'll start 4.2 work pretty soon and this kind of rearchitecting in the details would be a fine fit there from my perspective.

Juergen

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 6, 2014

Craig commented

Agreed - this is only important when the content length is not specified. However, for most responses, the content length isn't known ahead of time. I think that only when serving static content is the length known, for typical dynamic responses (gsp, jsp, velocity, freemarker, raw response.write, etc) there's no way to know so a lot of resizing happens. These types of responses are also the least cacheable by caching proxies so they're the most likely to be processed by the application server making this issue that much more important.

I'd love to have the non-API change parts made for 4.1.x. But if that's not possible, I'll still be more than happy to see the improvement made for 4.2.x. :-)

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 9, 2014

Brian Clozel commented

Hi Craig

This is something we can work on, for sure.
I've written a quick JMH benchmark to check what kind of improvement we could get here.

Initial results show that the current implementation performs equally or better that the proposal.

For 25Kb responses

Benchmark                                                      Mode  Samples       Score      Error  Units
o.s.s.ResponseWrapperBenchmark.byteArrayWrapperFixed          thrpt        5  182496,538 ± 4991,415  ops/s
o.s.s.ResponseWrapperBenchmark.byteArrayWrapperUnknown        thrpt        5  181891,086 ± 4292,277  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperFixed      thrpt        5  186418,003 ± 2680,324  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperUnknown    thrpt        5  121349,954 ± 3360,872  ops/s

For 100Kb responses

Benchmark                                                      Mode  Samples      Score     Error  Units
o.s.s.ResponseWrapperBenchmark.byteArrayWrapperFixed          thrpt        5  44293,692 ± 437,824  ops/s
o.s.s.ResponseWrapperBenchmark.byteArrayWrapperUnknown        thrpt        5  44691,418 ± 651,204  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperFixed      thrpt        5  44758,725 ± 488,173  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperUnknown    thrpt        5  28252,114 ± 404,786  ops/s

This benchmark (like many of those) may be flawed, so don't hesitate to point issues there.
Note that this benchmark measures throughput, but not memory use.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 16, 2014

Craig commented

Brian Clozel sorry for the delayed response - it took me a while to figure out why your test got such different results than what I expected. Turns out it's a rather small thing that makes a measurable difference.

In your test, you do:

bh.consume(baos.toByteArray());

which causes FastByteArrayOutputStream to do a "safe" byte array. It creates a new byte array, copies all the buffered data into it, releases all the previously buffered data, uses the newly created by array as the new buffer, then copies the buffer and returns it. This copying then returning the buffer is the slow part - toByteArrayUnsafe() returns the internal buffer instead of copying it. If you use baos.toByteArrayUnsafe() you'll get different results with FastByteArrayOutputStream beating ResizableByteArrayOutputStream with a fixed size and roughly tying for the unknown size.

In my pull request, toByteArray() is only used in the first commit in order to be safe (we can't rely on classes that extend ShallowEtagHeaderFilter and override isEligibleForEtag or generateETagHeaderValue to not modify the byte[]). So the lower performance your jmh test highlights would be present if only the first commit is merged.

However, the second commit eliminates this problem. toByteArray() is never called; the OutputStream is read as a stream (instead of a byte[]). So it's both safe and fast.

Finally, I think that it's much more realistic for little bits of data to be written many times rather than one bit of data written once. So I added these 2 tests:

@Benchmark
public void byteArrayWrapperUnknown100times(ResponseData data, Blackhole bh) throws IOException {
	ResizableByteArrayOutputStream baos = new ResizableByteArrayOutputStream(1024);
	for(int i=0; i < 100; i++){
		baos.write(data.bytes);
	}
	bh.consume(baos.toByteArray());
}

@Benchmark
public void fastByteArrayWrapperUnknown100times(ResponseData data, Blackhole bh) throws IOException {
	FastByteArrayOutputStream baos = new FastByteArrayOutputStream(1024);
	for(int i=0; i < 100; i++){
		baos.write(data.bytes);
	}
	bh.consume(baos.toByteArray());
}

@Benchmark
public void fastByteArrayWrapperUnknown100timesUnsafe(ResponseData data, Blackhole bh) throws IOException {
	FastByteArrayOutputStream baos = new FastByteArrayOutputStream(1024);
	for(int i=0; i < 100; i++){
		baos.write(data.bytes);
	}
	bh.consume(baos.toByteArrayUnsafe());
}

Using these tests, FastByteArrayOutputStream beats ResizableByteArrayOutputStream by ~50%. It even wins when using toByteArray() instead of toByteArrayUnsafe().

Benchmark                                                                    Mode  Samples       Score       Error  Units
o.s.s.ResponseWrapperBenchmark.byteArrayWrapperFixed                        thrpt        5  153125.521 ▒  8547.369  ops/s
o.s.s.ResponseWrapperBenchmark.byteArrayWrapperUnknown                      thrpt        5  152464.681 ▒  7378.370  ops/s
o.s.s.ResponseWrapperBenchmark.byteArrayWrapperUnknown100times              thrpt        5     509.781 ▒    37.328  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperFixed                    thrpt        5  146966.577 ▒ 18102.712  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperFixedUnsafe              thrpt        5  293713.313 ▒ 26438.443  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperUnknown                  thrpt        5   91566.986 ▒  5903.223  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperUnknown100times          thrpt        5     531.457 ▒    78.856  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperUnknown100timesUnsafe    thrpt        5     771.595 ▒    98.998  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperUnknownUnsafe            thrpt        5  130245.591 ▒ 19222.186  ops/s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.