Fix stream hanging upon corrupted gzipped data #36

ophirorly · 2020-11-14T22:50:36Z

If the gzip header is corrupted, anyone trying to read from it will get stuck on the PipedInputStream.awaitSpace() function
This is a sample stack trace for such a case:

First we get an exception on the corrupted header:
java.util.zip.ZipException: Not in GZIP format at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:165) at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:79) at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:91) at rawhttp.core.body.encoding.GZipUncompressorOutputStream.lambda$startReader$0(GZipUncompressorOutputStream.java:54) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Then later, when trying to read the stream (via the LazyBodyReader) we get:
java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.io.PipedInputStream.awaitSpace(PipedInputStream.java:273) at java.io.PipedInputStream.receive(PipedInputStream.java:231) - locked <0x00000006d2ba5eb0> (a java.io.PipedInputStream) at java.io.PipedOutputStream.write(PipedOutputStream.java:149) at rawhttp.core.body.encoding.GZipUncompressorOutputStream.write(GZipUncompressorOutputStream.java:47) at java.io.FilterOutputStream.write(FilterOutputStream.java:97) at rawhttp.core.body.BodyConsumer$ChunkedBodyConsumer.lambda$consumeDataInto$2(BodyConsumer.java:102) at rawhttp.core.body.BodyConsumer$ChunkedBodyConsumer$$Lambda$46/1788247731.accept(Unknown Source) at rawhttp.core.body.ChunkedBodyParser.parseChunkedBody(ChunkedBodyParser.java:104) at rawhttp.core.body.BodyConsumer$ChunkedBodyConsumer.consumeDataInto(BodyConsumer.java:101) at rawhttp.core.body.BodyReader.writeDecodedTo(BodyReader.java:115) at rawhttp.core.body.BodyReader.decodeBody(BodyReader.java:200) at rawhttp.core.body.BodyReader.decodeBodyToString(BodyReader.java:212)

… gunzip it.

…rrupted.

renatoathaydes · 2020-11-17T09:03:20Z

Hi @ophirorly . Looks like a great improvement, but without a test, I can't know if this really fixes the problem. Can you write one?

I wrote all core tests in Kotlin, hope you're familiar with it? If not, I don't mind if you add a Java test.

You can look at existing tests like rawhttp.core.body.FileBodyTest and rawhttp.core.body.InputStreamChunkDecoderTest... I didn't add a test for gzip because I was just delegating that to the Java gzip stream, but you can create one new test class just for this.

ophirorly · 2020-11-21T13:44:28Z

Hi @renatoathaydes. Of course. I'll add a test when I'm able. Also - perhaps it'll also help to save the exception, and throw it in finishDecoding() so that the caller will have an indication he's got a corrupted body.
What do you think?

renatoathaydes · 2020-11-21T18:30:19Z

@ophirorly yeah, when the caller tries to get the body, it should definitely cause an Exception in this case.

2. Add tests

ophirorly · 2020-11-23T17:15:56Z

@renatoathaydes I've added code to throw an exception if the parsed body was corruped, and two tests for both scenarios.
They pass with the changes and fail without it (one of them hangs)

renatoathaydes · 2020-11-23T18:49:59Z

rawhttp-core/src/test/kotlin/rawhttp/core/body/encoding/BodyDecodingTest.kt

+            response.body.get().decodeBodyToString(StandardCharsets.UTF_8)
+        } catch(e: IOException) {}
+
+        true shouldBe true


Thanks for the effort... It helps a lot that you've written a little test, but I have to say that the tests, as they are, are really not good enough... sorry to be blunt.

We need to check how long it took for the test to give up, what the Exception type was, what the message was...

I will accept your changes but will write more tests myself... I will push my code to this PR (hopefully, if I have permission) so we can discuss later if you would like.

Thanks again for finding the bug and contributing a first solution!

Make sure caller gets the appropriate Exception on failures like this. Final test still failing.

renatoathaydes · 2020-11-23T22:14:46Z

@ophirorly I've pushed changes to the tests and then fixed the problems I found... I wanted to make sure the caller of getDecodedBody actually gets the appropriate Exception, which in this case is the Exception related to the GZIP Stream.
The second test you wrote is still failing, and I am not sure why... if you want to have a look, feel free to push a fix :) otherwise I will continue tomorrow.

renatoathaydes · 2020-11-23T22:16:26Z

rawhttp-core/src/main/java/rawhttp/core/body/BodyReader.java

-import java.nio.charset.Charset;
-import java.util.Iterator;
-import java.util.Optional;
-import java.util.OptionalLong;


I will revert import changes later! My IDE lost my import settings apparently.... I avoid * imports as I've had many bugs caused by the wrong classes being imported accidentally.

ophirorly · 2020-11-23T22:28:46Z

I think that the second test fails because you throw the WrappedException in the executor thread, which does nothing.
You need to save it and then throw it (or a new exception) where the caller would get it. I used the finishDecoding(), since it guarantees no race conditions, and the user have to call it.

Never mind. I didn't consider the retrival of the reader thread execution result.

ophirorly · 2020-11-23T23:20:25Z

@ophirorly I've pushed changes to the tests and then fixed the problems I found... I wanted to make sure the caller of getDecodedBody actually gets the appropriate Exception, which in this case is the Exception related to the GZIP Stream.
The second test you wrote is still failing, and I am not sure why... if you want to have a look, feel free to push a fix :) otherwise I will continue tomorrow.

I think it fails because the caller thread is not stuck on writing. It either already wrote all the data to the stream and is now waiting for the readerExecution to finish, or still writing. When you interrupt the thread - it catches the interrupted exception, instead of the execution result.

I'm not sure yet how to properly resolve it.

…tream

ophirorly · 2020-11-24T08:32:07Z

@renatoathaydes I've added a fix for the second issue.
The problem is that if the user thread is stuck on writing we'd want to interrupt it, but if it is waiting in finishDecoding() - we do not - we'd want it to get the runtime exception instead. I'm not sure about checking the stream state from within the reader thread, so I simply updated an error state ('caughtException') on the stream object, as in my initial fix.
Personally, I would pass the exception and use it in all cases for simplicity, but I guess opinions will vary on this issue

User of class will eventually receive the Exception.

renatoathaydes · 2020-11-24T19:55:38Z

The problem is that if the user thread is stuck on writing we'd want to interrupt it, but if it is waiting in finishDecoding()

@ophirorly right, I'd forgotten to close the stream, which would've called finishDecoding... thanks for finding that.

so I simply updated an error state ('caughtException')

The problem with that is that when you have multiple threads, updating a field like that is unsafe and prone to race conditions... But you're right: keeping a reference to the error seems to be the only way to re-throw it later... I've improved your fix a bit to use AtomicReference, and I think it's safe from race conditions now because the only way for us to get an InterruptedException at the finishDecoding method is if the reader thread interrupted the writer Thread that was waiting... and it only does that immediately after setting the reference to the error (I am surprised your fix worked because you updated a non-volatile field and immediately read it from another Thread, but the JVM seems very lenient regarding multi-threaded access to fields like this - just don't count on it :) ).

We actually needed to both keep a ref to the Error AND do as I did, interrupt the reader (the way you were doing before worked but the caller would get a misleading Pipe closed error) to stop it from hanging... I wish it were simpler, but this now seems to cover everything.

All tests are passing so I will merge as soon as I get a CI build green light.

Thanks for the contribution.

renatoathaydes · 2020-11-24T20:02:28Z

Looks like the CI is not building because this is a fork! Will merge and fix anything if necessary.

ophirorly · 2020-11-25T08:25:21Z

(I am surprised your fix worked because you updated a non-volatile field and immediately read it from another Thread, but the JVM seems very lenient regarding multi-threaded access to fields like this - just don't count on it :) ).

I think that in this scenario it's ok, because I would expect the JVM to put a memory barrier at that point (since the thread made a blocking call). However, putting a volatile is definitely better practice, and so is AtomicRef.

I also don't think there's a risk of significant race conditions - there is one reader thread that can only update the exception once, finishDecoding should also be called once, and reading the exception happens once, and it's after the reader has finished one way or another.
But again - I agree that even if I'm right - putting an AtomicReference is the better practice here.

We actually needed to both keep a ref to the Error AND do as I did, interrupt the reader (the way you were doing before worked but the caller would get a misleading Pipe closed error) to stop it from hanging... I wish it were simpler, but this now seems to cover everything.

Indeed.
Thank you for perfecting it and taking the time to explain it so throughly.

ophirorly added 2 commits November 13, 2020 01:37

If gzip stream parsing fails, stream the data as is without trying to…

0b3c69b

… gunzip it.

Fix for reading request/response bodymay hang when gzipped data is co…

b758214

…rrupted.

ophirorly changed the title ~~Skip gzip on failure~~ Fix stream hanging upon corrupted gzipped data Nov 14, 2020

ophirorly mentioned this pull request Nov 15, 2020

Reading from GZipUncompressorOutputStream can hang if the gzipped data is corrupted. #37

Closed

ophirorly added 2 commits November 16, 2020 15:34

initialize gzip stream in a different thread to avoid hanging

ae870c5

remove unneeded 'throws' decleration

b47bce4

1. Throw an exception if parsed body was corrupted

96c9df9

2. Add tests

renatoathaydes reviewed Nov 23, 2020

View reviewed changes

Improving tests for corrupted GZIP stream.

f56b012

Make sure caller gets the appropriate Exception on failures like this. Final test still failing.

renatoathaydes reviewed Nov 23, 2020

View reviewed changes

finish decoding

26d18c8

Return IOException when not blocked while handling a corrupted gzip s…

b05f7db

…tream

renatoathaydes added 2 commits November 24, 2020 20:45

Improved error logic in GZIP output stream.

c25b5f7

Don't log stacktrace.

ee578c0

User of class will eventually receive the Exception.

renatoathaydes merged commit 65c8674 into renatoathaydes:master Nov 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix stream hanging upon corrupted gzipped data #36

Fix stream hanging upon corrupted gzipped data #36

ophirorly commented Nov 14, 2020

renatoathaydes commented Nov 17, 2020

ophirorly commented Nov 21, 2020 •

edited

Loading

renatoathaydes commented Nov 21, 2020

ophirorly commented Nov 23, 2020 •

edited

Loading

renatoathaydes Nov 23, 2020

renatoathaydes commented Nov 23, 2020

renatoathaydes Nov 23, 2020

ophirorly commented Nov 23, 2020 •

edited

Loading

ophirorly commented Nov 23, 2020 •

edited

Loading

ophirorly commented Nov 24, 2020 •

edited

Loading

renatoathaydes commented Nov 24, 2020

renatoathaydes commented Nov 24, 2020

ophirorly commented Nov 25, 2020 •

edited

Loading

Fix stream hanging upon corrupted gzipped data #36

Fix stream hanging upon corrupted gzipped data #36

Conversation

ophirorly commented Nov 14, 2020

renatoathaydes commented Nov 17, 2020

ophirorly commented Nov 21, 2020 • edited Loading

renatoathaydes commented Nov 21, 2020

ophirorly commented Nov 23, 2020 • edited Loading

renatoathaydes Nov 23, 2020

Choose a reason for hiding this comment

renatoathaydes commented Nov 23, 2020

renatoathaydes Nov 23, 2020

Choose a reason for hiding this comment

ophirorly commented Nov 23, 2020 • edited Loading

ophirorly commented Nov 23, 2020 • edited Loading

ophirorly commented Nov 24, 2020 • edited Loading

renatoathaydes commented Nov 24, 2020

renatoathaydes commented Nov 24, 2020

ophirorly commented Nov 25, 2020 • edited Loading

ophirorly commented Nov 21, 2020 •

edited

Loading

ophirorly commented Nov 23, 2020 •

edited

Loading

ophirorly commented Nov 23, 2020 •

edited

Loading

ophirorly commented Nov 23, 2020 •

edited

Loading

ophirorly commented Nov 24, 2020 •

edited

Loading

ophirorly commented Nov 25, 2020 •

edited

Loading