Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky decode #3115

Merged
merged 2 commits into from Jan 27, 2020
Merged

Fix flaky decode #3115

merged 2 commits into from Jan 27, 2020

Conversation

mgibowski
Copy link
Contributor

@mgibowski mgibowski commented Jan 25, 2020

Fixes #2956

Test was failing when the generated string contained UTF-8 BOM character (that's why the minimum example looked like an empty String).

Last year fs2 started ignoring UTF-8 BOM characters:
typelevel/fs2#1484

So flakiness of this test is a successful discovery of the difference between the behavior of fs2 and http4s decoders.

Following the example of fs2, I made changes to the decode function dropping BOM characters and added an example test to verify that.

@rossabaker rossabaker added the testing label Jan 25, 2020
Copy link
Member

@rossabaker rossabaker left a comment

Great find! I've been meaning to clean this up and offer it to fs2, because it's not HTTP, but not gotten around to it. But this is a good step forward.

@@ -24,7 +26,8 @@ package object util {
else Pull.output1(outputString).as(None)
case Some((chunk, stream)) =>
if (chunk.nonEmpty) {
val bytes = chunk.toArray
val chunkWithoutBom = skipByteOrderMark(chunk)
Copy link
Member

@rossabaker rossabaker Jan 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably fast enough as is, but do we only need to check this on the first chunk? And is it incorrect if we drop it in the middle of a stream?

Copy link
Contributor Author

@mgibowski mgibowski Jan 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, checking only the first chunk is enough.

According to wikipedia:

its only use in UTF-8 is to signal at the start that the text stream is encoded in UTF-8, or that it was converted to UTF-8 from a stream that contained an optional BOM.

So in both implementations (here and in fs2) if UTF-8 BOM is not at the very beginning of the stream - it will not be dropped and will remain in the result.

So we could optimize it possibly introducing some var to mark the first chunk... But hopefully that is not necessary.

@@ -55,5 +55,11 @@ class DecodeSpec extends Http4sSpec {
decoded must_== expected
}
}

"drop Byte Order Mark" in {
Copy link
Member

@rossabaker rossabaker Jan 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably exceedingly rare, but what if the BOM spans chunks?

Copy link
Contributor Author

@mgibowski mgibowski Jan 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given, that we exclude cases of it appearing in the middle of the stream, I don't think it would ever happen as we use unconsN. It might be more likely in fs2 implementation, where they use uncons1.

Copy link
Contributor Author

@mgibowski mgibowski Jan 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having written this it became clear to me that it is actually unnecessary to check chunk.size >= 3.
I took it from fs2, but because of unconsN it is redundant in our implementation.

@rossabaker rossabaker added bug and removed testing labels Jan 26, 2020
@rossabaker
Copy link
Member

@rossabaker rossabaker commented Jan 26, 2020

As far as I know, this isn't a bug that has bitten any users in practice, but the change to the code goes beyond the Testing label.

Copy link
Member

@ChristopherDavenport ChristopherDavenport left a comment

Agree we should have A rollover for across chunks

@rossabaker
Copy link
Member

@rossabaker rossabaker commented Jan 27, 2020

Do we want to merge this as is and follow up with another PR for the corner case of the corner case?

@ChristopherDavenport
Copy link
Member

@ChristopherDavenport ChristopherDavenport commented Jan 27, 2020

Yup.

@rossabaker rossabaker merged commit 50d3d32 into http4s:master Jan 27, 2020
2 checks passed
rossabaker added a commit to rossabaker/http4s that referenced this issue Jan 27, 2020
rossabaker pushed a commit to rossabaker/http4s that referenced this issue Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants