Fix flaky decode #3115

mgibowski · 2020-01-25T18:43:22Z

Test was failing when the generated string contained UTF-8 BOM character (that's why the minimum example looked like an empty String).

Last year fs2 started ignoring UTF-8 BOM characters:
typelevel/fs2#1484

So flakiness of this test is a successful discovery of the difference between the behavior of fs2 and http4s decoders.

Following the example of fs2, I made changes to the decode function dropping BOM characters and added an example test to verify that.

rossabaker

Great find! I've been meaning to clean this up and offer it to fs2, because it's not HTTP, but not gotten around to it. But this is a good step forward.

rossabaker · 2020-01-25T20:02:26Z

core/src/main/scala/org/http4s/util/package.scala

@@ -24,7 +26,8 @@ package object util {
          else Pull.output1(outputString).as(None)
        case Some((chunk, stream)) =>
          if (chunk.nonEmpty) {
-            val bytes = chunk.toArray
+            val chunkWithoutBom = skipByteOrderMark(chunk)


This is probably fast enough as is, but do we only need to check this on the first chunk? And is it incorrect if we drop it in the middle of a stream?

Yes, checking only the first chunk is enough.

According to wikipedia:

its only use in UTF-8 is to signal at the start that the text stream is encoded in UTF-8, or that it was converted to UTF-8 from a stream that contained an optional BOM.

So in both implementations (here and in fs2) if UTF-8 BOM is not at the very beginning of the stream - it will not be dropped and will remain in the result.

So we could optimize it possibly introducing some var to mark the first chunk... But hopefully that is not necessary.

rossabaker · 2020-01-25T20:02:58Z

tests/src/test/scala/org/http4s/util/DecodeSpec.scala

@@ -55,5 +55,11 @@ class DecodeSpec extends Http4sSpec {
        decoded must_== expected
      }
    }
+
+    "drop Byte Order Mark" in {


Probably exceedingly rare, but what if the BOM spans chunks?

Given, that we exclude cases of it appearing in the middle of the stream, I don't think it would ever happen as we use unconsN. It might be more likely in fs2 implementation, where they use uncons1.

Having written this it became clear to me that it is actually unnecessary to check chunk.size >= 3.
I took it from fs2, but because of unconsN it is redundant in our implementation.

rossabaker · 2020-01-26T15:38:11Z

As far as I know, this isn't a bug that has bitten any users in practice, but the change to the code goes beyond the Testing label.

ChristopherDavenport

Agree we should have A rollover for across chunks

rossabaker · 2020-01-27T18:18:56Z

Do we want to merge this as is and follow up with another PR for the corner case of the corner case?

ChristopherDavenport · 2020-01-27T19:24:22Z

Yup.

Test dropping UTF-8 BOM

mgibowski added 2 commits January 25, 2020 18:28

Drop UTF-8 BOM when decoding (just like fs2)

1b36293

Test dropping UTF-8 BOM

df26f73

rossabaker added the testing Issues related to tests label Jan 25, 2020

rossabaker reviewed Jan 25, 2020

View reviewed changes

rossabaker added bug Determined to be a bug in http4s and removed testing Issues related to tests labels Jan 26, 2020

ChristopherDavenport approved these changes Jan 26, 2020

View reviewed changes

rossabaker merged commit 50d3d32 into http4s:master Jan 27, 2020

rossabaker mentioned this pull request Jan 27, 2020

Deal with corner cases of BOM handling in decode #3122

Open

rossabaker added a commit to rossabaker/http4s that referenced this pull request Jan 27, 2020

[ci-skip] Add http4s#3115 to changelog

a29204a

rossabaker pushed a commit to rossabaker/http4s that referenced this pull request Apr 3, 2020

Backport http4s#3115: Drop UTF-8 BOM when decoding (just like fs2)

3d31e39

Test dropping UTF-8 BOM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky decode #3115

Fix flaky decode #3115

mgibowski commented Jan 25, 2020

rossabaker left a comment

rossabaker Jan 25, 2020

mgibowski Jan 27, 2020

rossabaker Jan 25, 2020

mgibowski Jan 27, 2020

mgibowski Jan 27, 2020

rossabaker commented Jan 26, 2020

ChristopherDavenport left a comment

rossabaker commented Jan 27, 2020

ChristopherDavenport commented Jan 27, 2020

Fix flaky decode #3115

Fix flaky decode #3115

Conversation

mgibowski commented Jan 25, 2020

rossabaker left a comment

Choose a reason for hiding this comment

rossabaker Jan 25, 2020

Choose a reason for hiding this comment

mgibowski Jan 27, 2020

Choose a reason for hiding this comment

rossabaker Jan 25, 2020

Choose a reason for hiding this comment

mgibowski Jan 27, 2020

Choose a reason for hiding this comment

mgibowski Jan 27, 2020

Choose a reason for hiding this comment

rossabaker commented Jan 26, 2020

ChristopherDavenport left a comment

Choose a reason for hiding this comment

rossabaker commented Jan 27, 2020

ChristopherDavenport commented Jan 27, 2020