Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Efficient Ember Encoding #3546

Merged

Conversation

ChristopherDavenport
Copy link
Member

@ChristopherDavenport ChristopherDavenport commented Jun 28, 2020

  • It was previously UTF-8, we've switched to ASCII to match the gist, is that the correct encoding?

Copy link
Member

@rossabaker rossabaker left a comment

👍, with caution

}
// Final CRLF terminates headers and signals body to follow.
stringBuilder.append(CRLF)
stringBuilder.toString.getBytes(StandardCharsets.US_ASCII)
Copy link
Member

@rossabaker rossabaker Jun 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this (and line 76) can potentially throw, if there was something naughty in the model. Our model constructors guard against a lot of these, but they're not foolproof yet.

Copy link
Member

@rossabaker rossabaker Jun 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, there is an old tradition of smuggling ISO-8859-1 through this, and it achieves that 1:1 mapping between bytes and chars as ASCII. Perhaps that is the better choice.

Copy link
Member

@rossabaker rossabaker Jun 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC says:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

@rossabaker rossabaker added the enhancement Feature requests and improvements label Jun 29, 2020
@rossabaker
Copy link
Member

rossabaker commented Jun 29, 2020

Are we keeping this off 0.21 because it changes undefined (by us or spec) behavior? It looks binary compatible, and I wouldn't necessarily mind backporting it.

@ChristopherDavenport ChristopherDavenport changed the base branch from master to series/0.21 Jun 29, 2020
@ChristopherDavenport
Copy link
Member Author

ChristopherDavenport commented Jul 1, 2020

Updated Charset

@rossabaker rossabaker added this to the 0.21.7 milestone Jul 1, 2020
@rossabaker rossabaker merged commit 1ad8940 into http4s:series/0.21 Jul 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants