Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support unicode byte order mark in config.ru #1252

Merged
merged 1 commit into from Jun 3, 2018

Conversation

@mikegee
Copy link
Contributor

mikegee commented Apr 16, 2018

fixes #1021

@mikegee mikegee force-pushed the mikegee:fix-1021-unicode-bom-in-configru branch from 9b0ca89 to 7583330 Apr 16, 2018
@@ -34,7 +34,7 @@ class Builder
def self.parse_file(config, opts = Server::Options.new)
options = {}
if config =~ /\.ru$/
cfgfile = ::File.read(config)
cfgfile = ::File.read(config, mode: 'r:bom|utf-8')

This comment has been minimized.

Copy link
@jeremy

jeremy Apr 17, 2018

Member

Does this introduce a unstated requirement that config.ru is UTF-8?

This comment has been minimized.

Copy link
@mikegee

mikegee Apr 17, 2018

Author Contributor

Seems that way, doesn’t it? I’ll concoct a utf-16 config.ru or perhaps another encoding and see what happens.

Thanks for giving this PR a look!

This comment has been minimized.

Copy link
@mikegee

mikegee Apr 22, 2018

Author Contributor

This change does not introduce a new requirement. Instead it makes a previous requirement explicit.

When not told the encoding of the file, ::File.read assumes the content is UTF-8 and returns a string encoded as UTF-8 even when file is encoded as UTF-16. This fails because when the string is evaled, the first null byte is interpreted as EOF, which leads to a SyntaxError.

I'm not sure it's possible to infer the encoding of the config.ru file by looking at it, in general. I believe it is preferable to raise a bit earlier with a more reasonable error, as my proposed change would.

tested with Ruby 2.5.1

This comment has been minimized.

Copy link
@manveru

manveru Apr 22, 2018

Member

Sorry to disappoint, but ::File.read doesn't assume UTF-8 unless your LC_CTYPE environment variable or Encoding.default_external say so. So I don't think that this is backwards compatible.

This comment has been minimized.

Copy link
@mikegee

mikegee Apr 22, 2018

Author Contributor

Thanks for the clarification on that point, @manveru.

Unfortunately, changing my Encoding.default_external to match the content of my config.ru only leads to different problems.

First, the simple call to File.read fails with "ArgumentError: ASCII incompatible encoding needs binmode".

Second, File.read with mode: 'rb', doesn't assume the content matches my default external encoding and provides ASCII-8BIT instead. (In this case, eval raises SyntaxError as my comment above.)

Finally, if I explicitly declare the content type for File.read with mode: 'rb:utf-16le', it fails to successfully concatenate with the literal strings in the next method.
Encoding::CompatibilityError: incompatible character encodings: UTF-8 and UTF-16LE

Additional pointers on these issues are welcome.

This comment has been minimized.

Copy link
@manveru

manveru Apr 22, 2018

Member

I don't think we should support non-ASCII compatible encodings for source code. There is no way to set LC_* to UTF-16.
Also try to run the following file:

# Encoding: UTF-16LE
p 'hi'.encoding

And see that Ruby fails with UTF-16LE is not ASCII compatible (ArgumentError).

This comment has been minimized.

Copy link
@mikegee

mikegee Apr 23, 2018

Author Contributor

Thanks again, @manveru. I think I've got it now.

I was able to create a CP1252 config.ru with some accented characters and it worked as expected. Assuming config.ru is UTF-8 would be a breaking change.


Regarding the issue this PR is meant to solve, how do you feel about a solution like:

  • if the string encoding is UTF-8 after reading the file,
  • strip the BOM with String methods or a regexp

?

@mikegee mikegee force-pushed the mikegee:fix-1021-unicode-bom-in-configru branch from 7583330 to ece8625 Apr 29, 2018
@mikegee

This comment has been minimized.

Copy link
Contributor Author

mikegee commented Jun 1, 2018

@jeremy I addressed your feedback. Would you have a moment to check this out again?

@jeremy
jeremy approved these changes Jun 3, 2018
Copy link
Member

jeremy left a comment

Nice work @mikegee! Thanks for contributing.

@jeremy jeremy merged commit eac0f31 into rack:master Jun 3, 2018
5 checks passed
5 checks passed
ci/circleci: test-ruby-2.2 Your tests passed on CircleCI!
Details
ci/circleci: test-ruby-2.3 Your tests passed on CircleCI!
Details
ci/circleci: test-ruby-2.4 Your tests passed on CircleCI!
Details
ci/circleci: test-ruby-2.5 Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
jeremy added a commit that referenced this pull request Jun 3, 2018
@mikegee mikegee deleted the mikegee:fix-1021-unicode-bom-in-configru branch Jun 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.