Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issue with backticks #2258

Closed
mmustala opened this Issue Dec 2, 2014 · 6 comments

Comments

Projects
None yet
2 participants
@mmustala
Copy link

mmustala commented Dec 2, 2014

Hi,

I created a reduced example of when the encoding is not correct when using the backticks.

filename = "neliö" # Any string with multibyte chars should do
File.write(filename, "content")
`file -b --mime #{filename}`
# => "ERROR: cannot open `neliö' (No such file or directory)\n"

Passing the file name into the backticks with #{} is the key. When the file name is written directly into the code like this

`file -b --mime neliö`

there is no issues.

This relates to thoughtbot/cocaine#72 and thoughtbot/paperclip#1702

@enebo

This comment has been minimized.

Copy link
Member

enebo commented Dec 2, 2014

@mmustala Thanks for this.

@enebo

This comment has been minimized.

Copy link
Member

enebo commented Dec 2, 2014

@mmustala I am wondering what mode you are running this in and whether perhaps your repro is incomplete. If I reduce your snippet to:

filename = "neliö"

I reduced it to this because the parser never even makes it to the backtick line. So, I see the error you are mentioning without backticks + interpolation in the script. However, in this case we are correct since MRI 1.9 also raises the same error:

% ./bin/jruby --1.8 snippets/enc1.rb
% ./bin/jruby --1.9 snippets/enc1.rb
SyntaxError: snippets/enc1.rb:2: invalid multibyte char (US-ASCII)
% ./bin/jruby --2.0 snippets/enc1.rb

vs mri:

% mri18 snippets/enc1.rb
% mri19 snippets/enc1.rb
snippets/enc1.rb:1: invalid multibyte char (US-ASCII)
snippets/enc1.rb:1: invalid multibyte char (US-ASCII)
% mri20 snippets/enc1.rb

Mysteriously 1.9.3 seems to output the syntax error twice but we basically behave the same.

So I definitely believe we probably have a bug with backticks, interpolation, and encoding but your repro is not it. Could you take another swipe at a repro? I would love to fix any missing encoding issue we have. A panda dies everytime I see someone submits a workaround fix for JRuby :)

@mmustala

This comment has been minimized.

Copy link
Author

mmustala commented Dec 2, 2014

Sorry, I had just taken those lines from my irb. In a file it requires the encoding comment on the first line.

#encoding: utf-8
filename = "neliö" # Any string with multibyte chars should do
File.write(filename, "content")
mime = `file -b --mime #{filename}`
puts mime
# => "ERROR: cannot open `neliö' (No such file or directory)\n"

This should run and reproduce the issue. I'm running on default --1.9 mode and JRuby 1.7.16.1.

@enebo

This comment has been minimized.

Copy link
Member

enebo commented Dec 3, 2014

AHA! This error message is the result of executing the command with the wrong encoding interpolation. I kept viewing this as an error while interpolating to create the backtick string. Now this makes a lot more sense.

enebo added a commit that referenced this issue Dec 3, 2014

@enebo

This comment has been minimized.

Copy link
Member

enebo commented Dec 3, 2014

Simple our dxstr was not supplying the lexers encoding when constructing the AST node. @mmustala thanks for providing this. We had really thought we were down to a few esoteric encoding problems and not something this massive...

@enebo enebo closed this Dec 3, 2014

@mmustala

This comment has been minimized.

Copy link
Author

mmustala commented Dec 4, 2014

I remember when I noticed this I thought this cannot be true, JRuby cannot have this bug. Then I read the source where backticks is handled and it said something like "It is on users responsibility to use correct encoding". I thought that I just need to change the encoding in my code then.

But I'm happy to be able to delete those encoding parts from my code soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.