Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Psych fails with MBC strings in ASCII-8BIT #2901

Closed
nirvdrum opened this issue Apr 30, 2015 · 4 comments
Closed

Psych fails with MBC strings in ASCII-8BIT #2901

nirvdrum opened this issue Apr 30, 2015 · 4 comments

Comments

@nirvdrum
Copy link
Contributor

@nirvdrum nirvdrum commented Apr 30, 2015

Generally, the JRuby version of psych can handle MBC strings. However, if the encoding is ASCII-8BIT, as it would be by default when reading from a socket, JRuby psych is no longer able to parse the YAML.

Simple example:

MRI:

> ruby -v -e 'require "yaml"; p YAML.load("nokogiri: 鋸".force_encoding("ASCII-8BIT"))'
ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux]
{"nokogiri"=>"鋸"}

JRuby:

> bin/jruby -v -e 'require "yaml"; p YAML.load("nokogiri: 鋸".force_encoding("ASCII-8BIT"))'
jruby 9.0.0.0-SNAPSHOT (2.2.2) 2015-04-30 d34f7e9 Java HotSpot(TM) 64-Bit Server VM 25.45-b02 on 1.8.0_45-b14 +jit [linux-amd64]
Psych::SyntaxError: (<unknown>): 'reader' unacceptable character '�' (0x8B) special characters are not allowed
in "'reader'", position 11 at line 0 column 0
         parse at org/jruby/ext/psych/PsychParser.java:219
  parse_stream at /home/nirvdrum/dev/workspaces/jruby/lib/ruby/stdlib/psych.rb:376
         parse at /home/nirvdrum/dev/workspaces/jruby/lib/ruby/stdlib/psych.rb:324
          load at /home/nirvdrum/dev/workspaces/jruby/lib/ruby/stdlib/psych.rb:251
         <top> at -e:1
@nirvdrum nirvdrum changed the title Psych fails with MBC in ASCII-8BIT Psych fails with MBC strings in ASCII-8BIT Apr 30, 2015
@headius headius added this to the JRuby 9.0.0.0.rc1 milestone May 5, 2015
@headius
Copy link
Member

@headius headius commented May 5, 2015

Looks like psych defaults to UTF8 if the given encoding is not unicode. Fixing.

@headius
Copy link
Member

@headius headius commented May 5, 2015

Fixed in psych proper by defaulting to UTF-8 when the encoding we discover is not unicode.

@enebo
Copy link
Member

@enebo enebo commented Jul 1, 2015

@headius So this is fixed yeah?

@headius
Copy link
Member

@headius headius commented Jul 2, 2015

@enebo Yes, it's fixed in the main psych repo...I'm not sure whether that has been released in a gem for us to depend on yet. @tenderlove Where do we stand on merging JRuby stuff to psych master and pushing a non-preview gem?

@enebo enebo modified the milestones: JRuby 9.0.0.0.rc2, JRuby 9.0.0.0 Jul 9, 2015
@enebo enebo added this to the JRuby 9.0.1.0 milestone Sep 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants