Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Psych fails with MBC strings in ASCII-8BIT #2901

Closed
nirvdrum opened this Issue Apr 30, 2015 · 4 comments

Comments

Projects
None yet
3 participants
@nirvdrum
Copy link
Contributor

nirvdrum commented Apr 30, 2015

Generally, the JRuby version of psych can handle MBC strings. However, if the encoding is ASCII-8BIT, as it would be by default when reading from a socket, JRuby psych is no longer able to parse the YAML.

Simple example:

MRI:

> ruby -v -e 'require "yaml"; p YAML.load("nokogiri: 鋸".force_encoding("ASCII-8BIT"))'
ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux]
{"nokogiri"=>"鋸"}

JRuby:

> bin/jruby -v -e 'require "yaml"; p YAML.load("nokogiri: 鋸".force_encoding("ASCII-8BIT"))'
jruby 9.0.0.0-SNAPSHOT (2.2.2) 2015-04-30 d34f7e9 Java HotSpot(TM) 64-Bit Server VM 25.45-b02 on 1.8.0_45-b14 +jit [linux-amd64]
Psych::SyntaxError: (<unknown>): 'reader' unacceptable character '�' (0x8B) special characters are not allowed
in "'reader'", position 11 at line 0 column 0
         parse at org/jruby/ext/psych/PsychParser.java:219
  parse_stream at /home/nirvdrum/dev/workspaces/jruby/lib/ruby/stdlib/psych.rb:376
         parse at /home/nirvdrum/dev/workspaces/jruby/lib/ruby/stdlib/psych.rb:324
          load at /home/nirvdrum/dev/workspaces/jruby/lib/ruby/stdlib/psych.rb:251
         <top> at -e:1

@nirvdrum nirvdrum changed the title Psych fails with MBC in ASCII-8BIT Psych fails with MBC strings in ASCII-8BIT Apr 30, 2015

@nirvdrum nirvdrum added the JRuby 9000 label May 4, 2015

@headius headius added this to the JRuby 9.0.0.0.rc1 milestone May 5, 2015

@headius

This comment has been minimized.

Copy link
Member

headius commented May 5, 2015

Looks like psych defaults to UTF8 if the given encoding is not unicode. Fixing.

@headius

This comment has been minimized.

Copy link
Member

headius commented May 5, 2015

Fixed in psych proper by defaulting to UTF-8 when the encoding we discover is not unicode.

@enebo

This comment has been minimized.

Copy link
Member

enebo commented Jul 1, 2015

@headius So this is fixed yeah?

@headius

This comment has been minimized.

Copy link
Member

headius commented Jul 2, 2015

@enebo Yes, it's fixed in the main psych repo...I'm not sure whether that has been released in a gem for us to depend on yet. @tenderlove Where do we stand on merging JRuby stuff to psych master and pushing a non-preview gem?

@enebo enebo modified the milestones: JRuby 9.0.0.0.rc2, JRuby 9.0.0.0 Jul 9, 2015

@enebo enebo added this to the JRuby 9.0.1.0 milestone Sep 2, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.