Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Psych yaml parser can not parse uppercase ÄÖÜ but äöü can parse #483

Closed
spider-network opened this Issue · 8 comments

5 participants

@spider-network

My Env

java -version
java version "1.7.0_09"
OpenJDK Runtime Environment (IcedTea7 2.3.3) (7u9-2.3.3-0ubuntu1~12.04.1)
OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)
ruby -v
jruby 1.7.1 (1.9.3p327) 2012-12-03 30a153b on OpenJDK 64-Bit Server VM 1.7.0_09-b30 [linux-amd64]

With jruby 1.7.2 it does also not work, bit it works with the normal MRI ;-(

How to reproduce the bug

test.yml

de:
  test: "Ä"

test.rb

require 'yaml'

YAML.parse(open("test.yml").read)

puts "Done"

Error

ruby test.rb
Psych::SyntaxError: (<unknown>): 'reader' unacceptable character '?' (0x84) special characters are not allowed
in "'reader'", position 14 at line 0 column 0
         parse at org/jruby/ext/psych/PsychParser.java:225
  parse_stream at /home/vagrant/.rvm/rubies/jruby-1.7.2/lib/ruby/1.9/psych.rb:205
         parse at /home/vagrant/.rvm/rubies/jruby-1.7.2/lib/ruby/1.9/psych.rb:153
        (root) at test.rb:3

Someone knows a workaround?

@spider-network

On my Mac i have the Oracle Java version and it works but not on my Ubuntu server with OpenJDK ;-(

java -version
java version "1.6.0_37"
Java(TM) SE Runtime Environment (build 1.6.0_37-b06-434-11M3909)
Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01-434, mixed mode)
@BanzaiMan
Owner

Can you check your system encoding (e.g., locale)? Also, try enforcing UTF-8 in test.rb.

@enebo
Owner

I am a little confused how this should work. The File.open.read will read in the file with a particular encoding and YAML expects it to be one of the two UTF-16's or UTF-8. So what happens if your default encoding is not UTF-* on the read? If it is ascii or the ascii-8bit (accented chars) ends up being valid UTF-* characters then you should see this error. I guess that could explain the error if as Hiro suggests your encoding is not UTF-8 on Ubuntu (LANG is also worth checking).

@spider-network

locale

vagrant@precise64:~$ locale
LANG=de_DE.UTF-8
LANGUAGE=
LC_CTYPE="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_COLLATE="en_US"
LC_MONETARY="en_US"
LC_MESSAGES="en_US"
LC_PAPER="en_US"
LC_NAME="en_US"
LC_ADDRESS="en_US"
LC_TELEPHONE="en_US"
LC_MEASUREMENT="en_US"
LC_IDENTIFICATION="en_US"
LC_ALL=en_US

It also does not work with enforcing UTF-8

vagrant@precise64:~/metrigo/jruby-bug/yaml$ ruby -v
jruby 1.7.1 (1.9.3p327) 2012-12-03 30a153b on OpenJDK 64-Bit Server VM 1.7.0_09-b30 [linux-amd64]
vagrant@precise64:~/metrigo/jruby-bug/yaml$ cat test.rb
# encoding: UTF-8
require 'yaml'

YAML.parse(open("test.yml").read)

puts "Done"
vagrant@precise64:~/metrigo/jruby-bug/yaml$
vagrant@precise64:~/metrigo/jruby-bug/yaml$ ruby test.rb
Psych::SyntaxError: (<unknown>): 'reader' unacceptable character '?' (0x84) special characters are not allowed
in "'reader'", position 14 at line 0 column 0
         parse at org/jruby/ext/psych/PsychParser.java:225
  parse_stream at /home/vagrant/.rvm/rubies/jruby-1.7.1/lib/ruby/1.9/psych.rb:205
         parse at /home/vagrant/.rvm/rubies/jruby-1.7.1/lib/ruby/1.9/psych.rb:153
        (root) at test.rb:4
@BanzaiMan
Owner

I still can't reproduce this. There is something else at play. Could you try without RVM?

$ ./bin/jruby -v
jruby 1.7.1 (1.9.3p327) 2012-12-03 30a153b on OpenJDK 64-Bit Server VM 1.7.0_09-b30 [linux-amd64]
$ cat test.yml 
de:
  test: "Ä"
$ cat test.rb
require 'yaml'

YAML.parse(open("test.yml").read)

puts "Done"
$ ./bin/jruby test.rb
Done
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ uname -a
Linux ip-10-196-35-92 3.2.0-31-virtual #50-Ubuntu SMP Fri Sep 7 16:36:36 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
@edzhelyov

I have a similar problem, but with translations (YAML files) from a 3rd party service. The file is downloaded and the characters appear as <80>, <88> when I open them with vim.

I tried to reproduce the test case from the author and everything works for me, so I suspect that maybe it depends on the source encoding of the YAML file...

@edzhelyov

My problem is related to the encoding set from RestClient(possibly Net::HTTP) on file attachments. The CRuby (1.9.3-p374) differs from the JRuby's one in my case. CRuby will set the response.body encoding to UTF-8 while in JRuby it will be "ASCII-8BIT".

Now I'm unsure if this is actually a bug and in which cases it happens as I have no understanding how Net:HTTP should behave when downloading files as I'm not aware what the HTTP specification says about encoding in this cases.

I couldn't reproduce the the exact behavior on public domain and it seems it happens on specific server responses. So in some cases, unknown to me, the server can specify the file encoding and in the CRuby acknowledge it.

If someone knows more on this subject I'm happy to discuss it further, so I can isolate a specific case.

@headius
Owner

I think there's an encoding mismatch at play here. None of us can reproduce this, and one commentor theorized it could be an issue with a badly-encoded YAML source. When the file is encoded as UTF-8 and read as UTF-8, it appears to parse just fine.

If you can find a way for us to reproduce this, feel free to reopen.

@headius headius closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.