Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Regression: Encoding error saving YAML with Norwegian characters #1204

Closed
donv opened this Issue · 18 comments

4 participants

@donv
Collaborator

Hi!

Given the following example using JRuby 1.7.6 or jruby-1_7 head, ActiveRecord 3.2.15, and jdbc-postgres 1.3.2:

# encoding: utf-8
require 'rubygems'
require 'activerecord-jdbc-adapter'

ActiveRecord::Base.establish_connection :adapter => 'postgresql',
                                        :database => 'encoding_test'

ActiveRecord::Base.connection.execute <<ddl
DROP TABLE IF EXISTS my_models
ddl

ActiveRecord::Base.connection.execute <<ddl
CREATE TABLE my_models(
  id SERIAL NOT NULL PRIMARY KEY,
  comment VARCHAR(32)
)
ddl

class MyModel < ActiveRecord::Base ; end
MyModel.create! :comment => 'Så sære søtsaker?! (ÆØÅ)'.to_yaml

I get the following error:

moterom-pc:admin uwe$ jruby encoding_test.rb 
ArgumentError: invalid byte sequence in US-ASCII
                                              =~ at org/jruby/RubyRegexp.java:1676
                                     exec_insert at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-jdbc-adapter-1.3.2/lib/arjdbc/postgresql/adapter.rb:538
                                          insert at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/connection_adapters/abstract/database_statements.rb:90
                                          insert at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/connection_adapters/abstract/query_cache.rb:19
                                          insert at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/relation.rb:66
                                          create at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/persistence.rb:367
                                          create at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/timestamp.rb:58
                                          create at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/callbacks.rb:268
  _run__1053631033__create__429877854__callbacks at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activesupport-3.2.15/lib/active_support/callbacks.rb:406
                                  __run_callback at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activesupport-3.2.15/lib/active_support/callbacks.rb:405
                           _run_create_callbacks at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activesupport-3.2.15/lib/active_support/callbacks.rb:390
                                   run_callbacks at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activesupport-3.2.15/lib/active_support/callbacks.rb:81
                                          create at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/callbacks.rb:268
                                create_or_update at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/persistence.rb:348
                                create_or_update at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/callbacks.rb:264
    _run__1053631033__save__429877854__callbacks at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activesupport-3.2.15/lib/active_support/callbacks.rb:406
                                  __run_callback at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activesupport-3.2.15/lib/active_support/callbacks.rb:405
                             _run_save_callbacks at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activesupport-3.2.15/lib/active_support/callbacks.rb:390
                                   run_callbacks at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activesupport-3.2.15/lib/active_support/callbacks.rb:81
                                create_or_update at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/callbacks.rb:264
                                           save! at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/persistence.rb:104
                                           save! at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/validations.rb:56
                                           save! at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/attribute_methods/dirty.rb:33
                                           save! at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/transactions.rb:264
               with_transaction_returning_status at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/transactions.rb:313
                                     transaction at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/connection_adapters/abstract/database_statements.rb:192
                                     transaction at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/transactions.rb:208
               with_transaction_returning_status at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/transactions.rb:311
                                           save! at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/transactions.rb:264
                                         create! at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/validations.rb:41
                                          (root) at encoding_test.rb:20
moterom-pc:admin uwe$ 

Reverting to JRuby 1.7.4 works. Saving the string without converting to YAML also works.

This is pretty serious and blocks adoption of JRuby 1.7.6.

@ratnikov

Seems to me that the culprit is (sql.is_a?(String) && sql =~ /RETURNING "?\S+"?$/) part of https://github.com/jruby/activerecord-jdbc-adapter/blob/v1.3.2/lib/arjdbc/postgresql/adapter.rb#L538

Perhaps you can try to output the string and provide an easier reproduction case?

@donv
Collaborator

The string looks normal and has encoding utf-8. Using a literal string with the same content works fine.

@donv
Collaborator

Hi @ratnikov !

Have you verified the failing example? Does it fail for you? When I dump the yaml string, it looks completely normal, but if I generate the same string not using "to_yaml", it works fine.

@ratnikov

Seems to work for me:

ratnikov@ratnikov:~/crap$ ruby --version
jruby 1.7.6 (1.9.3p392) 2013-10-22 6004147 on OpenJDK Server VM 1.7.0-google-v5-54862216-54132845 [linux-i386]
ratnikov@ratnikov:~/crap$ cd regexp/
ratnikov@ratnikov:~/crap/regexp$ cat test.rb

encoding: utf-8

require 'rubygems'
require 'activerecord-jdbc-adapter'

ActiveRecord::Base.establish_connection :adapter => 'postgresql',
:database => 'encoding_test',
:username => 'encoding_test',
:password => 'secret'

ActiveRecord::Base.connection.execute <<ddl
DROP TABLE IF EXISTS my_models
ddl

ActiveRecord::Base.connection.execute <<ddl
CREATE TABLE my_models(
id SERIAL NOT NULL PRIMARY KEY,
comment VARCHAR(32)
)
ddl

class MyModel < ActiveRecord::Base ; end
MyModel.create! :comment => 'Så sære søtsaker?! (ÆØÅ)'.to_yaml

puts "models: #{MyModel.all.inspect}"
ratnikov@ratnikov:~/crap/regexp$ ruby test.rb
models: #]>
ratnikov@ratnikov:~/crap/regexp$

@donv
Collaborator

Thanks for testing. It is failing on three separate systems here.

@ratnikov

What's your ruby --version ?

@donv
Collaborator
$ jruby --version
jruby 1.7.6 (1.9.3p392) 2013-10-22 6004147 on Java HotSpot(TM) 64-Bit Server VM 1.7.0_45-b18 [darwin-x86_64]

$ gem --version
2.1.9

activerecord (3.2.15)
activerecord-jdbc-adapter (1.3.2)
activerecord-jdbcpostgresql-adapter (1.3.2)
jdbc-postgres (9.2.1004)
@ratnikov
@ratnikov

Here's a condensed case (although I did set it up as unit test):

# encoding: utf-8
require 'test/unit'

class EncodingTest < Test::Unit::TestCase
  def test_psych_encoding
    File.open('ascii.rb', 'w') do |io|
      io.write <<-CODE
# encoding: US-ASCII

module Foo
  extend self
  def foo(value)
    "'\#{value}'".encoding
  end
end
      CODE
    end

    require 'ascii.rb'
    require 'psych'
    str = Psych.dump("'Så sære søtsaker?! (ÆØÅ)'")

    assert_equal Encoding.find("UTF-8"), Foo.foo(str), "Should still be utf, right?"
  end
end

Seems like due to Foo being written as ASCII the mixing of ASCII and UTF doesn't work very well. In the original example, the "'#{}'" comes from quoting the value before sending it off to the insert query.

@headius
Owner

Appears to be a problem with how it negotiates the final encoding for a US-ASCII string getting UTF-8 content interpolated into it.

@headius
Owner

This is probably a bug in cat19 logic. The dstr logic for string interpolation should be calling cat19 to properly negotiate character encodings while building the final string, and the final encoding here should be UTF-8 since US-ASCII can't hold those high-bit characters.

It's also possible, since psych is involved, that our psych impl is not properly setting some encoding detail on "str" coming out of #dump. If it's leaving the code range set to "7bit" that could caust cat19 to assume the string is all 7-bit characters rather than looking at encoding, and it would then shove them all together with US-ASCII encoding.

@ratnikov

JRuby.reference(value).getCodeRange reports 32 in both places, so this seems more likely the interpolation issues (although cannot rule out psych issue 100%, since it's possible that JRuby.reference does something funky and reports a different codeRange that will eventually be used. Although interpolating JRuby.reference(value) also yields ASCII encoding in the end as well).

@enebo
Owner

I just started to look at this one. Psych must be creating this string wrong. I can see the string is marked as 7bit and it clearly is not a 7bit string. If I hack it to unknown or valid then this works fine (it should be marked valid imho).

@enebo
Owner

AHA...found it. StringIO.write is unconditionally calling 1.8 impl of write which means it is not marking the internal string to unknown each time it adds more to the internal string. So it starts CR7 empty and then when junk (and stuff) is added it just still thinks it is CR.

@enebo enebo referenced this issue from a commit
@enebo enebo Related to #1204. Make Psych emit strings in proper encoding. IOOutpu…
…tStream can now explicitly specify what encoding it wants to write strings with
22e8c69
@enebo
Owner

Ok this last commit does not solve the problem but it definitely helps. YAML was emitting strings all as ASCII all the time. This now lets the emit match the proper UTF encoding it needs. StringIO is still broken however. My previous comment about StringIO thinking it is 7bit and no longer being it is still the issue.

@donv
Collaborator

Congratz on the 1.7.7 release! Do you think a fix for this will make it into 1.7.8? It is a blocker for us, keeping us on 1.7.4.

@headius headius closed this issue from a commit
@headius headius Re-port StringIO and methods required for it.
This incorporates missing flags, string modification, and encoding
logic into StringIO by doing a line-by-line behavior port from
MRI.

Fixes #1204.
Fixes #1242.
61014cf
@headius headius closed this in 61014cf
@headius
Owner

This is now fixed, but we need a test for it. Can you put one together, @donv or @enebo?

@donv
Collaborator

I have verified the fix in our production test system. It looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.