Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: Encoding error saving YAML with Norwegian characters #1204

Closed
donv opened this Issue Nov 7, 2013 · 18 comments

Comments

Projects
None yet
4 participants
@donv
Copy link
Member

donv commented Nov 7, 2013

Hi!

Given the following example using JRuby 1.7.6 or jruby-1_7 head, ActiveRecord 3.2.15, and jdbc-postgres 1.3.2:

# encoding: utf-8
require 'rubygems'
require 'activerecord-jdbc-adapter'

ActiveRecord::Base.establish_connection :adapter => 'postgresql',
                                        :database => 'encoding_test'

ActiveRecord::Base.connection.execute <<ddl
DROP TABLE IF EXISTS my_models
ddl

ActiveRecord::Base.connection.execute <<ddl
CREATE TABLE my_models(
  id SERIAL NOT NULL PRIMARY KEY,
  comment VARCHAR(32)
)
ddl

class MyModel < ActiveRecord::Base ; end
MyModel.create! :comment => 'Så sære søtsaker?! (ÆØÅ)'.to_yaml

I get the following error:

moterom-pc:admin uwe$ jruby encoding_test.rb 
ArgumentError: invalid byte sequence in US-ASCII
                                              =~ at org/jruby/RubyRegexp.java:1676
                                     exec_insert at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-jdbc-adapter-1.3.2/lib/arjdbc/postgresql/adapter.rb:538
                                          insert at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/connection_adapters/abstract/database_statements.rb:90
                                          insert at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/connection_adapters/abstract/query_cache.rb:19
                                          insert at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/relation.rb:66
                                          create at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/persistence.rb:367
                                          create at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/timestamp.rb:58
                                          create at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/callbacks.rb:268
  _run__1053631033__create__429877854__callbacks at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activesupport-3.2.15/lib/active_support/callbacks.rb:406
                                  __run_callback at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activesupport-3.2.15/lib/active_support/callbacks.rb:405
                           _run_create_callbacks at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activesupport-3.2.15/lib/active_support/callbacks.rb:390
                                   run_callbacks at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activesupport-3.2.15/lib/active_support/callbacks.rb:81
                                          create at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/callbacks.rb:268
                                create_or_update at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/persistence.rb:348
                                create_or_update at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/callbacks.rb:264
    _run__1053631033__save__429877854__callbacks at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activesupport-3.2.15/lib/active_support/callbacks.rb:406
                                  __run_callback at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activesupport-3.2.15/lib/active_support/callbacks.rb:405
                             _run_save_callbacks at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activesupport-3.2.15/lib/active_support/callbacks.rb:390
                                   run_callbacks at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activesupport-3.2.15/lib/active_support/callbacks.rb:81
                                create_or_update at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/callbacks.rb:264
                                           save! at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/persistence.rb:104
                                           save! at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/validations.rb:56
                                           save! at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/attribute_methods/dirty.rb:33
                                           save! at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/transactions.rb:264
               with_transaction_returning_status at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/transactions.rb:313
                                     transaction at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/connection_adapters/abstract/database_statements.rb:192
                                     transaction at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/transactions.rb:208
               with_transaction_returning_status at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/transactions.rb:311
                                           save! at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/transactions.rb:264
                                         create! at /Users/uwe/workspace/aifudis/admin/platform/jruby-1.7.6/lib/ruby/gems/shared/gems/activerecord-3.2.15/lib/active_record/validations.rb:41
                                          (root) at encoding_test.rb:20
moterom-pc:admin uwe$ 

Reverting to JRuby 1.7.4 works. Saving the string without converting to YAML also works.

This is pretty serious and blocks adoption of JRuby 1.7.6.

@ratnikov

This comment has been minimized.

Copy link
Contributor

ratnikov commented Nov 7, 2013

Seems to me that the culprit is (sql.is_a?(String) && sql =~ /RETURNING "?\S+"?$/) part of https://github.com/jruby/activerecord-jdbc-adapter/blob/v1.3.2/lib/arjdbc/postgresql/adapter.rb#L538

Perhaps you can try to output the string and provide an easier reproduction case?

@donv

This comment has been minimized.

Copy link
Member Author

donv commented Nov 7, 2013

The string looks normal and has encoding utf-8. Using a literal string with the same content works fine.

@donv

This comment has been minimized.

Copy link
Member Author

donv commented Nov 8, 2013

Hi @ratnikov !

Have you verified the failing example? Does it fail for you? When I dump the yaml string, it looks completely normal, but if I generate the same string not using "to_yaml", it works fine.

@ratnikov

This comment has been minimized.

Copy link
Contributor

ratnikov commented Nov 8, 2013

Seems to work for me:

ratnikov@ratnikov:/crap$ ruby --version
jruby 1.7.6 (1.9.3p392) 2013-10-22 6004147 on OpenJDK Server VM 1.7.0-google-v5-54862216-54132845 [linux-i386]
ratnikov@ratnikov:
/crap$ cd regexp/
ratnikov@ratnikov:~/crap/regexp$ cat test.rb

encoding: utf-8

require 'rubygems'
require 'activerecord-jdbc-adapter'

ActiveRecord::Base.establish_connection :adapter => 'postgresql',
:database => 'encoding_test',
:username => 'encoding_test',
:password => 'secret'

ActiveRecord::Base.connection.execute <<ddl
DROP TABLE IF EXISTS my_models
ddl

ActiveRecord::Base.connection.execute <<ddl
CREATE TABLE my_models(
id SERIAL NOT NULL PRIMARY KEY,
comment VARCHAR(32)
)
ddl

class MyModel < ActiveRecord::Base ; end
MyModel.create! :comment => 'Så sære søtsaker?! (ÆØÅ)'.to_yaml

puts "models: #{MyModel.all.inspect}"
ratnikov@ratnikov:/crap/regexp$ ruby test.rb
models: #<ActiveRecord::Relation [#<MyModel id: 1, comment: "--- Så sære søtsaker?! (ÆØÅ)\n">]>
ratnikov@ratnikov:
/crap/regexp$

@donv

This comment has been minimized.

Copy link
Member Author

donv commented Nov 8, 2013

Thanks for testing. It is failing on three separate systems here.

@ratnikov

This comment has been minimized.

Copy link
Contributor

ratnikov commented Nov 8, 2013

What's your ruby --version ?

@donv

This comment has been minimized.

Copy link
Member Author

donv commented Nov 8, 2013

$ jruby --version
jruby 1.7.6 (1.9.3p392) 2013-10-22 6004147 on Java HotSpot(TM) 64-Bit Server VM 1.7.0_45-b18 [darwin-x86_64]

$ gem --version
2.1.9

activerecord (3.2.15)
activerecord-jdbc-adapter (1.3.2)
activerecord-jdbcpostgresql-adapter (1.3.2)
jdbc-postgres (9.2.1004)
@ratnikov

This comment has been minimized.

Copy link
Contributor

ratnikov commented Nov 8, 2013

Reproduced. I was using AR 4.0 first.

On Fri, Nov 8, 2013 at 1:03 PM, Uwe Kubosch notifications@github.comwrote:

$ jruby --version
jruby 1.7.6 (1.9.3p392) 2013-10-22 6004147 on Java HotSpot(TM) 64-Bit Server VM 1.7.0_45-b18 [darwin-x86_64]

$ gem --version
2.1.9

activerecord (3.2.15)
activerecord-jdbc-adapter (1.3.2)
activerecord-jdbcpostgresql-adapter (1.3.2)
jdbc-postgres (9.2.1004)


Reply to this email directly or view it on GitHubhttps://github.com//issues/1204#issuecomment-28083825
.

@ratnikov

This comment has been minimized.

Copy link
Contributor

ratnikov commented Nov 8, 2013

Here's a condensed case (although I did set it up as unit test):

# encoding: utf-8
require 'test/unit'

class EncodingTest < Test::Unit::TestCase
  def test_psych_encoding
    File.open('ascii.rb', 'w') do |io|
      io.write <<-CODE
# encoding: US-ASCII

module Foo
  extend self
  def foo(value)
    "'\#{value}'".encoding
  end
end
      CODE
    end

    require 'ascii.rb'
    require 'psych'
    str = Psych.dump("'Så sære søtsaker?! (ÆØÅ)'")

    assert_equal Encoding.find("UTF-8"), Foo.foo(str), "Should still be utf, right?"
  end
end

Seems like due to Foo being written as ASCII the mixing of ASCII and UTF doesn't work very well. In the original example, the "'#{}'" comes from quoting the value before sending it off to the insert query.

@headius

This comment has been minimized.

Copy link
Member

headius commented Nov 11, 2013

Appears to be a problem with how it negotiates the final encoding for a US-ASCII string getting UTF-8 content interpolated into it.

@headius

This comment has been minimized.

Copy link
Member

headius commented Nov 11, 2013

This is probably a bug in cat19 logic. The dstr logic for string interpolation should be calling cat19 to properly negotiate character encodings while building the final string, and the final encoding here should be UTF-8 since US-ASCII can't hold those high-bit characters.

It's also possible, since psych is involved, that our psych impl is not properly setting some encoding detail on "str" coming out of #dump. If it's leaving the code range set to "7bit" that could caust cat19 to assume the string is all 7-bit characters rather than looking at encoding, and it would then shove them all together with US-ASCII encoding.

@ratnikov

This comment has been minimized.

Copy link
Contributor

ratnikov commented Nov 11, 2013

JRuby.reference(value).getCodeRange reports 32 in both places, so this seems more likely the interpolation issues (although cannot rule out psych issue 100%, since it's possible that JRuby.reference does something funky and reports a different codeRange that will eventually be used. Although interpolating JRuby.reference(value) also yields ASCII encoding in the end as well).

@enebo

This comment has been minimized.

Copy link
Member

enebo commented Nov 12, 2013

I just started to look at this one. Psych must be creating this string wrong. I can see the string is marked as 7bit and it clearly is not a 7bit string. If I hack it to unknown or valid then this works fine (it should be marked valid imho).

@enebo

This comment has been minimized.

Copy link
Member

enebo commented Nov 12, 2013

AHA...found it. StringIO.write is unconditionally calling 1.8 impl of write which means it is not marking the internal string to unknown each time it adds more to the internal string. So it starts CR7 empty and then when junk (and stuff) is added it just still thinks it is CR.

enebo added a commit that referenced this issue Nov 13, 2013

Related to #1204. Make Psych emit strings in proper encoding. IOOutpu…
…tStream can now explicitly specify what encoding it wants to write strings with
@enebo

This comment has been minimized.

Copy link
Member

enebo commented Nov 13, 2013

Ok this last commit does not solve the problem but it definitely helps. YAML was emitting strings all as ASCII all the time. This now lets the emit match the proper UTF encoding it needs. StringIO is still broken however. My previous comment about StringIO thinking it is 7bit and no longer being it is still the issue.

@donv

This comment has been minimized.

Copy link
Member Author

donv commented Nov 14, 2013

Congratz on the 1.7.7 release! Do you think a fix for this will make it into 1.7.8? It is a blocker for us, keeping us on 1.7.4.

@headius

This comment has been minimized.

Copy link
Member

headius commented Nov 21, 2013

This is now fixed, but we need a test for it. Can you put one together, @donv or @enebo?

@donv

This comment has been minimized.

Copy link
Member Author

donv commented Nov 22, 2013

I have verified the fix in our production test system. It looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.