New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String encoding differs between MRI and JRuby #4564

Closed
MSNexploder opened this Issue Apr 19, 2017 · 12 comments

Comments

Projects
None yet
3 participants
@MSNexploder
Contributor

MSNexploder commented Apr 19, 2017

Environment

JRuby 9.1.8.0 / 9.1.5.0 on OSX and Linux + graphql-ruby gem

Behavior

ATM i'm trying to get graphql-ruby (https://github.com/rmosolgo/graphql-ruby) to work on JRuby. The only real showstopper right now is that occasionally strings like "unit" or "limit" are binary encoded rather than UTF-8 or 7Bit ASCII which are properly encoding on MRI.

As can be seen on Travis (https://travis-ci.org/rmosolgo/graphql-ruby/jobs/223395132) this happens reproducible on every test run. Running the failing test cases isolated however works without any issues.

I'm working on a simpler repro case, but failed to create a truly minimal one.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 19, 2017

Member

I was unable to run your tests on my Fedora 25 environment with JRuby master. Are you able to reproduce this with JRuby master? (builds at ci.jruby.org).

Member

headius commented Apr 19, 2017

I was unable to run your tests on my Fedora 25 environment with JRuby master. Are you able to reproduce this with JRuby master? (builds at ci.jruby.org).

@headius headius added the JRuby 9000 label Apr 19, 2017

@MSNexploder

This comment has been minimized.

Show comment
Hide comment
@MSNexploder

MSNexploder Apr 19, 2017

Contributor

Just tested using jruby 9.1.9.0-SNAPSHOT (2.3.3) 2017-04-19 937cdf7 Java HotSpot(TM) 64-Bit Server VM 25.121-b13 on 1.8.0_121-b13 [darwin-x86_64] - still the same issue. The same string is properly UTF-8 encoded in MRI.

Sample stacktrace
Error:
GraphQL::Schema::Warden::hidding input types#test_0001_isn't present in introspection:
GraphQL::StringEncodingError: String "unit" was encoded as ASCII-8BIT! GraphQL requires UTF-8 encoding.
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/schema/default_type_error.rb:6:in `call'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/schema.rb:351:in `type_error'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/string_type.rb:12:in `block in (root)'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/scalar_type.rb:68:in `coerce_result'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:144:in `resolve_value'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:175:in `resolve_value'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:119:in `continue_resolve_field'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:81:in `resolve_field'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:33:in `block in resolve_selection'
    org/jruby/RubyHash.java:1343:in `each'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:28:in `resolve_selection'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:185:in `resolve_value'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:175:in `resolve_value'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:159:in `block in resolve_value'
    org/jruby/RubyArray.java:1734:in `each'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:151:in `resolve_value'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:119:in `continue_resolve_field'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:81:in `resolve_field'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:33:in `block in resolve_selection'
    org/jruby/RubyHash.java:1343:in `each'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:28:in `resolve_selection'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:185:in `resolve_value'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:119:in `continue_resolve_field'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:81:in `resolve_field'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:33:in `block in resolve_selection'
    org/jruby/RubyHash.java:1343:in `each'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:28:in `resolve_selection'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:15:in `execute'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/query/executor.rb:37:in `execute'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/query/executor.rb:18:in `result'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/query.rb:126:in `result'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/schema.rb:218:in `execute'
    /Users/stefan/Desktop/projects/graphql-ruby/spec/graphql/schema/warden_spec.rb:125:in `run_query'
    /Users/stefan/Desktop/projects/graphql-ruby/spec/graphql/schema/warden_spec.rb:121:in `query_with_mask'
    /Users/stefan/Desktop/projects/graphql-ruby/spec/graphql/schema/warden_spec.rb:464:in `block in test_0001_isn't present in introspection'
Contributor

MSNexploder commented Apr 19, 2017

Just tested using jruby 9.1.9.0-SNAPSHOT (2.3.3) 2017-04-19 937cdf7 Java HotSpot(TM) 64-Bit Server VM 25.121-b13 on 1.8.0_121-b13 [darwin-x86_64] - still the same issue. The same string is properly UTF-8 encoded in MRI.

Sample stacktrace
Error:
GraphQL::Schema::Warden::hidding input types#test_0001_isn't present in introspection:
GraphQL::StringEncodingError: String "unit" was encoded as ASCII-8BIT! GraphQL requires UTF-8 encoding.
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/schema/default_type_error.rb:6:in `call'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/schema.rb:351:in `type_error'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/string_type.rb:12:in `block in (root)'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/scalar_type.rb:68:in `coerce_result'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:144:in `resolve_value'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:175:in `resolve_value'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:119:in `continue_resolve_field'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:81:in `resolve_field'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:33:in `block in resolve_selection'
    org/jruby/RubyHash.java:1343:in `each'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:28:in `resolve_selection'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:185:in `resolve_value'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:175:in `resolve_value'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:159:in `block in resolve_value'
    org/jruby/RubyArray.java:1734:in `each'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:151:in `resolve_value'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:119:in `continue_resolve_field'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:81:in `resolve_field'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:33:in `block in resolve_selection'
    org/jruby/RubyHash.java:1343:in `each'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:28:in `resolve_selection'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:185:in `resolve_value'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:119:in `continue_resolve_field'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:81:in `resolve_field'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:33:in `block in resolve_selection'
    org/jruby/RubyHash.java:1343:in `each'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:28:in `resolve_selection'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/execution/execute.rb:15:in `execute'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/query/executor.rb:37:in `execute'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/query/executor.rb:18:in `result'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/query.rb:126:in `result'
    /Users/stefan/Desktop/projects/graphql-ruby/lib/graphql/schema.rb:218:in `execute'
    /Users/stefan/Desktop/projects/graphql-ruby/spec/graphql/schema/warden_spec.rb:125:in `run_query'
    /Users/stefan/Desktop/projects/graphql-ruby/spec/graphql/schema/warden_spec.rb:121:in `query_with_mask'
    /Users/stefan/Desktop/projects/graphql-ruby/spec/graphql/schema/warden_spec.rb:464:in `block in test_0001_isn't present in introspection'
@MSNexploder

This comment has been minimized.

Show comment
Hide comment
@MSNexploder

MSNexploder Apr 19, 2017

Contributor

Looks like the test order is important.

Running tests using different seeds results in different tests failing (but it looks like using the same seed keeps failing the same tests).

E.g. using seed "62230" results in 7 errors, but "27871" yields only 6 errors.

Contributor

MSNexploder commented Apr 19, 2017

Looks like the test order is important.

Running tests using different seeds results in different tests failing (but it looks like using the same seed keeps failing the same tests).

E.g. using seed "62230" results in 7 errors, but "27871" yields only 6 errors.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 19, 2017

Member

Here's what I get when I try to run your tetsts:

Coverage may be inaccurate; set the "--debug" command line option, or do JRUBY_OPTS="--debug" or set the "debug.fullTrace=true" option in your .jrubyrc
DEPRECATION WARNING: alias_method_chain is deprecated. Please, use Module#prepend instead. From module, you can access the original method using super. (called from singleton class at /home/headius/projects/jruby/lib/ruby/gems/shared/gems/activerecord-jdbc-adapter-1.3.22/lib/arjdbc/jdbc/base_ext.rb:13)
NameError: uninitialized constant ActiveRecord::ConnectionAdapters::Column::Format
                                                                                     load_missing_constant at /home/headius/projects/jruby/lib/ruby/gems/shared/gems/activesupport-5.0.2/lib/active_support/dependencies.rb:551
                                                                                             const_missing at /home/headius/projects/jruby/lib/ruby/gems/shared/gems/activesupport-5.0.2/lib/active_support/dependencies.rb:203
                                                                                         <module:TypeCast> at /home/headius/projects/jruby/lib/ruby/gems/shared/gems/activerecord-jdbc-adapter-1.3.22/lib/arjdbc/jdbc/type_cast.rb:13

I'm not sure what's wrong with my setup :-(

Member

headius commented Apr 19, 2017

Here's what I get when I try to run your tetsts:

Coverage may be inaccurate; set the "--debug" command line option, or do JRUBY_OPTS="--debug" or set the "debug.fullTrace=true" option in your .jrubyrc
DEPRECATION WARNING: alias_method_chain is deprecated. Please, use Module#prepend instead. From module, you can access the original method using super. (called from singleton class at /home/headius/projects/jruby/lib/ruby/gems/shared/gems/activerecord-jdbc-adapter-1.3.22/lib/arjdbc/jdbc/base_ext.rb:13)
NameError: uninitialized constant ActiveRecord::ConnectionAdapters::Column::Format
                                                                                     load_missing_constant at /home/headius/projects/jruby/lib/ruby/gems/shared/gems/activesupport-5.0.2/lib/active_support/dependencies.rb:551
                                                                                             const_missing at /home/headius/projects/jruby/lib/ruby/gems/shared/gems/activesupport-5.0.2/lib/active_support/dependencies.rb:203
                                                                                         <module:TypeCast> at /home/headius/projects/jruby/lib/ruby/gems/shared/gems/activerecord-jdbc-adapter-1.3.22/lib/arjdbc/jdbc/type_cast.rb:13

I'm not sure what's wrong with my setup :-(

@MSNexploder

This comment has been minimized.

Show comment
Hide comment
@MSNexploder

MSNexploder Apr 19, 2017

Contributor

For some reason you are running the Rails 5.0 tests while my setup is running the 4.2 ones by default. Pretty sure my setup is the broken one.

Nonetheless you should be able to run the tests using appraisal:
bundle exec appraisal rails_4.2 rake test

(just trying for myself I had to downgrade concurrent-ruby - bundle exec appraisal rails_4.2 bundle update concurrent-ruby)

Contributor

MSNexploder commented Apr 19, 2017

For some reason you are running the Rails 5.0 tests while my setup is running the 4.2 ones by default. Pretty sure my setup is the broken one.

Nonetheless you should be able to run the tests using appraisal:
bundle exec appraisal rails_4.2 rake test

(just trying for myself I had to downgrade concurrent-ruby - bundle exec appraisal rails_4.2 bundle update concurrent-ruby)

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 20, 2017

Member

Ahh ok. I just did "bundle install" and it went for the Rails 5 gems by default. I'm re-trying with 4.2 now.

Member

headius commented Apr 20, 2017

Ahh ok. I just did "bundle install" and it went for the Rails 5 gems by default. I'm re-trying with 4.2 now.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 24, 2017

Member

Well I struggled to get a reproduction and had little luck, but I was able to reproduce your erratic test results.

Anything you can do to reduce this down to just a few tests would be helpful, I'm trying to cut pieces out as I go, while keeping failures, but I have not made a lot of progress.

Member

headius commented Apr 24, 2017

Well I struggled to get a reproduction and had little luck, but I was able to reproduce your erratic test results.

Anything you can do to reduce this down to just a few tests would be helpful, I'm trying to cut pieces out as I go, while keeping failures, but I have not made a lot of progress.

@MSNexploder

This comment has been minimized.

Show comment
Hide comment
@MSNexploder

MSNexploder Apr 24, 2017

Contributor

I finally managed to create a more minimal reproduction 🎉

First I extracted one of the failing tests:

# frozen_string_literal: true
require 'ostruct'
require 'graphql'

QueryType = GraphQL::ObjectType.define do
  name "Query"

  field :unit, types.String
end

Schema = GraphQL::Schema.define do
  query QueryType
  resolve_type ->(obj, ctx) { QueryType }
end

module TestData
  def unit
    nil
  end
end

query_string = '{ Query: __type(name: "Query") { fields { name }}}'
puts Schema.execute(query_string, root_value: TestData, variables: {})

But once I removed the (now useless) TestData module everything works.

After that it was pretty straightforward to get graphql out of the picture:

s_string = :unit.to_s
puts s_string.encoding # => US-ASCII

module Test
  def unit
    nil
  end
end

puts s_string.encoding # => ASCII-8BIT

Interestingly this only breaks if the method is defined within a module or class - top-level functions work without any problem.

Contributor

MSNexploder commented Apr 24, 2017

I finally managed to create a more minimal reproduction 🎉

First I extracted one of the failing tests:

# frozen_string_literal: true
require 'ostruct'
require 'graphql'

QueryType = GraphQL::ObjectType.define do
  name "Query"

  field :unit, types.String
end

Schema = GraphQL::Schema.define do
  query QueryType
  resolve_type ->(obj, ctx) { QueryType }
end

module TestData
  def unit
    nil
  end
end

query_string = '{ Query: __type(name: "Query") { fields { name }}}'
puts Schema.execute(query_string, root_value: TestData, variables: {})

But once I removed the (now useless) TestData module everything works.

After that it was pretty straightforward to get graphql out of the picture:

s_string = :unit.to_s
puts s_string.encoding # => US-ASCII

module Test
  def unit
    nil
  end
end

puts s_string.encoding # => ASCII-8BIT

Interestingly this only breaks if the method is defined within a module or class - top-level functions work without any problem.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 25, 2017

Member

Oh excellent! Ok, this gives me enough to proceed. I have my suspicions.

Member

headius commented Apr 25, 2017

Oh excellent! Ok, this gives me enough to proceed. I have my suspicions.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 25, 2017

Member

OMG I just realized the original string changes encoding. That's super wrong. It must be a String stored in the Symbol, returned directly for to_s, and then later updated by the symbol to have ASCII-8BIT encoding.

Member

headius commented Apr 25, 2017

OMG I just realized the original string changes encoding. That's super wrong. It must be a String stored in the Symbol, returned directly for to_s, and then later updated by the symbol to have ASCII-8BIT encoding.

@headius headius closed this in 01cd84d Apr 25, 2017

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 25, 2017

Member

Wow, this is a great find!

We have long had issues with encoding of symbols. In MRI, all symbols are keyed off their raw bytes, but in the table they store the original encoding. This allows looking them up with a simple key (rather than bytes + encoding) but sometimes leads to a race in deciding the symbol encoding for bytes which are the same but which should be encoded differently.

In our IR we addressed this by having our symbol logic always force newly-created symbols to have the encoding we wanted. Unfortunately if the symbol already existed -- perhaps with a different encoding -- we would mutate the original symbol to have the encoding. So that's bad.

Making matters worse, the underlying "ByteList" is shared by the original symbol and any to_s representation of it. Updating the encoding in the symbol ended up updating it in any to_s strings throughout the system, which was the root cause of this gnarly little issue.

Fixing that led to us realizing that we had many other paths incorrectly using java.lang.String as a representation of the symbol, so I did additional work to actually propagate the original ByteList through as-is. This should be much closer to matching MRI, and it fixes your issue along the way.

Thanks so much for the extra work in reducing this! It was a simple matter to find the problem, and the fix is a great one to have.

Member

headius commented Apr 25, 2017

Wow, this is a great find!

We have long had issues with encoding of symbols. In MRI, all symbols are keyed off their raw bytes, but in the table they store the original encoding. This allows looking them up with a simple key (rather than bytes + encoding) but sometimes leads to a race in deciding the symbol encoding for bytes which are the same but which should be encoded differently.

In our IR we addressed this by having our symbol logic always force newly-created symbols to have the encoding we wanted. Unfortunately if the symbol already existed -- perhaps with a different encoding -- we would mutate the original symbol to have the encoding. So that's bad.

Making matters worse, the underlying "ByteList" is shared by the original symbol and any to_s representation of it. Updating the encoding in the symbol ended up updating it in any to_s strings throughout the system, which was the root cause of this gnarly little issue.

Fixing that led to us realizing that we had many other paths incorrectly using java.lang.String as a representation of the symbol, so I did additional work to actually propagate the original ByteList through as-is. This should be much closer to matching MRI, and it fixes your issue along the way.

Thanks so much for the extra work in reducing this! It was a simple matter to find the problem, and the fix is a great one to have.

@MSNexploder

This comment has been minimized.

Show comment
Hide comment
@MSNexploder

MSNexploder Apr 26, 2017

Contributor

Great to see this fixed!

And many thanks for the additional insights.

Contributor

MSNexploder commented Apr 26, 2017

Great to see this fixed!

And many thanks for the additional insights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment