Spec ./spec/rspec/support/encoded_string_spec.rb:98 passes on MRI when it shouldn't #163

yorickpeterse · 2015-01-31T13:24:06Z

This is an odd case. The spec https://github.com/rspec/rspec-support/blob/master/spec/rspec/support/encoded_string_spec.rb#L98 seems to pass on MRI and fail on Rubinius, but it makes little sense as to why it passes on MRI.

The problem on Rubinius seems to be that it's comparing "\x80" with "?", both encoded using Emacs-Mule. So see what data is used I "instrumented" the code using good ol' p:

diff --git a/spec/rspec/support/encoded_string_spec.rb b/spec/rspec/support/encoded_string_spec.rb
index 6c2b6ce..53c242c 100644
--- a/spec/rspec/support/encoded_string_spec.rb
+++ b/spec/rspec/support/encoded_string_spec.rb
@@ -98,6 +98,11 @@ module RSpec::Support
             it 'forces the encoding and replaces invalid characters with the REPLACE string' do
               resulting_string = build_encoded_string(string, no_converter_encoding).to_s
               expected_string  = EncodedString::REPLACE.force_encoding(no_converter_encoding)
+
+              p build_encoded_string(string, no_converter_encoding).to_s
+              p resulting_string
+              p expected_string
+
               expect(resulting_string).to be_identical_string(expected_string)
             end
           end

On MRI the output of this test is as following:

Run options: include {:focus=>true, :locations=>{"./spec/rspec/support/encoded_string_spec.rb"=>[98]}}

Randomized with seed 6728

RSpec::Support::EncodedString
  #to_s
    when no converter is known for an encoding
"\x80"
"?"
"?"
      forces the encoding and replaces invalid characters with the REPLACE string

Finished in 0.0013 seconds (files took 0.08671 seconds to load)
1 example, 0 failures

Randomized with seed 6728

On Rubinius the output is as following:

Run options: include {:focus=>true, :locations=>{"./spec/rspec/support/encoded_string_spec.rb"=>[98]}}

Randomized with seed 40770

RSpec::Support::EncodedString
  #to_s
    when no converter is known for an encoding
"\x80"
"\x80"
"?"
      forces the encoding and replaces invalid characters with the REPLACE string (FAILED - 1)

Failures:

  1) RSpec::Support::EncodedString#to_s when no converter is known for an encoding forces the encoding and replaces invalid characters with the REPLACE string
     Failure/Error: expect(resulting_string).to be_identical_string(expected_string)
       expected "\x80" (Emacs-Mule) to be identical to "?" (Emacs-Mule)
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-expectations-0e91a2814751/lib/rspec/expectations/fail_with.rb:29:in `fail_with'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-expectations-0e91a2814751/lib/rspec/expectations/handler.rb:40:in `handle_failure'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-expectations-0e91a2814751/lib/rspec/expectations/handler.rb:50:in `handle_matcher'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-expectations-0e91a2814751/lib/rspec/expectations/handler.rb:27:in `with_matcher'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-expectations-0e91a2814751/lib/rspec/expectations/handler.rb:48:in `handle_matcher'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-expectations-0e91a2814751/lib/rspec/expectations/expectation_target.rb:54:in `to'
     # ./spec/rspec/support/encoded_string_spec.rb:106:in `__script__'
     # kernel/common/eval.rb:101:in `instance_exec'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example.rb:177:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example.rb:385:in `with_around_and_singleton_context_hooks'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example.rb:343:in `with_around_example_hooks'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/hooks.rb:474:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/hooks.rb:612:in `run_around_example_hooks_for'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/hooks.rb:474:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example.rb:343:in `with_around_example_hooks'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example.rb:385:in `with_around_and_singleton_context_hooks'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example.rb:174:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example_group.rb:548:in `run_examples'
     # kernel/bootstrap/array.rb:97:in `map'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example_group.rb:544:in `run_examples'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example_group.rb:512:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example_group.rb:513:in `run'
     # kernel/bootstrap/array.rb:97:in `map'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example_group.rb:513:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example_group.rb:513:in `run'
     # kernel/bootstrap/array.rb:97:in `map'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example_group.rb:513:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/runner.rb:110:in `run_specs'
     # kernel/bootstrap/array.rb:97:in `map'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/runner.rb:110:in `run_specs'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/configuration.rb:1526:in `with_suite_hooks'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/runner.rb:109:in `run_specs'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/reporter.rb:62:in `report'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/runner.rb:108:in `run_specs'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/runner.rb:86:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/runner.rb:70:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/runner.rb:38:in `invoke'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/exe/rspec:4:in `__script__'
     # kernel/common/kernel.rb:497:in `load'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bin/rspec:23:in `__script__'
     # kernel/delta/code_loader.rb:66:in `load_script'
     # kernel/delta/code_loader.rb:152:in `load_script'
     # kernel/loader.rb:655:in `script'
     # kernel/loader.rb:809:in `main'

Finished in 0.00769 seconds (files took 0.36474 seconds to load)
1 example, 1 failure

Failed examples:

rspec ./spec/rspec/support/encoded_string_spec.rb:98 # RSpec::Support::EncodedString#to_s when no converter is known for an encoding forces the encoding and replaces invalid characters with the REPLACE string

Randomized with seed 40770

What is interesting here is that the local variable resulting_string appears to be correct (per the test), but calling the method used to build this variable's value results in a different string, regardless of calling it before or after the variable assignment.

If I only add p build_encoded_string(string, no_converter_encoding).to_s to the test it seems to output "?" correctly. This suggests some sort of race condition is going on somewhere. I'm currently trying to see where that might be.

The text was updated successfully, but these errors were encountered:

yorickpeterse · 2015-01-31T13:37:42Z

There are several other places with EncodedString modifies input strings using String#force_encoding. While this doesn't cause the problem of this issue it could potentially muck with user input in ways a user wouldn't expect.

yorickpeterse · 2015-01-31T13:42:35Z

The following snippet is basically what EncodedString does:

"\x80".force_encoding('Emacs-Mule').encode(:invalid => :replace)

On MRI this returns "?" encoded as Emacs-Mule, on Rubinius this returns "\x80", also encoded as as Emacs-Mule. This suggests our encoding conversion API is apparently different compared to MRI.

@brixen You being Mr Encoding himself, does any of the above ring a bell?

See rspec/rspec-support#163 for the background story.

yorickpeterse · 2015-01-31T14:38:31Z

JRuby 1.17.18 seems to have the same problem, it also returns "\x80". However, JRuby 9000 does return "?" here.

myronmarston · 2015-01-31T16:55:00Z

@bf4 recently did alot of research into encoding errors, particularly on MRI, and revamped these specs, so he's best equipped to answer your questions about these specs. @bf4, can you chip in here?

bf4 · 2015-01-31T17:35:49Z

The dup on REPLACE is a good call.

Besides the comment in the referenced test about the meaning of invalid: :replace, another change in Ruby 2.0 is that the default (i.e. no magic comments) __ENCODING__ for a ruby script changed from US-ASCII to UTF-8. See https://bugs.ruby-lang.org/issues/6200 https://github.com/ruby/ruby/blob/aacc35e144/encoding.c#L1741

This can cause some interactions in specs where, for example, the differ spec has a magic comment # encoding: utf-8 so all strings created in that file have a default encoding of UTF-8, totally independent of what the external encoding is. But, any strings created in the differ, which has no magic comment, and returned to the differ_spec, will be US-ASCII in Ruby < 2.0. Things like this can cause some test discrepancies (which I haven't quite decided how to handle)

@yorickpeterse I haven't read the whole history here in great detail. Did you note that I got this spec from rubyspec https://github.com/rubyspec/rubyspec/blob/91ce9f6549/core/string/shared/encode.rb#L12 ?

I think the end state of EncodedString conversions will be something like (off the top of my head, ignoring any invalid bytes)

if compatible_encoding = Encoding.compatible?(string.encoding, @encoding)
  string.encode(compatible_encoding)
elsif compatible_encoding = Encoding.compatible?(string.encoding, Encoding.default_external) 
  string.encode(compatible_encoding)
else
  string.force_encoding(Encoding.default_external) # cannot convert, so force
end

(I still haven't fixed the original bug that got me into this rabbit hole..., an ArgumentError on EncodedString#split

      describe '#split' do
        context 'when the string has an invalid byte sequence' do
          let(:message_with_invalid_byte_sequence) { "\xEF \255 \xAD I have bad bytes".force_encoding(utf8_encoding) }

          it 'normally raises an ArgumentError' do
            expect {
              message_with_invalid_byte_sequence.split("\n")
            }.to raise_error(ArgumentError)
          end

          it 'replaces invalid bytes with the REPLACE string' do
            resulting_array = build_encoded_string(message_with_invalid_byte_sequence, utf8_encoding).split("\n")
            expect(resulting_array).to match [
              a_string_identical_to("? ? ? I have bad bytes")
            ]
          end
        end

)

bf4 · 2015-05-17T13:10:54Z

I believe this has been resolved and may be closed

See rspec/rspec-support#163 for the background story.

yorickpeterse mentioned this issue Jan 31, 2015

Don't modify EncodedString::REPLACE in place. #164

Merged

yorickpeterse pushed a commit to rubinius/rubinius that referenced this issue Jan 31, 2015

Failing spec for String#encode with Emacs-Mule.

aac96d1

See rspec/rspec-support#163 for the background story.

myronmarston closed this as completed May 17, 2015

eregon pushed a commit to ruby/spec that referenced this issue Jun 4, 2015

Failing spec for String#encode with Emacs-Mule.

50005d2

See rspec/rspec-support#163 for the background story.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec ./spec/rspec/support/encoded_string_spec.rb:98 passes on MRI when it shouldn't #163

Spec ./spec/rspec/support/encoded_string_spec.rb:98 passes on MRI when it shouldn't #163

yorickpeterse commented Jan 31, 2015

yorickpeterse commented Jan 31, 2015

yorickpeterse commented Jan 31, 2015

yorickpeterse commented Jan 31, 2015

myronmarston commented Jan 31, 2015

bf4 commented Jan 31, 2015

bf4 commented May 17, 2015

Spec ./spec/rspec/support/encoded_string_spec.rb:98 passes on MRI when it shouldn't #163

Spec ./spec/rspec/support/encoded_string_spec.rb:98 passes on MRI when it shouldn't #163

Comments

yorickpeterse commented Jan 31, 2015

yorickpeterse commented Jan 31, 2015

yorickpeterse commented Jan 31, 2015

yorickpeterse commented Jan 31, 2015

myronmarston commented Jan 31, 2015

bf4 commented Jan 31, 2015

bf4 commented May 17, 2015