Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec ./spec/rspec/support/encoded_string_spec.rb:98 passes on MRI when it shouldn't #163

Closed
yorickpeterse opened this issue Jan 31, 2015 · 6 comments

Comments

@yorickpeterse
Copy link
Contributor

This is an odd case. The spec https://github.com/rspec/rspec-support/blob/master/spec/rspec/support/encoded_string_spec.rb#L98 seems to pass on MRI and fail on Rubinius, but it makes little sense as to why it passes on MRI.

The problem on Rubinius seems to be that it's comparing "\x80" with "?", both encoded using Emacs-Mule. So see what data is used I "instrumented" the code using good ol' p:

diff --git a/spec/rspec/support/encoded_string_spec.rb b/spec/rspec/support/encoded_string_spec.rb
index 6c2b6ce..53c242c 100644
--- a/spec/rspec/support/encoded_string_spec.rb
+++ b/spec/rspec/support/encoded_string_spec.rb
@@ -98,6 +98,11 @@ module RSpec::Support
             it 'forces the encoding and replaces invalid characters with the REPLACE string' do
               resulting_string = build_encoded_string(string, no_converter_encoding).to_s
               expected_string  = EncodedString::REPLACE.force_encoding(no_converter_encoding)
+
+              p build_encoded_string(string, no_converter_encoding).to_s
+              p resulting_string
+              p expected_string
+
               expect(resulting_string).to be_identical_string(expected_string)
             end
           end

On MRI the output of this test is as following:

Run options: include {:focus=>true, :locations=>{"./spec/rspec/support/encoded_string_spec.rb"=>[98]}}

Randomized with seed 6728

RSpec::Support::EncodedString
  #to_s
    when no converter is known for an encoding
"\x80"
"?"
"?"
      forces the encoding and replaces invalid characters with the REPLACE string

Finished in 0.0013 seconds (files took 0.08671 seconds to load)
1 example, 0 failures

Randomized with seed 6728

On Rubinius the output is as following:

Run options: include {:focus=>true, :locations=>{"./spec/rspec/support/encoded_string_spec.rb"=>[98]}}

Randomized with seed 40770

RSpec::Support::EncodedString
  #to_s
    when no converter is known for an encoding
"\x80"
"\x80"
"?"
      forces the encoding and replaces invalid characters with the REPLACE string (FAILED - 1)

Failures:

  1) RSpec::Support::EncodedString#to_s when no converter is known for an encoding forces the encoding and replaces invalid characters with the REPLACE string
     Failure/Error: expect(resulting_string).to be_identical_string(expected_string)
       expected "\x80" (Emacs-Mule) to be identical to "?" (Emacs-Mule)
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-expectations-0e91a2814751/lib/rspec/expectations/fail_with.rb:29:in `fail_with'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-expectations-0e91a2814751/lib/rspec/expectations/handler.rb:40:in `handle_failure'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-expectations-0e91a2814751/lib/rspec/expectations/handler.rb:50:in `handle_matcher'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-expectations-0e91a2814751/lib/rspec/expectations/handler.rb:27:in `with_matcher'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-expectations-0e91a2814751/lib/rspec/expectations/handler.rb:48:in `handle_matcher'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-expectations-0e91a2814751/lib/rspec/expectations/expectation_target.rb:54:in `to'
     # ./spec/rspec/support/encoded_string_spec.rb:106:in `__script__'
     # kernel/common/eval.rb:101:in `instance_exec'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example.rb:177:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example.rb:385:in `with_around_and_singleton_context_hooks'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example.rb:343:in `with_around_example_hooks'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/hooks.rb:474:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/hooks.rb:612:in `run_around_example_hooks_for'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/hooks.rb:474:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example.rb:343:in `with_around_example_hooks'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example.rb:385:in `with_around_and_singleton_context_hooks'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example.rb:174:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example_group.rb:548:in `run_examples'
     # kernel/bootstrap/array.rb:97:in `map'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example_group.rb:544:in `run_examples'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example_group.rb:512:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example_group.rb:513:in `run'
     # kernel/bootstrap/array.rb:97:in `map'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example_group.rb:513:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example_group.rb:513:in `run'
     # kernel/bootstrap/array.rb:97:in `map'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/example_group.rb:513:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/runner.rb:110:in `run_specs'
     # kernel/bootstrap/array.rb:97:in `map'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/runner.rb:110:in `run_specs'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/configuration.rb:1526:in `with_suite_hooks'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/runner.rb:109:in `run_specs'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/reporter.rb:62:in `report'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/runner.rb:108:in `run_specs'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/runner.rb:86:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/runner.rb:70:in `run'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/lib/rspec/core/runner.rb:38:in `invoke'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bundler/gems/rspec-core-655a52b03fd0/exe/rspec:4:in `__script__'
     # kernel/common/kernel.rb:497:in `load'
     # /home/yorickpeterse/.gem/rbx/2.1.0/bin/rspec:23:in `__script__'
     # kernel/delta/code_loader.rb:66:in `load_script'
     # kernel/delta/code_loader.rb:152:in `load_script'
     # kernel/loader.rb:655:in `script'
     # kernel/loader.rb:809:in `main'

Finished in 0.00769 seconds (files took 0.36474 seconds to load)
1 example, 1 failure

Failed examples:

rspec ./spec/rspec/support/encoded_string_spec.rb:98 # RSpec::Support::EncodedString#to_s when no converter is known for an encoding forces the encoding and replaces invalid characters with the REPLACE string

Randomized with seed 40770

What is interesting here is that the local variable resulting_string appears to be correct (per the test), but calling the method used to build this variable's value results in a different string, regardless of calling it before or after the variable assignment.

If I only add p build_encoded_string(string, no_converter_encoding).to_s to the test it seems to output "?" correctly. This suggests some sort of race condition is going on somewhere. I'm currently trying to see where that might be.

@yorickpeterse
Copy link
Contributor Author

There are several other places with EncodedString modifies input strings using String#force_encoding. While this doesn't cause the problem of this issue it could potentially muck with user input in ways a user wouldn't expect.

@yorickpeterse
Copy link
Contributor Author

The following snippet is basically what EncodedString does:

"\x80".force_encoding('Emacs-Mule').encode(:invalid => :replace) 

On MRI this returns "?" encoded as Emacs-Mule, on Rubinius this returns "\x80", also encoded as as Emacs-Mule. This suggests our encoding conversion API is apparently different compared to MRI.

@brixen You being Mr Encoding himself, does any of the above ring a bell?

yorickpeterse pushed a commit to rubinius/rubinius that referenced this issue Jan 31, 2015
@yorickpeterse
Copy link
Contributor Author

JRuby 1.17.18 seems to have the same problem, it also returns "\x80". However, JRuby 9000 does return "?" here.

@myronmarston
Copy link
Member

@bf4 recently did alot of research into encoding errors, particularly on MRI, and revamped these specs, so he's best equipped to answer your questions about these specs. @bf4, can you chip in here?

@bf4
Copy link
Contributor

bf4 commented Jan 31, 2015

The dup on REPLACE is a good call.

Besides the comment in the referenced test about the meaning of invalid: :replace, another change in Ruby 2.0 is that the default (i.e. no magic comments) __ENCODING__ for a ruby script changed from US-ASCII to UTF-8. See https://bugs.ruby-lang.org/issues/6200 https://github.com/ruby/ruby/blob/aacc35e144/encoding.c#L1741

This can cause some interactions in specs where, for example, the differ spec has a magic comment # encoding: utf-8 so all strings created in that file have a default encoding of UTF-8, totally independent of what the external encoding is. But, any strings created in the differ, which has no magic comment, and returned to the differ_spec, will be US-ASCII in Ruby < 2.0. Things like this can cause some test discrepancies (which I haven't quite decided how to handle)

@yorickpeterse I haven't read the whole history here in great detail. Did you note that I got this spec from rubyspec https://github.com/rubyspec/rubyspec/blob/91ce9f6549/core/string/shared/encode.rb#L12 ?

I think the end state of EncodedString conversions will be something like (off the top of my head, ignoring any invalid bytes)

if compatible_encoding = Encoding.compatible?(string.encoding, @encoding)
  string.encode(compatible_encoding)
elsif compatible_encoding = Encoding.compatible?(string.encoding, Encoding.default_external) 
  string.encode(compatible_encoding)
else
  string.force_encoding(Encoding.default_external) # cannot convert, so force
end

(I still haven't fixed the original bug that got me into this rabbit hole..., an ArgumentError on EncodedString#split

      describe '#split' do
        context 'when the string has an invalid byte sequence' do
          let(:message_with_invalid_byte_sequence) { "\xEF \255 \xAD I have bad bytes".force_encoding(utf8_encoding) }

          it 'normally raises an ArgumentError' do
            expect {
              message_with_invalid_byte_sequence.split("\n")
            }.to raise_error(ArgumentError)
          end

          it 'replaces invalid bytes with the REPLACE string' do
            resulting_array = build_encoded_string(message_with_invalid_byte_sequence, utf8_encoding).split("\n")
            expect(resulting_array).to match [
              a_string_identical_to("? ? ? I have bad bytes")
            ]
          end
        end

)

@bf4
Copy link
Contributor

bf4 commented May 17, 2015

I believe this has been resolved and may be closed

eregon pushed a commit to ruby/spec that referenced this issue Jun 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants