Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undesired string literal replacements when UTF8 locale not set in ENV #3311

Closed
johanlunds opened this issue Jul 11, 2016 · 4 comments
Closed

Comments

@johanlunds
Copy link

We upgraded Rubocop from 0.38.0 to 0.41.2 in our app. Suddenly Rubocop wanted to replace stuff like:

  • "En kort men bra text varför jag ska försöka erövra denna trofé"
  • with "En kort men bra text varf\u00F6r jag ska f\u00F6rs\u00F6ka er\u00F6vra denna trof\u00E9".

However the first variant is valid UTF-8, and much more readable.

It turns out that our locale in ENV was not set up for UTF-8. See below for a reproducible example.

This is more of a FYI, since it's something new introduced between 0.38.0 and 0.41.2 that could be very confusing and irritating to end-users. I don't know if it should be fixed, just documented or nothing needs to be done about this. It took me some time to figure out Rubocop works as desired if I set my locale correctly.


Expected behavior

  • That both of the test runs below should pass.
  • That Rubocop wouldn't be dependent on locale in ENV.

Actual behavior

One of the tests fail after changing locale in ENV.

Steps to reproduce the problem

vagrant@vagrant-machine:~/temp/rubocop$ export LC_ALL=en_US.UTF-8
vagrant@vagrant-machine:~/temp/rubocop$ export LANG=en_US.UTF-8
vagrant@vagrant-machine:~/temp/rubocop$ export LANGUAGE=en_US.UTF-8
vagrant@vagrant-machine:~/temp/rubocop$ bundle exec rspec ./spec/rubocop/cop/style/string_literals_spec.rb
Run options:
  include {:focus=>true}
  exclude {:broken=>#<Proc:./spec/spec_helper.rb:33>}

All examples were filtered out; ignoring {:focus=>true}

Randomized with seed 17674
....................................................

Finished in 0.17152 seconds (files took 0.48332 seconds to load)
52 examples, 0 failures

Randomized with seed 17674

vagrant@vagrant-machine:~/temp/rubocop$ export LANGUAGE=en_US
vagrant@vagrant-machine:~/temp/rubocop$ export LANG=en_US
vagrant@vagrant-machine:~/temp/rubocop$ export LC_ALL=en_US
vagrant@vagrant-machine:~/temp/rubocop$ bundle exec rspec ./spec/rubocop/cop/style/string_literals_spec.rb
Run options:
  include {:focus=>true}
  exclude {:broken=>#<Proc:./spec/spec_helper.rb:33>}

All examples were filtered out; ignoring {:focus=>true}

Randomized with seed 9774
....................................F...............

Failures:

  1) RuboCop::Cop::Style::StringLiterals configured with single quotes preferred autocorrects words with non-ascii chars
     Failure/Error: expect(new_source).to eq("'España'")

       expected: "'Espa\u00F1a'"
            got: "\"Espa\\u00F1a\""

       (compared using ==)
     # ./spec/rubocop/cop/style/string_literals_spec.rb:177:in `block (3 levels) in <top (required)>'

Finished in 0.16386 seconds (files took 0.52398 seconds to load)
52 examples, 1 failure

Failed examples:

rspec ./spec/rubocop/cop/style/string_literals_spec.rb:175 # RuboCop::Cop::Style::StringLiterals configured with single quotes preferred autocorrects words with non-ascii chars

Randomized with seed 9774

RuboCop version

vagrant@vagrant-machine:~/temp/rubocop$ ./bin/rubocop -V
0.41.2 (using Parser 2.3.1.2, running on ruby 2.3.0 x86_64-linux)

vagrant@vagrant-machine:~/temp/rubocop$ git rev-parse HEAD
77ebfd6113e1a0077b400c937c28be5e82022c22
@Drenmi
Copy link
Collaborator

Drenmi commented Jul 12, 2016

I can confirm this is happening even with a # encoding: utf-8 directive at the top of the file.

@deivid-rodriguez
Copy link
Contributor

I fixed the previous issues related to this, so I'll try to have a look at this one too.

bbatsov added a commit that referenced this issue Sep 29, 2016
[Fix #3311] Detect encoding incompatibilities to spare bad autocorrection
@johanlunds
Copy link
Author

❤️

Neodelf pushed a commit to Neodelf/rubocop that referenced this issue Oct 15, 2016
When the external encoding is non unicode, but there are pure unicode
characters in the source, parser replaces those with their equivalent
control sequences, ending up in a bad autocorrection.

Detect those cases.

Also added a couple of regression tests and enabled running the whole
test suite against ASCII external encoding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants