Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Ruby 2.4 casecmp? method #100

Open
alejandrovelez7 opened this issue Feb 21, 2020 · 8 comments
Open

Support for Ruby 2.4 casecmp? method #100

alejandrovelez7 opened this issue Feb 21, 2020 · 8 comments

Comments

@alejandrovelez7
Copy link

Is your feature request related to a problem? Please describe.

The suggestion for casecmp can be updated to use the newer Ruby 2.4 method casecmp? instead of casecmp.zero? which requires the user to know the contract for the casecmp method. The ? variation returns truthy or falsy values and is more succinct.

Example:
str.casecmp('ABC').zero? vs. str.casecmp?('ABC')

Describe the solution you'd like

Would it be possible to include casecmp? as a potential Good alternative in the casecmp cop?

Describe alternatives you've considered

Removing the .zero? variation completely might not be preferable if we are supporting versions of Ruby before 2.4.

Additional context

N/A

@alejandrovelez7 alejandrovelez7 changed the title Ruby 2.4 casecmp? method Support for Ruby 2.4 casecmp? method Feb 21, 2020
@bbatsov
Copy link
Contributor

bbatsov commented Feb 21, 2020

Great idea! I guess we'll have to update this cop.

@koic
Copy link
Member

koic commented Feb 21, 2020

I have also considered this method. However this is a performance cop, it is doubtful that casecmp? Is a good case.
https://gist.github.com/koic/bb85f03a534edffc6a52fcdc66e47568

casecmp? is still slower than casecmp.zero? in Ruby 2.7.

# bench.rb
require 'benchmark/ips'

Benchmark.ips do |x|
  x.report('upcase')   { 'String'.upcase == 'string' }
  x.report('downcase') { 'String'.downcase == 'string' }
  x.report('casecmp')  { 'String'.casecmp('string').zero? }
  x.report('casecmp?') { 'String'.casecmp?('string') }
  x.compare!
end
% ruby -v
ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-darwin17]
% ruby bench.rb
Warming up --------------------------------------
              upcase   200.689k i/100ms
            downcase   197.741k i/100ms
             casecmp   225.196k i/100ms
            casecmp?   164.628k i/100ms
Calculating -------------------------------------
              upcase      4.208M (± 7.5%) i/s -     21.072M in   5.038656s
            downcase      4.329M (± 5.1%) i/s -     21.752M in   5.039575s
             casecmp      5.091M (± 3.7%) i/s -     25.447M in   5.005460s
            casecmp?      3.014M (± 5.9%) i/s -     15.146M in   5.047086s

Comparison:
             casecmp:  5090889.2 i/s
            downcase:  4328831.1 i/s - 1.18x  slower
              upcase:  4207647.1 i/s - 1.21x  slower
            casecmp?:  3013803.1 i/s - 1.69x  slower

koic added a commit to koic/ruby that referenced this issue Feb 28, 2020
## Background

`String#casecmp?` method was suggested to RuboCop Performance
to use as a fast method.
rubocop/rubocop-performance#100

However `String#casecmp?` is always slower than `String#casecmp("arg").zero?`.

## Summary

This PR improves `String#casecmp?` and `Symbol#casecmp?` performance
when comparing ASCII characters.
OTOH, in the case of Non-ASCII characters, performance is slightly
degraded by added condition.

And I found ruby#1668 which is essentially the same.

## Benchmark

The following is a benchmark case for `String#casecmp?` with similar cases.

```ruby
require 'benchmark/ips'

puts '* ASCII'

Benchmark.ips do |x|
  x.report('upcase')        { 'String'.upcase == 'string' }
  x.report('downcase')      { 'String'.downcase == 'string' }
  x.report('casecmp.zero?') { 'String'.casecmp('string').zero? }
  x.report('casecmp?')      { 'String'.casecmp?('string') }
  x.compare!
end

puts '* Non-ASCII'

Benchmark.ips do |x|
  x.report('upcase')        { 'äö '.upcase == ('ÄÖÜ') }
  x.report('downcase')      { 'äö '.downcase == ('ÄÖÜ') }
  x.report('casecmp.zero?') { 'äö '.casecmp('ÄÖÜ').zero? }
  x.report('casecmp?')      { 'äö '.casecmp?('ÄÖÜ') }
  x.compare!
end
```

### Before

```console
* ASCII
Warming up --------------------------------------
              upcase   200.428k i/100ms
            downcase   197.503k i/100ms
       casecmp.zero?   216.244k i/100ms
            casecmp?   171.257k i/100ms
Calculating -------------------------------------
              upcase      4.326M (± 2.9%) i/s -     21.646M in       5.008511s
            downcase      4.350M (± 2.3%) i/s -     21.923M in       5.042694s
       casecmp.zero?      4.998M (± 3.7%) i/s -     25.084M in       5.025253s
            casecmp?      3.090M (± 2.9%) i/s -     15.584M in       5.047497s

Comparison:
       casecmp.zero?:  4998357.4 i/s
            downcase:  4349766.0 i/s - 1.15x  slower
              upcase:  4325765.3 i/s - 1.16x  slower
            casecmp?:  3090194.2 i/s - 1.62x  slower

* Non-ASCII
Warming up --------------------------------------
              upcase   137.352k i/100ms
            downcase   136.735k i/100ms
       casecmp.zero?   206.341k i/100ms
            casecmp?    96.326k i/100ms
Calculating -------------------------------------
              upcase      2.215M (± 2.7%) i/s -     11.126M in       5.027582s
            downcase      2.199M (± 2.5%) i/s -     11.076M in       5.038781s
       casecmp.zero?      4.709M (± 1.8%) i/s -     23.729M in       5.041362s
            casecmp?      1.315M (± 1.6%) i/s -      6.646M in       5.054479s

Comparison:
       casecmp.zero?:  4708502.7 i/s
              upcase:  2214590.5 i/s - 2.13x  slower
            downcase:  2199478.9 i/s - 2.14x  slower
            casecmp?:  1315326.8 i/s - 3.58x  slower
```

### After

`casecmp?` performance for ASCII characters will be improved.

```console
* ASCII
Warming up --------------------------------------
              upcase   203.640k i/100ms
            downcase   203.605k i/100ms
       casecmp.zero?   217.033k i/100ms
            casecmp?   227.628k i/100ms
Calculating -------------------------------------
              upcase      4.284M (± 1.4%) i/s -     21.586M in       5.039916s
            downcase      4.228M (± 1.6%) i/s -     21.175M in       5.009956s
       casecmp.zero?      5.202M (± 2.7%) i/s -     26.044M in       5.010598s
            casecmp?      5.308M (± 7.4%) i/s -     26.405M in       5.005934s

Comparison:
            casecmp?:  5308087.4 i/s
       casecmp.zero?:  5201808.4 i/s - same-ish: difference falls within
            error
              upcase:  4283916.1 i/s - 1.24x  slower
            downcase:  4227698.3 i/s - 1.26x  slower

* Non-ASCII
Warming up --------------------------------------
              upcase   134.896k i/100ms
            downcase   133.699k i/100ms
       casecmp.zero?   204.155k i/100ms
            casecmp?    90.405k i/100ms
Calculating -------------------------------------
              upcase      2.160M (± 5.7%) i/s -     10.792M in       5.013980s
            downcase      2.013M (±12.2%) i/s -      9.894M in       4.999781s
       casecmp.zero?      4.457M (± 4.8%) i/s -     22.253M in       5.004385s
            casecmp?      1.257M (± 3.2%) i/s -      6.328M in       5.038371s

Comparison:
       casecmp.zero?:  4457430.6 i/s
              upcase:  2160130.1 i/s - 2.06x  slower
            downcase:  2012931.9 i/s - 2.21x  slower
            casecmp?:  1257370.8 i/s - 3.55x  slower
```

`Symbol#casecmp?` have similar results.

## Other Information

This is an off-topic for this PR.
I inform RuboCop Performance that `String#casecmp` and `String#casecmp?`
behave differently when using Non-ASCII characters.

```ruby
'äöü'.casecmp('ÄÖÜ').zero? #=> false
'äöü'.casecmp?('ÄÖÜ')      #=> true
```
koic added a commit to koic/ruby that referenced this issue Feb 28, 2020
## Background

`String#casecmp?` method was suggested to RuboCop Performance
to use as a fast method.
rubocop/rubocop-performance#100

However `String#casecmp?` is always slower than `String#casecmp("arg").zero?`.

## Summary

This PR improves `String#casecmp?` and `Symbol#casecmp?` performance
when comparing ASCII characters.
OTOH, in the case of Non-ASCII characters, performance is slightly
degraded by added condition.

And I found ruby#1668 which is essentially the same.

## Benchmark

The following is a benchmark case for `String#casecmp?` with similar cases.

```ruby
require 'benchmark/ips'

puts '* ASCII'

Benchmark.ips do |x|
  x.report('upcase')        { 'String'.upcase == 'string' }
  x.report('downcase')      { 'String'.downcase == 'string' }
  x.report('casecmp.zero?') { 'String'.casecmp('string').zero? }
  x.report('casecmp?')      { 'String'.casecmp?('string') }
  x.compare!
end

puts '* Non-ASCII'

Benchmark.ips do |x|
  x.report('upcase')        { 'äö '.upcase == ('ÄÖÜ') }
  x.report('downcase')      { 'äö '.downcase == ('ÄÖÜ') }
  x.report('casecmp.zero?') { 'äö '.casecmp('ÄÖÜ').zero? }
  x.report('casecmp?')      { 'äö '.casecmp?('ÄÖÜ') }
  x.compare!
end
```

### Before

```console
* ASCII
Warming up --------------------------------------
              upcase   200.428k i/100ms
            downcase   197.503k i/100ms
       casecmp.zero?   216.244k i/100ms
            casecmp?   171.257k i/100ms
Calculating -------------------------------------
              upcase      4.326M (± 2.9%) i/s -     21.646M in       5.008511s
            downcase      4.350M (± 2.3%) i/s -     21.923M in       5.042694s
       casecmp.zero?      4.998M (± 3.7%) i/s -     25.084M in       5.025253s
            casecmp?      3.090M (± 2.9%) i/s -     15.584M in       5.047497s

Comparison:
       casecmp.zero?:  4998357.4 i/s
            downcase:  4349766.0 i/s - 1.15x  slower
              upcase:  4325765.3 i/s - 1.16x  slower
            casecmp?:  3090194.2 i/s - 1.62x  slower

* Non-ASCII
Warming up --------------------------------------
              upcase   137.352k i/100ms
            downcase   136.735k i/100ms
       casecmp.zero?   206.341k i/100ms
            casecmp?    96.326k i/100ms
Calculating -------------------------------------
              upcase      2.215M (± 2.7%) i/s -     11.126M in       5.027582s
            downcase      2.199M (± 2.5%) i/s -     11.076M in       5.038781s
       casecmp.zero?      4.709M (± 1.8%) i/s -     23.729M in       5.041362s
            casecmp?      1.315M (± 1.6%) i/s -      6.646M in       5.054479s

Comparison:
       casecmp.zero?:  4708502.7 i/s
              upcase:  2214590.5 i/s - 2.13x  slower
            downcase:  2199478.9 i/s - 2.14x  slower
            casecmp?:  1315326.8 i/s - 3.58x  slower
```

### After

`casecmp?` performance for ASCII characters will be improved.

```console
* ASCII
Warming up --------------------------------------
              upcase   203.640k i/100ms
            downcase   203.605k i/100ms
       casecmp.zero?   217.033k i/100ms
            casecmp?   227.628k i/100ms
Calculating -------------------------------------
              upcase      4.284M (± 1.4%) i/s -     21.586M in       5.039916s
            downcase      4.228M (± 1.6%) i/s -     21.175M in       5.009956s
       casecmp.zero?      5.202M (± 2.7%) i/s -     26.044M in       5.010598s
            casecmp?      5.308M (± 7.4%) i/s -     26.405M in       5.005934s

Comparison:
            casecmp?:  5308087.4 i/s
       casecmp.zero?:  5201808.4 i/s - same-ish: difference falls within error
              upcase:  4283916.1 i/s - 1.24x  slower
            downcase:  4227698.3 i/s - 1.26x  slower

* Non-ASCII
Warming up --------------------------------------
              upcase   134.896k i/100ms
            downcase   133.699k i/100ms
       casecmp.zero?   204.155k i/100ms
            casecmp?    90.405k i/100ms
Calculating -------------------------------------
              upcase      2.160M (± 5.7%) i/s -     10.792M in       5.013980s
            downcase      2.013M (±12.2%) i/s -      9.894M in       4.999781s
       casecmp.zero?      4.457M (± 4.8%) i/s -     22.253M in       5.004385s
            casecmp?      1.257M (± 3.2%) i/s -      6.328M in       5.038371s

Comparison:
       casecmp.zero?:  4457430.6 i/s
              upcase:  2160130.1 i/s - 2.06x  slower
            downcase:  2012931.9 i/s - 2.21x  slower
            casecmp?:  1257370.8 i/s - 3.55x  slower
```

`Symbol#casecmp?` have similar results.

## Other Information

This is an off-topic for this PR.
I inform RuboCop Performance that `String#casecmp` and `String#casecmp?`
behave differently when using Non-ASCII characters.

```ruby
'äöü'.casecmp('ÄÖÜ').zero? #=> false
'äöü'.casecmp?('ÄÖÜ')      #=> true
```
koic added a commit to koic/ruby that referenced this issue Feb 29, 2020
## Background

`String#casecmp?` method was suggested to RuboCop Performance
to use as a fast method.
rubocop/rubocop-performance#100

However `String#casecmp?` is always slower than `String#casecmp("arg").zero?`.

## Summary

This PR improves `String#casecmp?` and `Symbol#casecmp?` performance
when comparing ASCII characters.
OTOH, in the case of Non-ASCII characters, performance is slightly
degraded by added condition.

And I found ruby#1668 which is essentially the same.

## Benchmark

The following is a benchmark case for `String#casecmp?` with similar cases.

```ruby
require 'benchmark/ips'

puts '* ASCII'

Benchmark.ips do |x|
  x.report('upcase')        { 'String'.upcase == 'string' }
  x.report('downcase')      { 'String'.downcase == 'string' }
  x.report('casecmp.zero?') { 'String'.casecmp('string').zero? }
  x.report('casecmp?')      { 'String'.casecmp?('string') }
  x.compare!
end

puts '* Non-ASCII'

Benchmark.ips do |x|
  x.report('upcase')        { 'äö '.upcase == ('ÄÖÜ') }
  x.report('downcase')      { 'äö '.downcase == ('ÄÖÜ') }
  x.report('casecmp.zero?') { 'äö '.casecmp('ÄÖÜ').zero? }
  x.report('casecmp?')      { 'äö '.casecmp?('ÄÖÜ') }
  x.compare!
end
```

### Before

```console
* ASCII
Warming up --------------------------------------
              upcase   200.428k i/100ms
            downcase   197.503k i/100ms
       casecmp.zero?   216.244k i/100ms
            casecmp?   171.257k i/100ms
Calculating -------------------------------------
              upcase      4.326M (± 2.9%) i/s -     21.646M in       5.008511s
            downcase      4.350M (± 2.3%) i/s -     21.923M in       5.042694s
       casecmp.zero?      4.998M (± 3.7%) i/s -     25.084M in       5.025253s
            casecmp?      3.090M (± 2.9%) i/s -     15.584M in       5.047497s

Comparison:
       casecmp.zero?:  4998357.4 i/s
            downcase:  4349766.0 i/s - 1.15x  slower
              upcase:  4325765.3 i/s - 1.16x  slower
            casecmp?:  3090194.2 i/s - 1.62x  slower

* Non-ASCII
Warming up --------------------------------------
              upcase   137.352k i/100ms
            downcase   136.735k i/100ms
       casecmp.zero?   206.341k i/100ms
            casecmp?    96.326k i/100ms
Calculating -------------------------------------
              upcase      2.215M (± 2.7%) i/s -     11.126M in       5.027582s
            downcase      2.199M (± 2.5%) i/s -     11.076M in       5.038781s
       casecmp.zero?      4.709M (± 1.8%) i/s -     23.729M in       5.041362s
            casecmp?      1.315M (± 1.6%) i/s -      6.646M in       5.054479s

Comparison:
       casecmp.zero?:  4708502.7 i/s
              upcase:  2214590.5 i/s - 2.13x  slower
            downcase:  2199478.9 i/s - 2.14x  slower
            casecmp?:  1315326.8 i/s - 3.58x  slower
```

### After

`casecmp?` performance for ASCII characters will be improved.

```console
* ASCII
Warming up --------------------------------------
              upcase   206.368k i/100ms
            downcase   205.971k i/100ms
       casecmp.zero?   225.636k i/100ms
            casecmp?   232.030k i/100ms
Calculating -------------------------------------
              upcase      4.337M (± 1.3%) i/s -     21.875M in       5.045187s
            downcase      4.297M (± 1.5%) i/s -     21.627M in       5.033750s
       casecmp.zero?      5.240M (± 1.4%) i/s -     26.399M in       5.038613s
            casecmp?      5.548M (± 1.5%) i/s -     27.844M in       5.019895s

Comparison:
            casecmp?:  5547954.3 i/s
       casecmp.zero?:  5240466.7 i/s - 1.06x  slower
              upcase:  4336522.3 i/s - 1.28x  slower
            downcase:  4297370.8 i/s - 1.29x  slower

### Non-ASCII
Warming up --------------------------------------
              upcase   141.841k i/100ms
            downcase   141.233k i/100ms
       casecmp.zero?   215.401k i/100ms
            casecmp?    94.625k i/100ms
Calculating -------------------------------------
              upcase      2.238M (± 2.1%) i/s -     11.205M in       5.010247s
            downcase      2.243M (± 1.3%) i/s -     11.299M in       5.038967s
       casecmp.zero?      4.611M (± 2.6%) i/s -     23.048M in       5.001970s
            casecmp?      1.255M (± 2.4%) i/s -      6.340M in       5.055611s

Comparison:
       casecmp.zero?:  4610838.3 i/s
            downcase:  2242621.9 i/s - 2.06x  slower
              upcase:  2237553.1 i/s - 2.06x  slower
            casecmp?:  1254739.1 i/s - 3.67x  slower
```

`Symbol#casecmp?` have similar results.

## Other Information

This is an off-topic for this PR.
I inform RuboCop Performance that `String#casecmp` and `String#casecmp?`
behave differently when using Non-ASCII characters.

```ruby
'äöü'.casecmp('ÄÖÜ').zero? #=> false
'äöü'.casecmp?('ÄÖÜ')      #=> true
```
@AlexWayfer
Copy link

I've reported about this cop in 2017: rubocop/rubocop#4277 (comment)

Ruby >= 2.4: casecmp doesn't work with Unicode, casecmp? is slower than downcase + ==, so downcase + == is good compromise.

@AlexWayfer
Copy link

And there are no actions in fast-ruby, sadly: fastruby/fast-ruby#160

@zverok
Copy link

zverok commented May 5, 2020

Funnily enough, downcase is just plain wrong in edge cases (and that's why casecmp? is slower, probably):

str = "Straße"
str2 = str.upcase
# => "STRASSE" 
str.downcase == str2.downcase
# => false 
# ...because...
[str.downcase, str2.downcase]
# => ["straße", "strasse"] 
# ...so, you might want to...
[str.downcase(:fold), str2.downcase(:fold)]
# => ["strasse", "strasse"]
# ...but you can just
str.casecmp?(str2)
# => true 

So, it seems that:

  1. this cop shouldn't be in rubocop-performance (the three alternatives aren't equivalent at all)
  2. but suggesting casecmp? instead of comparing downcased is a good candidate for main Rubocop's Lint/ department.

@AlexWayfer
Copy link

@zverok good case. Can we use upcase and == instead of downcase? It should work for your examples and be faster than casecmp?.

@zverok
Copy link

zverok commented May 6, 2020

@AlexWayfer counter-example would be EXACTLY the same :) German's "lowercase sharp s" ("ß") generally uppercased as "SS", but "uppercase sharp s" still exists (used for avoiding ambiguity in uppercase passport data, for example). So, peculiarly...

"STRASSE" == "STRAẞE"
# => false
"STRASSE".upcase == "STRAẞE".upcase
# => false 
"STRASSE".casecmp?("STRAẞE")
# => true 

Generally, case folding is a "right" thing to do for case-insensitive comparison, and it is complicated, and there is no "simple shortcut" around it, that's why we need it.

@AlexWayfer
Copy link

@AlexWayfer counter-example would be EXACTLY the same :) German's "lowercase sharp s" ("ß") generally uppercased as "SS", but "uppercase sharp s" still exists (used for avoiding ambiguity in uppercase passport data, for example). So, peculiarly...

Wow, it's interesting. These chars look similar in GitHub, but not in my terminal:

image

Thank you. Then, if Ruby (and other languages, I've checked) works so, let's move this cop as suggested by @zverok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants