Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make String#remove and String#remove! accept multiple arguments #17383

Merged
merged 1 commit into from
Nov 3, 2014

Conversation

rwz
Copy link
Contributor

@rwz rwz commented Oct 24, 2014

Quite often when I use String#remove I need to remove multiple patterns from the string at once. This change makes method handle multiple arguments without breaking backwards compatibility.

Before:

["this", "that", /and that/].each do |pattern|
  text.remove! pattern
end

After:

text.remove! "this", "that", /and that/

@@ -264,6 +264,11 @@ def test_remove
assert_equal "Summer", "Fast Summer".remove!(/Fast /)
end

def test_multiple_remove
assert_equal "This is a good day to die".remove(" to ", /die/), "This is a good day"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps add coverage to ensure the dup is used (e.g. original string is not mutated on the remove)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've merely copied the style of existing test, but I guess you're right. Testing that does makes more sense. Will add it.

@egilburg
Copy link
Contributor

Behavior looks nice to me, unfortunately implementation is almost 3 times slower now for likely the most common case (gsub with single String instance):

require 'benchmark'

class String
  def gsub_remove(pattern)
    gsub pattern, ''
  end

  def union_regex_remove(*patterns)
    dup.gsub!(Regexp.union(patterns), '')
  end

  def non_dup_union_regex_remove(*patterns)
    gsub(Regexp.union(patterns), '')
  end

  def hybrid_regex_remove(*patterns)
    pattern = patterns.one? ? patterns.first : Regexp.union(patterns)
    dup.gsub!(pattern, '')
  end

  def non_dup_hybrid_regex_remove(*patterns)
    pattern = patterns.one? ? patterns.first : Regexp.union(patterns)
    gsub(pattern, '')
  end
end

str = 'hello world'
n = 100_000

Benchmark.bm do |x|
  x.report("gsub_remove") do
    n.times do
      str.gsub_remove('world')
    end
  end

  x.report("union_regex_remove") do
    n.times do
      str.union_regex_remove('world')
    end
  end

  x.report("non_dup_union_regex_remove") do
    n.times do
      str.non_dup_union_regex_remove('world')
    end
  end


  x.report("hybrid_regex_remove") do
    n.times do
      str.hybrid_regex_remove('world')
    end
  end

  x.report("non_dup_hybrid_regex_remove") do
    n.times do
      str.hybrid_regex_remove('world')
    end
  end
end
       user     system      total        real
gsub_remove  0.370000   0.000000   0.370000 (  0.373440)
union_regex_remove  1.000000   0.010000   1.010000 (  1.009617)
non_dup_union_regex_remove  0.950000   0.020000   0.970000 (  0.964599)
hybrid_regex_remove  0.520000   0.000000   0.520000 (  0.517759)
non_dup_hybrid_regex_remove  0.510000   0.000000   0.510000 (  0.516304)
=> [  0.370000   0.000000   0.370000 (  0.373440)
,   1.000000   0.010000   1.010000 (  1.009617)
,   0.950000   0.020000   0.970000 (  0.964599)
,   0.520000   0.000000   0.520000 (  0.517759)
,   0.510000   0.000000   0.510000 (  0.516304)
]

Seems the slowness comes due to use of Regexp.union. The use of dup.gsub! over gsub doesn't seem to have meaningful difference.

So the below optimization reduces penalty from 170% worse to 40% worse:

def remove!(*patterns)
  pattern = patterns.one? patterns.first : Regexp.union(patterns)
  pattern.gsub!(pattern '')
end

Still significantly slower, unfortunately, and also a bit more complex.

@rwz
Copy link
Contributor Author

rwz commented Oct 25, 2014

I did some benchmarking and seems like naive each approach performs much better than Regexp.union. Based on the results I've changed implementation to use hybrid since in most common single argument usecase it's only 1.18 times slower than the original.

# benchmark.rb

require "benchmark/ips"

class String
  def remove_original(pattern)
    gsub pattern, ""
  end

  def remove_union(*patterns)
    gsub Regexp.union(patterns), ""
  end

  def remove_each(*patterns)
    dup.tap do |copy|
      patterns.each do |pattern|
        copy.gsub! pattern, ""
      end
    end
  end

  def remove_hybrid(*patterns)
    patterns.one?? gsub(patterns.first, "") : remove_each(*patterns)
  end
end

Benchmark.ips do |x|
  x.report "original"  do
    "This is a good day to die".remove_original("day to die")
  end

  x.report "union" do
    "This is a good day to die".remove_union("day to die")
  end

  x.report "each" do
    "This is a good day to die".remove_each("day to die")
  end

  x.report "hybrid" do
    "This is a good day to die".remove_hybrid("day to die")
  end

  x.compare!
end

Here're the results using Ruby 2.1.3 on my MBA:

Calculating -------------------------------------
            original     19039 i/100ms
               union      7988 i/100ms
                each     14385 i/100ms
              hybrid     17234 i/100ms
-------------------------------------------------
            original   262255.7 (±4.7%) i/s -    1313691 in   5.020144s
               union    94790.4 (±3.9%) i/s -     479280 in   5.064012s
                each   179523.3 (±4.2%) i/s -     906255 in   5.057517s
              hybrid   221898.8 (±4.0%) i/s -    1120210 in   5.056392s

Comparison:
            original:   262255.7 i/s
              hybrid:   221898.8 i/s - 1.18x slower
                each:   179523.3 i/s - 1.46x slower
               union:    94790.4 i/s - 2.77x slower

@rwz
Copy link
Contributor Author

rwz commented Oct 25, 2014

In two-arguments case each is almost two times faster than union.

Benchmark.ips do |x|
  x.report "union" do
    "This is a good day to die".remove_union("day ", /to die/)
  end

  x.report "each" do
    "This is a good day to die".remove_each("day ", /to die/)
  end

  x.report "hybrid" do
    "This is a good day to die".remove_hybrid("day ", /to die/)
  end

  x.compare!
end
Calculating -------------------------------------
               union      5827 i/100ms
                each     10298 i/100ms
              hybrid      9300 i/100ms
-------------------------------------------------
               union    63621.4 (±5.1%) i/s -     320485 in   5.051326s
                each   122117.6 (±4.1%) i/s -     617880 in   5.068228s
              hybrid   110248.9 (±4.2%) i/s -     558000 in   5.070665s

Comparison:
                each:   122117.6 i/s
              hybrid:   110248.9 i/s - 1.11x slower
               union:    63621.4 i/s - 1.92x slower

gsub! pattern, ''
# Alters the string by removing all occurrences of the patterns.
def remove!(*patterns)
return gsub!(patterns.first, "") if patterns.one?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid early return.

if/else is prefered for this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use the .each method there should be no performance benefit for checking .one? . Can just use the each.

Eugene

On Oct 25, 2014, at 2:03 PM, Rafael Mendonça França notifications@github.com wrote:

In activesupport/lib/active_support/core_ext/string/filters.rb:

end

  • Alters the string by removing all occurrences of the pattern. Short-hand for String#gsub!(pattern, '').

  • def remove!(pattern)
  • gsub! pattern, ''
  • Alters the string by removing all occurrences of the patterns.

  • def remove!(*patterns)
  • return gsub!(patterns.first, "") if patterns.one?
    Avoid early return.

if/else is prefered for this case.


Reply to this email directly or view it on GitHub.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaelfranca removed the condition altogether.

@rwz
Copy link
Contributor Author

rwz commented Oct 25, 2014

Checking for number of arguments was only making each slower:

require "benchmark/ips"

class String
  def remove_original!(pattern)
    gsub! pattern, ""
  end

  def remove_each!(*patterns)
    patterns.each do |pattern|
      gsub! pattern, ""
    end

    self
  end

  def remove_hybrid!(*patterns)
    if patterns.one?
      gsub! patterns.first, ""
    else
      patterns.each do |pattern|
        gsub! pattern, ""
      end
    end

    self
  end
end

Benchmark.ips do |x|
  x.report "original" do
    "This is a good day to die".remove_original!("day to die")
  end

  x.report "each" do
    "This is a good day to die".remove_each!("day to die")
  end

  x.report "hybrid" do
    "This is a good day to die".remove_hybrid!("day to die")
  end

  x.compare!
end
Calculating -------------------------------------
            original     18514 i/100ms
                each     17285 i/100ms
              hybrid     16519 i/100ms
-------------------------------------------------
            original   240848.9 (±5.0%) i/s -    1203410 in   5.009171s
                each   221128.7 (±5.2%) i/s -    1106240 in   5.017479s
              hybrid   209163.8 (±4.9%) i/s -    1057216 in   5.067494s

Comparison:
            original:   240848.9 i/s
                each:   221128.7 i/s - 1.09x slower
              hybrid:   209163.8 i/s - 1.15x slower

@rwz
Copy link
Contributor Author

rwz commented Oct 31, 2014

Hey, is there anything else to do to get this merged? Are there still any problems? /cc @rafaelfranca

@rafaelfranca rafaelfranca merged commit 73ad151 into rails:master Nov 3, 2014
rafaelfranca added a commit that referenced this pull request Nov 3, 2014
Make `String#remove` and `String#remove!` accept multiple arguments

Conflicts:
	activesupport/CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants