match_array is taking too long to finish #1161

RicardoTrindade · 2020-02-13T15:12:24Z

Subject of the issue

Was writing some tests in my project, was using match_array to make some expectations and noticed that it causing some tests to never finish.

Your environment

Ruby version: ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-linux]
rspec-expectations version: 3.9.0

Steps to reproduce

Try and run this test locally and see if it finishes (it will most likely fail nonetheless)

    it do
      a = Array.new(50) { rand(1...9) }
      b = Array.new(50) { rand(1...9) }
      expect(a).to match_array(b)
    end

Expected behavior

The test should finish running (regardless of pass/fail)

Actual behavior

The test never finishes or takes too long to finish (with smaller arrays)

The text was updated successfully, but these errors were encountered:

pirj · 2020-02-13T16:10:59Z

Kind of relates to #685 and #577
Do you think with those things in mind there a good algorithm that would handle matching in linear time?

diei · 2021-03-25T22:38:01Z

I ran into same problem. I use Ruby 2.7.1 and rspec-expectations 3.10.1. Even arrays with only 30 items takes much too long to finish.

pirj · 2021-03-25T22:46:59Z

Would you like to make an analysis of the time complexity of the algorithm, @diei?
Does the fact that it fails or matches affect the time to complete?
Let's dissect this.
Here is the source to get you going

rspec-expectations/lib/rspec/matchers/built_in/contain_exactly.rb

Line 83 in 43bf64b

values_match?(safe_sort(expected), safe_sort(actual))

diei · 2021-03-26T00:36:46Z

I think the complexity is factorial.

best_solution_for_pairing is called actuals size times:

rspec-expectations/lib/rspec/matchers/built_in/contain_exactly.rb

Line 236 in 43bf64b

actuals.each do |actual_index|

And in best_solution_for_pairing a new PairingsMaximizer is created and find_best_solution is called with the actuals and expected arrays reduced by one item.

rspec-expectations/lib/rspec/matchers/built_in/contain_exactly.rb

Line 288 in 43bf64b

    
           solution + self.class.new(modified_expecteds, modified_actuals).find_best_solution

In best case match_array finishes in milliseconds, even with arrays having 1000 items.

pirj · 2021-03-26T16:15:53Z

So it seems that sorting doesn't really help, as e.g. there's not always a correlation between sorting and matching match_array(/ab/, /bc/, /cd/)?
Would it be possible to understand when we're comparing literals so sorting would always help (i.e. if a < b < c means that a === b is false)?
Or can we check elements one by one, e.g. if the first didn't match - skip the rest?

There might be an algorithm out there in CS even for a complex case where sorting won't help, I'm just personally unaware which one would fit best.

diei · 2021-04-07T10:59:29Z

I'm sorry, but I have not the capacity to dig into this as deep as needed.

…ents obey transitivity This is a proof of concept approach for addressing issue rspec#1161. The current implementation for ContainExactly runs in O(n!). In practice, it runs in O(n log n) when the elements are comparable and sorting result in a match. The crux of the problem is that some elements don't obey transitivity. As a result, knowing that sorting actual and expected doesn't result in a match *doesn't* guarantee that expected and actual don't match. This proof of concept provides a way for the user to indicate that the elements in a particular example's expected and actual obey transitivity. That looks like this: expect(a).to contain_exactly(*b).transitive And runs in O(n log n) time. More practically, this means that common use cases for contains_exactly will enjoy a massive speedup. Previously, users have examples where comparing arrays of 30 integers "never finishes." Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s on my machine.

bclayman-sq · 2021-10-06T23:20:01Z

@pirj I think your intuition is exactly right. I've put out a small proof of concept PR to explore this idea and how it might work in code!

…ents obey transitivity This is a proof of concept approach for addressing issue rspec#1161. The current implementation for ContainExactly runs in O(n!). In practice, it runs in O(n log n) when the elements are comparable and sorting result in a match. The crux of the problem is that some elements don't obey transitivity. As a result, knowing that sorting actual and expected doesn't result in a match *doesn't* guarantee that expected and actual don't match. This proof of concept provides a way for the user to indicate that the elements in a particular example's expected and actual obey transitivity. That looks like this: expect(a).to contain_exactly(*b).transitive And runs in O(n log n) time. More practically, this means that common use cases for contains_exactly will enjoy a massive speedup. Previously, users have examples where comparing arrays of 30 integers "never finishes." Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s on my machine.

@genehsu

Speed up the ContainExactly matcher by pre-emptively matching up corresponding elements in the expected and actual arrays. This addresses rspec#1006, rspec#1161. This PR is a collaboration between me and @genehsu based on a couple of our earlier PRs and discussion that resulted: 1) rspec#1325 2) rspec#1328 Co-authored-by: Gene Hsu (@genehsu)

@genehsu

Speed up the ContainExactly matcher by pre-emptively matching up corresponding elements in the expected and actual arrays. This addresses rspec#1006, rspec#1161. This PR is a collaboration between me and @genehsu based on a couple of our earlier PRs and discussion that resulted: 1) rspec#1325 2) rspec#1328 Co-authored-by: Gene Hsu (@genehsu)

@genehsu

Speed up the ContainExactly matcher by pre-emptively matching up corresponding elements in the expected and actual arrays. This addresses rspec#1006, rspec#1161. This PR is a collaboration between me and @genehsu based on a couple of our earlier PRs and discussion that resulted: 1) rspec#1325 2) rspec#1328 Co-authored-by: Gene Hsu (@genehsu)

pirj mentioned this issue Jun 16, 2020

Let's change the default for RSpec/PredicateMatcher rubocop/rubocop-rspec#919

Open

ojab mentioned this issue Sep 10, 2021

include matcher is not diffable enough #1321

Closed

bclayman-sq mentioned this issue Oct 6, 2021

Improve ContainExactly matcher speed when elements obey transitivity #1325

Closed

pirj mentioned this issue Oct 26, 2021

Speed up the ContainExactly matcher #1328

Closed

bclayman-sq mentioned this issue Oct 29, 2021

Speed up the ContainExactly matcher #1333

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

match_array is taking too long to finish #1161

match_array is taking too long to finish #1161

RicardoTrindade commented Feb 13, 2020

pirj commented Feb 13, 2020

diei commented Mar 25, 2021

pirj commented Mar 25, 2021

diei commented Mar 26, 2021 •

edited

pirj commented Mar 26, 2021

diei commented Apr 7, 2021

bclayman-sq commented Oct 6, 2021

match_array is taking too long to finish #1161

match_array is taking too long to finish #1161

Comments

RicardoTrindade commented Feb 13, 2020

Subject of the issue

Your environment

Steps to reproduce

Expected behavior

Actual behavior

pirj commented Feb 13, 2020

diei commented Mar 25, 2021

pirj commented Mar 25, 2021

diei commented Mar 26, 2021 • edited

pirj commented Mar 26, 2021

diei commented Apr 7, 2021

bclayman-sq commented Oct 6, 2021

diei commented Mar 26, 2021 •

edited