Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

match_array is taking too long to finish #1161

Open
RicardoTrindade opened this issue Feb 13, 2020 · 7 comments
Open

match_array is taking too long to finish #1161

RicardoTrindade opened this issue Feb 13, 2020 · 7 comments

Comments

@RicardoTrindade
Copy link

Subject of the issue

Was writing some tests in my project, was using match_array to make some expectations and noticed that it causing some tests to never finish.

Your environment

  • Ruby version: ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-linux]
  • rspec-expectations version: 3.9.0

Steps to reproduce

Try and run this test locally and see if it finishes (it will most likely fail nonetheless)

    it do
      a = Array.new(50) { rand(1...9) }
      b = Array.new(50) { rand(1...9) }
      expect(a).to match_array(b)
    end

Expected behavior

The test should finish running (regardless of pass/fail)

Actual behavior

The test never finishes or takes too long to finish (with smaller arrays)

@pirj
Copy link
Member

pirj commented Feb 13, 2020

Kind of relates to #685 and #577
Do you think with those things in mind there a good algorithm that would handle matching in linear time?

@diei
Copy link

diei commented Mar 25, 2021

I ran into same problem. I use Ruby 2.7.1 and rspec-expectations 3.10.1. Even arrays with only 30 items takes much too long to finish.

@pirj
Copy link
Member

pirj commented Mar 25, 2021

Would you like to make an analysis of the time complexity of the algorithm, @diei?
Does the fact that it fails or matches affect the time to complete?
Let's dissect this.
Here is the source to get you going

values_match?(safe_sort(expected), safe_sort(actual))

@diei
Copy link

diei commented Mar 26, 2021

I think the complexity is factorial.

best_solution_for_pairing is called actuals size times:

And in best_solution_for_pairing a new PairingsMaximizer is created and find_best_solution is called with the actuals and expected arrays reduced by one item.

solution + self.class.new(modified_expecteds, modified_actuals).find_best_solution

In best case match_array finishes in milliseconds, even with arrays having 1000 items.

@pirj
Copy link
Member

pirj commented Mar 26, 2021

So it seems that sorting doesn't really help, as e.g. there's not always a correlation between sorting and matching match_array(/ab/, /bc/, /cd/)?
Would it be possible to understand when we're comparing literals so sorting would always help (i.e. if a < b < c means that a === b is false)?
Or can we check elements one by one, e.g. if the first didn't match - skip the rest?

There might be an algorithm out there in CS even for a complex case where sorting won't help, I'm just personally unaware which one would fit best.

@diei
Copy link

diei commented Apr 7, 2021

I'm sorry, but I have not the capacity to dig into this as deep as needed.

bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 6, 2021
…ents obey transitivity

This is a proof of concept approach for addressing issue rspec#1161.

The current implementation for ContainExactly runs in O(n!).  In practice,
it runs in O(n log n) when the elements are comparable and sorting result
in a match.

The crux of the problem is that some elements don't obey transitivity.  As
a result, knowing that sorting actual and expected doesn't result in a match
*doesn't* guarantee that expected and actual don't match.

This proof of concept provides a way for the user to indicate that the elements
in a particular example's expected and actual obey transitivity.

That looks like this:

expect(a).to contain_exactly(*b).transitive

And runs in O(n log n) time.  More practically, this means that common
use cases for contains_exactly will enjoy a massive speedup.  Previously,
users have examples where comparing arrays of 30 integers "never finishes."

Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s
on my machine.
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 6, 2021
…ents obey transitivity

This is a proof of concept approach for addressing issue rspec#1161.

The current implementation for ContainExactly runs in O(n!).  In practice,
it runs in O(n log n) when the elements are comparable and sorting result
in a match.

The crux of the problem is that some elements don't obey transitivity.  As
a result, knowing that sorting actual and expected doesn't result in a match
*doesn't* guarantee that expected and actual don't match.

This proof of concept provides a way for the user to indicate that the elements
in a particular example's expected and actual obey transitivity.

That looks like this:

expect(a).to contain_exactly(*b).transitive

And runs in O(n log n) time.  More practically, this means that common
use cases for contains_exactly will enjoy a massive speedup.  Previously,
users have examples where comparing arrays of 30 integers "never finishes."

Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s
on my machine.
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 6, 2021
…ents obey transitivity

This is a proof of concept approach for addressing issue rspec#1161.

The current implementation for ContainExactly runs in O(n!).  In practice,
it runs in O(n log n) when the elements are comparable and sorting result
in a match.

The crux of the problem is that some elements don't obey transitivity.  As
a result, knowing that sorting actual and expected doesn't result in a match
*doesn't* guarantee that expected and actual don't match.

This proof of concept provides a way for the user to indicate that the elements
in a particular example's expected and actual obey transitivity. That looks like this:

expect(a).to contain_exactly(*b).transitive

And runs in O(n log n) time.  More practically, this means that common
use cases for contains_exactly will enjoy a massive speedup.  Previously,
users have examples where comparing arrays of 30 integers "never finishes."
Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s
on my machine.
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 6, 2021
…ents obey transitivity

This is a proof of concept approach for addressing issue rspec#1161.

The current implementation for ContainExactly runs in O(n!).  In practice,
it runs in O(n log n) when the elements are comparable and sorting result
in a match.

The crux of the problem is that some elements don't obey transitivity.  As
a result, knowing that sorting actual and expected doesn't result in a match
*doesn't* guarantee that expected and actual don't match.

This proof of concept provides a way for the user to indicate that the elements
in a particular example's expected and actual obey transitivity. That looks like this:

expect(a).to contain_exactly(*b).transitive

And runs in O(n log n) time.  More practically, this means that common
use cases for contains_exactly will enjoy a massive speedup.  Previously,
users have examples where comparing arrays of 30 integers "never finishes."
Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s
on my machine.
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 6, 2021
…ents obey transitivity

This is a proof of concept approach for addressing issue rspec#1161.

The current implementation for ContainExactly runs in O(n!).  In practice,
it runs in O(n log n) when the elements are comparable and sorting result
in a match.

The crux of the problem is that some elements don't obey transitivity.  As
a result, knowing that sorting actual and expected doesn't result in a match
*doesn't* guarantee that expected and actual don't match.

This proof of concept provides a way for the user to indicate that the elements
in a particular example's expected and actual obey transitivity. That looks like this:

expect(a).to contain_exactly(*b).transitive

And runs in O(n log n) time.  More practically, this means that common
use cases for contains_exactly will enjoy a massive speedup.  Previously,
users have examples where comparing arrays of 30 integers "never finishes."
Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s
on my machine.
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 6, 2021
…ents obey transitivity

This is a proof of concept approach for addressing issue rspec#1161.

The current implementation for ContainExactly runs in O(n!).  In practice,
it runs in O(n log n) when the elements are comparable and sorting result
in a match.

The crux of the problem is that some elements don't obey transitivity.  As
a result, knowing that sorting actual and expected doesn't result in a match
*doesn't* guarantee that expected and actual don't match.

This proof of concept provides a way for the user to indicate that the elements
in a particular example's expected and actual obey transitivity. That looks like this:

expect(a).to contain_exactly(*b).transitive

And runs in O(n log n) time.  More practically, this means that common
use cases for contains_exactly will enjoy a massive speedup.  Previously,
users have examples where comparing arrays of 30 integers "never finishes."
Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s
on my machine.
@bclayman-sq
Copy link
Contributor

@pirj I think your intuition is exactly right. I've put out a small proof of concept PR to explore this idea and how it might work in code!

bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 7, 2021
…ents obey transitivity

This is a proof of concept approach for addressing issue rspec#1161.

The current implementation for ContainExactly runs in O(n!).  In practice,
it runs in O(n log n) when the elements are comparable and sorting result
in a match.

The crux of the problem is that some elements don't obey transitivity.  As
a result, knowing that sorting actual and expected doesn't result in a match
*doesn't* guarantee that expected and actual don't match.

This proof of concept provides a way for the user to indicate that the elements
in a particular example's expected and actual obey transitivity. That looks like this:

expect(a).to contain_exactly(*b).transitive

And runs in O(n log n) time.  More practically, this means that common
use cases for contains_exactly will enjoy a massive speedup.  Previously,
users have examples where comparing arrays of 30 integers "never finishes."
Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s
on my machine.
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 7, 2021
…ents obey transitivity

This is a proof of concept approach for addressing issue rspec#1161.

The current implementation for ContainExactly runs in O(n!).  In practice,
it runs in O(n log n) when the elements are comparable and sorting result
in a match.

The crux of the problem is that some elements don't obey transitivity.  As
a result, knowing that sorting actual and expected doesn't result in a match
*doesn't* guarantee that expected and actual don't match.

This proof of concept provides a way for the user to indicate that the elements
in a particular example's expected and actual obey transitivity. That looks like this:

expect(a).to contain_exactly(*b).transitive

And runs in O(n log n) time.  More practically, this means that common
use cases for contains_exactly will enjoy a massive speedup.  Previously,
users have examples where comparing arrays of 30 integers "never finishes."
Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s
on my machine.
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 7, 2021
…ents obey transitivity

This is a proof of concept approach for addressing issue rspec#1161.

The current implementation for ContainExactly runs in O(n!).  In practice,
it runs in O(n log n) when the elements are comparable and sorting result
in a match.

The crux of the problem is that some elements don't obey transitivity.  As
a result, knowing that sorting actual and expected doesn't result in a match
*doesn't* guarantee that expected and actual don't match.

This proof of concept provides a way for the user to indicate that the elements
in a particular example's expected and actual obey transitivity. That looks like this:

expect(a).to contain_exactly(*b).transitive

And runs in O(n log n) time.  More practically, this means that common
use cases for contains_exactly will enjoy a massive speedup.  Previously,
users have examples where comparing arrays of 30 integers "never finishes."
Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s
on my machine.
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 8, 2021
…ents obey transitivity

This is a proof of concept approach for addressing issue rspec#1161.

The current implementation for ContainExactly runs in O(n!).  In practice,
it runs in O(n log n) when the elements are comparable and sorting result
in a match.

The crux of the problem is that some elements don't obey transitivity.  As
a result, knowing that sorting actual and expected doesn't result in a match
*doesn't* guarantee that expected and actual don't match.

This proof of concept provides a way for the user to indicate that the elements
in a particular example's expected and actual obey transitivity. That looks like this:

expect(a).to contain_exactly(*b).transitive

And runs in O(n log n) time.  More practically, this means that common
use cases for contains_exactly will enjoy a massive speedup.  Previously,
users have examples where comparing arrays of 30 integers "never finishes."
Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s
on my machine.
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 8, 2021
…ents obey transitivity

This is a proof of concept approach for addressing issue rspec#1161.

The current implementation for ContainExactly runs in O(n!).  In practice,
it runs in O(n log n) when the elements are comparable and sorting result
in a match.

The crux of the problem is that some elements don't obey transitivity.  As
a result, knowing that sorting actual and expected doesn't result in a match
*doesn't* guarantee that expected and actual don't match.

This proof of concept provides a way for the user to indicate that the elements
in a particular example's expected and actual obey transitivity. That looks like this:

expect(a).to contain_exactly(*b).transitive

And runs in O(n log n) time.  More practically, this means that common
use cases for contains_exactly will enjoy a massive speedup.  Previously,
users have examples where comparing arrays of 30 integers "never finishes."
Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s
on my machine.
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 8, 2021
…ents obey transitivity

This is a proof of concept approach for addressing issue rspec#1161.

The current implementation for ContainExactly runs in O(n!).  In practice,
it runs in O(n log n) when the elements are comparable and sorting result
in a match.

The crux of the problem is that some elements don't obey transitivity.  As
a result, knowing that sorting actual and expected doesn't result in a match
*doesn't* guarantee that expected and actual don't match.

This proof of concept provides a way for the user to indicate that the elements
in a particular example's expected and actual obey transitivity. That looks like this:

expect(a).to contain_exactly(*b).transitive

And runs in O(n log n) time.  More practically, this means that common
use cases for contains_exactly will enjoy a massive speedup.  Previously,
users have examples where comparing arrays of 30 integers "never finishes."
Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s
on my machine.
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 9, 2021
…ents obey transitivity

This is a proof of concept approach for addressing issue rspec#1161.

The current implementation for ContainExactly runs in O(n!).  In practice,
it runs in O(n log n) when the elements are comparable and sorting result
in a match.

The crux of the problem is that some elements don't obey transitivity.  As
a result, knowing that sorting actual and expected doesn't result in a match
*doesn't* guarantee that expected and actual don't match.

This proof of concept provides a way for the user to indicate that the elements
in a particular example's expected and actual obey transitivity. That looks like this:

expect(a).to contain_exactly(*b).transitive

And runs in O(n log n) time.  More practically, this means that common
use cases for contains_exactly will enjoy a massive speedup.  Previously,
users have examples where comparing arrays of 30 integers "never finishes."
Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s
on my machine.
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 9, 2021
…ents obey transitivity

This is a proof of concept approach for addressing issue rspec#1161.

The current implementation for ContainExactly runs in O(n!).  In practice,
it runs in O(n log n) when the elements are comparable and sorting result
in a match.

The crux of the problem is that some elements don't obey transitivity.  As
a result, knowing that sorting actual and expected doesn't result in a match
*doesn't* guarantee that expected and actual don't match.

This proof of concept provides a way for the user to indicate that the elements
in a particular example's expected and actual obey transitivity. That looks like this:

expect(a).to contain_exactly(*b).transitive

And runs in O(n log n) time.  More practically, this means that common
use cases for contains_exactly will enjoy a massive speedup.  Previously,
users have examples where comparing arrays of 30 integers "never finishes."
Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s
on my machine.
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 9, 2021
…ents obey transitivity

This is a proof of concept approach for addressing issue rspec#1161.

The current implementation for ContainExactly runs in O(n!).  In practice,
it runs in O(n log n) when the elements are comparable and sorting result
in a match.

The crux of the problem is that some elements don't obey transitivity.  As
a result, knowing that sorting actual and expected doesn't result in a match
*doesn't* guarantee that expected and actual don't match.

This proof of concept provides a way for the user to indicate that the elements
in a particular example's expected and actual obey transitivity. That looks like this:

expect(a).to contain_exactly(*b).transitive

And runs in O(n log n) time.  More practically, this means that common
use cases for contains_exactly will enjoy a massive speedup.  Previously,
users have examples where comparing arrays of 30 integers "never finishes."
Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s
on my machine.
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 9, 2021
…ents obey transitivity

This is a proof of concept approach for addressing issue rspec#1161.

The current implementation for ContainExactly runs in O(n!).  In practice,
it runs in O(n log n) when the elements are comparable and sorting result
in a match.

The crux of the problem is that some elements don't obey transitivity.  As
a result, knowing that sorting actual and expected doesn't result in a match
*doesn't* guarantee that expected and actual don't match.

This proof of concept provides a way for the user to indicate that the elements
in a particular example's expected and actual obey transitivity. That looks like this:

expect(a).to contain_exactly(*b).transitive

And runs in O(n log n) time.  More practically, this means that common
use cases for contains_exactly will enjoy a massive speedup.  Previously,
users have examples where comparing arrays of 30 integers "never finishes."
Using `.transitive` here with arrays of 10,000 integers runs in < 0.1s
on my machine.
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 29, 2021
Speed up the ContainExactly matcher by pre-emptively matching up corresponding elements in the expected and actual arrays.

This addresses rspec#1006, rspec#1161.

This PR is a collaboration between me and @genehsu based on
a couple of our earlier PRs and discussion that resulted:
1) rspec#1325
2) rspec#1328

Co-authored-by: Gene Hsu (@genehsu)
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 29, 2021
Speed up the ContainExactly matcher by pre-emptively matching up corresponding elements in the expected and actual arrays.

This addresses rspec#1006, rspec#1161.

This PR is a collaboration between me and @genehsu based on
a couple of our earlier PRs and discussion that resulted:
1) rspec#1325
2) rspec#1328

Co-authored-by: Gene Hsu (@genehsu)
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 29, 2021
Speed up the ContainExactly matcher by pre-emptively matching up corresponding elements in the expected and actual arrays.

This addresses rspec#1006, rspec#1161.

This PR is a collaboration between me and @genehsu based on
a couple of our earlier PRs and discussion that resulted:
1) rspec#1325
2) rspec#1328

Co-authored-by: Gene Hsu (@genehsu)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants