-
Notifications
You must be signed in to change notification settings - Fork 746
CollectionAssert.AreEquivalent is extremely slow #2799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Several of us did changes to collection assert in that period. I'll take a look to see if one of mine caused this. |
Actually, I think that my change in #2600 has caused the problem (I've not had time to check it yet, as I'm halfway a VS upgrade). In the case above we will have to run through the entire remaining collection for every match we want to remove. Before my change we would find each match immediately as the first element. |
The ideal situation might be to implement |
I'm wondering if the various collection constraints should possibly be separated in their implementation, separately optimized and only then possibly recombined to remove duplication. |
That sounds like a good idea. Either way, the collection equivalence constraint itself will probably need to use HashSet.SetEquals eventually to get to the standard O(n+m) people expect. It won't be able to do that unless all our internal NUnit equality comparers implement IEqualityComparer. |
Implementing IEqualityComparer means either (1) forgetting about tolerance or (2) making tolerance a constructor argument for comparers and using a new one either for every comparison or perhaps caching a few of them. Could get complicated. |
That's an interesting question. Is it even meaningful to test for set equality with tolerance? I don't know what set equality means without transitive element equality comparisons, and if you have tolerance, you don't have transitive equality. My attempt at searching didn't turn anything up. It would be cool to know. We'd only need to adapt to IEqualityComparer when using a hash table. We'd only use a hash table for operations that compare two collections. A hash table in theory requires transitive equality comparisons and transitive hash codes. This is what the IEqualityComparer contract demands, so if it's inherently possible to use a hash table for the comparison, it's inherently possible to adapt our element comparer to IEqualityComparer. Let's say there is a hypothetical comparison we want NUnit to do between two collections which is not inherently possible to losslessly adapt to IEqualityComparer. If that is the case, that would mean the comparison cannot even in theory make use of a hash table. We would then have to fall back to something slower. But before we worry about that, does such a comparison meaningfully exist? |
I think set equality with tolerance does makes sense. Essentially, "is it possible to match every item in set A with one or more items in set B using equality function F such that no item in set B is left unmatched". But I think that's always going to be O(n*m) in the general case. Even items that implement I don't think I'd expect NUnit to try solve any of that for me out of the box. ETA: As @jnm2 says, all uses of "set" in this answer should really be replaced with "bag", as there may be duplicate items. |
@herebebeasties Good point. I was talking about set equality, but We'd have to have an algorithm that not only did the O(m*n) initial mapping but also proceeded by trying every combination of overlaps to see if it could match the items 1:1. It seems like that would have factorial complexity on top of the n*m, depending on how connected the mapping is (worse with larger tolerances). This logic doesn't sound familiar, so I wonder if NUnit is currently returning wrong results for How does it sound to implement the logic to first determine whether the element comparison is transitive, and if so, use |
Also, any takers? If this interests you, please jump in! |
Well, this went quite a bit quicker than I anticipated! @ggeurts has started a PR already. |
Yes, I have started work on a PR, but still need to work out a way to get a hash code provider where that is possible for equivalence operations. |
@ggeurts I'm thinking we need IChainComparer to have something like a
|
For the enumerable, array, dictionary and similar comparers custom hash code implementations will be needed for sorted and unsorted scenarios. I am experimenting with int? IChainComparer.GetHashCode(object o) implementations. |
@ggeurts Since the hash code provider can be picked without examining a particular |
Is the collection element type always known? |
@ggeurts No. You may find this PR quite relevant: https://github.com/nunit/nunit/pull/2501/files#diff-f6b841c6c7eed9838e33cf9082f26944R73 |
@ggeurts In fact, as it seemed to me in that PR, analyzing the args to decide if the element type can be known in advance is most likely the key to an efficient implementation. The current inefficient implementation has the virtue of actually working for |
So it makes sense to provide a more generic way to attempt to get an
IEqualityComparer<T> for a collection argument, based on the pull request
mentioned earlier. I will look into that.
Op vr 27 apr. 2018 20:09 schreef CharliePoole <notifications@github.com>:
… @ggeurts <https://github.com/ggeurts> In fact, as it seemed to me in that
PR, analyzing the args to decide if the element type can be known in
advance is most likely the key to an efficient implementation. The current
inefficient implementation has the virtue of actually working for object[]
arguments containing mixed types, but that's probably not something that
really happens a lot.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2799 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AApzfZuxhtX_5DUkIUZ1_fpbz_NCHyb2ks5ts17ngaJpZM4TMlxm>
.
|
@ggeurts Yes, I agree. The full set of criteria (as far as I've figured out) for a more efficient approach is:
You could probably cross off the third point by providing special IEqualityComparers for such types. |
Extended NUnitEqualityComparer with hash code calculations to speed up collection equivalence tests Fix spelling mistake
Extended NUnitEqualityComparer with hash code calculations to speed up collection equivalence tests Fix spelling mistake
Extended NUnitEqualityComparer with hash code calculations to speed up collection equivalence tests Fix spelling mistake
Extended NUnitEqualityComparer with hash code calculations to speed up collection equivalence tests Fix spelling mistake
I have extended NUnitEqualityComparer and related classes (such as chain comparers and equalityadapter) with hash code generation. For most collection equivalence tests the complexity is now O(log(n)) instead of O(n*m). The exception is where external comparers are used that are based on IComparer or Comparison delegates, because it is not possible to calculate unique hash codes for those cases. It is possible to speed up collection equivalence tests further by providing CollectionTally with an optimized IEqualityComparer implementation rather than it always using NUnitEqualityComparer for comparisons. However, I prefer to first wait for review of the changes committed so far. |
@jnm2 @CharliePoole Is ggeurts@d893d83 ok for review or do I need to break it down into smaller commits? |
@ggeurts Sorry, I haven't had a lot of free time! That is difficult to review all at once. How much work do you think it is to break it into smaller descriptive commits? |
I will break down the changes into smaller commits to simplify review. However, my spare time is limited as well. It may take a short while to get there. |
Hello Team, Up this issue. Actually this issue is not a big problem for us because of we decided not to use nunit on one of our project. But I remember that I spent several days to figure out what is wrong in my continuous integration setup to execute super simple tests there. I was so surprised when saw the actual reason why tests execution was hanging. To save other people, let's review the fix. Thanks! |
I would like to bump this - it's still open, and I definitely still see the issue in 3.12.0 |
From #3825, this has gotten even worse in 3.13. The example code from @NightElfik [Test]
public void NUunitRepro() {
var dict = new Dictionary<int, string>();
for (int i = 0; i < 10000; ++i) {
dict.Add(i, i.ToString());
}
CollectionAssert.AreEquivalent(dict.Values.AsEnumerable(),
Enumerable.Range(0, 10000).Select(j => j.ToString()));
} My timings for various versions of NUnit;
We should probably dust of the PR #2830 and see if we can get it finished. |
I've been profiling the code and going through it all day. The fact that our constraints are not generic and take in objects is part of the problem, but most of it is a build-up of small features that we have accepted over time. The recent increase in time looks to be caused by recursion detection that was fixed recently. Other increases are the addition of new comparers for new types and fixes for equivalence. @NightElfik, @nvborisenko and @cidthecoatrack , In both of the code examples, they are determining equivalence of two collections that are in the same order. Switching the above example to the following runs in 0.037 sec. [Test]
public void CollectionAreEqualTests()
{
var dict = new Dictionary<int, string>();
for (int i = 0; i < 10000; ++i)
{
dict.Add(i, i.ToString());
}
CollectionAssert.AreEqual(dict.Values.AsEnumerable(),
Enumerable.Range(0, 10000).Select(j => j.ToString()));
} |
I have user an Order By expression, coupled with Is.EqualTo, as a workaround for some time now, so the recommendation is valid. But the beauty of Is.EquivalentTo is that is compares what is missing and what is extra, not just that index i is different. Could we maybe extend the equivalence to do some internal sorting of some kind, and then essentially do the same as Is.Equal, but instead of kicking out at the first unmatched index, keeps assessing? |
It may be important to remember that NUnit's |
It's separate from most of the ideas discussed in this issue and corresponding PR, but something I've been trying to wrap my head around recently is how it might look to make the internal @rprouse 's point of furthering generic constraint usage could also help there (though I suspect would make it harder to port to 3.13.x in a non-breaking way) |
I made a small breakthrough this morning. If the two collections are in the same order it is really slow, but if the two lists are reverse of each other, then it is fast. This takes 22 seconds [Test]
public void IntCollectionAreEquivalentTests()
{
var actual = Enumerable.Range(0, SIZE);
var expected = Enumerable.Range(0, SIZE);
CollectionAssert.AreEquivalent(actual, expected);
} This takes 0.006 seconds [Test]
public void IntCollectionReversedAreEquivalentTests()
{
var actual = Enumerable.Range(0, SIZE);
var expected = Enumerable.Range(0, SIZE).Select(i => SIZE - i - 1);
CollectionAssert.AreEquivalent(actual, expected);
} Initial thinking is that public void TryRemove(object o)
{
for (int index = _missingItems.Count - 1; index >= 0; index--)
{
if (ItemsEqual(_missingItems[index], o))
{
_missingItems.RemoveAt(index);
return;
}
}
_extraItems.Add(o);
} I expect that the Then we can apply an optimization by sorting the two collections in the tally if they are sortable. Determining if the collections are sortable is the hard part. My thinking there is that a collection is sortable if,
I'll do some experimentation with that, but I'd appreciate if the @nunit/framework-team can poke holes in my sortable logic. I'm sure I am missing something 😄 |
What about the the non generic |
Regarding the ordering, then I noted above - #2799 (comment) - (and later lost track of it) that I introduced this problem, when I reversed the iteration, so that we delete from the back of the list instead of the front (as the example was comparing two large collections which mostly consisted of 0s). So reversing the iteration back to the original code will probably make performance worse for collections consisting of few distinct values. |
Verification of 10000 string elements takes 10 seconds. Verification of 100k elements may take more than 30 minutes (cannot wait until it finishes). This behavior takes a place in NUnit 3.10 and not reproducible in 3.9
The text was updated successfully, but these errors were encountered: