Option for custom compare function #241

dtorres-sf · 2021-04-13T05:52:29Z

This is a PR to address comments in #185. There are a few changes this introduces:

New option called iterable_compare_func that takes a function pointer to compare two items. It function takes two parameters and should return True if it is a match, False if it is not a match or raise CannotCompare if it is unable to compare the two. If CannotCompare is raised then it will revert back to comparing in order. If iterable_compare_func is not provided or set to None the behavior defaults to comparing items in order.
A new report item called iterable_item_moved this will only ever be added if there is a custom compare function.

Because of # 2 above there was a need to track both the t2 and t1 path which is why some additional options are needed in model.py functions.

For # 2 above we need to also save the new_value as part of this report item. This is because values_changed is applied before items are moved in the Delta. And to move items we remove it and add the new_value back in the new position.

The new compare function takes two items of an iterable and should return True it matching, False if no match, and raise CannotCompare if unable to compare the two items. The default behavior is the same as before which is comparing each item in order. If the compare function raises CannotCompare then behavior reverts back to the default in order. This also introduces a new report key which is `iterable_item_moved` to track if iterable items have moved.

seperman · 2021-04-13T22:42:57Z

Hi @dtorres-sf
Thanks a lot for taking the time to making this PR. It is definitely in the right direction. I left some comments.

seperman · 2021-04-13T22:11:57Z

tests/test_ignore_order.py

@@ -779,3 +779,166 @@ def test_ignore_order_and_group_by(self):
        diff2 = DeepDiff(t1, t2, group_by='id', ignore_order=True)
        expected2 = {'iterable_item_added': {"root['BB']['ate'][1]": 'Brownies'}}
        assert expected2 == diff2
+
+    def test_compare_func(self):
+        t1 = {


Since the test objects are not small, could you please move them as fixtures to the tests/fixtures directory?

seperman · 2021-04-13T22:15:13Z

tests/test_ignore_order.py

+        replay_diff = DeepDiff(recreated_t2, t2)
+        assert replay_diff.to_dict() == {}


It can be replaced with assert t2 == recreated_t2

seperman · 2021-04-13T22:16:31Z

tests/test_ignore_order.py

@@ -779,3 +779,166 @@ def test_ignore_order_and_group_by(self):
        diff2 = DeepDiff(t1, t2, group_by='id', ignore_order=True)
        expected2 = {'iterable_item_added': {"root['BB']['ate'][1]": 'Brownies'}}
        assert expected2 == diff2
+
+    def test_compare_func(self):


Could you please also add smaller unit tests with much smaller test objects? It will make it easier to debug in the future. The current test will be a better fit for test_delta.py. Perhaps we could add a new test class there something like class TestDeltaCompareFunc and add the tests that include delta there.

seperman · 2021-04-13T22:29:21Z

deepdiff/diff.py

+        This will compare in sequence order.
+        """
+
+        return [((i, i), (x, y)) for i, (x, y) in enumerate(


seperman · 2021-04-13T22:32:04Z

deepdiff/diff.py

+                if(id(y) not in y_matched):
+                    matches.append(((-1, j), (ListItemRemovedOrAdded, y)))
+            return matches
+        except CannotCompare:


Should we log to the user that the CannotCompare is raised or let the user's iterable_compare_func log if they want it?

I think it will be common that CannotCompare is raised, since the most basic case of a simple list of numbers or strings will most likely not want to use a compare function, and would revert to comparing in order. Since it will be so common that this will be raised, I would say probably let the creator of the compare function determine when logging is necessary.

seperman · 2021-04-13T22:36:08Z

deepdiff/diff.py

+                for j, y in enumerate(level.t2):
+
+                    if(self.iterable_compare_func(x, y)):
+                        y_matched.add(id(y))


Using the ID is cool but there could be issues when 2 things have different IDs but they are equal. Hence we usually hash the objects using DeepHash and use the hash instead. The goal is that even if the IDs are different, the hashes of 2 equal objects should be the same. I used to be concerned about the hashing being a bottleneck but so far I have not seen any bottlenecks from it.

I have updated to use hashes, can you please take a look and make sure I am hashing these objects correctly?

deepdiff/diff.py

seperman · 2021-04-14T15:55:48Z

Hi @dtorres-sf
Also you may want to look at how the delta works with "report_repetition=True". It keeps track of indexes of the item in the t1 and t2. That is essentially the iterable_item_moved that you have made.

- Use DeepHash instead of id when tracking elements - Add and cleanup unit tests - Pass in level object to compare in order to get path if desired

dtorres-sf · 2021-04-18T23:50:43Z

@seperman I have made the changes and also added level as a parameter to the compare function. The thinking is you might want to compare different levels differently. See a test case test_compare_func_path_specific.

Regarding report_repetition case, I was going through that and did not figure out exactly how to translate that to iterable item moved. It looks like it recreates the old iterable in order to try and get values, is that right? Do you think it is possible to have the new iterable_item_moved only store indexes and effectively replay them? What would be the best way to go about this, rather than storing the new_value as I am doing now in order to just correctly know what the value and index of the item that is moved?

seperman

LGTM!

seperman · 2021-04-29T00:53:01Z

@dtorres-sf
I'm adding this feature to work when ignore_order=True too. What I'm wondering is if we should stop using the compare function when any error is raised an not just CannotCompare. That will simplify what the user is expected to do. What do you think?

seperman · 2021-04-29T02:23:27Z

Actually scratch that. I will leave it up to the user to raise CannotCompare.

dtorres-sf · 2021-04-29T02:54:32Z

@seperman Yeah, I thought about that, I think it could go either way. As the application developer I think I would want to know if my compare function crashed rather than an implicit assumption that it should revert back to a native compare function (essentially bypassing the user compare function). If a user wants to they could always wrap their compare function body in a try/except and raise CannotCompare if it failed for any reason.

seperman changed the base branch from master to dev April 13, 2021 18:49

seperman reviewed Apr 13, 2021

View reviewed changes

Code review comments.

d587cf7

- Use DeepHash instead of id when tracking elements - Add and cleanup unit tests - Pass in level object to compare in order to get path if desired

seperman approved these changes Apr 28, 2021

View reviewed changes

seperman merged commit 31ad1d0 into seperman:dev Apr 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option for custom compare function #241

Option for custom compare function #241

dtorres-sf commented Apr 13, 2021 •

edited

seperman commented Apr 13, 2021

seperman Apr 13, 2021

seperman Apr 13, 2021

seperman Apr 13, 2021

seperman Apr 13, 2021

seperman Apr 13, 2021

dtorres-sf Apr 18, 2021

seperman Apr 13, 2021 •

edited

dtorres-sf Apr 18, 2021

seperman commented Apr 14, 2021

dtorres-sf commented Apr 18, 2021

seperman left a comment

seperman commented Apr 29, 2021

seperman commented Apr 29, 2021

dtorres-sf commented Apr 29, 2021

		replay_diff = DeepDiff(recreated_t2, t2)
		assert replay_diff.to_dict() == {}

Option for custom compare function #241

Option for custom compare function #241

Conversation

dtorres-sf commented Apr 13, 2021 • edited

seperman commented Apr 13, 2021

seperman Apr 13, 2021

Choose a reason for hiding this comment

seperman Apr 13, 2021

Choose a reason for hiding this comment

seperman Apr 13, 2021

Choose a reason for hiding this comment

seperman Apr 13, 2021

Choose a reason for hiding this comment

seperman Apr 13, 2021

Choose a reason for hiding this comment

dtorres-sf Apr 18, 2021

Choose a reason for hiding this comment

seperman Apr 13, 2021 • edited

Choose a reason for hiding this comment

dtorres-sf Apr 18, 2021

Choose a reason for hiding this comment

seperman commented Apr 14, 2021

dtorres-sf commented Apr 18, 2021

seperman left a comment

Choose a reason for hiding this comment

seperman commented Apr 29, 2021

seperman commented Apr 29, 2021

dtorres-sf commented Apr 29, 2021

dtorres-sf commented Apr 13, 2021 •

edited

seperman Apr 13, 2021 •

edited