Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option for custom compare function #241

Merged
merged 2 commits into from Apr 28, 2021

Conversation

dtorres-sf
Copy link

@dtorres-sf dtorres-sf commented Apr 13, 2021

This is a PR to address comments in #185. There are a few changes this introduces:

  1. New option called iterable_compare_func that takes a function pointer to compare two items. It function takes two parameters and should return True if it is a match, False if it is not a match or raise CannotCompare if it is unable to compare the two. If CannotCompare is raised then it will revert back to comparing in order. If iterable_compare_func is not provided or set to None the behavior defaults to comparing items in order.
  2. A new report item called iterable_item_moved this will only ever be added if there is a custom compare function.

Because of # 2 above there was a need to track both the t2 and t1 path which is why some additional options are needed in model.py functions.

For # 2 above we need to also save the new_value as part of this report item. This is because values_changed is applied before items are moved in the Delta. And to move items we remove it and add the new_value back in the new position.

The new compare function takes two items of an iterable and should return True
it matching, False if no match, and raise CannotCompare if unable to compare the two items.

The default behavior is the same as before which is comparing each item in order.

If the compare function raises CannotCompare then behavior reverts back to the default in order.

This also introduces a new report key which is `iterable_item_moved` to track if iterable items have moved.
@seperman seperman changed the base branch from master to dev April 13, 2021 18:49
@seperman
Copy link
Owner

Hi @dtorres-sf
Thanks a lot for taking the time to making this PR. It is definitely in the right direction. I left some comments.

@@ -779,3 +779,166 @@ def test_ignore_order_and_group_by(self):
diff2 = DeepDiff(t1, t2, group_by='id', ignore_order=True)
expected2 = {'iterable_item_added': {"root['BB']['ate'][1]": 'Brownies'}}
assert expected2 == diff2

def test_compare_func(self):
t1 = {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the test objects are not small, could you please move them as fixtures to the tests/fixtures directory?

Comment on lines 943 to 944
replay_diff = DeepDiff(recreated_t2, t2)
assert replay_diff.to_dict() == {}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be replaced with assert t2 == recreated_t2

@@ -779,3 +779,166 @@ def test_ignore_order_and_group_by(self):
diff2 = DeepDiff(t1, t2, group_by='id', ignore_order=True)
expected2 = {'iterable_item_added': {"root['BB']['ate'][1]": 'Brownies'}}
assert expected2 == diff2

def test_compare_func(self):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also add smaller unit tests with much smaller test objects? It will make it easier to debug in the future. The current test will be a better fit for test_delta.py. Perhaps we could add a new test class there something like class TestDeltaCompareFunc and add the tests that include delta there.

This will compare in sequence order.
"""

return [((i, i), (x, y)) for i, (x, y) in enumerate(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if(id(y) not in y_matched):
matches.append(((-1, j), (ListItemRemovedOrAdded, y)))
return matches
except CannotCompare:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log to the user that the CannotCompare is raised or let the user's iterable_compare_func log if they want it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be common that CannotCompare is raised, since the most basic case of a simple list of numbers or strings will most likely not want to use a compare function, and would revert to comparing in order. Since it will be so common that this will be raised, I would say probably let the creator of the compare function determine when logging is necessary.

deepdiff/diff.py Outdated
for j, y in enumerate(level.t2):

if(self.iterable_compare_func(x, y)):
y_matched.add(id(y))
Copy link
Owner

@seperman seperman Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the ID is cool but there could be issues when 2 things have different IDs but they are equal. Hence we usually hash the objects using DeepHash and use the hash instead. The goal is that even if the IDs are different, the hashes of 2 equal objects should be the same. I used to be concerned about the hashing being a bottleneck but so far I have not seen any bottlenecks from it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated to use hashes, can you please take a look and make sure I am hashing these objects correctly?

deepdiff/diff.py Show resolved Hide resolved
deepdiff/diff.py Show resolved Hide resolved
@seperman
Copy link
Owner

Hi @dtorres-sf
Also you may want to look at how the delta works with "report_repetition=True". It keeps track of indexes of the item in the t1 and t2. That is essentially the iterable_item_moved that you have made.

- Use DeepHash instead of id when tracking elements
- Add and cleanup unit tests
- Pass in level object to compare in order to get path if desired
@dtorres-sf
Copy link
Author

@seperman I have made the changes and also added level as a parameter to the compare function. The thinking is you might want to compare different levels differently. See a test case test_compare_func_path_specific.

Regarding report_repetition case, I was going through that and did not figure out exactly how to translate that to iterable item moved. It looks like it recreates the old iterable in order to try and get values, is that right? Do you think it is possible to have the new iterable_item_moved only store indexes and effectively replay them? What would be the best way to go about this, rather than storing the new_value as I am doing now in order to just correctly know what the value and index of the item that is moved?

Copy link
Owner

@seperman seperman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@seperman seperman merged commit 31ad1d0 into seperman:dev Apr 28, 2021
@seperman
Copy link
Owner

@dtorres-sf
I'm adding this feature to work when ignore_order=True too. What I'm wondering is if we should stop using the compare function when any error is raised an not just CannotCompare. That will simplify what the user is expected to do. What do you think?

@seperman
Copy link
Owner

Actually scratch that. I will leave it up to the user to raise CannotCompare.

@dtorres-sf
Copy link
Author

@seperman Yeah, I thought about that, I think it could go either way. As the application developer I think I would want to know if my compare function crashed rather than an implicit assumption that it should revert back to a native compare function (essentially bypassing the user compare function). If a user wants to they could always wrap their compare function body in a try/except and raise CannotCompare if it failed for any reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants