Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong diff in case of removing a non last index of an array #100

Open
Glignos opened this issue Jan 8, 2018 · 4 comments
Open

Wrong diff in case of removing a non last index of an array #100

Glignos opened this issue Jan 8, 2018 · 4 comments

Comments

@Glignos
Copy link
Member

Glignos commented Jan 8, 2018

Suppose we have an array

old = ['a','b','c']
new = ['b','c']

# In this case I figured from the documentation that you should expect
# the diff to be ('remove','',[(0,'a')])

# but instead I'm getting:
[('change', [0], ('a', 'b')), ('change', [1], ('b', 'c')), ('remove', '', [(2, 'c')])]

Is that an error or did I misinterpret the usage of the library?

@jirikuncar
Copy link
Member

@Glignos the current algorithm for comparing lists is very simple. In your case you would get a better result if you use sets instead of lists.

You can propose a change of following lines

intersection = list(range(0, min(len_first, len_second)))
addition = list(range(min(len_first, len_second), len_second))
deletion = list(reversed(range(min(len_first, len_second), len_first)))

@jirikuncar
Copy link
Member

I would propose to use difflib.SequenceMatcher that could do the job quite well.

@lukeolson13
Copy link

I'm having the same issue, except with more complex data (dictionaries nested in dictionaries). I obviously have no idea what kind of fix this would be, but would love to see the functionality!

@nj-vs-vh
Copy link

nj-vs-vh commented Jan 25, 2024

Hey everyone! I found this thread when looking for a solution to calculate diffs between JSON documents. My schema contains a large list of objects at the top of the tree, so the subject issue is a deal breaker. I've prototyped my solution based on some of dictdiffer's code and on difflib.SequenceMatcher, as proposed by @jirikuncar.

The main difficulty is in patch: I store all indices as they appear in the source array, and as I perform the patching I need to keep track of accumulating offset (e.g. if I delete a range, all subsequent indices need to be reduced by the length of deleted segment). Also, I don't need to invert diffs, so I didn't implement this functionality.

Here's the link to my code and here's a little demo. My code uses a different format (structured dicts instead of tuples) but the idea is the same:

a
['a', 'b', 'c', 'y', 'e', 'f', 'x']
b
['b', 'c', 'x', 'e', 'f']

diff
{'path': [], 'action': 'remove_range', 'start': 0, 'length': 1}
{'path': [3], 'action': 'change', 'new': 'x'}
{'path': [], 'action': 'remove_range', 'start': 6, 'length': 1}

Works with documents as well

a
['a', 'b', 'c', {'nested': [{'value': 'y', 'other': 123}]}, 'e', 'f']
b
['extra', 'a', 'b', 'c', {'nested': [{'value': 'x'}]}, 'e', 'f']

diff
{'path': [], 'action': 'add_range', 'start': 0, 'values': ['extra']}
{'path': [3, 'nested', 0, 'value'], 'action': 'change', 'new': 'x'}
{'path': [3, 'nested', 0], 'action': 'remove', 'keys': ['other']}

One difference is that my diffs are not reversible (but I don't think it'll be too hard to add this in case I need it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants