Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dyff can't recognize fine grained differences for list items if len(list)>1 #117

Open
Self-Perfection opened this issue Oct 26, 2020 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@Self-Perfection
Copy link
Contributor

Consider following example

- hosts: localhost
  gather_facts: yes
  vars:
    - work_dir: /tmp
    - env: stage

Let's delete just env key: sed '/env:/d' old.yml > new.yml. In that case dyff shows nice and concise difference:

$ dyff between *yml
     _        __  __
   _| |_   _ / _|/ _|  between /tmp/test_dyff/cleaned/new.yml
 / _' | | | | |_| |_       and /tmp/test_dyff/cleaned/old.yml
| (_| | |_| |  _|  _|
 \__,_|\__, |_| |_|   returned one difference
        |___/

0.vars
  + one list entry added:
    - env: stage

But adding just one more toplevel list item like following:

- hosts: localhost
  gather_facts: yes
  vars:
    - work_dir: /tmp
    - env: stage

- gather_facts: no

Breaks difference detection:

     _        __  __
   _| |_   _ / _|/ _|  between /tmp/test_dyff/cleaned/new.yml
 / _' | | | | |_| |_       and /tmp/test_dyff/cleaned/old.yml
| (_| | |_| |  _|  _|
 \__,_|\__, |_| |_|   returned one difference
        |___/

(root level)
- one list entry removed:   + one list entry added:
  - hosts: localhost          - hosts: localhost
  │ gather_facts: yes         │ gather_facts: yes
  │ vars:                     │ vars:
  │ - work_dir: /tmp          │ - work_dir: /tmp
                              │ - env: stage

This does not seem right.

@HeavyWombat HeavyWombat self-assigned this Oct 27, 2020
@HeavyWombat
Copy link
Member

The behavior is actually intended like that. At least, this was the original idea for dyff, because for the project that sparked this tool, the order of entries in the list was of great importance. The problem is when you have what is called "simple lists" where it is difficult to decide whether the list entry was modified or whether it was actually just moved in the list. So comparing index by index (compare entry one with entry one in the other list, ...) was not an option. The only exeception to that rule was for simple lists with exactly one entry each. In this edge case, it was easy just to say compare the only entry in the list with the only other entry in the other list.

Long story short, I can understand actually both behavior preferences. There are three kind of styles:

  • Only compare list entries if it can be made sure that they can really be compared, e.g. only one entry in simple lists and entries with the same name (or id, or key) in named-entry lists.
  • Progressive style, where in simple lists the list index is assumed to be relevant and lists are compared by index.
  • Content specific style, where the tool would try to guess whether there is a similar entry in the other list at a different index and would compare them. This was always my idea for an improvement, however, this could be very error prone for more complex scenarios.

What I could do is the following: Introduce a command line flag to configure the compare style for lists that are not that easy to compare, mostly simple lists to be honest. The options would be safe (current style), index (second bullet point), or best guess for the content specific comparison.

@Self-Perfection
Copy link
Contributor Author

Indeed this is not easy, I see it now. I'd hope dyff would find simplest possible change (set of operations) that transforms on input file to another. Turns out this is complicated computer science task. For instance here is paper that proposes tree comparison algorithm and tests it on XML files.

Apparently implementation of my wish is harder than I anticipated.

@HeavyWombat HeavyWombat added the enhancement New feature or request label Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants