Skip to content
This repository was archived by the owner on Oct 10, 2020. It is now read-only.

[merged] Atomic/diff.py: Use go-mtree for file comparisons#777

Closed
baude wants to merge 1 commit intoprojectatomic:masterfrom
baude:diff_mtree
Closed

[merged] Atomic/diff.py: Use go-mtree for file comparisons#777
baude wants to merge 1 commit intoprojectatomic:masterfrom
baude:diff_mtree

Conversation

@baude
Copy link
Member

@baude baude commented Dec 1, 2016

The previous algorithm for comparing files used python's
dircmp and is considered to be a shallow comparision. This
allowed distinctly small possibilities that two files being
compared could be different but not caught.

We now use go-mtree to do the comparison. This can emulate the
shallow comparison we had before but we can also adding a
sha256digest as part of the comparison using the new --keywords
option.

Also, made slight tweaks to gomtree functions in Atomic.util
so we debug and influence the return of JSON data.

This solves #761

Atomic/diff.py Outdated
diffp.add_argument("-k", "--keywords", nargs='+',
choices=['link', 'nlink', 'mode', 'type', 'time', 'uid', 'gid', 'size', 'sha256digest'],
default=['link', 'nlink', 'mode', 'type', 'time', 'uid', 'gid', 'size'],
help=_("iExclusive keywords to be used for file level comparision"))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typos?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, vim command :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@twaugh
Copy link

twaugh commented Dec 1, 2016

This works well for me.

diffp.add_argument("-k", "--keywords", nargs='+',
choices=['link', 'nlink', 'mode', 'type', 'time', 'uid', 'gid', 'size', 'sha256digest'],
default=['link', 'nlink', 'mode', 'type', 'time', 'uid', 'gid', 'size'],
help=_("Exclusive keywords to be used for file level comparision"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comparison

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you add "all"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhatdan I can do 'all' but if we ever add two algo's for digesting, they both would then get used. Adding for now as it makes sense.

Atomic/diff.py Outdated
self.chroot_left = chroot_left
self.chroot_right = chroot_right
self.delta(self.compare)
self.chroot_left = [] #chroot_left
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are useless, either describe or remove the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

self.parse_mtree_json()

def parse_mtree_json(self):
def extra(_result): #pylint: disable=unused-variable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variables are used, aren't they, remove pylint?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the use of eval below, pylint doesnt recognize they are used.

Leaving until better solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I thought they might have been left over.

def extra(_result): #pylint: disable=unused-variable
self.right.append(_result['path'])

def missing(_result): #pylint: disable=unused-variable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the use of eval below, pylint doesnt recognize they are used.

Leaving until better solution.

def missing(_result): #pylint: disable=unused-variable
self.left.append(_result['path'])

def modified(_result): #pylint: disable=unused-variable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the use of eval below, pylint doesnt recognize they are used.

Leaving until better solution.

_print_diff(self.right)
if len(self.common_diff):
util.write_out("\nCommon files that are different:")
util.write_out("\nCommon files that are different: (reason)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you are doing here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I supply a reason in the output to show why things are different. Example:

Files only in centos:latest:
     usr/bin/view
     usr/bin/ex
     usr/bin/rvi
     usr/bin/vi
     etc/virc
     usr/bin/rview

Files only in 19f8d8b7147b:
     run/secrets

Common files that are different: (reason)
     var/lib/rpm/Basenames (time)
     var/lib/rpm/Packages (time)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that happen. This looks like it will just output
"(reason)"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That LOC is just to print the header(title). The rest comes in the _print_diff method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you handle this if there are multiple reasons?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup!

--help
--json
--display
--keywords, -k
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to automatically expand the keywords.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't figure that out ... do you know how to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok we can merge and I will look to add this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great

**--json**
Output in the form of JSON.

**-k** **--keywords**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give a list of keywords available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man page should include list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Compare files by 'sha256digests' and 'time' between images 'foo1' and 'foo2'

atomic diff foo1 foo2 --keywords sha256digest time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the handling of multiple keywords does not work well. What happens if I do

atomic diff --keywords sha256digest time foo1 foo2

Does it work
Or if I had a container named time.

atomic diff --keywords sha256digest time foo1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhatdan

It only works if if the keywords are the last arguments. Verified this in argparse as well. Example:

[bbaude@bbaude atomic (diff_mtree)]$ sudo ./atomic diff -k sha256digest centos:latest 19f8d8b7147b
atomic diff: argument -k/--keywords: invalid choice: 'centos:latest' (choose from 'all', 'link', 'nlink', 'mode', 'type', 'time', 'uid', 'gid', 'size', 'sha256digest')
Try 'atomic diff --help' for more information.

^^ Fails. Keywords have to be last.

[bbaude@bbaude atomic (diff_mtree)]$ sudo ./atomic diff centos:latest 19f8d8b7147b -k sha256digest

Ideas and/or suggestions welcome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make it a comma separated list? Or do it the way we did for TOP?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  -o [{time,stime,ppid,uid,gid,user,group}], --optional [{time,stime,ppid,uid,gid,user,group}]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW using "--" between the keyword list and the image names also works, but that's not very easy to use/discover.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but not easy to understand for most.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the ability to use -k one or more times. One keyword per -k

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do the "," separated list?

-k time -k stime -k ppid

Versus

-k time,ktime,ppid

But I guess it is not that big a deal

Most people will just do -k all, or is this the default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-all is the default. I thought we agreed to mimic the top usage which is one option per keyword. Want different?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I though atomic top did both, but if it use -o foo -o bar, then LGTM

# logic can be added to a "cleanup stack", by cascading function calls
# within traps. See tests/integration/test_mount.sh for an example.
trap teardown EXIT
if [ -e ${GOMTREE} ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to indicate a skip here by returning 77.

Also, if you want these tests to run on the Atomic Host platforms until gomtree is pulled in automatically, you can simply add to the YAML:

packages:
  - gomtree

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlebon is that done with an exit 77 or elsewise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly. See e.g. test_verify.sh.

@baude baude force-pushed the diff_mtree branch 3 times, most recently from c1c6641 to 7c4d8ea Compare December 5, 2016 14:54
The previous algorithm for comparing files used python's
dircmp and is considered to be a shallow comparision.  This
allowed distinctly small possibilities that two files being
compared could be different but not caught.

We now use go-mtree to do the comparison.  This can emulate the
shallow comparison we had before but we can also adding a
sha256digest as part of the comparison using the new --keywords
option.

Also, made slight tweaks to gomtree functions in Atomic.util
so we debug and influence the return of JSON data.

This solves projectatomic#761
@rhatdan
Copy link
Member

rhatdan commented Dec 5, 2016

@rh-atomic-bot r+

@rh-atomic-bot
Copy link

📌 Commit 171cba2 has been approved by rhatdan

@rh-atomic-bot
Copy link

⌛ Testing commit 171cba2 with merge 1db4288...

@rh-atomic-bot
Copy link

☀️ Test successful - status-atomicjenkins
Approved by: rhatdan
Pushing 1db4288 to master...

@rh-atomic-bot rh-atomic-bot changed the title Atomic/diff.py: Use go-mtree for file comparisons [merged] Atomic/diff.py: Use go-mtree for file comparisons Dec 5, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants