Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Datumaro] Diff with exact annotation matching #1989

Merged
merged 11 commits into from Sep 2, 2020

Conversation

zhiltsov-max
Copy link
Contributor

@zhiltsov-max zhiltsov-max commented Aug 5, 2020

Motivation and context

It can be hard to find tools for direct equality comparison of datasets. This patch adds this capability to Datumaro.

  • Added project ediff command for direct comparison of datasets
    • Optional annotation fields can be excluded from comparison (e.g. label for boxes, id, group)
    • Specific annotation and item attributes can be excluded from comparison (e.g. frame, is_crowd etc.)

Example output:

# Compare 2 projects, 
# ignore annotation .id and .group fields and the 'is_crowd' annotation attribute
# ignore item's 'frame' and 'id' attributes
# Result: compared CVAT xml and COCO instance annotations
datum project ediff test_project2 -if id -if group -ia is_crowd -iia frame -iia id

Found:
The first project has 0 unmatched items
The second project has 0 unmatched items
25 item conflicts
15 matching annotations
0 mismatching annotations

Output has been saved to 'diff.json' # has a list of differences

How to test:

  1. Create a pair of datumaro projects (import a CVAT task etc.)
  2. datum project ediff -p project1/ project2/

How has this been tested?

Unit tests

Checklist

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below)
# Copyright (C) 2020 Intel Corporation
#
# SPDX-License-Identifier: MIT

@zhiltsov-max zhiltsov-max changed the title [WIP] [Datumaro] Diff with exact annotation matching [Datumaro] Diff with exact annotation matching Aug 18, 2020
@zhiltsov-max
Copy link
Contributor Author

Ready for testing.

@coveralls
Copy link

coveralls commented Aug 18, 2020

Pull Request Test Coverage Report for Build 7260

  • 196 of 278 (70.5%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.05%) to 69.98%

Changes Missing Coverage Covered Lines Changed/Added Lines %
datumaro/datumaro/cli/contexts/project/diff.py 0 1 0.0%
datumaro/datumaro/components/extractor.py 3 4 75.0%
datumaro/datumaro/cli/contexts/project/init.py 9 40 22.5%
datumaro/datumaro/components/operations.py 182 231 78.79%
Totals Coverage Status
Change from base Build 7253: 0.05%
Covered Lines: 12119
Relevant Lines: 16868

💛 - Coveralls

@@ -567,6 +566,73 @@ def diff_command(args):

return 0

def build_ediff_parser(parser_ctor=argparse.ArgumentParser):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ediff -> compare?

nmanovic
nmanovic previously approved these changes Sep 2, 2020
Copy link
Contributor

@nmanovic nmanovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename ediff to compare or something like that.

@nmanovic nmanovic merged commit 98c06a3 into develop Sep 2, 2020
@nmanovic nmanovic deleted the zm/ann-diff-exact-match branch September 2, 2020 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants