Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip duplicate files #2

Closed
roskakori opened this issue Aug 18, 2016 · 1 comment
Closed

Skip duplicate files #2

roskakori opened this issue Aug 18, 2016 · 1 comment
Assignees
Milestone

Comments

@roskakori
Copy link
Owner

Goal: duplicate files are excluded from the analysis.

To detect duplicates efficiently, the following logic can be used:

  1. Build a file_size_to_paths_map where each file size maps to a list of source files having this size.
  2. Build a set of paths that are duplicate by comparing all files for sizes with multiple paths in file_size_to_paths_map. To compare two files, use filecmp.cmp(..., shallow=False)
@roskakori roskakori self-assigned this Oct 6, 2016
@roskakori roskakori added this to the v0.8 milestone Oct 6, 2016
roskakori added a commit that referenced this issue Oct 6, 2016
…teria.

 Use the option ``--duplicates`` to still count duplicate source code.
@roskakori
Copy link
Owner Author

Implemented and scheduled for release with version 0.8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant