-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--hash-unmatched seems to scan the whole dataset, like --hash-uniques #614
Comments
I can make Lines 839 to 842 in 675089d
I wonder if there is something else subtly wrong in the code. It appears that when Lines 855 to 859 in 675089d
Could someone please explain what exactly is being done here, what's the idea behind this special case? |
`group->n_inodes == 1` is also true for groups that simply consist of a single file. This condition will cause all single-file groups to be hashed if `--merge-directories` is also set. Thus, check that we are actually dealing with a group of hardlinks (and not just a single-file group that did not cause an early return because we are also doing `--hash-unmatched`). Fixes sahib#614.
`group->n_inodes == 1` is also true for groups that simply consist of a single file. This condition will cause all single-file groups to be hashed if `--merge-directories` is also set. Additionally, the whole `group->n_inodes == 1` condition is redundant because not following on the branch means that `group->n_clusters == 1` and therefore `group->n_inodes == 1`. Thus, check that we are actually dealing with a group of hardlinks (and not just a single-file group that did not cause an early return because we are also doing `--hash-unmatched`). Fixes sahib#614.
Fixes sahib#614 (albeit in a hacky way).
`group->n_inodes == 1` is also true for groups that simply consist of a single file. This condition will cause all single-file groups to be hashed if `--merge-directories` is also set. Additionally, the whole `group->n_inodes == 1` condition is redundant because not following on the branch means that `group->n_clusters == 1` and therefore `group->n_inodes == 1`. Thus, check that we are actually dealing with a group of hardlinks (and not just a single-file group that did not cause an early return because we are also doing `--hash-unmatched`). Fixes sahib#614.
Fixes sahib#614 (albeit in a hacky way).
`group->n_inodes == 1` is also true for groups that simply consist of a single file. This condition will cause all single-file groups to be hashed if `--merge-directories` is also set. Additionally, the whole `group->n_inodes == 1` condition is redundant because not following on the branch means that `group->n_clusters == 1` and therefore `group->n_inodes == 1`. Thus, check that we are actually dealing with a group of hardlinks (and not just a single-file group that did not cause an early return because we are also doing `--hash-unmatched`). Fixes sahib#614.
`group->n_inodes == 1` is also true for groups that simply consist of a single file. This condition will cause all single-file groups to be hashed if `--merge-directories` is also set. Additionally, the whole `group->n_inodes == 1` condition is redundant because not following on the branch means that `group->n_clusters == 1` and therefore `group->n_inodes == 1`. Thus, check that we are actually dealing with a group of hardlinks (and not just a single-file group that did not cause an early return because we are also doing `--hash-unmatched`). Fixes sahib#614.
`group->n_inodes == 1` is also true for groups that simply consist of a single file. This condition will cause all single-file groups to be hashed if `--merge-directories` is also set. Additionally, the whole `group->n_inodes == 1` condition is redundant because not following on the branch means that `group->n_clusters == 1` and therefore `group->n_inodes == 1`. Thus, check that we are actually dealing with a group of hardlinks (and not just a single-file group that did not cause an early return because we are also doing `--hash-unmatched`). Fixes sahib#614.
`group->n_inodes == 1` is also true for groups that simply consist of a single file. This condition will cause all single-file groups to be hashed if `--merge-directories` is also set. Additionally, the whole `group->n_inodes == 1` condition is redundant because not following on the branch means that `group->n_clusters == 1` and therefore `group->n_inodes == 1`. Thus, check that we are actually dealing with a group of hardlinks (and not just a single-file group that did not cause an early return because we are also doing `--hash-unmatched`). Fixes sahib#614.
Disregard the comment above (the suggested fix is wrong), see proper analysis in the linked PR. |
rmlint version
v2.10.1-281-g58d29ec1
gui/setup.py
to fix packaging.version.InvalidVersion: Invalid version: '2.10.1.Ludicrous.Lemur' #608dataset
I have a 30-something TB dataset, that consists of ~20 TB uniques and ~11 TB size-twins:
actual behavior
Basic rmlint invocation without
--hash-unmatched
(ignore--without-fiemap
, it's just there to speed up preprocessing, progress-bars were also trimmed):Control rmlint invocation with
--hash-uniques
:Now,
--hash-unmatched
:expected behavior
Isn't
--hash-unmatched
supposed to only scan size twins (i. e. 12 TB at most)?The text was updated successfully, but these errors were encountered: