You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use ZFS to store all my files in a big pool. Sometimes I have duplicates I identify with rdfind in a dry-run, check the results.txt manually and (if it's ok with me) re-run rdfind to really delete the duplicate files. So rdfind not only needs to fully read the remaining files in full twice (at least) to compute the checksums, but also it does not leverage the block checksums of every file that are an inherent feature of ZFS (and calculated anyway, but at write-time of the file).
The ZFS command zdb gives an indication on how this could work. To query which files (and their ZFS object ID) are on a given ZFS file system (here minitank/fw/video/gopro):
Now for rdfind using the zdb command is not a very good idea (except for a PoC maybe), as the output format is clearly not meant for automatic processing (it is also said not to be backward compatible through the ZFS releases). But zdb has no real magic, AFAIK it just queries the ZFS API and spits out the resulting information as text.
So when using ZDB, reading the actual files to check for equality would be unnecessary - which would render rdfind even quicker.
In a small experiment I could reproduce the cksum values being stable across different ZFS file systems (bar the checksums of the last block, because of a shorter block length on one of the ZFS file systems) and also intra-file system with a simple copy of the file (but different name and access times etc.).
The text was updated successfully, but these errors were encountered:
I use ZFS to store all my files in a big pool. Sometimes I have duplicates I identify with
rdfind
in a dry-run, check theresults.txt
manually and (if it's ok with me) re-runrdfind
to really delete the duplicate files. Sordfind
not only needs to fully read the remaining files in full twice (at least) to compute the checksums, but also it does not leverage the block checksums of every file that are an inherent feature of ZFS (and calculated anyway, but at write-time of the file).The ZFS command
zdb
gives an indication on how this could work. To query which files (and their ZFS object ID) are on a given ZFS file system (hereminitank/fw/video/gopro
):This allows to query (for example) what the checksums for each block the ZFS object 115 (file
G0020421.JPG
) has:Now for
rdfind
using thezdb
command is not a very good idea (except for a PoC maybe), as the output format is clearly not meant for automatic processing (it is also said not to be backward compatible through the ZFS releases). Butzdb
has no real magic, AFAIK it just queries the ZFS API and spits out the resulting information as text.So when using ZDB, reading the actual files to check for equality would be unnecessary - which would render
rdfind
even quicker.In a small experiment I could reproduce the
cksum
values being stable across different ZFS file systems (bar the checksums of the last block, because of a shorter block length on one of the ZFS file systems) and also intra-file system with a simple copy of the file (but different name and access times etc.).The text was updated successfully, but these errors were encountered: