Skip to content

Releases: markfasheh/duperemove

v0.14.1

25 Nov 11:29
9912c03
Compare
Choose a tag to compare

This release includes a couple of contributions to fix some bugs introduces in v0.14.

v0.14

20 Nov 18:49
ebc8c1f
Compare
Choose a tag to compare

Notable changes:

  • Batching has been reimplemented on top of the dedupe_seq.
  • The "scan" phase has been reimplemented (see 8264336 for details).
  • Filesystem locking has been implemented. See f3947e9 for details.

This release focuses on improving the "scan" phase: restructure, remove bugs and improve its performance, as well as pave the way for future features. More bugs probably joined the party in the mean time, sadly.

Special thanks to Sergei Trofimovich for his kind insights, helps and contributions

v0.13

29 Sep 10:07
9996a96
Compare
Choose a tag to compare

Notable changes:

  • Add a new dedupe option: [no]rescan_files. It will increase performance in some use cases.
  • New behaviors from v0.12 has been consolidated. Extent-based lookup is always enabled, as is fiemap. The v2 hashfile is no longer supported.
  • Hashfile are now updated after deduplication, to reflect the new physical offsets. This avoid (re)deduplicating extents in some cases.
  • Partial mode has been enhanced to support batching. The overall performance of this mode (which was previously known as "block-based mode") has been improved.
  • All files are now open in readonly mode.
  • Hashfile version has been increased to reflect the new database behaviors. Previous hashfiles are not compatible.
  • Always compute a hash for the entire file. This let us deduplicate same files easily, regardless of their extents mappings.
  • Deduplicating only parts of a file can be disabled using the [no]only_whole_files dedupe option.
  • Hashfiles with unsupported features or hash algorithm are now recreated transparently. Migration of the old content is not implemented.
  • Relative exclude patterns are no longer silently ingested. Such patterns are now rebuilt on top of the current working directory.
  • Batching is now set to 1024 by default.

v0.12

15 Jul 09:25
58ab87f
Compare
Choose a tag to compare

Notable changes:

  • Duplication lookup is now based on extents. This leads to a massive increase of the performances. Block-based lookup is still possible via --dedupe-options=partial.
  • Following that change, a new hashfile format has been introduced. Previous hashfile format is still supported when extents lookup are disabled, this is not recommended.
  • Batching has been implemented. When enabled with the -B <batchsize> option, duperemove will run the deduplication phase every <batchsize> scanned files. This is meant to help running duperemove on large dataset, with small blocksize, or on memory-constrained systems.
  • All hash algorithm has been removed and replaced by xxh128. This variant is as robust as murmur3 while being faster. Choosing a hash function via the --hash option has been removed. Hashfiles built with other algorithm must be removed.

Duperemove v0.11.3

10 Aug 06:34
Compare
Choose a tag to compare
  • Increase open file limit. (#269)
  • Create hash database file with 600 permission for improved security. (#262)
  • Read more data per pread, for v2 hashfile format this reduces the overall number of syscalls made which in turns results in better performance.
  • Fix truncated file handling, eliminating a an infinite loop case. (#255)