Skip to content

๐Ÿญ identifies common or duplicates across different hosts

Notifications You must be signed in to change notification settings

razum2um/xxhashdir_comm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

xxhashdir_comm

CI

Problem

  • Sometimes you did a backup by "hey, let's just rsync it somewhere"
  • Now you struggle to merge such "backups" to save space across hosts and drives?

This helps to identify common or duplicates across different hosts using collected xxhashdir results (plaintext in format: \d{0,20} .* with first column is xxhash checksum)

Howto

Prepare files with checksums

# on remote host
xxhashdir . > remote.xxhashdir
# on local host
xxhashdir . > local.xxhashdir
scp remote:remote.xxhashdir remote.xxhashdir

Usage

# ๐Ÿš€ to get common files (sources are mostly different)
# you likely want to know this to delete duplicates first, then copy rest
xxhashdir_comm --common local.xxhashdir remote.xxhashdir

# ๐Ÿš€ to get different files (sources are mostly equal)
# you likely want to know this to merge uniq files from second into first, then delete the second at all
xxhashdir_comm --only-second local.xxhashdir remote.xxhashdir

Why not _

  • rsync can delete files on reciever, but relies only on filenames and mtime
  • fdupes works only locally
  • zfs snapshot + zfs diff is perfect but also only local and requires to be a common dataset initially
  • incremental backups - you don't always bother to have

Why use it

  • stdout can be reprocessed with sed/grep/whatever again
  • unix way
  • having fun with rust

Further plans

  • Unify output with standart comm utility (columns & accept -1/2/3)
  • Consider xxhashdir with bytesize input and compare bytesizes

About

๐Ÿญ identifies common or duplicates across different hosts

Topics

Resources

Stars

Watchers

Forks

Languages