Skip to content

Verify GPFS FS integrity by maintaining full database of file contents checksums. Proven in production on multi-PiB FSs

License

Notifications You must be signed in to change notification settings

pskopnik/lsdf-checksum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lsdf-checksum

lsdf-checksum is a distributed system to compute checksums of file content in large-scale file systems with the goal of verifying file data integrity.

lsdf-checksum uses gocraft/work with a Redis backend as a queue system and stores all its data in a MySQL / MariaDB database. It operates on IBM Spectrum Scale file systems.

The project was initially developed as part of a practical course at the Steinbuch Centre of Computing (SCC) at the Karlsruhe Institute of Technology (KIT). Most slides of the presentation concluding the course are available (the slides may be difficult to understand without the corresponding presentation, however).

Motivation

By increasing the total size of storage systems, the rate of error (bits per time) is also increased. This problem has been discussed in the literature, for example Rosenthal, David SH. "Keeping bits safe: how hard can it be?." Communications of the ACM 53.11 (2010): 47-55.

lsdf-checksum has been designed to be used within the several large file systems operated by the SCC, especially the Large Scale Data Facility (LSDF) and GridKa. The goal is to regularly compute and store checksums for each file in the file system. If a file has not been changed by a user since the last run and yet the checksum has changed, a warning is issued. The system must be run regularly, so that it is still possible to restore a file from a backup.

The file systems are powered by IBM Spectrum Scale and lsdf-checksum uses snapshots to work on a static version of the file system during each run. IBM Spectrum Scale includes a policy engine, which is used to compile a list of all files including some meta-data in the file system.

Building

The lsdf-checksum project has two primary commands:

  • lsdf-checksum-master is the master component of the system. This command contains the functionality for managing and performing checksum runs. It also allows querying the meta data database for checksum mismatches.
  • lsdf-checksum-worker is the light-weight worker component of the system. Workers receive work packs containing files to be checksummed. After reading the files, their checksums are send back to the master.

The binaries are built using a recent go version (tested with go1.12). Execute the following commands in the root folder of this repository. Go will fetch all dependencies. The output are the two binaries in the current working directory.

go build ./cmd/lsdf-checksum-master
go build ./cmd/lsdf-checksum-worker

Both binaries do not depend on significant runtime libraries (e.g. libc is required). Both binaries contain help texts (--help). Calling the binaries only with the --help-man flag outputs a man page for the command.

About

Verify GPFS FS integrity by maintaining full database of file contents checksums. Proven in production on multi-PiB FSs

Resources

License

Stars

Watchers

Forks

Packages

No packages published