Siegfried is a signature-based file format identification tool, implementing:
- the National Archives UK's PRONOM file format signatures
- freedesktop.org's MIME-info file format signatures
- the Library of Congress's FDD file format signatures (beta).
sf file.ext sf DIR
sf -csv file.ext | DIR // Output CSV rather than YAML sf -json file.ext | DIR // Output JSON rather than YAML sf -droid file.ext | DIR // Output DROID CSV rather than YAML sf -nr DIR // Don't scan subdirectories sf -z file.zip | DIR // Decompress and scan zip, tar, gzip, warc, arc sf -hash md5 file.ext | DIR // Calculate md5, sha1, sha256, sha512, or crc hash sf -sig custom.sig file.ext // Use a custom signature file sf - // Scan stream piped to stdin sf -name file.ext - // Provide filename when scanning stream sf -f myfiles.txt // Scan list of files sf -version // Display version information sf -home c:\junk -sig custom.sig file.ext // Use a custom home directory sf -serve hostname:port // Server mode sf -throttle 10ms DIR // Pause for duration (e.g. 1s) between file scans sf -multi 256 DIR // Scan multiple (e.g. 256) files in parallel sf -log [comma-sep opts] file.ext | DIR // Log errors etc. to stderr (default) or stdout sf -log e,w file.ext | DIR // Log errors and warnings to stderr sf -log u,o file.ext | DIR // Log unknowns to stdout sf -log d,s file.ext | DIR // Log debugging and slow messages to stderr sf -log p,t DIR > results.yaml // Log progress and time while redirecting results sf -log fmt/1,c DIR > results.yaml // Log instances of fmt/1 and chart results sf -replay -log u -csv results.yaml // Replay results file, convert to csv, log unknowns
By default, siegfried uses the latest PRONOM signatures without buffer limits (i.e. it may do full file scans). To use MIME-info or LOC signatures, or to add buffer limits or other customisations, use the roy tool to build your own signature file.
With go installed:
go get github.com/richardlehane/siegfried/cmd/sf sf -update
Or, without go installed:
Download a pre-built binary from the releases page. Unzip to a location in your system path. Then run:
brew install mistydemeo/digipres/siegfried
Or, for the most recent updates, you can install from this fork:
brew install richardlehane/digipres/siegfried
Ubuntu/Debian (64 bit):
wget -qO - https://bintray.com/user/downloadSubjectPublicKey?username=bintray | sudo apt-key add - echo "deb http://dl.bintray.com/siegfried/debian wheezy main" | sudo tee -a /etc/apt/sources.list sudo apt-get update && sudo apt-get install siegfried
pkg install siegfried
git clone https://aur.archlinux.org/siegfried.git cd siegfried makepkg -si
- update LOC signatures to 2017-09-28
- update PRONOM signatures to v93
- version information for MIME-info signatures (freedesktop.org and tika-mimetypes) now recorded in mime-info.json file and presented in results
- new sets file for PRONOM extensions. This creates sets like @.doc and @.txt (i.e. all PUIDs with those extensions). Allows you to do commands like
roy build -limit @.doc,@.docx,
roy inspect @.txtand
sf -log @.pdf,o DIR
- update freedesktop.org signatures to v1.9
- out of memory error when using
sf -zon compressed files that contain very large files; reported by Terry Jolliffe
- report errors that occur during file decompression. Previously, only fatal errors encountered when a compressed file is first opened were reported. Now errors that are encountered while attempting to walk the contents of a compressed file are also reported.
- report errors for 'roy inspect' when roy can't find anything to inspect; reported by Ross Spencer
- continue on error flag (-coe) can now be used to continue scans despite fatal file errors that would normally cause scanning to halt. This may be useful e.g. for big directory scans over unreliable networks. Usage:
sf -coe DIR.
- update PRONOM signatures to v92
- file scanning is now restricted to regular files (i.e. not symlinks, sockets, devices etc.). Reported by Henk Vanstappen.
- windows longpath fix now works for paths that appear short
sf -updateflag can now be used to download/update non-PRONOM signatures. Options are "loc", "tika", "freedesktop", "pronom-tika-loc", "deluxe" and "archivematica". To update a non-PRONOM signature, include the signature name as an argument after the flags e.g.
sf -update freedesktop. This command will overwrite 'default.sig' (the default signature file that sf loads). You can preserve your default signature file by providing an alternative
sf -sig notdefault.sig -update loc. If you use one of the signature options as a filename (with or without a .sig extension), you can omit the signature argument i.e.
sf -update -sig loc.sigis equivalent to
sf -sig loc.sig -update loc. Feature requested by Ross Spencer.
sf -updatenow does SHA-256 hash verification of updates and communication with the update server is via HTTPS.
- update PRONOM signatures to v91
- fixes to config package where global variables are polluted with subsquent calls to the Add(Identifier) function
- fix to reader package where panic triggered by illegal slice access in some cases
roy addnow take a
-nobyteflag to omit byte signatures from the identifier; requested by Nick Krabbenhoeft
- update Tika MIMEInfo signatures to 1.16
- update LOC to 2017-06-10
- sf now accepts multiple files or directories as input e.g.
sf myfile1.doc mydir myfile3.txt
- LOC signature update
- code re-organisation to export reader and writer packages
sf -replaycan now take lists of results files with
sf -replay -f list-of-results.txt
- the command
sf -replay -now works on Windows as expected e.g.
sf myfiles | sf -replay -json -
- text matcher not allocating hits to correct identifiers; fixes #101
- unescaped YAML field contains quote; reported by Ross Spencer
- PRONOM v90 update
- the -home flag was being overriden for roy subcommands due to interaction other flags
- signature updates for PRONOM, LOC and tika-mimetypes
roy inspectaccepts space as well as comma-separated lists of formats e.g.
roy inspect fmt/1 fmt/2
- log files that match particular formats with
-log fmt/1,@set2(comma separated list of format IDs/format sets). These can be mixed with regular log options e.g.
- generate a summary view of formats matched during a scan with
-log chart(or just
- replay scans from results files with
sf -replay: load one or more results files to replay logging or to convert to a different output format e.g.
sf -replay -csv results.yamlor
sf -replay -log unknown,chart,stdout results1.yaml results2.csv
- compare results with
roy comparesubcommand: view the difference between two or more results e.g.
roy compare results1.yaml results2.csv droid.csv ...
roy setscreates pronom-all.json, pronom-families.json, and pronom-types.json sets files;
roy sets -changescreates a pronom-changes.json sets file from a PRONOM release-notes.xml file;
roy sets -list @set1,@set2lists contents of a comma-separated list of format sets
roy inspect releasesprovides a summary view of a PRONOM release-notes.xml file
sf -command now scans stdin e.g.
cat mypdf.pdf | sf -. You can pass a filename in to supplement the analysis with the
cat myfile.pdf | sf -name myfile.pdf -. In previous versions of sf, the dash argument signified treating stdin as a newline separated list of filenames for scanning. Use the new
-fflag for this e.g.
sf -f myfiles.txtor
cat myfiles.txt | sf -f -; change requested by pm64
- some files cause endless scanning due to large numbers of signature hits; reported by workflowsguy
- null bytes can be written to output due to bad zip filename decoding; reported by Tim Walsh
See the CHANGELOG for the full history.
Copyright 2017 Richard Lehane
Licensed under the Apache License, Version 2.0
Join the Google Group for updates, signature releases, and help.
Like siegfried and want to get involved in its development? That'd be wonderful! There are some notes on the wiki to get you started, and please get in touch.
Thanks Ross for https://github.com/exponential-decay/skeleton-test-suite-generator and http://exponentialdecay.co.uk/sd/index.htm, both are very handy!
Thanks Misty for the brew and ubuntu packaging
Thanks Steffen for the FreeBSD and Arch Linux packaging