- update PRONOM to v97
- zs flag now activates -z flag
- details text in PRONOM identifier
roypanic when building signatures with empty sequences. Reported by Greg Lepore
- a new Wikidata identifier, harvesting information from the Wikidata Query Service. Implemented by Ross Spencer.
- select which archive types (zip, tar, gzip, warc, or arc) are unpacked using the -zs flag (sf -zs tar,zip). Implemented by Ross Spencer.
- update LOC signatures to 2020-09-21
- update tika-mimetypes signatures to v1.24
- update freedesktop.org signatures to v2.0
- incorrect basis for some signatures with multiple patterns. Reported and fixed by Ross Spencer.
- utc flag returns file modified dates in UTC e.g.
sf -utc FILE | DIR. Requested by Dragan Espenschied
- new cost and repetition flags to control segmentation when building signatures
- update PRONOM to v96
- update LOC signatures to 2019-12-18
- update tika-mimetypes signatures to v1.23
- update freedesktop.org signatures to v1.15
- XML namespaces detected by prefix on root tag, as well as default namespace (for mime-info spec)
- panic when scanning certain MS-CFB files. Reported separately by Mike Shallcross and Euan Cochrane
- file with many FF xx sequences grinds to a halt. Reported by Andy Foster
-fflag now scans directories, as well as files. Requested by Harry Moss
- update LOC signatures to 2019-06-16
- update tika-mimetypes signatures to v1.22
- filenames with "?" were parsed as URLs; reported by workflowsguy
- update PRONOM to v95
- update LOC signatures to 2019-05-20
- update tika-mimetypes signatures to v1.21
- .docx files with .doc extensions panic due to bug in division of hints in container matcher. Thanks to Jean-Séverin Lair for reporting and sharing samples and to VAIarchief for additional report with example.
- mime-info signatures panic on some files due to duplicate entries in the freedesktop and tika signature files; spotted during an attempt at pair coding with Ross Spencer... thanks Ross and sorry for hogging the laptop! #125
- update LOC signatures to 2019-01-06
- update tika-mimetypes signatures to v1.20
- container matching can now match against directory names. Thanks Ross Spencer for reporting and for the sample SIARD signature file. Thanks Dave Clipsham, Martin Hoppenheit and Phillip Tommerholt for contributions on the ticket.
- fixes to travis.yml for auto-deploy of debian release; #124
- print configuration defaults with
- update PRONOM to v94
- LOC identifier fixed after regression in v1.7.9
- remove skeleton-suite files triggering malware warnings by adding to .gitignore; reported by Dave Rice
- release built with Go version 11, which includes a fix for a CIFS error that caused files to be skipped during file walk; reported by Maarten Savels
- save defaults in a configuration file: use the -setconf flag to record any other flags used into a config file. These defaults will be loaded each time you run sf. E.g.
sf -multi 16 -setconfthen
sf DIR(loads the new multi default)
-conf filenameto save or load from a named config file. E.g.
sf -multi 16 -serve :5138 -conf srv.conf -setconfand then
sf -conf srv.conf
-yamlflag so, if you set json/csv in default config :(, you can override with YAML instead. Choose the YAML!
roy compare -joinoptions that join on filepath now work better when comparing results with mixed windows and unix paths
- exported decompress package to give more functionality for users of the golang API; requested by Byron Ruth
- update LOC signatures to 2018-06-14
- update freedesktop.org signatures to v1.10
- update tika-mimetype signatures to v1.18
- misidentifications of some files e.g. ODF presentation due to sf quitting early on strong matches. Have adjusted this algorithm to make sf wait longer if there is evidence (e.g. from filename) that the file might be something else. Reported by Jean-Séverin Lair
- read and other file errors caused sf to hang; reports by Greg Lepore and Andy Foster; fix contributed by Ross Spencer
- bug reading streams where EOF returned for reads exactly adjacent the end of file
- bug in mscfb library (race condition for concurrent access to a global variable)
- some matches result in extremely verbose basis fields; reported by Nick Krabbenhoeft. Partly fixed: basis field now reports a single basis for a match but work remains to speed up matching for these cases.
- update LOC signatures to 2017-09-28
- update PRONOM signatures to v93
- version information for MIME-info signatures (freedesktop.org and tika-mimetypes) now recorded in mime-info.json file and presented in results
- new sets file for PRONOM extensions. This creates sets like @.doc and @.txt (i.e. all PUIDs with those extensions). Allows you to do commands like
roy build -limit @.doc,@.docx,
roy inspect @.txtand
sf -log @.pdf,o DIR
- update freedesktop.org signatures to v1.9
- out of memory error when using
sf -zon compressed files that contain very large files; reported by Terry Jolliffe
- report errors that occur during file decompression. Previously, only fatal errors encountered when a compressed file is first opened were reported. Now errors that are encountered while attempting to walk the contents of a compressed file are also reported.
- report errors for 'roy inspect' when roy can't find anything to inspect; reported by Ross Spencer
- continue on error flag (-coe) can now be used to continue scans despite fatal file errors that would normally cause scanning to halt. This may be useful e.g. for big directory scans over unreliable networks. Usage:
sf -coe DIR
- update PRONOM signatures to v92
- file scanning is now restricted to regular files (i.e. not symlinks, sockets, devices etc.). Reported by Henk Vanstappen.
- windows longpath fix now works for paths that appear short
sf -updateflag can now be used to download/update non-PRONOM signatures. Options are "loc", "tika", "freedesktop", "pronom-tika-loc", "deluxe" and "archivematica". To update a non-PRONOM signature, include the signature name as an argument after the flags e.g.
sf -update freedesktop. This command will overwrite 'default.sig' (the default signature file that sf loads). You can preserve your default signature file by providing an alternative
sf -sig notdefault.sig -update loc. If you use one of the signature options as a filename (with or without a .sig extension), you can omit the signature argument i.e.
sf -update -sig loc.sigis equivalent to
sf -sig loc.sig -update loc. Feature requested by Ross Spencer.
sf -updatenow does SHA-256 hash verification of updates and communication with the update server is via HTTPS.
- update PRONOM signatures to v91
- fixes to config package where global variables are polluted with subsquent calls to the Add(Identifier) function
- fix to reader package where panic triggered by illegal slice access in some cases
roy addnow take a
-nobyteflag to omit byte signatures from the identifier; requested by Nick Krabbenhoeft
- update Tika MIMEInfo signatures to 1.16
- update LOC to 2017-06-10
- no changes since v1.7.3, repairing Travis-CI auto-deploy of Debian packages
- sf now accepts multiple files or directories as input e.g.
sf myfile1.doc mydir myfile3.txt
- LOC signature update
- code re-organisation to export reader and writer packages
sf -replaycan now take lists of results files with
sf -replay -f list-of-results.txt
- the command
sf -replay -now works on Windows as expected e.g.
sf myfiles | sf -replay -json -
- text matcher not allocating hits to correct identifiers; fixes #101
- unescaped YAML field contains quote; reported by Ross Spencer
- PRONOM v90 update
- the -home flag was being overriden for roy subcommands due to interaction other flags
- signature updates for PRONOM, LOC and tika-mimetypes
roy inspectaccepts space as well as comma-separated lists of formats e.g.
roy inspect fmt/1 fmt/2
- log files that match particular formats with
-log fmt/1,@set2(comma separated list of format IDs/format sets). These can be mixed with regular log options e.g.
- generate a summary view of formats matched during a scan with
-log chart(or just
- replay scans from results files with
sf -replay: load one or more results files to replay logging or to convert to a different output format e.g.
sf -replay -csv results.yamlor
sf -replay -log unknown,chart,stdout results1.yaml results2.csv
- compare results with
roy comparesubcommand: view the difference between two or more results e.g.
roy compare results1.yaml results2.csv droid.csv ...
roy setscreates pronom-all.json, pronom-families.json, and pronom-types.json sets files;
roy sets -changescreates a pronom-changes.json sets file from a PRONOM release-notes.xml file;
roy sets -list @set1,@set2lists contents of a comma-separated list of format sets
roy inspect releasesprovides a summary view of a PRONOM release-notes.xml file
sf -command now scans stdin e.g.
cat mypdf.pdf | sf -. You can pass a filename in to supplement the analysis with the
cat myfile.pdf | sf -name myfile.pdf -. In previous versions of sf, the dash argument signified treating stdin as a newline separated list of filenames for scanning. Use the new
-fflag for this e.g.
sf -f myfiles.txtor
cat myfiles.txt | sf -f -; change requested by pm64
- some files cause endless scanning due to large numbers of signature hits; reported by workflowsguy
- null bytes can be written to output due to bad zip filename decoding; reported by Tim Walsh
- enable -hash, -z, and -log flags for -serve and -multi modes
- new hash, z, and sig params for -serve mode (to control per-request)
- enable droid output in -serve mode
- GET requests in -serve mode now just percent encoded (with base64 option as a param)
- -serve mode landing page now includes example forms
- code re-organisation using /internal directory to hide internal packages
- Identify method now returns a slice rather than channel of IDs (siegfried pkg change)
- graph implicit and missing priorities with
roy inspect implicit-prioritiesand
roy inspect missing-priorities
- error parsing mimeinfo signatures with double backslashes (e.g. rtf signatures)
- new sets files (pronom-families.json and pronom-types) automatically created from PRONOM classficiations. Removed redundant sets (database, audio, etc.).
- debbuilder.sh fix: debian packages were copying roy data to wrong directory
- roy inspect priorities command now includes "orphan" fmts in graphs
- update PRONOM urls from apps. to www.
- roy inspect FMT command now inspects sets e.g. roy inspect @pdfa
- roy inspect priorities command generates graphs of priority relations
- container matcher running when empty (i.e. for freedesktop/tika signature files and when -nocontainer flag used with PRONOM)
- -doubleup flag preventing signature extensions loading: since v1.3.0 signature extensions included with the -extend flag haven't been loading properly due to interaction with the doubles filter (which prevents byte signatures loading for formats that also have container signatures defined)
- use fwac rather than wac package for performance
- roy inspect FMT command speed up by building without reports and without the doubles filter
- -reports flag removed for roy harvest and roy build commands
- -reports flag changed for roy inspect command, now a boolean that, if set, will cause the signature(s) to be built from the PRONOM report(s), rather than the DROID XML file. This is slower but can be a more accurate representation.
- roy inspect FMT command now gives details of all signatures, including container signatures
- misidentification: x-fmt/45 files misidentified as fmt/40 due to repetition of elements in container file
- roy build -noreports includes blank extensions that generate false matches; reported by Ross Spencer
- poor performance unknowns due to interaction of -bof/-eof flags with known BOF/EOF calculation; reported by Ross Spencer
- unnecessary warnings for mimeinfo identifier
- add fddXML.zip to .gitattributes to preserve newlines
- various Go Report Card issues
- Travis and Appveyor CI automated deployment to Github releases and Bintray
- PRONOM v85 signatures
- LICENSE.txt, CHANGELOG.md
- Go Report Card
- golang.org/x/image/riff bug (reported here)
- misspellings reported by Go Report Card
- ineffectual assignments reported by Go Report Card
- implement Library of Congress FDD signatures (beta)
- implement RIFF matcher
- -multi flag replaces -nopriority; based on report by Ross Spencer
- change to -z output: use hash as filepath separator (and unix slash for webarchives); requested by Ross Spencer
- parsing fmt/837 signature; reported by Sarah Romkey
- implement freedesktop.org MIME-info signatures (and the Apache Tika variant)
- implement XML matcher
- file name matcher now supports glob patterns as well as file extensions
- default signature file now "default.sig" (was "pronom.sig")
- changes to YAML and JSON output: "ns" (for namespace) replaces "id", and "id" replaces "puid"
- changes to CSV output: multi-identifiers now displayed in extra columns, not extra rows
- summarise os errors; requested by Ross Spencer
- code quality: vendor external packages; implemented by Misty de Meo
- code quality: refactor textmatcher package
- code quality: refactor siegreader package
- code quality: documentation
- speed regression in TIFF mis-identification patch last release
- measure time elapsed with -log time
- percent encode file URIs in droid output
- long windows directory paths (further work on bug fixed in 1.4.2); reported by Ross Spencer
- mscfb panic; reported by Ross Spencer
- TIFF mis-identifications due to an early halt error
- new -throttle flag; requested by Ross Spencer
- errors logged to stderr by default (to quieten use -log ""); requested by Ross Spencer
- mscfb update: lazy reading
- webarchive update: decode Transfer-Encoding and Content-Encoding; requested by Dragan Espenschied
- -log replaces -debug, -slow, -unknown and -known flags (see usage above)
- highlight empty file/stream with error and warning
- negative text match overrides extension-only plain text match
- new MIME matcher; requested by Dragan Espenschied
- support warc continuations
- add all.json and tiff.json sets
- minor speed-up
- report less redundant basis information
- report error on empty file/stream
- scan within warc and arc files with -z flag; reqested by Dragan Espenschied
- sf -slow FILE | DIR reports slow signatures
- sf -version describes signature file; requested by Michelle Lindlar
- quit scanning earlier on known unknowns
- don't include byte signatures where formats have container signatures (unless -doubleup flag is given); fixes a mis-identification reported by Ross Spencer
- sf -debug output simplified
- roy -limit and -exclude now operate on text and default zip matches
- roy -nopriority re-configured to return more results
- upgraded versions of sf panic when attempting to read old signature files; reported by Stefan
- panic mmap'ing files over 1GB on Win32; reported by Duncan
- reporting extensions for folders with "."; reported by Ross Spencer
- -noext flag to roy to suppress extension matching; requested by Greg Lepore
- -known and -unknown flags for sf to output lists of recognised and unknown files respectively; requested by Greg Lepore
- support annotation of sets.json files; requested by Greg Lepore
- add warning when use -extendc without -extend
- report container extensions in details; reported by Ross Spencer
- text matcher (i.e. sf README will now report a 'Plain Text File' result)
- -notext flag to suppress text matcher (roy build -notext)
- all outputs now include file last modified time
- -hash flag with choice of md5, sha1, sha256, sha512, crc (e.g. sf -hash md5 FILE)
- -droid flag to mimic droid output (sf -droid FILE)
- detect encoding of zip filenames reported by Dragan Espenschied
- mscfb reported by Dragan Espenschied
- scan within archive formats (zip, tar, gzip) with -z flag
- format sets (e.g. roy build -exclude @pdfa)
- support bitmask patterns
- leaner, faster signature format
- mirror bof patterns as eof patterns where both roy -bof and -eof limits set
- json output
- server mode
- single quote YAML output
- optimisations (mmap, multithread, etc.)
- csv output
- periodic priority checking to stop searches earlier
- range/distance/choices bugfix
- change to signature file format
- roy (r2d2 rename) signature customisation
- parse Droid signature (not just PRONOM reports)
- support extension signatures
- support multiple identifiers
- config package
- license info in srcs (no change to license; this allows for attributing authorship for non-Richard contribs)
- default home change to "$HOME/siegfried" (no longer ".siegfried")
- mscfb bugfixes
- container matching
- cross-compile was broken (because of use of os/user). Now doing native builds on the three platforms so the download binaries should all work now.
- bug in processing code caused really bad matching profile for MP3 sigs. No need to update the tool for this, but please do a sieg -update to get the latest signature file.
- sf command line: descriptive output in YAML, including basis for matches
- optimisations inc. initial BOF loop before main matching loop
- sf command line changes: -version and -update flags now enabled
- over-the-wire updates of signature files from www.itforarchivists.com/siegfried
- replaced ac matcher with wac matcher
- re-write of bytematcher code
- some benchmarks slower but fewer really poor edge cases (see cmd/sieg/testdata/bench_results.txt)... so a win!
- but still too slow!
- an Identifier type that controls the matching process and stops on best possible match (i.e. no longer require a full file scan for all files)
- name/extension matching
- a custom reader (pkg/core/siegreader)
- benchmarks (cmd/sieg/testdata)
- simplifications to the sieg command and signature file
- optimisations that have boosted performance (see cmd/sieg/testdata/bench_results.txt). But still too slow!
- First release. Parses PRONOM signatures and performs byte matching. Bare bones CLI. Glacially slow!