- update LOC signatures to 2017-09-28
- update PRONOM signatures to v93
- version information for MIME-info signatures (freedesktop.org and tika-mimetypes) now recorded in mime-info.json file and presented in results
- new sets file for PRONOM extensions. This creates sets like @.doc and @.txt (i.e. all PUIDs with those extensions). Allows you to do commands like
roy build -limit @.doc,@.docx
,roy inspect @.txt
andsf -log @.pdf,o DIR
- update freedesktop.org signatures to v1.9
- out of memory error when using
sf -z
on compressed files that contain very large files; reported by Terry Jolliffe - report errors that occur during file decompression. Previously, only fatal errors encountered when a compressed file is first opened were reported. Now errors that are encountered while attempting to walk the contents of a compressed file are also reported.
- report errors for 'roy inspect' when roy can't find anything to inspect; reported by Ross Spencer
- continue on error flag (-coe) can now be used to continue scans despite fatal file errors that would normally cause scanning to halt. This may be useful e.g. for big directory scans over unreliable networks. Usage:
sf -coe DIR
- update PRONOM signatures to v92
- file scanning is now restricted to regular files (i.e. not symlinks, sockets, devices etc.). Reported by Henk Vanstappen.
- windows longpath fix now works for paths that appear short
sf -update
flag can now be used to download/update non-PRONOM signatures. Options are "loc", "tika", "freedesktop", "pronom-tika-loc", "deluxe" and "archivematica". To update a non-PRONOM signature, include the signature name as an argument after the flags e.g.sf -update freedesktop
. This command will overwrite 'default.sig' (the default signature file that sf loads). You can preserve your default signature file by providing an alternative-sig
target e.g.sf -sig notdefault.sig -update loc
. If you use one of the signature options as a filename (with or without a .sig extension), you can omit the signature argument i.e.sf -update -sig loc.sig
is equivalent tosf -sig loc.sig -update loc
. Feature requested by Ross Spencer.sf -update
now does SHA-256 hash verification of updates and communication with the update server is via HTTPS.
- update PRONOM signatures to v91
- fixes to config package where global variables are polluted with subsquent calls to the Add(Identifier) function
- fix to reader package where panic triggered by illegal slice access in some cases
roy build
androy add
now take a-nobyte
flag to omit byte signatures from the identifier; requested by Nick Krabbenhoeft
- update Tika MIMEInfo signatures to 1.16
- update LOC to 2017-06-10
- no changes since v1.7.3, repairing Travis-CI auto-deploy of Debian packages
- sf now accepts multiple files or directories as input e.g.
sf myfile1.doc mydir myfile3.txt
- LOC signature update
- code re-organisation to export reader and writer packages
sf -replay
can now take lists of results files with-f
flag e.g.sf -replay -f list-of-results.txt
- the command
sf -replay -
now works on Windows as expected e.g.sf myfiles | sf -replay -json -
- text matcher not allocating hits to correct identifiers; fixes #101
- unescaped YAML field contains quote; reported by Ross Spencer
- PRONOM v90 update
- the -home flag was being overriden for roy subcommands due to interaction other flags
- signature updates for PRONOM, LOC and tika-mimetypes
roy inspect
accepts space as well as comma-separated lists of formats e.g.roy inspect fmt/1 fmt/2
- log files that match particular formats with
-log fmt/1,@set2
(comma separated list of format IDs/format sets). These can be mixed with regular log options e.g.-log unknown,fmt/1,chart
- generate a summary view of formats matched during a scan with
-log chart
(or just-log c
) - replay scans from results files with
sf -replay
: load one or more results files to replay logging or to convert to a different output format e.g.sf -replay -csv results.yaml
orsf -replay -log unknown,chart,stdout results1.yaml results2.csv
- compare results with
roy compare
subcommand: view the difference between two or more results e.g.roy compare results1.yaml results2.csv droid.csv ...
roy sets
subcommand:roy sets
creates pronom-all.json, pronom-families.json, and pronom-types.json sets files;roy sets -changes
creates a pronom-changes.json sets file from a PRONOM release-notes.xml file;roy sets -list @set1,@set2
lists contents of a comma-separated list of format setsroy inspect releases
provides a summary view of a PRONOM release-notes.xml file
- the
sf -
command now scans stdin e.g.cat mypdf.pdf | sf -
. You can pass a filename in to supplement the analysis with the-name
flag. E.g.cat myfile.pdf | sf -name myfile.pdf -
. In previous versions of sf, the dash argument signified treating stdin as a newline separated list of filenames for scanning. Use the new-f
flag for this e.g.sf -f myfiles.txt
orcat myfiles.txt | sf -f -
; change requested by pm64
- some files cause endless scanning due to large numbers of signature hits; reported by workflowsguy
- null bytes can be written to output due to bad zip filename decoding; reported by Tim Walsh
- enable -hash, -z, and -log flags for -serve and -multi modes
- new hash, z, and sig params for -serve mode (to control per-request)
- enable droid output in -serve mode
- GET requests in -serve mode now just percent encoded (with base64 option as a param)
- -serve mode landing page now includes example forms
- code re-organisation using /internal directory to hide internal packages
- Identify method now returns a slice rather than channel of IDs (siegfried pkg change)
- graph implicit and missing priorities with
roy inspect implicit-priorities
androy inspect missing-priorities
- error parsing mimeinfo signatures with double backslashes (e.g. rtf signatures)
- new sets files (pronom-families.json and pronom-types) automatically created from PRONOM classficiations. Removed redundant sets (database, audio, etc.).
- debbuilder.sh fix: debian packages were copying roy data to wrong directory
- roy inspect priorities command now includes "orphan" fmts in graphs
- update PRONOM urls from apps. to www.
- roy inspect FMT command now inspects sets e.g. roy inspect @pdfa
- roy inspect priorities command generates graphs of priority relations
- container matcher running when empty (i.e. for freedesktop/tika signature files and when -nocontainer flag used with PRONOM)
- -doubleup flag preventing signature extensions loading: since v1.3.0 signature extensions included with the -extend flag haven't been loading properly due to interaction with the doubles filter (which prevents byte signatures loading for formats that also have container signatures defined)
- use fwac rather than wac package for performance
- roy inspect FMT command speed up by building without reports and without the doubles filter
- -reports flag removed for roy harvest and roy build commands
- -reports flag changed for roy inspect command, now a boolean that, if set, will cause the signature(s) to be built from the PRONOM report(s), rather than the DROID XML file. This is slower but can be a more accurate representation.
- roy inspect FMT command now gives details of all signatures, including container signatures
- misidentification: x-fmt/45 files misidentified as fmt/40 due to repetition of elements in container file
- roy build -noreports includes blank extensions that generate false matches; reported by Ross Spencer
- poor performance unknowns due to interaction of -bof/-eof flags with known BOF/EOF calculation; reported by Ross Spencer
- unnecessary warnings for mimeinfo identifier
- add fddXML.zip to .gitattributes to preserve newlines
- various Go Report Card issues
- Travis and Appveyor CI automated deployment to Github releases and Bintray
- PRONOM v85 signatures
- LICENSE.txt, CHANGELOG.md
- Go Report Card
- golang.org/x/image/riff bug (reported here)
- misspellings reported by Go Report Card
- ineffectual assignments reported by Go Report Card
- implement Library of Congress FDD signatures (beta)
- implement RIFF matcher
- -multi flag replaces -nopriority; based on report by Ross Spencer
- change to -z output: use hash as filepath separator (and unix slash for webarchives); requested by Ross Spencer
- parsing fmt/837 signature; reported by Sarah Romkey
- implement freedesktop.org MIME-info signatures (and the Apache Tika variant)
- implement XML matcher
- file name matcher now supports glob patterns as well as file extensions
- default signature file now "default.sig" (was "pronom.sig")
- changes to YAML and JSON output: "ns" (for namespace) replaces "id", and "id" replaces "puid"
- changes to CSV output: multi-identifiers now displayed in extra columns, not extra rows
- summarise os errors; requested by Ross Spencer
- code quality: vendor external packages; implemented by Misty de Meo
- big file handling
- file handle leak; reported by Ross Spencer
- mscfb; reported by Ross Spencer
- code quality: refactor textmatcher package
- code quality: refactor siegreader package
- code quality: documentation
- speed regression in TIFF mis-identification patch last release
- measure time elapsed with -log time
- percent encode file URIs in droid output
- long windows directory paths (further work on bug fixed in 1.4.2); reported by Ross Spencer
- mscfb panic; reported by Ross Spencer
- TIFF mis-identifications due to an early halt error
- new -throttle flag; requested by Ross Spencer
- errors logged to stderr by default (to quieten use -log ""); requested by Ross Spencer
- mscfb update: lazy reading
- webarchive update: decode Transfer-Encoding and Content-Encoding; requested by Dragan Espenschied
- long windows paths; reported by Ross Spencer
- 32-bit file size overflow; reported by Ross Spencer
- -log replaces -debug, -slow, -unknown and -known flags (see usage above)
- highlight empty file/stream with error and warning
- negative text match overrides extension-only plain text match
- new MIME matcher; requested by Dragan Espenschied
- support warc continuations
- add all.json and tiff.json sets
- minor speed-up
- report less redundant basis information
- report error on empty file/stream
- scan within warc and arc files with -z flag; reqested by Dragan Espenschied
- sf -slow FILE | DIR reports slow signatures
- sf -version describes signature file; requested by Michelle Lindlar
- quit scanning earlier on known unknowns
- don't include byte signatures where formats have container signatures (unless -doubleup flag is given); fixes a mis-identification reported by Ross Spencer
- sf -debug output simplified
- roy -limit and -exclude now operate on text and default zip matches
- roy -nopriority re-configured to return more results
- upgraded versions of sf panic when attempting to read old signature files; reported by Stefan
- panic mmap'ing files over 1GB on Win32; reported by Duncan
- reporting extensions for folders with "."; reported by Ross Spencer
- -noext flag to roy to suppress extension matching; requested by Greg Lepore
- -known and -unknown flags for sf to output lists of recognised and unknown files respectively; requested by Greg Lepore
- support annotation of sets.json files; requested by Greg Lepore
- add warning when use -extendc without -extend
- report container extensions in details; reported by Ross Spencer
- text matcher (i.e. sf README will now report a 'Plain Text File' result)
- -notext flag to suppress text matcher (roy build -notext)
- all outputs now include file last modified time
- -hash flag with choice of md5, sha1, sha256, sha512, crc (e.g. sf -hash md5 FILE)
- -droid flag to mimic droid output (sf -droid FILE)
- detect encoding of zip filenames reported by Dragan Espenschied
- mscfb reported by Dragan Espenschied
- scan within archive formats (zip, tar, gzip) with -z flag
- format sets (e.g. roy build -exclude @pdfa)
- support bitmask patterns
- leaner, faster signature format
- mirror bof patterns as eof patterns where both roy -bof and -eof limits set
- (mscfb) reported by Pascal Aantz
- race condition in scorer (affected tip golang)
- user documentation
- bugfixes (mscfb, match/wac and sf)
- QA using comparator
- json output
- server mode
- single quote YAML output
- optimisations (mmap, multithread, etc.)
- csv output
- periodic priority checking to stop searches earlier
- range/distance/choices bugfix
- change to signature file format
- roy (r2d2 rename) signature customisation
- parse Droid signature (not just PRONOM reports)
- support extension signatures
- support multiple identifiers
- config package
- license info in srcs (no change to license; this allows for attributing authorship for non-Richard contribs)
- default home change to "$HOME/siegfried" (no longer ".siegfried")
- mscfb bugfixes
- container matching
- cross-compile was broken (because of use of os/user). Now doing native builds on the three platforms so the download binaries should all work now.
- bug in processing code caused really bad matching profile for MP3 sigs. No need to update the tool for this, but please do a sieg -update to get the latest signature file.
- sf command line: descriptive output in YAML, including basis for matches
- optimisations inc. initial BOF loop before main matching loop
- sf command line changes: -version and -update flags now enabled
- over-the-wire updates of signature files from www.itforarchivists.com/siegfried
- replaced ac matcher with wac matcher
- re-write of bytematcher code
- some benchmarks slower but fewer really poor edge cases (see cmd/sieg/testdata/bench_results.txt)... so a win!
- but still too slow!
- an Identifier type that controls the matching process and stops on best possible match (i.e. no longer require a full file scan for all files)
- name/extension matching
- a custom reader (pkg/core/siegreader)
- benchmarks (cmd/sieg/testdata)
- simplifications to the sieg command and signature file
- optimisations that have boosted performance (see cmd/sieg/testdata/bench_results.txt). But still too slow!
- First release. Parses PRONOM signatures and performs byte matching. Bare bones CLI. Glacially slow!