- enable -hash, -z, and -log flags for -serve and -multi modes
- new hash, z, and sig params for -serve mode (to control per-request)
- enable droid output in -serve mode
- GET requests in -serve mode now just percent encoded (with base64 option as a param)
- -serve mode landing page now includes example forms
- code re-organisation using /internal directory to hide internal packages
- Identify method now returns a slice rather than channel of IDs (siegfried pkg change)
- graph implicit and missing priorities with
roy inspect implicit-priorities
androy inspect missing-priorities
- error parsing mimeinfo signatures with double backslashes (e.g. rtf signatures)
- new sets files (pronom-families.json and pronom-types) automatically created from PRONOM classficiations. Removed redundant sets (database, audio, etc.).
- debbuilder.sh fix: debian packages were copying roy data to wrong directory
- roy inspect priorities command now includes "orphan" fmts in graphs
- update PRONOM urls from apps. to www.
- roy inspect FMT command now inspects sets e.g. roy inspect @pdfa
- roy inspect priorities command generates graphs of priority relations
- container matcher running when empty (i.e. for freedesktop/tika signature files and when -nocontainer flag used with PRONOM)
- -doubleup flag preventing signature extensions loading: since v1.3.0 signature extensions included with the -extend flag haven't been loading properly due to interaction with the doubles filter (which prevents byte signatures loading for formats that also have container signatures defined)
- use fwac rather than wac package for performance
- roy inspect FMT command speed up by building without reports and without the doubles filter
- -reports flag removed for roy harvest and roy build commands
- -reports flag changed for roy inspect command, now a boolean that, if set, will cause the signature(s) to be built from the PRONOM report(s), rather than the DROID XML file. This is slower but can be a more accurate representation.
- roy inspect FMT command now gives details of all signatures, including container signatures
- misidentification: x-fmt/45 files misidentified as fmt/40 due to repetition of elements in container file
- roy build -noreports includes blank extensions that generate false matches; reported by Ross Spencer
- poor performance unknowns due to interaction of -bof/-eof flags with known BOF/EOF calculation; reported by Ross Spencer
- unnecessary warnings for mimeinfo identifier
- add fddXML.zip to .gitattributes to preserve newlines
- various Go Report Card issues
- Travis and Appveyor CI automated deployment to Github releases and Bintray
- PRONOM v85 signatures
- LICENSE.txt, CHANGELOG.md
- Go Report Card
- golang.org/x/image/riff bug (reported here)
- misspellings reported by Go Report Card
- ineffectual assignments reported by Go Report Card
- implement Library of Congress FDD signatures (beta)
- implement RIFF matcher
- -multi flag replaces -nopriority; based on report by Ross Spencer
- change to -z output: use hash as filepath separator (and unix slash for webarchives); requested by Ross Spencer
- parsing fmt/837 signature; reported by Sarah Romkey
- implement freedesktop.org MIME-info signatures (and the Apache Tika variant)
- implement XML matcher
- file name matcher now supports glob patterns as well as file extensions
- default signature file now "default.sig" (was "pronom.sig")
- changes to YAML and JSON output: "ns" (for namespace) replaces "id", and "id" replaces "puid"
- changes to CSV output: multi-identifiers now displayed in extra columns, not extra rows
- summarise os errors; requested by Ross Spencer
- code quality: vendor external packages; implemented by Misty de Meo
- big file handling
- file handle leak; reported by Ross Spencer
- mscfb; reported by Ross Spencer
- code quality: refactor textmatcher package
- code quality: refactor siegreader package
- code quality: documentation
- speed regression in TIFF mis-identification patch last release
- measure time elapsed with -log time
- percent encode file URIs in droid output
- long windows directory paths (further work on bug fixed in 1.4.2); reported by Ross Spencer
- mscfb panic; reported by Ross Spencer
- TIFF mis-identifications due to an early halt error
- new -throttle flag; requested by Ross Spencer
- errors logged to stderr by default (to quieten use -log ""); requested by Ross Spencer
- mscfb update: lazy reading
- webarchive update: decode Transfer-Encoding and Content-Encoding; requested by Dragan Espenschied
- long windows paths; reported by Ross Spencer
- 32-bit file size overflow; reported by Ross Spencer
- -log replaces -debug, -slow, -unknown and -known flags (see usage above)
- highlight empty file/stream with error and warning
- negative text match overrides extension-only plain text match
- new MIME matcher; requested by Dragan Espenschied
- support warc continuations
- add all.json and tiff.json sets
- minor speed-up
- report less redundant basis information
- report error on empty file/stream
- scan within warc and arc files with -z flag; reqested by Dragan Espenschied
- sf -slow FILE | DIR reports slow signatures
- sf -version describes signature file; requested by Michelle Lindlar
- quit scanning earlier on known unknowns
- don't include byte signatures where formats have container signatures (unless -doubleup flag is given); fixes a mis-identification reported by Ross Spencer
- sf -debug output simplified
- roy -limit and -exclude now operate on text and default zip matches
- roy -nopriority re-configured to return more results
- upgraded versions of sf panic when attempting to read old signature files; reported by Stefan
- panic mmap'ing files over 1GB on Win32; reported by Duncan
- reporting extensions for folders with "."; reported by Ross Spencer
- -noext flag to roy to suppress extension matching; requested by Greg Lepore
- -known and -unknown flags for sf to output lists of recognised and unknown files respectively; requested by Greg Lepore
- support annotation of sets.json files; requested by Greg Lepore
- add warning when use -extendc without -extend
- report container extensions in details; reported by Ross Spencer
- text matcher (i.e. sf README will now report a 'Plain Text File' result)
- -notext flag to suppress text matcher (roy build -notext)
- all outputs now include file last modified time
- -hash flag with choice of md5, sha1, sha256, sha512, crc (e.g. sf -hash md5 FILE)
- -droid flag to mimic droid output (sf -droid FILE)
- detect encoding of zip filenames reported by Dragan Espenschied
- mscfb reported by Dragan Espenschied
- scan within archive formats (zip, tar, gzip) with -z flag
- format sets (e.g. roy build -exclude @pdfa)
- support bitmask patterns
- leaner, faster signature format
- mirror bof patterns as eof patterns where both roy -bof and -eof limits set
- (mscfb) reported by Pascal Aantz
- race condition in scorer (affected tip golang)
- user documentation
- bugfixes (mscfb, match/wac and sf)
- QA using comparator
- json output
- server mode
- single quote YAML output
- optimisations (mmap, multithread, etc.)
- csv output
- periodic priority checking to stop searches earlier
- range/distance/choices bugfix
- change to signature file format
- roy (r2d2 rename) signature customisation
- parse Droid signature (not just PRONOM reports)
- support extension signatures
- support multiple identifiers
- config package
- license info in srcs (no change to license; this allows for attributing authorship for non-Richard contribs)
- default home change to "$HOME/siegfried" (no longer ".siegfried")
- mscfb bugfixes
- container matching
- cross-compile was broken (because of use of os/user). Now doing native builds on the three platforms so the download binaries should all work now.
- bug in processing code caused really bad matching profile for MP3 sigs. No need to update the tool for this, but please do a sieg -update to get the latest signature file.
- sf command line: descriptive output in YAML, including basis for matches
- optimisations inc. initial BOF loop before main matching loop
- sf command line changes: -version and -update flags now enabled
- over-the-wire updates of signature files from www.itforarchivists.com/siegfried
- replaced ac matcher with wac matcher
- re-write of bytematcher code
- some benchmarks slower but fewer really poor edge cases (see cmd/sieg/testdata/bench_results.txt)... so a win!
- but still too slow!
- an Identifier type that controls the matching process and stops on best possible match (i.e. no longer require a full file scan for all files)
- name/extension matching
- a custom reader (pkg/core/siegreader)
- benchmarks (cmd/sieg/testdata)
- simplifications to the sieg command and signature file
- optimisations that have boosted performance (see cmd/sieg/testdata/bench_results.txt). But still too slow!
- First release. Parses PRONOM signatures and performs byte matching. Bare bones CLI. Glacially slow!