-
Notifications
You must be signed in to change notification settings - Fork 31
Description
When an LoC identifier is built with PRONOM the sources are added as follows:
- All LoC sigs and ids are generated.
- The PRONOM identifiers we want for LoC records that have a PUID are then generated and attached.
Signatures and IDs are then returned to the identifier when it's built.
For the byte matcher this is done here.
OGG is a good example format here as the LoC identifier expects to have one LoC record with a byte-pattern and one PRONOM sequence that should match against it as well.
If we test against the OGG skeleton the ideal result is as follows:
$ ./sf ogg/fmt-203-signature-id-504.ogg
---
siegfried : 1.8.0
scandate : 2020-06-07T21:40:23-04:00
signature : default.sig
created : 2020-06-07T21:39:38-04:00
identifiers :
- name : 'loc'
details : 'fddXML.zip (2016-12-13, DROID_SignatureFile_V96.xml, container-signature-20200121.xml)'
---
filename : 'ogg/fmt-203-signature-id-504.ogg'
filesize : 62
modified : 2020-06-07T15:57:34-04:00
errors :
matches :
- ns : 'loc'
id : 'fdd000026'
format : 'Ogg File Format'
full : 'Ogg File Format'
mime : 'application/ogg'
basis : 'extension match ogg; byte match at 0, 6 (signature 2/2)'
warning : The Place() function inside the identifier looks up the IDs index, and returns position and total number of signatures for the matching pattern.
Again, the ideal for these indexes (I believe) should be as follows:
Indexes IDs (base.Place() [fdd000019 fdd000019 fdd000022 fdd000022 fdd000022 fdd000022 fdd000022 fdd000026 fdd000026 fdd000027 fdd000027 fdd000031 fdd000031]
Note that all of the identifiers run contiguously for each other within the slice, i.e. 0026 entries are next to each other, 0027 entries follow each other, etc.
Actual:
What we're seeing currently is:
Indexes IDs (base.Place() [fdd000019 fdd000022 fdd000022 fdd000022 fdd000026 fdd000027 fdd000031 fdd000019 fdd000027 fdd000022 fdd000022 fdd000031 fdd000026]
Note that 0026 entries are 4th and last in the slice. Siegfried will stop looping through the indexes to calculate the position and total number before it finds the 2nd-n identifier it is supposed to find.
The primary impact here seems to be visual. The binary pattern being used and the result returned is still accurate, but instead of:
basis : 'extension match ogg; byte match at 0, 6 (signature 2/2)'
Will be:
basis : 'extension match ogg; byte match at 0, 6'
Where we don't see the (signature 2/2) value we'd like to know which pattern matched specifically so that we can audit the results in more detail.
---
siegfried : 1.8.0
scandate : 2020-06-07T21:38:21-04:00
signature : default.sig
created : 2020-06-07T21:35:35-04:00
identifiers :
- name : 'loc'
details : 'fddXML.zip (2016-12-13, DROID_SignatureFile_V96.xml, container-signature-20200121.xml)'
---
filename : 'ogg/fmt-203-signature-id-504.ogg'
filesize : 62
modified : 2020-06-07T15:57:34-04:00
errors :
matches :
- ns : 'loc'
id : 'fdd000026'
format : 'Ogg File Format'
full : 'Ogg File Format'
mime : 'application/ogg'
basis : 'extension match ogg; byte match at 0, 6'
warning : I've some sample files here to help making recreate this a little easier.
OGG skeleton fmt-203-signature-id-504.zip
Restricted FDD set which is enough to recreate the issue without being the whole set restricted-set-fddXML.zip
OGG only FDD record ogg_fddXML.zip