You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OGG is a good example format here as the LoC identifier expects to have one LoC record with a byte-pattern and one PRONOM sequence that should match against it as well.
If we test against the OGG skeleton the ideal result is as follows:
Note that all of the identifiers run contiguously for each other within the slice, i.e. 0026 entries are next to each other, 0027 entries follow each other, etc.
Note that 0026 entries are 4th and last in the slice. Siegfried will stop looping through the indexes to calculate the position and total number before it finds the 2nd-n identifier it is supposed to find.
The primary impact here seems to be visual. The binary pattern being used and the result returned is still accurate, but instead of:
basis : 'extension match ogg; byte match at 0, 6 (signature 2/2)'
Will be:
basis : 'extension match ogg; byte match at 0, 6'
Where we don't see the (signature 2/2) value we'd like to know which pattern matched specifically so that we can audit the results in more detail.
It might be nice to fix in a way that all implementations of Identifers could benefit? E.g. by adding a sort feature to parseable.go in the internal/identifier package.
Nice. I'm pretty confident that works as well. I've submitted your fix on the vanilla develop branch along with a proposed unit test for it. Hopefully that's not looking too bad.
When an LoC identifier is built with PRONOM the sources are added as follows:
Signatures and IDs are then returned to the identifier when it's built.
For the byte matcher this is done here.
OGG is a good example format here as the LoC identifier expects to have one LoC record with a byte-pattern and one PRONOM sequence that should match against it as well.
If we test against the OGG skeleton the ideal result is as follows:
The
Place()
function inside the identifier looks up the IDs index, and returns position and total number of signatures for the matching pattern.Again, the ideal for these indexes (I believe) should be as follows:
Note that all of the identifiers run contiguously for each other within the slice, i.e.
0026
entries are next to each other,0027
entries follow each other, etc.Actual:
What we're seeing currently is:
Note that
0026
entries are 4th and last in the slice. Siegfried will stop looping through the indexes to calculate the position and total number before it finds the 2nd-n identifier it is supposed to find.The primary impact here seems to be visual. The binary pattern being used and the result returned is still accurate, but instead of:
basis : 'extension match ogg; byte match at 0, 6 (signature 2/2)'
Will be:
basis : 'extension match ogg; byte match at 0, 6'
Where we don't see the
(signature 2/2)
value we'd like to know which pattern matched specifically so that we can audit the results in more detail.I've some sample files here to help making recreate this a little easier.
OGG skeleton fmt-203-signature-id-504.zip
Restricted FDD set which is enough to recreate the issue without being the whole set restricted-set-fddXML.zip
OGG only FDD record ogg_fddXML.zip
The text was updated successfully, but these errors were encountered: