Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to determine EOF from BOF from basis? #194

Closed
ross-spencer opened this issue Jun 2, 2022 · 2 comments
Closed

Possible to determine EOF from BOF from basis? #194

ross-spencer opened this issue Jun 2, 2022 · 2 comments

Comments

@ross-spencer
Copy link
Collaborator

Given this example from a TGA: 'extension match tga; byte match at 4261283, 18' is Siegfried saying it read 4mb to read the last 18 bytes of the file, or is it seeking 18 bytes from the end of the file? TGA uses is identified using: TRUEVISION-XFILE.<null> last 18 bytes of the file. So, only 18 bytes are needed.

Does it matter? I imagined the max offsets creating small window either side of the file payload, e.g. 1000 bytes max from BOF or 500 bytes max from EOF. Establishing through SF alone requires knowing which values may be BOF or EOF?

filename : 'MARBLES.TGA'
filesize : 4261301
modified : 2022-06-02T13:14:16+02:00
errors   :
matches  :
  - ns      : 'pronom'
    id      : 'fmt/402'
    format  : 'Truevision TGA Bitmap'
    version : '2.0'
    mime    :
    basis   : 'extension match tga; byte match at 4261283, 18'
    warning :
@richardlehane
Copy link
Owner

it didn't read 4mb to match, this pattern would have been found during an EOF scan, it is just that in the basis field all offsets are reported as BOF offsets. The second value (18) is the length of the pattern match. Sometimes a signature requires multiple patterns to match e.g. a BOF and an EOF. In those cases you'll get a list of offset, length pairs e.g. byte match at [[0 14] [1822 2]]

You can of course convert it to an EOF offset by deducting it from the file size. If you want to establish a max window of BOF/EOF offsets, perhaps you could convert to EOF and assume it is an EOF offset if lower?

@ross-spencer
Copy link
Collaborator Author

You can of course convert it to an EOF offset by deducting it from the file size. If you want to establish a max window of BOF/EOF offsets, perhaps you could convert to EOF and assume it is an EOF offset if lower?

And now I'm having deja-vu! There must be something in previous emails or demystify issues talking about this.

I'll need to improve the subroutine here. This single EOF sequence is missing from that.

Ideally, I think what I'd prefer is the indicator to be provided by the tool like Siegfried, it might be something for DROID to decide on too (related to: digital-preservation/droid#773). But conceptually, I suspect what I'd like is wrong, because I'm essentially trying to think about this in terms of signature development, where it explicitly says there's an instruction to Siegfried/DROID that something is a BOF and EOF, and maybe those should be clear to be optimized too, but as a consumer of that information, Siegfried's concerns and what it displays back to the user is different? (but because we can put a heuristic into a tool like demystify or A.N. Other tool to work it out, it doesn't matter!)

Okay thanks Richard. Thinking out loud here, and caught between two different issues. Will close this and keep thinking about it.

Repository owner locked and limited conversation to collaborators Jun 3, 2022
@richardlehane richardlehane converted this issue into discussion #196 Jun 3, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants