Join GitHub today
Function to return reader of nested member #119
Hi! I am looking to implement a function that would ideally leverage the recursive unpacking and decompression this package already does. The function signature would look something like this:
func ReadMember(path string, member string) (io.ReadCloser, error)
For example, given a (contrived) archive:
The use case is to dynamically read out portions of an archive given the semantics of Siegfried.
A more general function that would walk the members of an input, but that would need to be limited to leaves in the hierarchy.
Do you have a suggestion on how to implement this given the components available in this package?
Thanks Richard. I should have stated this up front, yes the file format detection specifically relying on the standards is necessary for my use case. My team and I are building a data catalog and archive for biomedical data. We are currently using Archivematica as a pipeline to prepare archive packages and it uses PRONOM as the file format standard.
I need to look at the Roy tool more, but an unrelated question is how to add support for "unofficial" or non-registered file formats. We have genomic data files that we are cataloging such as VCF and FASTQ files. My assumption is that I can create a custom signature file that includes a detection mechanism for these formats?
I've had a look at this again this morning & is definitely possible but unfortunately I think at the moment any solution would be pretty ugly and involve a lot of copy/paste of non-exported bits of the siegfried codebase: specifically the decompress.go file within the cmd/sf package & the internal/siegreader package (which is what you'd need to get an io.Reader). I'm currently working on a new release and will look at either exporting some of this stuff so can be used externally or create a helper function for this use case within the top level siegfried package.
Re. a custom signature file - yes you'd use the roy tool for this. See this wiki page for instructions.
Basically the steps are:
You can invoke sf with custom signatures using the -sig flag. E.g.
Hi Byron - v1.7.9 released today now exports a decompress package that you should be able to use for your purposes. I left the siegreader package internal but exposed a public Reader() method on the siegreader.Buffer type. You can already get Buffers from the main siegfried package and with this new method you can now create io.Readers from those Buffers.
See this gist for a worked example along the lines of the ReadMember func you proposed