Where path would be the path to the source file and member would be the name of the member within the file whose byte stream will be returned in the io.ReadCloser. For now, member would be the filename returned by Siegfried that delimits paths by # when denoting nested files.
For example, given a (contrived) archive:
Calling ReadMember("foo.zip", "foo.zip#dir/bar.zip#baz.csv.gz#baz.csv") would return a io.ReadCloser that would be the decompressed contents of baz.csv.
The use case is to dynamically read out portions of an archive given the semantics of Siegfried.
A more general function that would walk the members of an input, but that would need to be limited to leaves in the hierarchy.
Do you have a suggestion on how to implement this given the components available in this package?
The text was updated successfully, but these errors were encountered:
thanks for the issue. I'll have a think about this. But just to clarify - do you need siegfried at all in terms of its file format ID functionality or are you just trying to replicate some of the ancillary file walk/unpacking functionality from the command line tool (i.e. if you know ahead of time the member path then you also know ahead of time what formats you need to unpack?)?
Thanks Richard. I should have stated this up front, yes the file format detection specifically relying on the standards is necessary for my use case. My team and I are building a data catalog and archive for biomedical data. We are currently using Archivematica as a pipeline to prepare archive packages and it uses PRONOM as the file format standard.
I need to look at the Roy tool more, but an unrelated question is how to add support for "unofficial" or non-registered file formats. We have genomic data files that we are cataloging such as VCF and FASTQ files. My assumption is that I can create a custom signature file that includes a detection mechanism for these formats?
I've had a look at this again this morning & is definitely possible but unfortunately I think at the moment any solution would be pretty ugly and involve a lot of copy/paste of non-exported bits of the siegfried codebase: specifically the decompress.go file within the cmd/sf package & the internal/siegreader package (which is what you'd need to get an io.Reader). I'm currently working on a new release and will look at either exporting some of this stuff so can be used externally or create a helper function for this use case within the top level siegfried package.
Re. a custom signature file - yes you'd use the roy tool for this. See this wiki page for instructions.
Hi Byron - v1.7.9 released today now exports a decompress package that you should be able to use for your purposes. I left the siegreader package internal but exposed a public Reader() method on the siegreader.Buffer type. You can already get Buffers from the main siegfried package and with this new method you can now create io.Readers from those Buffers.
See this gist for a worked example along the lines of the ReadMember func you proposed