-
Notifications
You must be signed in to change notification settings - Fork 31
Function to return reader of nested member #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi Byron |
Thanks Richard. I should have stated this up front, yes the file format detection specifically relying on the standards is necessary for my use case. My team and I are building a data catalog and archive for biomedical data. We are currently using Archivematica as a pipeline to prepare archive packages and it uses PRONOM as the file format standard. I need to look at the Roy tool more, but an unrelated question is how to add support for "unofficial" or non-registered file formats. We have genomic data files that we are cataloging such as VCF and FASTQ files. My assumption is that I can create a custom signature file that includes a detection mechanism for these formats? |
I've had a look at this again this morning & is definitely possible but unfortunately I think at the moment any solution would be pretty ugly and involve a lot of copy/paste of non-exported bits of the siegfried codebase: specifically the decompress.go file within the cmd/sf package & the internal/siegreader package (which is what you'd need to get an io.Reader). I'm currently working on a new release and will look at either exporting some of this stuff so can be used externally or create a helper function for this use case within the top level siegfried package. Re. a custom signature file - yes you'd use the roy tool for this. See this wiki page for instructions. Basically the steps are:
You can invoke sf with custom signatures using the -sig flag. E.g. |
Thank you. Having looked through the codebase before, I was going to start there anyway. I will trace my way back from the command entrypoint. re: sig. Great I will try this out. I appreciate it. |
Hi Byron - v1.7.9 released today now exports a decompress package that you should be able to use for your purposes. I left the siegreader package internal but exposed a public Reader() method on the siegreader.Buffer type. You can already get Buffers from the main siegfried package and with this new method you can now create io.Readers from those Buffers. See this gist for a worked example along the lines of the ReadMember func you proposed |
@richardlehane Thank you, this looks great and I really appreciate you adding support for it. I hope to give it a try tomorrow. |
Hi! I am looking to implement a function that would ideally leverage the recursive unpacking and decompression this package already does. The function signature would look something like this:
Where
path
would be the path to the source file andmember
would be the name of the member within the file whose byte stream will be returned in theio.ReadCloser
. For now,member
would be thefilename
returned by Siegfried that delimits paths by#
when denoting nested files.For example, given a (contrived) archive:
Calling
ReadMember("foo.zip", "foo.zip#dir/bar.zip#baz.csv.gz#baz.csv")
would return aio.ReadCloser
that would be the decompressed contents ofbaz.csv
.The use case is to dynamically read out portions of an archive given the semantics of Siegfried.
A more general function that would walk the members of an input, but that would need to be limited to leaves in the hierarchy.
Do you have a suggestion on how to implement this given the components available in this package?
The text was updated successfully, but these errors were encountered: