Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Walk over nested Archives #164

Closed
alixaxel opened this issue Apr 28, 2019 · 2 comments
Closed

Walk over nested Archives #164

alixaxel opened this issue Apr 28, 2019 · 2 comments

Comments

@alixaxel
Copy link
Contributor

What would you like to have changed?

I'd like to have the ability to Walk over io.ReadClosers and not just file paths.

Why is this feature a useful, necessary, and/or important addition to this project?

Although I understand this is somewhat of a niche need, it's conceivable that a ZIP file could contain more ZIP files nested within it. By providing a such a method, it would be possible to inspect the contents of the inner ZIP/Tar/Rar, something along the lines of:

err := archiver.Walk(path, func(file archiver.File) error {
  if (filepath.Ext(file.Name()) == ".zip") {
    err := archiver.WalkDeep(file.ReadCloser, func(file archiver.File) error {
      return nil
    })
  } else {
    // non-archive file
  }

  return nil
}

I think (but might be wrong) it's not possible to read just the inner ZIP EOCD and correctly map each entry to the correct byte offsets (due to the potential several layers of compression).

@torgabor
Copy link

I think this is a great idea, not just for the nested reads you mentioned, but for any case where you have the archive not as a file on disk, but abstracted away as an io.Reader. The only difficulty with the implementation seems to be that right now the code uses the file extension to infer the type of file, so to implement this feature, a header-based format autodetection would need to be implemented.

@mholt
Copy link
Owner

mholt commented Jan 2, 2022

I think #302, which will soon become v4 of this package, allows this because every file that you walk can be handled with an arbitrary callback function, and that function could be simply starting another walk. I rewrote the entire thing and got rid of the reliance on file extensions as well (except for optionally matching unknown files to formats, which can use extension or peeking the stream, or both). We can reopen this issue if it remains unresolved and needs more discussion.

@mholt mholt closed this as completed Jan 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants