Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite entire package #99

Merged
merged 19 commits into from
Nov 7, 2018
Merged

Rewrite entire package #99

merged 19 commits into from
Nov 7, 2018

Conversation

mholt
Copy link
Owner

@mholt mholt commented Nov 6, 2018

Alright, this little library has become quite an important tool for me over the last couple years and it's time to give it some TLC.

For background, see #90 (and frankly all other open and even closed issues).

This change supersedes all open PRs. Any open PRs will need to be reconsidered and, if still relevant, rebased.

Closes #90. <-- main issue
Closes #22.
Closes #39.
Closes #58.
Closes #61.
Closes #68.
Closes #80.
Closes #97.

Format progress

  • .zip
  • .tar
  • .gz
  • .bz2
  • .xz
  • .lz4
  • .sz
  • .tar.gz
  • .tar.bz2
  • .tar.xz
  • .tar.lz4
  • .tar.sz
  • .rar (extract-only)

Other features progress

  • CLI
  • Port existing tests

Highlights

  • Archivers are created like this (most fields optional - there's also DefaultZip, etc, with some sane defaults that are ready to use):
z := archive.Zip{
	CompressionLevel:       flate.DefaultCompression,
	MkdirAll:               true,
	SelectiveCompression:   true,
	ContinueOnError:        false,
	OverwriteExisting:      false,
	ImplicitTopLevelFolder: true,
}
  • They can create an archive like this:
err := z.Archive([]string{"file1.txt", "../file2.txt"}, "test.zip")
  • When creating archives, file extension MUST match the archive type.

  • They can open and extract whole archives like this:

err = z.Unarchive("test.zip", "/Users/matt/Desktop")
  • They can traverse/walk/inspect archives without extracting them:
err = z.Walk("test.zip", func(f archive.File) error {
	zfh, ok := f.Header.(zip.FileHeader)
	if ok {
		fmt.Println(zfh.Name)
	}
	return nil
})
  • Inside a walk function, you can even read the contents of files if you want (yep, that's right, they're io.ReadClosers! - they get closed for you when the walk function returns):
_, err = io.Copy(w, f)
  • They can extract single files (or folders, recursively) from an archive:
err = z.Extract("test.zip", "testdata/myfile.txt", "/Users/matt/Desktop")
  • They can even read and write archives in a streaming fashion, file-by-file, to any writer or from any reader:
// STREAMING WRITE EXAMPLE:
err = z.Create(out)
if err != nil {
	return err
}
defer z.Close()
// ... open a file, or any io.ReadCloser ...
err = z.Write(File{
	FileInfo: archive.FileInfo{
		FileInfo:   file.Info(),
		CustomName: "name/in/archive.txt",
	},
	ReadCloser: file,
})

// STREAMING READ EXAMPLE:
// (can be any io.Reader in general, but zip requires io.ReaderAt and knowing the size)
err = z.Open(file, fileInfo.Size())
if err != nil {
	return err
}
defer z.Close()

for {
	f, err := z.Read()
	if err == io.EOF {
		break
	}
	if err != nil {
		return err
	}
	// you can access the header now through f.Header or
	// read the contents as f is also an io.ReadCloser, then
	// close f when finished
	err = f.Close()
}

It should work similarly for the other archive formats.

Single-file compression and decompression is possible, too.

This PR includes a fully functional CLI. One feature request I would still add is the ability to stream compressors and decompressors through the stdin and stdout, but that can come later. Very easy to add.

We can potentially implement options to configure symlink handling and even scan for zip-slip patterns and prevent an ugly extraction (but we cannot fix zip-slip implicitly, unfortunately).

archive/tar.go Outdated
}

// Compile-time checks to ensure type implements desired interfaces.
var (
Copy link

@komuw komuw Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learn something new everyday.
thanks for this. 👍

@mholt
Copy link
Owner Author

mholt commented Nov 6, 2018

I decided to leave the package name the same, rather than introduce a different import path, etc.

However the command name is now "arc" (I know it's not 100% unique or original -- but I didn't like typing out archiver so often).

This will be Archiver 3.0.

@mholt mholt changed the title Rewrite entire package (WIP - not ready for merge) Rewrite entire package (needs review) Nov 6, 2018
The archivers' walks now skip the output archive, if contained in
one of the source directory trees. For zip files, it would fill the
disk and for tar files it would be included in the archive, about
10 KB of it...

Tar now closes the Reader/Writer wrapper last, after the inner
tar writer/reader has been closed. Otherwise result is a corrupted
archive that could be opened by archiver, but not by the OS.
# Conflicts:
#	archiver_test.go
#	bz2.go
#	cmd/archiver/main.go
#	gz.go
@mholt mholt removed the help wanted label Nov 7, 2018
@mholt mholt self-assigned this Nov 7, 2018
@mholt mholt changed the title Rewrite entire package (needs review) Rewrite entire package Nov 7, 2018
@mholt mholt merged commit c8b6307 into master Nov 7, 2018
sameersbn pushed a commit to sameersbn/kube-prod-runtime that referenced this pull request Nov 12, 2018
anguslees pushed a commit to vmware-archive/kube-prod-runtime that referenced this pull request Nov 12, 2018
sameersbn pushed a commit to sameersbn/kube-prod-runtime that referenced this pull request Nov 13, 2018
see mholt/archiver#99

(cherry picked from commit e34ed02)
Signed-off-by: Sameer Naik <sameer@bitnami.com>
@mholt mholt deleted the rewrite branch November 16, 2018 02:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment