Skip to content

Commit

Permalink
Explicitly zero fields instead of overwriting.
Browse files Browse the repository at this point in the history
  • Loading branch information
klauspost committed Nov 6, 2016
1 parent 9c4b0f5 commit 56cfeba
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 27 deletions.
33 changes: 27 additions & 6 deletions README.md
Expand Up @@ -3,15 +3,26 @@ pgzip

Go parallel gzip compression/decompression. This is a fully gzip compatible drop in replacement for "compress/gzip".

This will split compression into blocks that are compressed in parallel. This can be useful for compressing big amounts of data. The output is a standard gzip file.
This will split compression into blocks that are compressed in parallel.
This can be useful for compressing big amounts of data. The output is a standard gzip file.

The gzip decompression is modified so it decompresses ahead of the current reader. This means that reads will be non-blocking if the decompressor can keep ahead of your code reading from it. CRC calculation also takes place in a separate goroutine.
The gzip decompression is modified so it decompresses ahead of the current reader.
This means that reads will be non-blocking if the decompressor can keep ahead of your code reading from it.
CRC calculation also takes place in a separate goroutine.

You should only use this if you are (de)compressing big amounts of data, say **more than 1MB** at the time, otherwise you will not see any benefit, and it will likely be faster to use the internal gzip library.
You should only use this if you are (de)compressing big amounts of data,
say **more than 1MB** at the time, otherwise you will not see any benefit,
and it will likely be faster to use the internal gzip library
or [this package](https://github.com/klauspost/compress).

It is important to note that this library creates and reads *standard gzip files*. You do not have to match the compressor/decompressor to get the described speedups, and the gzip files are fully compatible with other gzip readers/writers.
It is important to note that this library creates and reads *standard gzip files*.
You do not have to match the compressor/decompressor to get the described speedups,
and the gzip files are fully compatible with other gzip readers/writers.

A golang variant of this is [bgzf](https://godoc.org/github.com/biogo/hts/bgzf), which has the same feature, as well as seeking in the resulting file. The only drawback is a slightly bigger overhead compared to this and pure gzip. See a comparison below.
A golang variant of this is [bgzf](https://godoc.org/github.com/biogo/hts/bgzf),
which has the same feature, as well as seeking in the resulting file.
The only drawback is a slightly bigger overhead compared to this and pure gzip.
See a comparison below.

[![GoDoc][1]][2] [![Build Status][3]][4]

Expand All @@ -22,7 +33,14 @@ A golang variant of this is [bgzf](https://godoc.org/github.com/biogo/hts/bgzf),

Installation
====
```go get github.com/klauspost/pgzip```
```go get github.com/klauspost/pgzip/...```

You might need to get/update the dependencies:

```
go get -u github.com/klauspost/compress
go get -u github.com/klauspost/crc32
```

Usage
====
Expand All @@ -36,6 +54,9 @@ with

# Changes

* Oct 6, 2016: Fixed an issue if the destination writer returned an error.
* Oct 6, 2016: Better buffer reuse, should now generate less garbage.
* Oct 6, 2016: Output does not change based on write sizes.
* Dec 8, 2015: Decoder now supports the io.WriterTo interface, giving a speedup and less GC pressure.
* Oct 9, 2015: Reduced allocations by ~35 by using sync.Pool. ~15% overall speedup.

Expand Down
43 changes: 22 additions & 21 deletions gzip.go
Expand Up @@ -11,6 +11,7 @@ import (
"hash"
"io"
"sync"
"time"

"github.com/klauspost/compress/flate"
"github.com/klauspost/crc32"
Expand Down Expand Up @@ -83,7 +84,7 @@ func (z *Writer) SetConcurrency(blockSize, blocks int) error {
z.blockSize = blockSize
z.results = make(chan result, blocks)
z.blocks = blocks
z.dstPool = sync.Pool{New: func() interface{} { return make([]byte, 0, blockSize+(blockSize)>>2) }}
z.dstPool = sync.Pool{New: func() interface{} { return make([]byte, 0, blockSize+(blockSize)>>4) }}
return nil
}

Expand Down Expand Up @@ -141,28 +142,28 @@ func (z *Writer) init(w io.Writer, level int) {
} else {
digest = crc32.NewIEEE()
}

*z = Writer{
Header: Header{
OS: 255, // unknown
},
w: w,
level: level,
digest: digest,
pushedErr: make(chan struct{}, 0),
results: make(chan result, z.blocks),
blockSize: z.blockSize,
blocks: z.blocks,
}
z.dictFlatePool = sync.Pool{
New: func() interface{} {
z.Header = Header{OS: 255}
z.w = w
z.level = level
z.digest = digest
z.pushedErr = make(chan struct{}, 0)
z.results = make(chan result, z.blocks)
z.err = nil
z.closed = false
z.Comment = ""
z.Extra = nil
z.ModTime = time.Time{}
z.wroteHeader = false
z.currentBuffer = nil
z.buf = [10]byte{}
z.prevTail = nil
z.size = 0
if z.dictFlatePool.New == nil {
z.dictFlatePool.New = func() interface{} {
f, _ := flate.NewWriterDict(w, level, nil)
return f
},
}
}
bs := z.blockSize
z.dstPool = sync.Pool{New: func() interface{} { return make([]byte, 0, bs+(bs)>>4) }}

}

// Reset discards the Writer z's state and makes it equivalent to the
Expand Down Expand Up @@ -463,7 +464,7 @@ func (z *Writer) Flush() error {

// UncompressedSize will return the number of bytes written.
// pgzip only, not a function in the official gzip package.
func (z Writer) UncompressedSize() int {
func (z *Writer) UncompressedSize() int {
return z.size
}

Expand Down

0 comments on commit 56cfeba

Please sign in to comment.