Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/zip: Writer.Copy references entire Reader data #65499

Closed
jba opened this issue Feb 3, 2024 · 3 comments
Closed

archive/zip: Writer.Copy references entire Reader data #65499

jba opened this issue Feb 3, 2024 · 3 comments
Labels
NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@jba
Copy link
Contributor

jba commented Feb 3, 2024

Go version

go1.22-20240109-RC01 linux/amd64

Output of go env in your module/workspace:

GO111MODULE='on'
GOARCH='amd64'
GOBIN=''
GOCACHE='/usr/local/google/home/jba/.cache/go-build'
GOENV='/usr/local/google/home/jba/.config/go/env'
GOEXE=''
GOEXPERIMENT='fieldtrack,boringcrypto'
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/usr/local/google/home/jba/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/usr/local/google/home/jba/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org'
GOROOT='/usr/lib/google-golang'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/lib/google-golang/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22-20240109-RC01 cl/597041403 +dcbe772469 X:fieldtrack,boringcrypto'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/usr/local/google/home/jba/repos/work/go.mod'
GOWORK='/usr/local/google/home/jba/repos/work/go.work'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1055306189=/tmp/go-build -gno-record-gcc-switches'

What did you do?

The following program is zipmem.go:

package main

import (
	"archive/zip"
	"bytes"
	"io"
	"log"
	"os"
	"path/filepath"
	"runtime"
	"runtime/pprof"
)

func main() {
	file := filepath.Join(runtime.GOROOT(), "lib/time/zoneinfo.zip")
	data, err := os.ReadFile(file)
	if err != nil {
		log.Fatal(err)
	}
	zr, err := zip.NewReader(bytes.NewReader(data), int64(len(data)))
	if err != nil {
		log.Fatal(err)
	}
	zw := zip.NewWriter(io.Discard)
	if err := zw.Copy(zr.File[0]); err != nil {
		log.Fatal(err)
	}
	writeHeapProfile()
	if err := zw.Close(); err != nil {
		log.Fatal(err)
	}
}

func writeHeapProfile() {
	const filename = "heap.prof"
	f, err := os.Create(filename)
	if err != nil {
		log.Fatal(err)
	}
	runtime.GC()
	if err := pprof.WriteHeapProfile(f); err != nil {
		log.Fatal(err)
	}
	if err := f.Close(); err != nil {
		log.Fatal(err)
	}
}

Shell command:

> go build zipmem.go && ./zipmem && go tool pprof zipmem heap.prof

Inside pprof:

(pprof) top

What did you see happen?

Showing nodes accounting for 1761.95kB, 100% of 1761.95kB total
      flat  flat%   sum%        cum   cum%
 1024.17kB 58.13% 58.13%  1024.17kB 58.13%  archive/zip.(*Reader).init
  737.78kB 41.87%   100%   737.78kB 41.87%  os.ReadFile
         0     0%   100%  1024.17kB 58.13%  archive/zip.NewReader
         0     0%   100%  1761.95kB   100%  main.main
         0     0%   100%  1761.95kB   100%  runtime.main

What did you expect to see?

The memory profile is written after a GC should have freed up the zip.Reader. So it's surprising to see the memory allocated by os.ReadFile is still there.

I believe the reason is this line in Writer.Copy:

fw, err := w.CreateRaw(&f.FileHeader)

CreateRaw stores the FileHeader pointer in the Writer w. The code that populates the FileHeader takes care to ensure that it shares no memory with the zip.Reader from which it came. Unfortunately, FileHeader is embedded in File, which does refer to the underlying reader:

type File struct {
    FileHeader
    ...
    zipr         io.ReaderAt // the argument to NewReader
    ...
}

I believe the pointer to the embedded FileHeader causes the entire File struct to be retained, including the bytes used for the reader.


The problem can be worked around by copying the embedded FileHeader. Luckily, Writer.Copy is a convenience method whose functionality can be written outside the standard library:

func zipWriterCopy(w *zip.Writer, f *zip.File) error {
	r, err := f.OpenRaw()
	if err != nil {
		return err
	}
	fh := f.FileHeader
	fw, err := w.CreateRaw(&fh)
	if err != nil {
		return err
	}
	_, err = io.Copy(fw, r)
	return err
}

I believe Writer.Copy should be fixed in this way. I don't think the fix should be in Writer.CreateRaw because users may depend on the identity of its argument. But its documentation should be updated to alert users to the problem.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/560238 mentions this issue: archive/zip: reduce memory held by Writer.Copy

@seankhliao seankhliao added Performance NeedsFix The path to resolution is known, but the work has not been done. labels Feb 3, 2024
gopherbot pushed a commit that referenced this issue Feb 6, 2024
Make a copy of the argument File's FileHeader, and pass a pointer
to the copy to CreateRaw.

Passing the pointer directly causes the entire `File` to be referenced
by the receiver. The `File` includes a reference to the `ReaderAt`
underlying the `Reader`, so all its memory, which may be the entire
contents of the archive, is prevented from being garbage-collected.

Also, explain the issue in the doc comment for CreateRaw. We
cannot change its behavior because someone may depend on the
preserving the identity of its argument pointer.

For #65499.

Change-Id: Ieb4963a0ea30539d597547d3511accbd8c6b5c5a
Reviewed-on: https://go-review.googlesource.com/c/go/+/560238
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Damien Neil <dneil@google.com>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
Make a copy of the argument File's FileHeader, and pass a pointer
to the copy to CreateRaw.

Passing the pointer directly causes the entire `File` to be referenced
by the receiver. The `File` includes a reference to the `ReaderAt`
underlying the `Reader`, so all its memory, which may be the entire
contents of the archive, is prevented from being garbage-collected.

Also, explain the issue in the doc comment for CreateRaw. We
cannot change its behavior because someone may depend on the
preserving the identity of its argument pointer.

For golang#65499.

Change-Id: Ieb4963a0ea30539d597547d3511accbd8c6b5c5a
Reviewed-on: https://go-review.googlesource.com/c/go/+/560238
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Damien Neil <dneil@google.com>
@seankhliao seankhliao added this to the Unplanned milestone Jul 13, 2024
@andrewmbenton
Copy link
Contributor

It looks like this was merged.

@ianlancetaylor
Copy link
Contributor

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

5 participants