Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: Decoder allocates (and does not release) memory for every control <?...?> encountered #37211

Open
DrRibosome opened this issue Feb 13, 2020 · 1 comment
Milestone

Comments

@DrRibosome
Copy link

@DrRibosome DrRibosome commented Feb 13, 2020

What version of Go are you using (go version)?

$ go version
go version go1.13.7 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/jcrawford/.cache/go-build"
GOENV="/home/jcrawford/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/jcrawford/git/gowork"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/jcrawford/opt/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/jcrawford/opt/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/jcrawford/git/djtest/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build917252283=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Parse an xml file that includes multiple <?xml version="1.0" encoding="ISO-8859-1" ?> control tags in the same file.

For each encountered tag, a new reader will be created and will not be released (see below for details).

It is possible to trigger this with a degenerate xml consisting of only repeated tags. As an example, I created a file of ~400kb in size containing 9030 lines of:

$ head bad.xml 
<?xml version="1.0" encoding="ISO-8859-1" ?>
<?xml version="1.0" encoding="ISO-8859-1" ?>
<?xml version="1.0" encoding="ISO-8859-1" ?>
<?xml version="1.0" encoding="ISO-8859-1" ?>
package main

import (
	"encoding/xml"
	"golang.org/x/net/html/charset"
	"io"
	"os"
	"runtime/pprof"
)

func main() {
	path := "bad.xml"

	f, err := os.Open(path)
	if err != nil {
		panic(err)
	}

	dec := xml.NewDecoder(f)
	dec.CharsetReader = charset.NewReaderLabel

	type Unused struct{}

	for {
		var value Unused
		err = dec.Decode(&value)
		if err == io.EOF {
			break
		}
	}

	heapOut, err := os.Create("heap")
	if err != nil {
		panic(err)
	}
	pprof.WriteHeapProfile(heapOut)
}

What did you expect to see?

Memory usage somewhat constant

What did you see instead?

Memory usage grows for every <?...?> tag encountered.

Profiling the heap with pprof, the problem appears to be that for every <?...?> tag encountered in xml.Decoder.rawToken, a new *charset.Reader is created through charset.NewReaderLabel. The decoder then switches to this new reader with xml.Decoder.switchToReader (and also wraps it in another *bufio.Reader). However, it looks like references to the previous readers (created when we encountered previous <?...?> tags) are never garbage collected. As a result, parsing a large file with enough <?...?> can exhaust the system memory.

As an example, running go tool pprof heap on the heap file produced above shows 44 MB of usage for the 9030 line example (which was ~400kb in size)

(pprof) top
Showing nodes accounting for 44.67MB, 100% of 44.67MB total
Showing top 10 nodes out of 12
      flat  flat%   sum%        cum   cum%
   25.10MB 56.18% 56.18%    25.10MB 56.18%  golang.org/x/text/transform.NewReader
   19.58MB 43.82%   100%    19.58MB 43.82%  bufio.NewReaderSize
         0     0%   100%    19.58MB 43.82%  bufio.NewReader
         0     0%   100%    44.67MB   100%  encoding/xml.(*Decoder).Decode
         0     0%   100%    44.67MB   100%  encoding/xml.(*Decoder).DecodeElement
         0     0%   100%    44.67MB   100%  encoding/xml.(*Decoder).Token
         0     0%   100%    44.67MB   100%  encoding/xml.(*Decoder).rawToken
         0     0%   100%    19.58MB 43.82%  encoding/xml.(*Decoder).switchToReader
         0     0%   100%    44.67MB   100%  encoding/xml.(*Decoder).unmarshal
         0     0%   100%    25.10MB 56.18%  golang.org/x/net/html/charset.NewReaderLabel
@DrRibosome DrRibosome changed the title encoding/xml decoder creates and does not release bufio.Reader for every control <? encountered encoding/xml decoder creates and does not release bufio.Reader for every control <?...?> encountered Feb 13, 2020
@DrRibosome DrRibosome changed the title encoding/xml decoder creates and does not release bufio.Reader for every control <?...?> encountered encoding/xml decoder allocates (and does not release) memory for every control <?...?> encountered Feb 13, 2020
@DrRibosome DrRibosome changed the title encoding/xml decoder allocates (and does not release) memory for every control <?...?> encountered encoding/xml Decoder allocates (and does not release) memory for every control <?...?> encountered Feb 13, 2020
@dmitshur dmitshur changed the title encoding/xml Decoder allocates (and does not release) memory for every control <?...?> encountered encoding/xml: Decoder allocates (and does not release) memory for every control <?...?> encountered Feb 15, 2020
@dmitshur

This comment has been minimized.

Copy link
Member

@dmitshur dmitshur commented Feb 15, 2020

/cc @rsc per owners.

@dmitshur dmitshur added this to the Backlog milestone Feb 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.