New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: very low performance in xml parser #21823

Open
243083df opened this Issue Sep 9, 2017 · 22 comments

Comments

Projects
None yet
@243083df
Copy link

243083df commented Sep 9, 2017

What version of Go are you using (go version)?

1.9

Does this issue reproduce with the latest release?

True

What operating system and processor architecture are you using (go env)?

Windows

What did you do?

I trying to parse large files with SAX with go, and get decadent performance.
I rewrite code in C#, and get maximum performance.

file, err := os.Open(filename)
handle(err)
defer file.Close()
buffer := bufio.NewReaderSize(file, 1024*1024*256) // 33554432
decoder := xml.NewDecoder(buffer)
for {
        t, _ := decoder.Token()
        if t == nil {
            break
        }
        switch se := t.(type) {
        case xml.StartElement:
            if se.Name.Local == "House" {
                house := House{}
                err := decoder.DecodeElement(&house, &se)
                handle(err)
            }
        }
    }
using (XmlReader reader = XmlReader.Create(filename)
            {
                while (reader.Read())
                {
                    switch (reader.NodeType)
                    {
                        case XmlNodeType.Element:
                            if (reader.Name == "House")
                            {
                                //Code
                            }
                            break;
                    }
                }
            }

What did you expect to see?

Mature and fast xml parser in golang.

What did you see instead?

The bottleneck in SAX xml parsing with go is CPU, instead of low HDD io performance.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Sep 9, 2017

Can you show us a complete, standalone example program? And provide your input file, or at least tell us about it?

What are the actual performance numbers?

Thanks.

@ianlancetaylor ianlancetaylor changed the title Very low performance in xml parser encoding/xml: very low performance in xml parser Sep 9, 2017

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Sep 9, 2017

@243083df

This comment has been minimized.

Copy link
Author

243083df commented Sep 9, 2017

File is very big, around 10-50 millions records.
The xml file looks like this:

<?xml version="1.0" encoding="utf-8"?>
<LIST>
<ELEMENT ATTRIBUTE1="" ATTRIBUTE2="" /><ELEMENT ATTRIBUTE1="" ATTRIBUTE2="" />
</LIST>
@a-h

This comment has been minimized.

Copy link
Contributor

a-h commented Sep 10, 2017

I was curious, since I use .NET and Go regularly, so I turned this into a reproduction at https://github.com/a-h/sax

On my MacOS machine, I found that the Go version was much slower (around 20 seconds) than the .NET Core 2.0 one (around 3 seconds) for a file with 10 million elements in it laid out as per the example above.

I put the timing output in the README.md. I noted that the CPU was at 100% during Go execution.

I added an SVG of the Go CPU profile output to the repo.

screen shot 2017-09-10 at 22 55 33

@243083df

This comment has been minimized.

Copy link
Author

243083df commented Sep 11, 2017

I think, @a-h's benchmark enough to investigate. I used same code.

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Sep 12, 2017

Change https://golang.org/cl/63390 mentions this issue: unicode: speed-up is16/is32

gopherbot pushed a commit that referenced this issue Sep 12, 2017

unicode: speed-up is16/is32
Avoid division in common case. There are 5438 ranges in unicode/tables.go
4110 of them have stride 1.
Stride 1 case got significantly faster. Other stride is a bit slower.
Measured by

import (
	"testing"
	"unicode"
)

func BenchmarkDiv1(b *testing.B) {
	rtb := &unicode.RangeTable{
		R16: []unicode.Range16{
			{0xa800, 0xdfff, 1}, // or 3
		},
	}
	for i := 0; i < b.N; i++ {
		unicode.Is(rtb, rune(0xc700))
	}
}

Div1-6  15.6ns ± 1%   9.9ns ± 1%  -36.54%  (p=0.000 n=10+10)
Div3-6  15.5ns ± 1%  16.1ns ± 1%   +3.67%  (p=0.000 n=10+10)

Helps a bit with xml parsing from issue #21823

XMLsax-6   30.9s ± 0%   29.6s ± 0%  -4.15%  (p=0.000 n=10+9)

Change-Id: Ibac1a91d7b9474d0c134b0add83e56caa62daa20
Reviewed-on: https://go-review.googlesource.com/63390
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@243083df

This comment has been minimized.

Copy link
Author

243083df commented Sep 13, 2017

By the way. Is call len(p) in https://github.com/golang/go/blob/master/src/bufio/bufio.go#L196 necessarily?

@mattn

This comment has been minimized.

Copy link
Member

mattn commented Sep 13, 2017

xml.NewDecoder already create bufio.Reader.

@243083df

This comment has been minimized.

Copy link
Author

243083df commented Sep 14, 2017

@mattn Yes, but its create bufio.Reader without minimum read value.
bufio.Reader with minimum read value around 64Mb, speed up parsing 2Gb xml from 2m40s to 2m5s(around 20%, but its not stable) on HDD.

@robfordww

This comment has been minimized.

Copy link

robfordww commented Nov 4, 2017

A lot of speed could be gained if the parser was referring to the xml byte slice instead of copying everything. I am working on such an parser for go, based on the rapidxml library in c++

@saleem-mirza

This comment has been minimized.

Copy link

saleem-mirza commented Dec 2, 2017

I am experiencing same issue. Parsing 1 GB XML takes minutes which C# version completes in few seconds ( < 10)

@suntong

This comment has been minimized.

Copy link

suntong commented Dec 31, 2017

So the go sax parser is about 6~7 times slower than the C#, i.e., much room to improve. Watching the development on this...

Anyone can confirm, whether unmarshaling into defined data structures, if possible, can speed things up? I always use the sax parser way, maybe it is time to go with the DOM parser way, as my files are not terribly big. I went with the sax parser way only because intuitively thinking, it should be faster than the DOM parser way.

@a-h

This comment has been minimized.

Copy link
Contributor

a-h commented Jan 3, 2018

Just checked the performance of various versions using Docker containers to see if 1.10beta1 is likely to improve performance (I thought I saw a few perf improvements to Unicode handling in the changelog).

docker run -it --rm -v `pwd`:/go/src/github.com/a-h/sax golang:1.10beta1 /bin/bash
docker run -it --rm -v `pwd`:/go/src/github.com/a-h/sax golang:1.9.2 /bin/bash
docker run -it --rm -v `pwd`:/go/src/github.com/a-h/sax golang:1.8 /bin/bash

Results

1.10: 0m21.383s
1.9.2: 0m21.281s
1.8: 0m27.370s

So, looks like no improvement this version.

@suntong

This comment has been minimized.

Copy link

suntong commented Jan 3, 2018

Well done!

Has anyone done a profile to see exactly where the pain-point is?
Maybe that might help pushing things a bit?

@a-h

This comment has been minimized.

Copy link
Contributor

a-h commented Jan 4, 2018

@a-h

This comment has been minimized.

Copy link
Contributor

a-h commented Mar 25, 2018

It's possible to reduce the amount of calls to the UTF8 DecodeRune etc. by keeping a cache of the names the decoder has already seen, on the basis that it's very likely those names will be seen again. This could be limited to a reasonable level, e.g. 10,000 XML names to stop it using too much RAM.

Over 1,000,000 XML elements, I saw an improvement from 1.632s to 1.431s.

$ go test -bench=Decoder -cpuprofile profile_cpu.out
processed authors:  1000000
goos: darwin
goarch: amd64
pkg: encoding/xml
BenchmarkDecoder-4             1        1437269859 ns/op        518663400 B/op   8000059 allocs/op
PASS
ok      encoding/xml    1.632s
$ go test -bench=Decoder -cpuprofile profile_cpu.out
processed authors:  1000000
goos: darwin
goarch: amd64
pkg: encoding/xml
BenchmarkDecoder-4             1        1211802156 ns/op        518656056 B/op   8000053 allocs/op
PASS
ok      encoding/xml    1.431s
func (d *Decoder) isName(s []byte) bool {
	// Check the cache first.
	if _, ok := d.names[string(s)]; ok {
		return true
	}
	// If it's not in the cache, add it if it's valid.
	v := isName(s)
	if v {
		d.names[string(s)] = true
	}
	return v
}

// Get name: /first(first|second)*/
// Do not set d.err if the name is missing (unless unexpected EOF is received):
// let the caller provide better context.
func (d *Decoder) name() (s string, ok bool) {
	d.buf.Reset()
	if !d.readName() {
		return "", false
	}

	// Now we check the characters.
	b := d.buf.Bytes()
	if !d.isName(b) {
		d.err = d.syntaxError("invalid XML name: " + string(b))
		return "", false
	}
	return string(b), true
}

@ianlancetaylor - is it worth me pursuing that as a change?

@a-h

This comment has been minimized.

Copy link
Contributor

a-h commented Mar 25, 2018

My benchmark is:

func BenchmarkDecoder(b *testing.B) {
	b.ReportAllocs()
	count := 1000000
	buffer := strings.NewReader(`<authors>` + authors(count) + `</authors`)
	for i := 0; i < b.N; i++ {
		buffer.Seek(0, io.SeekStart)
		decoder := NewDecoder(buffer)
		var authors = 0
		for {
			tok, _ := decoder.Token()
			if tok == nil {
				break
			}
			switch se := tok.(type) {
			case StartElement:
				if se.Name.Local == "author" {
					authors++
				}
			}
		}
		if authors != count {
			panic("failed to report correct number of authors")
		}
		fmt.Println("processed authors: ", authors)
	}
}

func authors(count int) string {
	buf := bytes.NewBufferString("")
	for i := 0; i < count; i++ {
		buf.WriteString(`<author name="Alan Watt" />`)
	}
	return buf.String()
}
@243083df

This comment has been minimized.

Copy link
Author

243083df commented Mar 28, 2018

@a-h I think it would be better to use slice cache isead of Read|Unread byte

@a-h

This comment has been minimized.

Copy link
Contributor

a-h commented Mar 29, 2018

Do you mean something that reads into a buffer like this? https://github.com/a-h/lexical/blob/master/input/stream.go

@nussjustin

This comment has been minimized.

Copy link
Contributor

nussjustin commented Apr 12, 2018

@a-h The code could be further optimized like this:

// Get name: /first(first|second)*/
// Do not set d.err if the name is missing (unless unexpected EOF is received):
// let the caller provide better context.
func (d *Decoder) name() (s string, ok bool) {
	d.buf.Reset()
	if !d.readName() {
		return "", false
	}

	// Now we check the characters.
	b := d.buf.Bytes()
    if s, ok = d.names[string(b)]; ok {
		return s, ok
	}
	if !isName(b) {
		d.err = d.syntaxError("invalid XML name: " + string(b))
		return "", false
	}
    s = string(b)
    d.names[s] = s
	return s, true
}

This way we can avoid the isName check and reuse the string from the d.names map without allocation.

For the existing unmarshal benchmark this gives me

name         old time/op    new time/op    delta
Unmarshal-8    11.6µs ± 1%    11.7µs ± 3%     ~     (p=0.990 n=12+14)

name         old alloc/op   new alloc/op   delta
Unmarshal-8    8.27kB ± 0%    9.13kB ± 0%  +10.40%  (p=0.000 n=15+15)

name         old allocs/op  new allocs/op  delta
Unmarshal-8       190 ± 0%       166 ± 0%  -12.63%  (p=0.000 n=15+15)

No change on the time, but at work we have a tool that parses multiple XML files, each around 250MB+, and reusing the allocated names saves about 2 seconds per file (14s -> 12s).

In my case I shaved off another second by making copyValue accept both []byte and string, avoiding an allocation for each attribute value.

But again, this doesn't reflect in the existing benchmark:

name         old time/op    new time/op    delta
Unmarshal-8    11.6µs ± 1%    11.4µs ± 1%  -1.72%  (p=0.000 n=12+12)

name         old alloc/op   new alloc/op   delta
Unmarshal-8    8.27kB ± 0%    8.11kB ± 0%  -1.93%  (p=0.000 n=15+15)

name         old allocs/op  new allocs/op  delta
Unmarshal-8       190 ± 0%       186 ± 0%  -2.11%  (p=0.000 n=15+15)
@tamerh

This comment has been minimized.

Copy link

tamerh commented Nov 15, 2018

Hi, any update on this or workaround solution? i also observed that it is slower than java.

@webern

This comment has been minimized.

Copy link

webern commented Dec 23, 2018

I have also found my way here after seeing a very large box in my profile for encoding/xml (*Decoder) unmarshal. Golang XML 'feels' a lot slower than C++ rapidxml, pugixml and expat, though I have not benchmarked.

@tamerh

This comment has been minimized.

Copy link

tamerh commented Jan 23, 2019

Hi all and @243083df, @l-we, @webern, @ajruckman, @qwantix, @evt

Regarding this issue you can try my library

https://github.com/tamerh/xml-stream-parser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment