Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Decoding Performance #44833

Closed
albertnanda opened this issue Mar 6, 2021 · 4 comments
Closed

JSON Decoding Performance #44833

albertnanda opened this issue Mar 6, 2021 · 4 comments

Comments

@albertnanda
Copy link

Go Version: go version go1.14 linux/amd64
go env

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/nlp/.cache/go-build"
GOENV="/home/nlp/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/nlp/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build531942507=/tmp/go-build -gno-record-gcc-switches"

I was quite surprised to see that JSON decoding in python is 2x faster than Go. Here is my setup:
To generate the json please run the following code:

import pandas as pd
d = [{'sentences': ['Sent', 'Sent1', 'Sent2'], 'content_id': 'String', 'id': 'String', 'date': '2000-03-28', 'label': '1', 'categories': ['String']}]
d = d*200000
df = pd.DataFrame(d)
df.to_json("./dummy.json",orient='records',lines=True)

Here is the comparison between python and go:

import json
with open("./dummy.json") as f:
    for index,line in enumerate(f):
        if index==100000:
            break
        json.loads(line)
package main

import (
	"encoding/json"
	"os"
)

func main() {
	file, _ := os.Open("./dummy.json")
	defer file.Close()
	data := map[string]interface{}{}
	decoder := json.NewDecoder(file)
	for i := 0; i < 100000; i++ {
		decoder.Decode(&data)
	}
}

Results:
Python:

real    0m0.425s
user    0m0.266s
sys     0m0.094s

Go(encoding/json)

real    0m0.591s
user    0m0.438s
sys     0m0.250s

Python seems to outperform Go. Also, the results don't vary even when I use the results i.e.

import json
result=[]
with open("./dummy.json") as f:
    for index,line in enumerate(f):
        if index==100000:
            break
        result.append(json.loads(line))
package main

import (
	"os"

	"github.com/goccy/go-json"
)

func main() {
	file, _ := os.Open("./dummy.json")
	defer file.Close()
	//data := map[string]interface{}{}
	result := make([]map[string]interface{}, 100000)
	decoder := json.NewDecoder(file)
	for i := 0; i < 100000; i++ {
		decoder.Decode(&result[i])
	}
}

Python Perf:

real    0m0.738s
user    0m0.500s
sys     0m0.172s

Go:

real    0m1.156s
user    0m1.016s
sys     0m0.328s
@ALTree
Copy link
Member

ALTree commented Mar 6, 2021

Note that the Python JSON decoder is written in C, it's not pure Python. So you're comparing Go with C.

@fzipp
Copy link
Contributor

fzipp commented Mar 6, 2021

How did you run these benchmarks? This looks like time output. I hope you did not measure go run.

@dsnet
Copy link
Member

dsnet commented Mar 6, 2021

See #40128 and #40128 (comment) in particular. There's work on a a new decoder that is significantly faster.

@dsnet
Copy link
Member

dsnet commented Mar 6, 2021

Going to close this as a duplicate of all the other performance related issues that are more specific as this issue doesn't present a concrete suggestion on what to do for improving performance.

@dsnet dsnet closed this as completed Mar 6, 2021
@golang golang locked and limited conversation to collaborators Mar 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants