Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix number parsing #27

Merged
merged 5 commits into from
Jan 18, 2021
Merged

Fix number parsing #27

merged 5 commits into from
Jan 18, 2021

Conversation

klauspost
Copy link
Collaborator

@klauspost klauspost commented Jan 15, 2021

Remove GOLANG_NUMBER_PARSING and remove the imprecise parsing and fix up the actual number parsing in Go.

By default, everything that looked like a number would be accepted and a lot of errors were not caught.

Uints will now actually be used if numbers are above maximum int64 and below uint64 with no float point markers.

Even with all the additional checks we are still faster:

λ benchcmp before.txt after.txt
benchmark                               old ns/op     new ns/op     delta
BenchmarkParseNumber/Pos/63bit-32       91.9          75.9          -17.41%
BenchmarkParseNumber/Neg/63bit-32       106           77.2          -27.17%
BenchmarkParseNumberFloat-32            190           72.5          -61.84%
BenchmarkParseNumberFloatExp-32         212           98.6          -53.49%
BenchmarkParseNumberBig-32              401           175           -56.36%
BenchmarkParseNumberRandomBits-32       420           230           -45.24%
BenchmarkParseNumberRandomFloats-32     305           172           -43.61%

... and full benchmarks:

benchmark                                             old ns/op      new ns/op      delta
BenchmarkApache_builds-32                             137091         139556         +1.80%
BenchmarkCanada-32                                    30705862       19000003       -38.12%
BenchmarkCitm_catalog-32                              1921474        2093471        +8.95%
BenchmarkGithub_events-32                             77611          77873          +0.34%
BenchmarkGsoc_2018-32                                 1220291        1215097        -0.43%
BenchmarkInstruments-32                               366747         374568         +2.13%
BenchmarkMarine_ik-32                                 27410259       18343775       -33.08%
BenchmarkMesh-32                                      8200018        5896043        -28.10%
BenchmarkMesh_pretty-32                               9793413        6947830        -29.06%
BenchmarkNumbers-32                                   1967319        1213924        -38.30%
BenchmarkRandom-32                                    1072071        1042956        -2.72%
BenchmarkTwitter-32                                   645530         645529         -0.00%
BenchmarkTwitterescaped-32                            1014456        1022548        +0.80%

Remove `GOLANG_NUMBER_PARSING` and remove the imprecise parsing and fix up the actual number parsing in Go.

By default, everything that looked like a number would be accepted and a lot of errors were not caught.

Uints will now actually be used if numbers are above maximum int64 and below uint64 with no float point markers.

Even with all the additional checks we are still faster:

```
λ benchcmp before.txt after.txt
benchmark                               old ns/op     new ns/op     delta
BenchmarkParseNumber/Pos/63bit-32       91.9          75.9          -17.41%
BenchmarkParseNumber/Neg/63bit-32       106           77.2          -27.17%
BenchmarkParseNumberFloat-32            190           72.5          -61.84%
BenchmarkParseNumberFloatExp-32         212           98.6          -53.49%
BenchmarkParseNumberBig-32              401           175           -56.36%
BenchmarkParseNumberRandomBits-32       420           230           -45.24%
BenchmarkParseNumberRandomFloats-32     305           172           -43.61%
```
This was referenced Jan 15, 2021
These numbers were measured on a MacBook Pro equipped with a 3.1 GHz Intel Core i7.
Also, to make it a fair comparison, the constant `GOLANG_NUMBER_PARSING` was set to `false` (default is `true`)
These numbers were measured on a MacBook Pro equipped with a 3.1 GHz Intel Core i7.
Also, to make it a fair comparison, the constant `GOLANG_NUMBER_PARSING` was set to `false` (default is `true`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This remark is now no longer applicable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the numbers aren't updated, since I still only have AVX2 available, but it should be fair to expect that we are pretty close to the numbers.

Copy link
Contributor

@fwessels fwessels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement, nice work!

@karlmcguire
Copy link

karlmcguire commented Jan 15, 2021

Thank you @fwessels and @klauspost! We're going to start using simdjson-go in Dgraph once this is merged.

@fwessels
Copy link
Contributor

Cool, that's nice to hear. What sort of speed-ups are you getting?

@karlmcguire
Copy link

karlmcguire commented Jan 15, 2021

Cool, that's nice to hear. What sort of speed-ups are you getting?

About 30% improvement over encoding/json just by using Iter.Root and converting the root element into map[string]interface{} or []interface{} and passing it to our parser.

And anywhere from 75%-100% speed improvement with a one-pass manual parser. We're holding off on merging that one because of the added complexity, but in the future I plan to simplify it quite a bit. I wasn't as familiar with your API when I initially wrote it.

@fwessels
Copy link
Contributor

That is nice to see. And if you have suggestions for our API, then we would be open to that.

@klauspost klauspost merged commit d846a83 into minio:master Jan 18, 2021
@klauspost klauspost deleted the fix-number-parsing branch January 18, 2021 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants