-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix number parsing #27
Conversation
Remove `GOLANG_NUMBER_PARSING` and remove the imprecise parsing and fix up the actual number parsing in Go. By default, everything that looked like a number would be accepted and a lot of errors were not caught. Uints will now actually be used if numbers are above maximum int64 and below uint64 with no float point markers. Even with all the additional checks we are still faster: ``` λ benchcmp before.txt after.txt benchmark old ns/op new ns/op delta BenchmarkParseNumber/Pos/63bit-32 91.9 75.9 -17.41% BenchmarkParseNumber/Neg/63bit-32 106 77.2 -27.17% BenchmarkParseNumberFloat-32 190 72.5 -61.84% BenchmarkParseNumberFloatExp-32 212 98.6 -53.49% BenchmarkParseNumberBig-32 401 175 -56.36% BenchmarkParseNumberRandomBits-32 420 230 -45.24% BenchmarkParseNumberRandomFloats-32 305 172 -43.61% ```
These numbers were measured on a MacBook Pro equipped with a 3.1 GHz Intel Core i7. | ||
Also, to make it a fair comparison, the constant `GOLANG_NUMBER_PARSING` was set to `false` (default is `true`) | ||
These numbers were measured on a MacBook Pro equipped with a 3.1 GHz Intel Core i7. | ||
Also, to make it a fair comparison, the constant `GOLANG_NUMBER_PARSING` was set to `false` (default is `true`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This remark is now no longer applicable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the numbers aren't updated, since I still only have AVX2 available, but it should be fair to expect that we are pretty close to the numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement, nice work!
Thank you @fwessels and @klauspost! We're going to start using simdjson-go in Dgraph once this is merged. |
Cool, that's nice to hear. What sort of speed-ups are you getting? |
About 30% improvement over And anywhere from 75%-100% speed improvement with a one-pass manual parser. We're holding off on merging that one because of the added complexity, but in the future I plan to simplify it quite a bit. I wasn't as familiar with your API when I initially wrote it. |
That is nice to see. And if you have suggestions for our API, then we would be open to that. |
Remove
GOLANG_NUMBER_PARSING
and remove the imprecise parsing and fix up the actual number parsing in Go.By default, everything that looked like a number would be accepted and a lot of errors were not caught.
Uints will now actually be used if numbers are above maximum int64 and below uint64 with no float point markers.
Even with all the additional checks we are still faster:
... and full benchmarks: