Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch the simdjson code to On Demand #272

Merged
merged 3 commits into from
Oct 13, 2020
Merged

Switch the simdjson code to On Demand #272

merged 3 commits into from
Oct 13, 2020

Conversation

lemire
Copy link
Contributor

@lemire lemire commented Oct 13, 2020

The simdjson library is already included in the kostya/benchmarks repository. However, the simdjson library can be run in two modes: the DOM API and the On Demand API. The DOM API is currently used in kostya/benchmarks. However, it is not a scenario that is favorable to a DOM API since it is clearly unnecessary to build a DOM to satisfy the benchmark. The On Demand API is better suited. Thus this PR switches the code from the DOM API to the On Demand API. We expect better performance. Furthermore, while at it, we tweaked the Makefile so that it checks out a specific commit from the simdjson repository, thus ensuring that no matter what happens upstream, code won't break in kostya/benchmarks.

Credit to @jkeiser

json/Makefile Outdated Show resolved Hide resolved
json/Makefile Outdated Show resolved Hide resolved
json/test_simdjson.cpp Outdated Show resolved Hide resolved
@nuald nuald merged commit e9ac736 into kostya:master Oct 13, 2020
@nuald
Copy link
Collaborator

nuald commented Oct 13, 2020

The new numbers will be provided within the new maintenance update (it's coming soon).

@kostya
Copy link
Owner

kostya commented Oct 14, 2020

Question, on demand means it depends on order of x,y,z or not?

@TkTech
Copy link

TkTech commented Oct 14, 2020

That is correct, the downside to the on-demand API is that it is forward-only. You can skip elements, but you must go in order.

@kostya
Copy link
Owner

kostya commented Oct 14, 2020

thats look like quite limited usage.

@beached
Copy link
Contributor

beached commented Oct 15, 2020

Doesn't that mean that

{"y": 0.49939, "x": 0.499274, "z": 0.499865}

Will fail to parse?

@lemire
Copy link
Contributor Author

lemire commented Oct 15, 2020

fail to parse?

If the input fails to match the expected JSON dialect, then an exception (simdjson::simdjson_error) is thrown.

Alternatives in this repository might segfault given such a variation of the dialect.

@nuald
Copy link
Collaborator

nuald commented Oct 16, 2020

@beached Yes, the version in this PR fails to parse that JSON. I find it quite confusing too, so fixed that in merged PR #273 . It's still on-demand API, but doesn't have (x, y, z) order sensitivity (so, it's closer to SAX-like parsing now). I think that order-insensitivity is a good trait for the JSON parser to have, and it's worth testing it, so created ticket #274.

@lemire If you don't mind, I've added the DOM parser code too (but little bit improved it), so now users could compare performance of both implementation. The numbers are posted within the aforementioned PR. I think the current behaviour of On-Demand algorithm has its uses, but at the same time it's quite risky as any changes in the content could break the parsing code. And a lot of people use JSON because of its flexibility, otherwise other formats like MessagePack or Protocol Buffers could be more appropriate. Not going to argue about it though, however fixed the order-sensitivity based on simdjson documentation.

@beached
Copy link
Contributor

beached commented Oct 16, 2020

The only time I have seen the order matter is something like geojson, which does have similarities to the benchmark. But that was encoded as 3 doubles in an array.

@jkeiser
Copy link

jkeiser commented Oct 16, 2020

If we're going to specify that points are order insensitive, which is totally fine, we should run some points through the benchmark that have a different order. @lemire noted there are implementations here that crash when that happens, indicating they may be order sensitive. It does make a real difference--an order sensitive parser can be objectively faster, so we'd be comparing apples and oranges.

Another performance impacting variation is missing fields: if there is no z, do we expect the json to be rejected, or for z to be treated as if it were 0? This is not academic: rejecting the input will definitely take extra code and impact performance. We probably need to validate that existing parsers handle this as well, whichever choice is made. The code in this PR, for example, would reject any input with a missing x, y or z. A SAX-like implementation would generally have to do extra work to validate this.

@jkeiser
Copy link

jkeiser commented Oct 16, 2020

Aaand I see that you anticipated the need to test :)

@beached
Copy link
Contributor

beached commented Oct 16, 2020

Curious, did not see any changes on the implementations of the test for JSON, did any of them fail the test portion when tested with an out of order member? Or is that a future PR?

@lemire
Copy link
Contributor Author

lemire commented Oct 16, 2020

an order sensitive parser can be objectively faster

I don't think that's true. At least not in the current scenario.

Suppose that you have inputs like this...

    {
      "x": 0.007869781476396165,
      "y": 0.08747439841428784,
      "z": 0.69873150960533,

Normally, you check for "x", then your check for "y" and then you check for "z". But when you check for "x", you have to have a branch for the case where "x" is not found... in that branch (that is never taken) you can easily go for "y" and "z". There is no reason for this key-reordering code to be slower if it is (effectively) never called. Now, that's treating "x", "y", and "z" as arbitrary keys, but of course, they are not arbitrary and you have even more room for optimizations if you'd like.

@jkeiser
Copy link

jkeiser commented Oct 16, 2020

That's true, if you optimize around a "default ordering" your only cost would be some extra code for the non default case.

@lemire
Copy link
Contributor Author

lemire commented Oct 16, 2020

@jkeiser So the code checked in currently is suboptimal. It is fine code based on our documentation, so no blame... but it is still suboptimal. We could be 10% faster at least. (I am not complaining, just observing.)

@nuald
Copy link
Collaborator

nuald commented Oct 16, 2020

I've added tests for the order-sensitivity (#275 ), no other JSON parsers required code changes, thus all parsers but simdjson On-Demand have the order insensitivity. I think it's expected as JSON's RFC 7159 states that "an object is an unordered collection of zero or more name/value pairs".

@beached I've meant that the previous PR had order insensitivity tests for simdjson only. The new one (#275) have tests for all other benchmarks.

@jkeiser
Copy link

jkeiser commented Oct 16, 2020

Thanks tons @nuald!

@jkeiser
Copy link

jkeiser commented Oct 16, 2020

Hey @nuald , I couldn't find an explanation of what energy is aiming to measure, but I noticed simdjson is using significantly less energy even though it's the same speed as some others. Any idea why that might be? My conjecture is that it's measuring the initial read from disk as low power, since simdjson presently requires the entire file to be read in before parsing, causing parsing to block entirely on the initial read from disk (we're working on that :).

It also brings up a metric that might highlight some other parsers' strengths in this regard: time to first result. For truly streaming parsers you should be able to read the first point much earlier than others, and there are real applications for a parser that can stream results quickly.

I was going to suggest that such parsers could use less memory, as well because they don't need the whole input file, but unless you can parse faster than the file can be loaded, buffer management is probably tricky. Not truly sure on that one.

@beached
Copy link
Contributor

beached commented Oct 16, 2020

Isn't the memory/energy done for all of them after they read the file to a buffer? That has come up often and is done so that the memory of the buffer is consistent and the read is not part of the measurement. If you look at a couple of them, the memory used is purely about the size of the buffer in memory

@lemire
Copy link
Contributor Author

lemire commented Oct 16, 2020

@beached My understanding is indeed that they all start from a pre-loaded JSON string, so there is no disk read during the benchmark for any of them.

@jkeiser Look at the benchmarks again: I think you load the file. Then tell some program that you are going to be parsing, then parse, then tell it you are done.

@nuald
Copy link
Collaborator

nuald commented Oct 16, 2020

The energy is measured from PP0 package (CPU, please see some references in http://web.eece.maine.edu/~vweaver/projects/rapl/rapl_support.html ), and essentially it says how much energy CPU spent on that benchmark. Memory and disk usage do not affect the energy metrics directly though. I guess simdjson has less energy consumption because they use SSE instructions in slightly different way than other benchmarks with the same speed. CPU internal are quite tricky, and unfortunately, I couldn't find any proper references, however, I could refer to this one (https://stackoverflow.com/a/19723106):

A side-by-side comparison of SSE2 (2-way SIMD) and AVX (4-way SIMD) did in fact show that AVX had a noticably higher power consumption and higher processor temperatures.

@jkeiser As for the using of memory mapping to parse JSON from files (I guess that's what you've meant by "streaming parsers") - that's out of scope of these benchmarks. Not all languages supports memory mapping, and file operations speed could vary a lot too, therefore it won't be fair to include those operations as a part of the benchmark. @lemire is absolutely right, we don't start benchmarking until we have the file loaded into the memory.

@jkeiser
Copy link

jkeiser commented Oct 16, 2020

The energy is measured from PP0 package (CPU, please see some references in http://web.eece.maine.edu/~vweaver/projects/rapl/rapl_support.html ), and essentially it says how much energy CPU spent on that benchmark. Memory and disk usage do not affect the energy metrics directly though. I guess simdjson has less energy consumption because they use SSE instructions in slightly different way than other benchmarks with the same speed. CPU internal are quite tricky, and unfortunately, I couldn't find any proper references, however, I could refer to this one (https://stackoverflow.com/a/19723106):

A side-by-side comparison of SSE2 (2-way SIMD) and AVX (4-way SIMD) did in fact show that AVX had a noticably higher power consumption and higher processor temperatures.

@nuald Thanks for the reference! Per the SIMD instructions, I'm not sure that's the thing affecting it: simdjson should be using the Haswell implementation with explicit AVX2 instructions, so I'd expect it to use more power than others if that was the main factor.

Now I'm wondering if the power reduction is related to fewer and cheaper memory accesses. simdjson is particularly optimized to avoid branch misprediction, meaning it takes fewer mispredicted paths, which may do "wasted" memory accesses. simdjson also works pretty hard to keep cache locality up (everything is stored or written sequentially as you go through the parse).

I'm mostly just curious at this point; obviously I'm pleased as well, since it fits well with my hopes and dreams of reducing global warming through JSON parsing :)

@jkeiser As for the using of memory mapping to parse JSON from files (I guess that's what you've meant by "streaming parsers") - that's out of scope of these benchmarks. Not all languages supports memory mapping, and file operations speed could vary a lot too, therefore it won't be fair to include those operations as a part of the benchmark. @lemire is absolutely right, we don't start benchmarking until we have the file loaded into the memory.

I'm actually talking more about I/O stream interfaces, rather than memory mapping. Almost all languages support a "stream" interface that lets you read a limited amount of data at a time from an input source (see Ruby's read(len). Parsers in those languages generally take such a stream as input, like JSON.parse(IO.open("file.txt")). Internally, these parsers might read the whole file into memory, or they might read it piece by piece, using a loop:

while buf = input.read(1024)
  # Parse the next 1024 bytes, throw away buf, and move to the next 1024 bytes ...

(I'm very, very sure I just told you stuff you already know, so please excuse my overexplaining :) The first sentence was probably enough.)

I know disks have variable loading speeds, and that sucks for a benchmark. What about having a benchmark "harness" program that loads the file and then pipes it via STDIN to the implementation? Could even be a shell script. That way the file load cost is not included in the benchmark time, but the implementation's strategy for reading JSON from a stream would still be measured.
(Many languages could use a string stream, but in some languages that still won't count the cost--it'll just pass a pointer to a part of the string. Using STDIN guarantees you have to create at least a buffer somewhere to read into.)

If you'd prefer, I can file an issue with this stuff in it so discussion can continue there :) I know this is a lot of talk for a closed issue.

@nuald
Copy link
Collaborator

nuald commented Oct 16, 2020

@jkeiser Yes, please open a ticket. I think the idea is worth considering, but I'm not sure about practical applications. In general, end users would like to read the whole JSON before proceeding, and while streaming support can reduce the memory footprint, it shouldn't affect speed that much (and could be even slower).

@jkeiser
Copy link

jkeiser commented Oct 16, 2020

@nuald Will do! The reason I care is that a low, or constant, memory profile is critical for servers, which is one of the main places we read JSON. Lower memory allows for more concurrent workers, and constant memory leads to longer uptime (because you eliminate one of the major sources of crashes/restarts, the "slow leak" that kills your server after 2-4 weeks).

I'd like to be able to use benchmarks like this to select what implementation I want to use, and I need information like that in order to decide :)

@beached
Copy link
Contributor

beached commented Oct 16, 2020

I think as far as memory goes, you can subtract the size of the data file from the numbers and get a good measure of what each library uses.

nuald pushed a commit to nuald/benchmarks that referenced this pull request Oct 31, 2020
* Updating simdjson to 0.6 API (on demand).

* Making things safer by specifying the commit + removing unnecessary std:: qualifiers.

* Minor changes following review
nuald added a commit that referenced this pull request Dec 3, 2020
* Re-run bf mandel tests (addressing #258).

* Maintenance update (#261)

* Re-run bf mandel tests (addressing #258).
* Preparation for Perl 7.
* Lua module conflicts have been resolved.

* Update README.md

Fixed UPDATE date.

* Maintenance update (#265)

* Added workaround for Intel JCC bug (#263).

* Maintenance update.

* Added JCC bugfix flag to clang. (#266)

* Vala numbers have been updated. (#268)

* Update test_dawjsonlink.cpp (#271)

Updated code to reflect API changes in v2 of DAW JSON Link

* Maintenance update with the renamed tests (#269). (#270)

* Switch the simdjson code to On Demand (#272)

* Updating simdjson to 0.6 API (on demand).

* Making things safer by specifying the commit + removing unnecessary std:: qualifiers.

* Minor changes following review

* Maintenance update and simdjson test changes (#273)

* Maintenance update

* Maintenance update and simdjson test changes.

* The JSON key ordering tests have been added. (#275)

* Boost.JSON test has been added and the old test has been renamed. (#277)

* Updated to use better numbers for floating point ops, changed Boost.JSON test to use doubles. (#279)

* Improved precisions of the tests (#281)

* Median cals have been added.

* Try another formatting.

* Changed memory calculations.

* Printing has been moved out from JSON benchmarks.

* Up-to-dated results.

* TOC updated.

* Typo

* Added printer to Scheme langs.

* Racket code has been fixed to use idiomatic struct instead of macros.

* C++ and Kotlin tests have been updated.

* D tests have been fixed to use proper initializations.

* Go and Elixir tests have improved.

* Nim microoptimization have been removed.

* Julia test has been updated.

* All haskell tests have been updated.

* Lua tests have been improved.

* OCaml tests have been improved.

* SML tests have been improved.

* Fsharp test has been improved.

* Tcl tests have been improved.

* Perl tests have been updated to use v5.32.

* Other tests have been fixed.

* Haskell networking has been fixed.

* Fixed JS code.

Co-authored-by: Darrell Wright <beached@users.noreply.github.com>
Co-authored-by: Daniel Lemire <lemire@gmail.com>
nuald added a commit to nuald/benchmarks that referenced this pull request Dec 4, 2020
* Re-run bf mandel tests (addressing kostya#258).

* Maintenance update (kostya#261)

* Re-run bf mandel tests (addressing kostya#258).
* Preparation for Perl 7.
* Lua module conflicts have been resolved.

* Update README.md

Fixed UPDATE date.

* Maintenance update (kostya#265)

* Added workaround for Intel JCC bug (kostya#263).

* Maintenance update.

* Added JCC bugfix flag to clang. (kostya#266)

* Vala numbers have been updated. (kostya#268)

* Update test_dawjsonlink.cpp (kostya#271)

Updated code to reflect API changes in v2 of DAW JSON Link

* Maintenance update with the renamed tests (kostya#269). (kostya#270)

* Switch the simdjson code to On Demand (kostya#272)

* Updating simdjson to 0.6 API (on demand).

* Making things safer by specifying the commit + removing unnecessary std:: qualifiers.

* Minor changes following review

* Maintenance update and simdjson test changes (kostya#273)

* Maintenance update

* Maintenance update and simdjson test changes.

* The JSON key ordering tests have been added. (kostya#275)

* Boost.JSON test has been added and the old test has been renamed. (kostya#277)

* Updated to use better numbers for floating point ops, changed Boost.JSON test to use doubles. (kostya#279)

* Improved precisions of the tests (kostya#281)

* Median cals have been added.

* Try another formatting.

* Changed memory calculations.

* Printing has been moved out from JSON benchmarks.

* Up-to-dated results.

* TOC updated.

* Typo

* Added printer to Scheme langs.

* Racket code has been fixed to use idiomatic struct instead of macros.

* C++ and Kotlin tests have been updated.

* D tests have been fixed to use proper initializations.

* Go and Elixir tests have improved.

* Nim microoptimization have been removed.

* Julia test has been updated.

* All haskell tests have been updated.

* Lua tests have been improved.

* OCaml tests have been improved.

* SML tests have been improved.

* Fsharp test has been improved.

* Tcl tests have been improved.

* Perl tests have been updated to use v5.32.

* Other tests have been fixed.

* Haskell networking has been fixed.

* Fixed JS code.

Co-authored-by: Darrell Wright <beached@users.noreply.github.com>
Co-authored-by: Daniel Lemire <lemire@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants