Faster and more correct serialization #1168

lemire · 2020-09-12T00:34:25Z

Currently, in simdjson, we have the recently introduced std::minify which can convert a JSON element back to a minified strings. Internally, it works with C++ streams. There are three problems with C++ streams...

It is slow. C++ streams are surprisingly slow.
It is locale-sensitive.
It is lossy. Outputting a float to a stream does not at all guarantee that it can be later reparsed exactly.

This PR fixes these issues with simdjson::minify. It adds a new equivalent function to_string, as it is a standard way to convert objects to strings in C++11. I would also add to_chars to be C++17-ish, but I stopped short of it because C++17 is not our primary concern.

This PR is backward compatible: there should be no API-breaking changes. However, the serialized output will differ. But that is fine. The new serialized output should be more precise and be locale independent.

It almost identical code from the previous code, I just copied @jkeiser 's code in a stream-free manner. The result is a 10x performance boost (see numbers below). This is still very far from having optimal speed but it is probably good enough to eliminate serialization as a bottleneck.

As stated elsewhere, this should be in 0.6 as I think it is of some importance.

$ ./benchmark/bench_dom_api --benchmark_filter=serialize
2020-09-12 00:22:12
Running ./benchmark/bench_dom_api
Run on (4 X 3100 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 8192 KiB (x4)
Load Average: 0.24, 0.08, 0.05
-------------------------------------------------------------------------------------------------------------
Benchmark                                                   Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------
serialize_twitter/repeats:10_mean                     9945291 ns      9903803 ns           10 Gigabytes=47.1487M/s docs=100.981/s
serialize_twitter/repeats:10_median                   9910065 ns      9859544 ns           10 Gigabytes=47.3558M/s docs=101.425/s
serialize_twitter/repeats:10_stddev                    104686 ns       104290 ns           10 Gigabytes=486.462k/s docs=1.04188/s
serialize_twitter/repeats:10_max                     10212314 ns     10180782 ns           10 Gigabytes=47.5373M/s docs=101.813/s
serialize_twitter_to_string/repeats:10_mean            956152 ns       953911 ns           10 Gigabytes=490.768M/s docs=1048.33/s
serialize_twitter_to_string/repeats:10_median          956150 ns       953941 ns           10 Gigabytes=490.749M/s docs=1048.28/s
serialize_twitter_to_string/repeats:10_stddev            3272 ns         3041 ns           10 Gigabytes=1.56657M/s docs=3.34634/s
serialize_twitter_to_string/repeats:10_max             961239 ns       958123 ns           10 Gigabytes=493.872M/s docs=1054.96/s
serialize_twitter_string_builder/repeats:10_mean       868574 ns       866894 ns           10 Gigabytes=540.028M/s docs=1.15355k/s
serialize_twitter_string_builder/repeats:10_median     869201 ns       866731 ns           10 Gigabytes=540.127M/s docs=1.15376k/s
serialize_twitter_string_builder/repeats:10_stddev       2921 ns         1887 ns           10 Gigabytes=1.17572M/s docs=2.51144/s
serialize_twitter_string_builder/repeats:10_max        872932 ns       869419 ns           10 Gigabytes=541.723M/s docs=1.15717k/s

There are other benefits to this approach as it opens the door to the addition of a prettifier.

Fixes #933

According to @plokhotnyuk The floating-point serialization could probably be even better, but this can be handled in the future.

plokhotnyuk · 2020-09-12T05:14:35Z

@lemire Have you considered using the amazing work of Raffaello Giulietti "The Schubfach way to render doubles" to serialize 64-bit floating point numbers into the shortest and the most precise text representation?

Here is the original implementation for Java from the authors.

Here is C++ implementation of the algorithm by Alexander Bolz.

Here is an adoption of the Schubfach way in one of the fastest JSON serializer for Scala.

BTW, there is the Dragonbox algorithm with C++ implementation, that is based on the Schubfach algorithm.

lemire · 2020-09-12T13:57:29Z

@plokhotnyuk Do you have a link to a C/C++ along with benchmarks?

At this point in time, I am not even trying to achieve best performance, let alone optimal. I am moving us off the C++ streams. Period.

If there is ready and tested code that I could adopt, please share... but it needs to be mature code.

jkeiser · 2020-09-12T16:57:33Z

I am still flabbergasted that C++ streams are this slow. I figured it'd be 50% worse, or something, and I could live with that at the time, but wow. Just wow.

lemire · 2020-09-12T17:47:26Z

I am still flabbergasted that C++ streams are this slow.

I hear that they are the root cause of the fires on the West Coast.

lemire · 2020-09-12T21:03:27Z

@plokhotnyuk Useful pointers. Thanks.

lemire · 2020-09-12T21:26:31Z

@plokhotnyuk I have looked at the C++ implementations and none of them look mature enough. It may well be great work, but I don't trust the look of the code, and I do not want to spend the time at this point to examine it.

lemire · 2020-09-13T01:37:52Z

@jkeiser This is ready for review.This modifies your code, so please review.

It can be improved. At this point, it is enough that it is correct, and much faster.

benchmark/bench_dom_api.cpp

include/simdjson/common_defs.h

include/simdjson/dom/serialization-inl.h

jkeiser

OK, this is just a thought. Don't take it as a request for change, just mull it over.

If we were to rename mini_formatter -> json_formatter, and expose it, we could support string, cout, and FILE * in one interface:

template<typename T=string>
struct json_formatter {
  T out;
private:
  void write(const char *buf, size_t len);
};

// Might need template<> in front of each of these
void json_formatter<string>::write(const char *buf, size_t len) { out.append(buf, len); }
void json_formatter<ostream &>::write(const char *buf, size_t len) { out << string_view(buf, len); }
void json_formatter<FILE *>::write(const char *buf, size_t len) { fputs(out, buf, len); }

template<typename T>
inline ostream &operator<<(ostream &out, T value) { json_formatter(out).write(value); }

inline json_formatter &operator<<(json_formatter &out, dom::element value) { ... }
inline json_formatter &operator<<(json_formatter &out, dom::array value) { ... }
inline json_formatter &operator<<(json_formatter &out, dom::object value) { ... }
inline json_formatter &operator<<(json_formatter &out, dom::document value) { ... }
inline json_formatter &operator<<(json_formatter &out, error_code value) { ... }
template<typename T>
inline json_formatter &operator<<(json_formatter &out, simdjson_result<T> value) {
  T actual;
  error_code error = value.get(actual);
  if (error) { out << error; }
  else { out << actual; }
  return *this;
}

This gives you the option to keep the formatter around between invocations, which I think is one of
the reasons you wrote string_builder :)

Usage:

// string
json_formatter formatter;
formatter << doc;
// formatter.out has the string

// ostream &
cout << doc;

// FILE *
json_formatter(stdout) << doc;

include/simdjson/dom/serialization.h

jkeiser

A few suggestions, but nothing that screams "need fix." It's good as-is, backwards-compatible and brings substantial benefit. The only drawback is it has to malloc while printing and for large files will be waiting on cache misses a lot, which the old one theoretically was not; but I don't know how important this is.

If we want to fix that, we can fix it later. This is way better than what we had!

lemire · 2020-09-14T14:48:17Z

I'll take your concerns point by point and try to address them. It is important to get these things right.

lemire · 2020-09-14T16:59:52Z

OK, this is just a thought. Don't take it as a request for change, just mull it over.

This might be totally fine but I don't want to mix API design with this PR. I am mostly preoccupied with getting the fundamentals right. So what I'll do is hide away everything in "internal" and make it private, so that we have the freedom of tweaking the API later.

are not part of our public API and are subject to change at any time!

lemire · 2020-09-14T17:57:37Z

@jkeiser My latest commit hides away string_buffer and mini_formatter in the internal namespace and I have put clear warning that it is not part of our public API.

So we can take a second pass and expose whatever we like later.

lemire · 2020-09-14T20:11:36Z

This gives you the option to keep the formatter around between invocations, which I think is one of
the reasons you wrote string_builder :)

My thinking was as follows. Suppose that you have an application where you routine grab a subset of a JSON document and you pass it along. The way we do things currently, we would constantly be allocating memory to hold the string content even if all of your strings are about the same size. In such a scenario, you'd want to keep reusing your buffer. But my thinking is not very relevant since I don't have a real use case for it. So let us hide all this where the users won't see it. This way, we can change it later.

jkeiser · 2020-09-14T20:13:54Z

This gives you the option to keep the formatter around between invocations, which I think is one of
the reasons you wrote string_builder :)

My thinking was as follows. Suppose that you have an application where you routine grab a subset of a JSON document and you pass it along. The way we do things currently, we would constantly be allocating memory to hold the string content even if all of your strings are about the same size. In such a scenario, you'd want to keep reusing your buffer. But my thinking is not very relevant since I don't have a real use case for it. So let us hide all this where the users won't see it. This way, we can change it later.

It's a good thought, and I agree anytime we have allocation, we should let people reuse it.

jkeiser · 2020-09-14T20:22:03Z

Looks good, thanks :)

lemire · 2020-09-23T14:00:32Z

Merging.

lemire added 4 commits September 11, 2020 17:08

Adding new files.

b233b5d

Better.

1ea8854

Fixing minifier and adding tests.

07f067b

Adding benchmarks.

cd388be

lemire added this to the 0.6 milestone Sep 12, 2020

lemire mentioned this pull request Sep 12, 2020

Removing the dependency on C++ stream in the core library #1165

Closed

lemire added 3 commits September 12, 2020 14:02

Including the array header.

1c1fb42

Replacing old stream-based code by the new code.

1b901dd

Merge branch 'master' into dlemire/new_formatters

292d6de

Doubling up the itoa.

07a7c85

lemire marked this pull request as ready for review September 13, 2020 01:36

lemire requested a review from jkeiser September 13, 2020 01:36