Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster and more correct serialization #1168

Merged
merged 19 commits into from
Sep 23, 2020
Merged

Faster and more correct serialization #1168

merged 19 commits into from
Sep 23, 2020

Conversation

lemire
Copy link
Member

@lemire lemire commented Sep 12, 2020

Currently, in simdjson, we have the recently introduced std::minify which can convert a JSON element back to a minified strings. Internally, it works with C++ streams. There are three problems with C++ streams...

  1. It is slow. C++ streams are surprisingly slow.
  2. It is locale-sensitive.
  3. It is lossy. Outputting a float to a stream does not at all guarantee that it can be later reparsed exactly.

This PR fixes these issues with simdjson::minify. It adds a new equivalent function to_string, as it is a standard way to convert objects to strings in C++11. I would also add to_chars to be C++17-ish, but I stopped short of it because C++17 is not our primary concern.

This PR is backward compatible: there should be no API-breaking changes. However, the serialized output will differ. But that is fine. The new serialized output should be more precise and be locale independent.

It almost identical code from the previous code, I just copied @jkeiser 's code in a stream-free manner. The result is a 10x performance boost (see numbers below). This is still very far from having optimal speed but it is probably good enough to eliminate serialization as a bottleneck.

As stated elsewhere, this should be in 0.6 as I think it is of some importance.

$ ./benchmark/bench_dom_api --benchmark_filter=serialize
2020-09-12 00:22:12
Running ./benchmark/bench_dom_api
Run on (4 X 3100 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 8192 KiB (x4)
Load Average: 0.24, 0.08, 0.05
-------------------------------------------------------------------------------------------------------------
Benchmark                                                   Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------
serialize_twitter/repeats:10_mean                     9945291 ns      9903803 ns           10 Gigabytes=47.1487M/s docs=100.981/s
serialize_twitter/repeats:10_median                   9910065 ns      9859544 ns           10 Gigabytes=47.3558M/s docs=101.425/s
serialize_twitter/repeats:10_stddev                    104686 ns       104290 ns           10 Gigabytes=486.462k/s docs=1.04188/s
serialize_twitter/repeats:10_max                     10212314 ns     10180782 ns           10 Gigabytes=47.5373M/s docs=101.813/s
serialize_twitter_to_string/repeats:10_mean            956152 ns       953911 ns           10 Gigabytes=490.768M/s docs=1048.33/s
serialize_twitter_to_string/repeats:10_median          956150 ns       953941 ns           10 Gigabytes=490.749M/s docs=1048.28/s
serialize_twitter_to_string/repeats:10_stddev            3272 ns         3041 ns           10 Gigabytes=1.56657M/s docs=3.34634/s
serialize_twitter_to_string/repeats:10_max             961239 ns       958123 ns           10 Gigabytes=493.872M/s docs=1054.96/s
serialize_twitter_string_builder/repeats:10_mean       868574 ns       866894 ns           10 Gigabytes=540.028M/s docs=1.15355k/s
serialize_twitter_string_builder/repeats:10_median     869201 ns       866731 ns           10 Gigabytes=540.127M/s docs=1.15376k/s
serialize_twitter_string_builder/repeats:10_stddev       2921 ns         1887 ns           10 Gigabytes=1.17572M/s docs=2.51144/s
serialize_twitter_string_builder/repeats:10_max        872932 ns       869419 ns           10 Gigabytes=541.723M/s docs=1.15717k/s

There are other benefits to this approach as it opens the door to the addition of a prettifier.

Fixes #933

According to @plokhotnyuk The floating-point serialization could probably be even better, but this can be handled in the future.

@plokhotnyuk
Copy link

plokhotnyuk commented Sep 12, 2020

@lemire Have you considered using the amazing work of Raffaello Giulietti "The Schubfach way to render doubles" to serialize 64-bit floating point numbers into the shortest and the most precise text representation?

Here is the original implementation for Java from the authors.

Here is C++ implementation of the algorithm by Alexander Bolz.

Here is an adoption of the Schubfach way in one of the fastest JSON serializer for Scala.

BTW, there is the Dragonbox algorithm with C++ implementation, that is based on the Schubfach algorithm.

@lemire
Copy link
Member Author

lemire commented Sep 12, 2020

@plokhotnyuk Do you have a link to a C/C++ along with benchmarks?

At this point in time, I am not even trying to achieve best performance, let alone optimal. I am moving us off the C++ streams. Period.

If there is ready and tested code that I could adopt, please share... but it needs to be mature code.

@jkeiser
Copy link
Member

jkeiser commented Sep 12, 2020

I am still flabbergasted that C++ streams are this slow. I figured it'd be 50% worse, or something, and I could live with that at the time, but wow. Just wow.

@lemire
Copy link
Member Author

lemire commented Sep 12, 2020

I am still flabbergasted that C++ streams are this slow.

I hear that they are the root cause of the fires on the West Coast.

@lemire
Copy link
Member Author

lemire commented Sep 12, 2020

@plokhotnyuk Useful pointers. Thanks.

@lemire
Copy link
Member Author

lemire commented Sep 12, 2020

@plokhotnyuk I have looked at the C++ implementations and none of them look mature enough. It may well be great work, but I don't trust the look of the code, and I do not want to spend the time at this point to examine it.

@lemire lemire marked this pull request as ready for review September 13, 2020 01:36
@lemire lemire requested a review from jkeiser September 13, 2020 01:36
@lemire
Copy link
Member Author

lemire commented Sep 13, 2020

@jkeiser This is ready for review.This modifies your code, so please review.

It can be improved. At this point, it is enough that it is correct, and much faster.

Copy link
Member

@jkeiser jkeiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is just a thought. Don't take it as a request for change, just mull it over.

If we were to rename mini_formatter -> json_formatter, and expose it, we could support string, cout, and FILE * in one interface:

template<typename T=string>
struct json_formatter {
  T out;
private:
  void write(const char *buf, size_t len);
};

// Might need template<> in front of each of these
void json_formatter<string>::write(const char *buf, size_t len) { out.append(buf, len); }
void json_formatter<ostream &>::write(const char *buf, size_t len) { out << string_view(buf, len); }
void json_formatter<FILE *>::write(const char *buf, size_t len) { fputs(out, buf, len); }

template<typename T>
inline ostream &operator<<(ostream &out, T value) { json_formatter(out).write(value); }

inline json_formatter &operator<<(json_formatter &out, dom::element value) { ... }
inline json_formatter &operator<<(json_formatter &out, dom::array value) { ... }
inline json_formatter &operator<<(json_formatter &out, dom::object value) { ... }
inline json_formatter &operator<<(json_formatter &out, dom::document value) { ... }
inline json_formatter &operator<<(json_formatter &out, error_code value) { ... }
template<typename T>
inline json_formatter &operator<<(json_formatter &out, simdjson_result<T> value) {
  T actual;
  error_code error = value.get(actual);
  if (error) { out << error; }
  else { out << actual; }
  return *this;
}

This gives you the option to keep the formatter around between invocations, which I think is one of
the reasons you wrote string_builder :)

Usage:

// string
json_formatter formatter;
formatter << doc;
// formatter.out has the string

// ostream &
cout << doc;

// FILE *
json_formatter(stdout) << doc;

Copy link
Member

@jkeiser jkeiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions, but nothing that screams "need fix." It's good as-is, backwards-compatible and brings substantial benefit. The only drawback is it has to malloc while printing and for large files will be waiting on cache misses a lot, which the old one theoretically was not; but I don't know how important this is.

If we want to fix that, we can fix it later. This is way better than what we had!

@lemire
Copy link
Member Author

lemire commented Sep 14, 2020

I'll take your concerns point by point and try to address them. It is important to get these things right.

@lemire
Copy link
Member Author

lemire commented Sep 14, 2020

OK, this is just a thought. Don't take it as a request for change, just mull it over.

This might be totally fine but I don't want to mix API design with this PR. I am mostly preoccupied with getting the fundamentals right. So what I'll do is hide away everything in "internal" and make it private, so that we have the freedom of tweaking the API later.

@lemire
Copy link
Member Author

lemire commented Sep 14, 2020

@jkeiser My latest commit hides away string_buffer and mini_formatter in the internal namespace and I have put clear warning that it is not part of our public API.

So we can take a second pass and expose whatever we like later.

@lemire
Copy link
Member Author

lemire commented Sep 14, 2020

This gives you the option to keep the formatter around between invocations, which I think is one of
the reasons you wrote string_builder :)

My thinking was as follows. Suppose that you have an application where you routine grab a subset of a JSON document and you pass it along. The way we do things currently, we would constantly be allocating memory to hold the string content even if all of your strings are about the same size. In such a scenario, you'd want to keep reusing your buffer. But my thinking is not very relevant since I don't have a real use case for it. So let us hide all this where the users won't see it. This way, we can change it later.

@jkeiser
Copy link
Member

jkeiser commented Sep 14, 2020

This gives you the option to keep the formatter around between invocations, which I think is one of
the reasons you wrote string_builder :)

My thinking was as follows. Suppose that you have an application where you routine grab a subset of a JSON document and you pass it along. The way we do things currently, we would constantly be allocating memory to hold the string content even if all of your strings are about the same size. In such a scenario, you'd want to keep reusing your buffer. But my thinking is not very relevant since I don't have a real use case for it. So let us hide all this where the users won't see it. This way, we can change it later.

It's a good thought, and I agree anytime we have allocation, we should let people reuse it.

@jkeiser
Copy link
Member

jkeiser commented Sep 14, 2020

Looks good, thanks :)

@lemire
Copy link
Member Author

lemire commented Sep 23, 2020

Merging.

@lemire lemire merged commit 60c139a into master Sep 23, 2020
@lemire lemire deleted the dlemire/new_formatters branch September 23, 2020 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tune the string serialization for performance (do not use C++ streams)
3 participants