Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance enhancements (mainly) to json_object_to_json_string() #208

Conversation

rgerhards
Copy link
Contributor

This patch set optimized various aspects of generating the output json. Each patch is commented in depth. The performance is increased by a factor of ~3 for my use case.

Special consideration should go to 4969ffd which introduces a new method of out of memory handling, which replaces the inconsistent previous method. Again, commit comment has all details.

This function and its helpers used to call sprintbuf() for constant
strings. This requires a lot of performance, because the call ends
up in vprintf(), which is not necessary as there is no modification
of the string to be done.

In some other places of the code, printbuf_memappend() was used in
such cases. This is the right route to follow, because it greatly
decreases processing time.

This patch does this replacement and keeps using sprintbuf() only
where it really is required. In my testing, this resulted in a very
noticable performance improvement (n *times* faster, where n
depends on the details of what is formatted and with which options).
We avoid writing to "memory" just in order to check loop
termination condition. While this looks like it makes really
no difference, the profiler (valgrind's callgrind tool) tells it
actually does. Also, wallclock time taken by time is consistently
a bit lower than with the previous method. As it doesn't hurt
anything (in terms of more complex code or so), I think the
change is worth it.
Do not call memcpy for single-byte values, as writing them directly
involves fewer overhead and thus is faster. Verified effect with
valgrind's callgrind profiler and wallclock timing.
The regular printbuf calls always keep the string properly
terminated. However, this is not required during construction of
the printable string and thus simply unnecessary overhead. I have
added a new function which avoids writing until the result is
finally finished. Both profiler (valgrind's callgrind) and
wallclock timing show this is a clear improvement).
the previous method relied on return codes to convey failure,
however, most often the return code was not checked. This could
lead to several issues, among them malformedness of the generated
string. Also, it is very unlikely to really get into an OOM
condition and, if so, it is extremely likely that the rest of
the (calling) app will not behave correctly.

This patch takes a different approach: if we run out of memory,
we ignore data for which no memory could be allocated. In essence,
this can result in the same random malformedness as the previous
approach, but we do not try to signal back to the orignal caller.
Note that with the previous approach this also did not work
reliably. The benefit of this approach is that we gain some
performance, have simpler code and have a clearly defined failure
state.
String escaping is a surprisingly performance intense operation.
I spent many hours in the profiler, and the root problem seems
to be that there is no easy way to detect the character classes
that need to be escaped, where the root cause is that these
characters are spread all over the ascii table. I tried
several approaches, including call tables, re-structuring
the case condition, different types of if conditions and
reordering the if conditions. What worked out best is this:
The regular case is that a character must not be escaped. So
we want to process that as fast as possible. In order to
detect this as quickly as possible, we have a lookup table
that tells us if a char needs escaping ("needsEscape", below).
This table has a spot for each ascii code. Note that it uses
chars, because anything larger causes worse cache operation
and anything smaller requires bit indexing and masking
operations, which are also comparatively costly. So plain
chars work best. What we then do is a single lookup into the
table to detect if we need to escape a character. If we need,
we go into the depth of actual escape detection. But if we
do NOT need to escape, we just quickly advance the index
and are done with that char. Note that it may look like the
extra table lookup costs performance, but right the contrary
is the case. We get amore than 30% performance increase due
to it (compared to the latest version of the code that did not
do the lookups.
With the new approch we can simplify the code a bit. As a side
effect, we get a very slight further performance increase.
The default printbuf size of 32 bytes is very conservative and
can cause a lot of malloc() action for all but trivial json objects.
The new json_global_set_printbuf_initial_size() API permits an
application to set a new default size.
restored int return type for the *_to_json_string formatting
functions as otherwise we would need to break a public
interface. The performance impact is minimal, as the result is
never evaluated.
@rgerhards
Copy link
Contributor Author

@rgerhards rgerhards closed this Nov 18, 2015
@rgerhards rgerhards deleted the enh-perf-json_object_object_to_json_string branch November 18, 2015 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant