[ML] Upgrading RapidJSON #2106

droberts195 · 2021-11-03T19:16:47Z

RapidJSON hasn't had a new release for 5 years, but lots of
improvements have been committed to its master branch, so we
should pick up those improvements. (The commit we were
previously using wasn't a numbered release either, so it's
not like we're doing anything dodgier than we were before.)

Tencent/rapidjson@0d4517f
is the commit of RapidJSON that this PR incorporates into the
ml-cpp repo.

RapidJSON hasn't had a new release for 5 years, but lots of improvements have been committed to its master branch, so we should pick up those improvements. (The commit we were previously using wasn't a numbered release either, so it's not like we're doing anything dodgier than we were before.) Tencent/rapidjson@0d4517f is the commit of RapidJSON that this PR incorporates into the ml-cpp repo.

dimitris-athanasiou

LGTM

…time

Currently PyTorch inference commands are sent as a series of JSON documents back-to-back with no delimiter in between. This conflicts with a change to the way rapidjson::kParseStopWhenDoneFlag works in the newer version of RapidJSON that we want to upgrade to on the C++ side. The newer version requires either a character or end-of-file to follow a JSON document before it considers parsing it to be complete. Since we don't want to end the stream, we have to send another character. To minimise data transfer we could send the opening brace of the next document, not knowing what the next document will contain yet. But it's nicer to send a newline character and let the next document be sent in full when it's ready. Even with the old version of RapidJSON sending a newline after each command should not cause a problem, as trailing whitespace is acceptable in JSON. Relates elastic/ml-cpp#2106

In elastic#2106 it became apparent that the behaviour of RapidJSON's own IStreamWrapper class had changed between the 2017 version we used previously and the latest 2021 version. In the older version it was unbuffered and in the newer version it was buffered. An attempt was made to work around this in elastic#2106 by specifying a single character buffer, however, in debug builds this then fell foul of an assertion that the buffer size was at least 4 characters. This PR adds a separate unbuffered istream wrapper class that we can use with RapidJSON in cases where we don't want the wrapper to consume extra characters beyond the end of the document that's being parsed. Having a separate class should also reduce the inefficiency of taking one character at a time, as there's no need to call a stream method to refill the buffer on every read.

In #2106 it became apparent that the behaviour of RapidJSON's own IStreamWrapper class had changed between the 2017 version we used previously and the latest 2021 version. In the older version it was unbuffered and in the newer version it was buffered. An attempt was made to work around this in #2106 by specifying a single character buffer, however, in debug builds this then fell foul of an assertion that the buffer size was at least 4 characters. This PR adds a separate unbuffered istream wrapper class that we can use with RapidJSON in cases where we don't want the wrapper to consume extra characters beyond the end of the document that's being parsed. Having a separate class should also reduce the inefficiency of taking one character at a time, as there's no need to call a stream method to refill the buffer on every read.

droberts195 added >enhancement :ml v8.1.0 labels Nov 3, 2021

Windows fix

ae71e27

dimitris-athanasiou approved these changes Nov 4, 2021

View reviewed changes

droberts195 added 3 commits November 4, 2021 11:58

Too many hashes - doh!

a717158

Need RapidJSON's istream wrapper to still consume one character at a …

d59e9bd

…time

Remove redundant header

bd7159c

droberts195 mentioned this pull request Nov 5, 2021

[ML] Newline delimit PyTorch inference commands elastic/elasticsearch#80408

Merged

Need to use a single character buffer for PyTorch command processor

5247f73

droberts195 merged commit 8f46bcf into elastic:main Nov 5, 2021

droberts195 deleted the upgrade_rapidjson branch November 5, 2021 13:51

droberts195 mentioned this pull request Nov 8, 2021

[ML] Adding an unbuffered istream wrapper for RapidJSON #2118

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Upgrading RapidJSON #2106

[ML] Upgrading RapidJSON #2106

Uh oh!

droberts195 commented Nov 3, 2021

Uh oh!

dimitris-athanasiou left a comment

Uh oh!

Uh oh!

[ML] Upgrading RapidJSON #2106

[ML] Upgrading RapidJSON #2106

Uh oh!

Conversation

droberts195 commented Nov 3, 2021

Uh oh!

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!