Skip to content

Conversation

droberts195
Copy link
Contributor

RapidJSON hasn't had a new release for 5 years, but lots of
improvements have been committed to its master branch, so we
should pick up those improvements. (The commit we were
previously using wasn't a numbered release either, so it's
not like we're doing anything dodgier than we were before.)

Tencent/rapidjson@0d4517f
is the commit of RapidJSON that this PR incorporates into the
ml-cpp repo.

RapidJSON hasn't had a new release for 5 years, but lots of
improvements have been committed to its master branch, so we
should pick up those improvements. (The commit we were
previously using wasn't a numbered release either, so it's
not like we're doing anything dodgier than we were before.)

Tencent/rapidjson@0d4517f
is the commit of RapidJSON that this PR incorporates into the
ml-cpp repo.
Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Nov 5, 2021
Currently PyTorch inference commands are sent as a series
of JSON documents back-to-back with no delimiter in between.
This conflicts with a change to the way
rapidjson::kParseStopWhenDoneFlag works in the newer version
of RapidJSON that we want to upgrade to on the C++ side.
The newer version requires either a character or end-of-file
to follow a JSON document before it considers parsing it to
be complete. Since we don't want to end the stream, we have
to send another character. To minimise data transfer we could
send the opening brace of the next document, not knowing what
the next document will contain yet. But it's nicer to send a
newline character and let the next document be sent in full
when it's ready. Even with the old version of RapidJSON
sending a newline after each command should not cause a
problem, as trailing whitespace is acceptable in JSON.

Relates elastic/ml-cpp#2106
droberts195 added a commit to elastic/elasticsearch that referenced this pull request Nov 5, 2021
Currently PyTorch inference commands are sent as a series
of JSON documents back-to-back with no delimiter in between.
This conflicts with a change to the way
rapidjson::kParseStopWhenDoneFlag works in the newer version
of RapidJSON that we want to upgrade to on the C++ side.
The newer version requires either a character or end-of-file
to follow a JSON document before it considers parsing it to
be complete. Since we don't want to end the stream, we have
to send another character. To minimise data transfer we could
send the opening brace of the next document, not knowing what
the next document will contain yet. But it's nicer to send a
newline character and let the next document be sent in full
when it's ready. Even with the old version of RapidJSON
sending a newline after each command should not cause a
problem, as trailing whitespace is acceptable in JSON.

Relates elastic/ml-cpp#2106
@droberts195 droberts195 merged commit 8f46bcf into elastic:main Nov 5, 2021
@droberts195 droberts195 deleted the upgrade_rapidjson branch November 5, 2021 13:51
droberts195 added a commit to droberts195/ml-cpp that referenced this pull request Nov 8, 2021
In elastic#2106 it became apparent that the behaviour of RapidJSON's
own IStreamWrapper class had changed between the 2017 version
we used previously and the latest 2021 version. In the older
version it was unbuffered and in the newer version it was
buffered.

An attempt was made to work around this in elastic#2106 by specifying
a single character buffer, however, in debug builds this then
fell foul of an assertion that the buffer size was at least 4
characters.

This PR adds a separate unbuffered istream wrapper class that
we can use with RapidJSON in cases where we don't want the
wrapper to consume extra characters beyond the end of the
document that's being parsed. Having a separate class should
also reduce the inefficiency of taking one character at a
time, as there's no need to call a stream method to refill
the buffer on every read.
droberts195 added a commit that referenced this pull request Nov 8, 2021
In #2106 it became apparent that the behaviour of RapidJSON's
own IStreamWrapper class had changed between the 2017 version
we used previously and the latest 2021 version. In the older
version it was unbuffered and in the newer version it was
buffered.

An attempt was made to work around this in #2106 by specifying
a single character buffer, however, in debug builds this then
fell foul of an assertion that the buffer size was at least 4
characters.

This PR adds a separate unbuffered istream wrapper class that
we can use with RapidJSON in cases where we don't want the
wrapper to consume extra characters beyond the end of the
document that's being parsed. Having a separate class should
also reduce the inefficiency of taking one character at a
time, as there's no need to call a stream method to refill
the buffer on every read.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants