-
Notifications
You must be signed in to change notification settings - Fork 66
[ML] Upgrading RapidJSON #2106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
[ML] Upgrading RapidJSON #2106
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
RapidJSON hasn't had a new release for 5 years, but lots of improvements have been committed to its master branch, so we should pick up those improvements. (The commit we were previously using wasn't a numbered release either, so it's not like we're doing anything dodgier than we were before.) Tencent/rapidjson@0d4517f is the commit of RapidJSON that this PR incorporates into the ml-cpp repo.
dimitris-athanasiou
approved these changes
Nov 4, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
droberts195
added a commit
to droberts195/elasticsearch
that referenced
this pull request
Nov 5, 2021
Currently PyTorch inference commands are sent as a series of JSON documents back-to-back with no delimiter in between. This conflicts with a change to the way rapidjson::kParseStopWhenDoneFlag works in the newer version of RapidJSON that we want to upgrade to on the C++ side. The newer version requires either a character or end-of-file to follow a JSON document before it considers parsing it to be complete. Since we don't want to end the stream, we have to send another character. To minimise data transfer we could send the opening brace of the next document, not knowing what the next document will contain yet. But it's nicer to send a newline character and let the next document be sent in full when it's ready. Even with the old version of RapidJSON sending a newline after each command should not cause a problem, as trailing whitespace is acceptable in JSON. Relates elastic/ml-cpp#2106
droberts195
added a commit
to elastic/elasticsearch
that referenced
this pull request
Nov 5, 2021
Currently PyTorch inference commands are sent as a series of JSON documents back-to-back with no delimiter in between. This conflicts with a change to the way rapidjson::kParseStopWhenDoneFlag works in the newer version of RapidJSON that we want to upgrade to on the C++ side. The newer version requires either a character or end-of-file to follow a JSON document before it considers parsing it to be complete. Since we don't want to end the stream, we have to send another character. To minimise data transfer we could send the opening brace of the next document, not knowing what the next document will contain yet. But it's nicer to send a newline character and let the next document be sent in full when it's ready. Even with the old version of RapidJSON sending a newline after each command should not cause a problem, as trailing whitespace is acceptable in JSON. Relates elastic/ml-cpp#2106
droberts195
added a commit
to droberts195/ml-cpp
that referenced
this pull request
Nov 8, 2021
In elastic#2106 it became apparent that the behaviour of RapidJSON's own IStreamWrapper class had changed between the 2017 version we used previously and the latest 2021 version. In the older version it was unbuffered and in the newer version it was buffered. An attempt was made to work around this in elastic#2106 by specifying a single character buffer, however, in debug builds this then fell foul of an assertion that the buffer size was at least 4 characters. This PR adds a separate unbuffered istream wrapper class that we can use with RapidJSON in cases where we don't want the wrapper to consume extra characters beyond the end of the document that's being parsed. Having a separate class should also reduce the inefficiency of taking one character at a time, as there's no need to call a stream method to refill the buffer on every read.
droberts195
added a commit
that referenced
this pull request
Nov 8, 2021
In #2106 it became apparent that the behaviour of RapidJSON's own IStreamWrapper class had changed between the 2017 version we used previously and the latest 2021 version. In the older version it was unbuffered and in the newer version it was buffered. An attempt was made to work around this in #2106 by specifying a single character buffer, however, in debug builds this then fell foul of an assertion that the buffer size was at least 4 characters. This PR adds a separate unbuffered istream wrapper class that we can use with RapidJSON in cases where we don't want the wrapper to consume extra characters beyond the end of the document that's being parsed. Having a separate class should also reduce the inefficiency of taking one character at a time, as there's no need to call a stream method to refill the buffer on every read.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
RapidJSON hasn't had a new release for 5 years, but lots of
improvements have been committed to its master branch, so we
should pick up those improvements. (The commit we were
previously using wasn't a numbered release either, so it's
not like we're doing anything dodgier than we were before.)
Tencent/rapidjson@0d4517f
is the commit of RapidJSON that this PR incorporates into the
ml-cpp repo.