Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protocol Version 5: JSON Client Side Performance #5150

Open
bchavez opened this issue Nov 25, 2015 · 4 comments
Open

Protocol Version 5: JSON Client Side Performance #5150

bchavez opened this issue Nov 25, 2015 · 4 comments

Comments

@bchavez
Copy link
Contributor

bchavez commented Nov 25, 2015

So, I was thinking a lot about the performance of my new C# driver https://github.com/bchavez/RethinkDb.Driver (and this probably applies to the Java driver too).

The V4 protocol requires clients to buffer the JSON query as a string in memory to determine its size, [TOKEN][LENGTH][QUERY].

This can be bad under heavy loads since strings are allocated on a heap and can cause GC memory pressure to build up fast. A better approach for C# (and possibly Java) is to write the query directly to the network stream. IE: [TOKEN]{QUERY}|postamble|. This would eliminate a lot of memory pressure issues on the client side.

Security wise, I think it's still the same... In V4, if someone wanted to fill the RethinkDB connection buffer with MAX size, they just specify LENGTH up front.

@mlucy mlucy added this to the subsequent milestone Nov 25, 2015
@mlucy
Copy link
Member

mlucy commented Nov 25, 2015

Thanks for the suggestion! Tagging as a ReQL_proposal and putting into subsequent since it's an API change.

@danielmewes
Copy link
Member

That's very reasonable, since the JSON-encoded query has a well defined closed format anyway. We wouldn't even need the postamble.

We also already have code in the server to perform efficient buffered network reads for cases where we don't know how much to read in advance. I think we could re-use this to read the data from the socket while parsing it into a JSON tree.

@asakatida
Copy link
Contributor

Could this be enabled in version 5 per query by sending a magic value for length? Or could a new set of flags in the handshake negotiate format on a per connection basis?

I was thinking about this from a receiving side if it gets implemented for query responses. The current format has the nice property that I can decode the buffer asynchronously and get the next token without waiting.

Also my understanding of the JSON standard is that there could be white space before or after the query representation. That would cause problems if a poorly written encoder gets used and whitespace gets treated as the next token.

@danielmewes
Copy link
Member

Could this be enabled in version 5 per query by sending a magic value for length? Or could a new set of flags in the handshake negotiate format on a per connection basis?

That's a good option I think.

I'm not really sure if we even want to support the new length-less format for responses from the server. It might be hard to support efficiently in some drivers, but on the other hand it would make the protocol more symmetric.
On the server, it could save some memory, and could potentially lead to a small reduction in latency as we can start sending the query result over the socket while we're still serializing the JSON response. It's also worth noting that we currently parallelize JSON encoding on the server for large responses, and I'm not sure how that part would interact with streaming the response directly onto a (buffered) socket.

If we implement this direction, the new encoding could be turned on or off for the sending and receiving direction independently through the new handshake you propose.
We would like to enhance the protocol handshake anyway, in order to support future protocol extensions more easily. We might end up doing that as part of #4519 .

Also my understanding of the JSON standard is that there could be white space before or after the query representation. That would cause problems if a poorly written encoder gets used and whitespace gets treated as the next token.

That's a good point. I don't think this will usually happen, and we could explicitly forbid it in the protocol specification though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants