Protocol Version 5: JSON Client Side Performance #5150

bchavez · 2015-11-25T21:38:55Z

So, I was thinking a lot about the performance of my new C# driver https://github.com/bchavez/RethinkDb.Driver (and this probably applies to the Java driver too).

The V4 protocol requires clients to buffer the JSON query as a string in memory to determine its size, [TOKEN][LENGTH][QUERY].

This can be bad under heavy loads since strings are allocated on a heap and can cause GC memory pressure to build up fast. A better approach for C# (and possibly Java) is to write the query directly to the network stream. IE: [TOKEN]{QUERY}|postamble|. This would eliminate a lot of memory pressure issues on the client side.

Security wise, I think it's still the same... In V4, if someone wanted to fill the RethinkDB connection buffer with MAX size, they just specify LENGTH up front.

The text was updated successfully, but these errors were encountered:

mlucy · 2015-11-25T23:33:04Z

Thanks for the suggestion! Tagging as a ReQL_proposal and putting into subsequent since it's an API change.

danielmewes · 2015-11-26T00:40:44Z

That's very reasonable, since the JSON-encoded query has a well defined closed format anyway. We wouldn't even need the postamble.

We also already have code in the server to perform efficient buffered network reads for cases where we don't know how much to read in advance. I think we could re-use this to read the data from the socket while parsing it into a JSON tree.

asakatida · 2015-12-03T01:19:12Z

Could this be enabled in version 5 per query by sending a magic value for length? Or could a new set of flags in the handshake negotiate format on a per connection basis?

I was thinking about this from a receiving side if it gets implemented for query responses. The current format has the nice property that I can decode the buffer asynchronously and get the next token without waiting.

Also my understanding of the JSON standard is that there could be white space before or after the query representation. That would cause problems if a poorly written encoder gets used and whitespace gets treated as the next token.

danielmewes · 2015-12-03T01:28:41Z

Could this be enabled in version 5 per query by sending a magic value for length? Or could a new set of flags in the handshake negotiate format on a per connection basis?

That's a good option I think.

I'm not really sure if we even want to support the new length-less format for responses from the server. It might be hard to support efficiently in some drivers, but on the other hand it would make the protocol more symmetric.
On the server, it could save some memory, and could potentially lead to a small reduction in latency as we can start sending the query result over the socket while we're still serializing the JSON response. It's also worth noting that we currently parallelize JSON encoding on the server for large responses, and I'm not sure how that part would interact with streaming the response directly onto a (buffered) socket.

If we implement this direction, the new encoding could be turned on or off for the sending and receiving direction independently through the new handshake you propose.
We would like to enhance the protocol handshake anyway, in order to support future protocol extensions more easily. We might end up doing that as part of #4519 .

Also my understanding of the JSON standard is that there could be white space before or after the query representation. That would cause problems if a poorly written encoder gets used and whitespace gets treated as the next token.

That's a good point. I don't think this will usually happen, and we could explicitly forbid it in the protocol specification though.

mlucy added the tp:ReQL_proposal label Nov 25, 2015

mlucy added this to the subsequent milestone Nov 25, 2015

danielmewes added the tp:performance label Nov 26, 2015

encryptio mentioned this issue Feb 19, 2016

Proposal: new client protocol #5406

Closed

danielmewes modified the milestones: subsequent, backlog Apr 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protocol Version 5: JSON Client Side Performance #5150

Protocol Version 5: JSON Client Side Performance #5150

bchavez commented Nov 25, 2015

mlucy commented Nov 25, 2015

danielmewes commented Nov 26, 2015

asakatida commented Dec 3, 2015

danielmewes commented Dec 3, 2015

Protocol Version 5: JSON Client Side Performance #5150

Protocol Version 5: JSON Client Side Performance #5150

Comments

bchavez commented Nov 25, 2015

mlucy commented Nov 25, 2015

danielmewes commented Nov 26, 2015

asakatida commented Dec 3, 2015

danielmewes commented Dec 3, 2015