-
Notifications
You must be signed in to change notification settings - Fork 1.7k
JSON-RPC over IPC should envelope messages. #4647
Comments
It should be noted that this problem makes it much easier for attackers to write injection attacks since writing a good parser is so difficult. The current |
No, only specific whitespace characters are allowed between tokens (space, horizontal tab, line feed and carriage return) while in strings it has to be escaped as I like the idea of using Few details to take care of in the parser:
|
Cross posting from the Go issue:
|
Continuing the conversation here rather than at Go-Ethereum since that issue was closed. Some of the quotes here are from that, but I want to consolidate the conversation. cc @bas-vk so all participants are here
All current documentation I have found says that the communication with nodes is standard JSON-RPC which is well defined and specced out to be UTF-8. If this is not true, it should be well documented everywhere that communication with an Ethereum node is a subset of JSON-RPC and it should be documented what exactly the constraints are. It is a fatal flaw to assume that those that come after you will understand the constraints that you were operating under and someone may decide to not hex-encode a payload which is reasonable and sound (hex encoding is terrible for bandwidth). Also, already there are at least a couple messages that don't comply with this assertion including
As I brought up with them, Mist's dechunker has bugs, the same as the web3.js client. Their custom dechunker may work against geth, but it doesn't work against all standards compliant JSON serializers. They can fix these bugs one at a time as they are found/noticed and new clients come out, but without a standard/spec to follow authors of clients are going to end up coding to "something that works with Geth/Parity" rather than, "something that will work with future Ethereum nodes". Also, it is my understanding that they already ran into such a bug where they needed to handle newlines between messages but they weren't. This is the problem when you are writing your own tokenizer/parser, it is sooo easy to get it wrong.
Sure, once the bytes are on the wire there are guarantees by the transport layer that the bytes will arrive on the other side in order. However, a good API should be able to handle clients that have bugs and aren't written perfectly. It should be able to return useful error messages to the client when it does something wrong and it should be able to recover from errors cleanly.
This is my biggest contention with the arguments against this proposal. Just because something has a line item reference on a Wikipedia page doesn't mean it is a standard that one should follow. In my research, go-lang is the only thing that supports this "streaming JSON standard" you are talking about. I couldn't find anything for Node and it appears Parity (Rust) had to roll their own as well. I have worked with a lot of JSON over the years in a number of languages and I have never seen "streaming JSON" as a thing that sane people do before. I would be very surprised if something this hard to code against made it through a good RFC process. It is possible I missed something, but these results suggest that everyone except Go developers have to write a custom parser and because of all of the problems I mentioned above, writing such a thing is non-trivial. Parity has effectively had to write its own JSON tokenizer which will get its channel into a bad state if a malformed message shows up on the wire and web3.js and Mist both have bugs in their implementation because they are doing a simple regex to try to guess message boundaries. Here are some first-page Google results when searching for streaming JSON: My hope is that the connections to ethereum nodes can be based on well defined standards, rather than "do exactly what geth does, and assume they won't change, and go read go-lang source code to figure out exactly what the constraints are, and also understand some other high-level constraints that aren't documented anywhere." |
For reference:
|
The spec was defined by the JavaScript web3, and agreed by all three origin
implementations. It was not lead by Geth. So the fact that Geth can do it
properly but you God forbid need to write a parser shouldn't be laid at our
fault.
The hex encoding protocol was documented and standardized on our wiki page
before any implementation existed. Don't say someone coming later doesn't
have the infos because everything's there. The hex was again something that
JavaScript needed and not enforced by Geth. We just play by the standard
defined.
There are a number of streaming JavaScript parsers, look them up on npm or
Google. The fact that someone decided to roll their own opposed to use an
existing library is ludicrous.
…On Feb 23, 2017 22:21, "Micah Zoltu" ***@***.***> wrote:
For reference:
Parity de-concatenator: https://github.com/ethcore/jso
n-ipc-server/blob/master/src/validator.rs#L21-L64
web3.js de-concatenator: https://github.com/ethereum/we
b3.js/blob/develop/lib/web3/ipcprovider.js#L84-L89
Mist de-concatenator: https://github.com/ethereum/mi
st/blob/master/modules/ipc/dechunker.js#L24-L32
ethrpc de-concatenator (doesn't actually have one, trying to build one is
what lead to this issue): https://github.com/AugurProjec
t/ethrpc/blob/master/index.js#L367-L384
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/ethcore/parity/issues/4647#issuecomment-282109310>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAH6GelXRVlA2iOuu8CZw1bGJKQMQ2LOks5rfepAgaJpZM4MJkTR>
.
|
And saying that you couldn't find a streaming json npm module is a flat out
lie https://www.npmjs.com/search?q=streaming+json
…On Feb 23, 2017 22:33, "Péter Szilágyi" ***@***.***> wrote:
The spec was defined by the JavaScript web3, and agreed by all three
origin implementations. It was not lead by Geth. So the fact that Geth can
do it properly but you God forbid need to write a parser shouldn't be laid
at our fault.
The hex encoding protocol was documented and standardized on our wiki page
before any implementation existed. Don't say someone coming later doesn't
have the infos because everything's there. The hex was again something that
JavaScript needed and not enforced by Geth. We just play by the standard
defined.
There are a number of streaming JavaScript parsers, look them up on npm or
Google. The fact that someone decided to roll their own opposed to use an
existing library is ludicrous.
On Feb 23, 2017 22:21, "Micah Zoltu" ***@***.***> wrote:
> For reference:
> Parity de-concatenator: https://github.com/ethcore/jso
> n-ipc-server/blob/master/src/validator.rs#L21-L64
> web3.js de-concatenator: https://github.com/ethereum/we
> b3.js/blob/develop/lib/web3/ipcprovider.js#L84-L89
> Mist de-concatenator: https://github.com/ethereum/mi
> st/blob/master/modules/ipc/dechunker.js#L24-L32
> ethrpc de-concatenator (doesn't actually have one, trying to build one is
> what lead to this issue): https://github.com/AugurProjec
> t/ethrpc/blob/master/index.js#L367-L384
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <https://github.com/ethcore/parity/issues/4647#issuecomment-282109310>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAH6GelXRVlA2iOuu8CZw1bGJKQMQ2LOks5rfepAgaJpZM4MJkTR>
> .
>
|
… On Feb 23, 2017 22:35, "Péter Szilágyi" ***@***.***> wrote:
And saying that you couldn't find a streaming json npm module is a flat
out lie https://www.npmjs.com/search?q=streaming+json
On Feb 23, 2017 22:33, "Péter Szilágyi" ***@***.***> wrote:
> The spec was defined by the JavaScript web3, and agreed by all three
> origin implementations. It was not lead by Geth. So the fact that Geth can
> do it properly but you God forbid need to write a parser shouldn't be laid
> at our fault.
>
> The hex encoding protocol was documented and standardized on our wiki
> page before any implementation existed. Don't say someone coming later
> doesn't have the infos because everything's there. The hex was again
> something that JavaScript needed and not enforced by Geth. We just play by
> the standard defined.
>
> There are a number of streaming JavaScript parsers, look them up on npm
> or Google. The fact that someone decided to roll their own opposed to use
> an existing library is ludicrous.
>
>
> On Feb 23, 2017 22:21, "Micah Zoltu" ***@***.***> wrote:
>
>> For reference:
>> Parity de-concatenator: https://github.com/ethcore/jso
>> n-ipc-server/blob/master/src/validator.rs#L21-L64
>> web3.js de-concatenator: https://github.com/ethereum/we
>> b3.js/blob/develop/lib/web3/ipcprovider.js#L84-L89
>> Mist de-concatenator: https://github.com/ethereum/mi
>> st/blob/master/modules/ipc/dechunker.js#L24-L32
>> ethrpc de-concatenator (doesn't actually have one, trying to build one
>> is what lead to this issue): https://github.com/AugurProjec
>> t/ethrpc/blob/master/index.js#L367-L384
>>
>> —
>> You are receiving this because you commented.
>> Reply to this email directly, view it on GitHub
>> <https://github.com/ethcore/parity/issues/4647#issuecomment-282109310>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/AAH6GelXRVlA2iOuu8CZw1bGJKQMQ2LOks5rfepAgaJpZM4MJkTR>
>> .
>>
>
|
Going over the results of the NPM query you linked one at a time for the first full page: https://github.com/rust-lang-nursery/rustc-serialize appears to just be a JSON deserializer, it doesn't appear to do anything with regards to newline delimited or concatenated JSON streams. Perhaps I am overlooking something in the documentation? |
FWIW, I encountered these issues when I tried to write ethrpc's IPC implementation originally (about a year ago now) -- I was also unable to find any good JavaScript library solutions. I agree that IPC messages should be enveloped. Furthermore, I think that the appropriate time to make a change like this is now, while there are only a couple widely-used clients (geth and parity).
@karalabe Which module, specifically, deals with concatenated streaming JSON? I browsed those links and did not find one. Also, can we please avoid accusations of lies etc.? Presumably, the only interest anyone here has is in future-proofing Ethereum's IPC / making it work as well as possible. |
@karalabe I'm assuming you are referring to this documentation? https://github.com/ethereum/wiki/wiki/JSON-RPC#hex-value-encoding Unfortunately it starts with "at present", implying that in the future other serialized data types may be sent. This is the kind of problem I'm warning against in that future developers may decide that hex value encoding is sub-optimal (it largely is for string data) and decide to add a third datatype that is a UTF-8 encoded string. This would be a very reasonable course of action without knowledge of the injection attack vector. IMO, robust code should make it hard for future developers to accidentally create injection attacks, not rely on future developers being knowledgeable and aware of all attack vectors. If you are intentionally adding code or a segment of a spec to protect against attack, then you should be explicit about why the code/spec exists. At the least, the documentation should state that values are all hex encoded to protect against injection attacks, similar to how URLs are URL encoded. Also, if hex encoding is required of all values passed, then it should be done universally and special cases like Finally, right at the top of that page it clearly says:
A security conscious developer looking to write robust code that interops with Geth/Parity will read this and attempt to implement an RFC 4627 compliant parser. This will result in failure for all of the reasons above. If Ethereum wants to keep the constraints you have laid out, please at least document them in the Wiki at the top and make it clear that it isn't an RFC 4627 compliant data format but rather a subset format. If you like I can try to make these changes to the documentation, should this proposal fail for Geth and Parity (the two primary clients). By the way, is the whole "concatenated messages when sent over IPC" documented anywhere? If not I think it should be, if so can someone please point me at these docs? I figured out how to talk over IPC via trial and error and looking at source code because I couldn't find any documentation. In summary: Coding a client to-spec (rather than to-Geth or to-Parity) at the moment is incredibly hard. I recommend either fixing the spec to reflect reality, or fix reality to match the spec. |
Please read https://en.m.wikipedia.org/wiki/Recursive_descent_parser if you
are unfamiliar with how parsers work or how one can be implemented to
correctly detect the end of a json message from an infinite data stream.
UTF8 data streams are similarly not a problem, you just have to work with
runes and not bytes.
…On Feb 23, 2017 23:05, "Micah Zoltu" ***@***.***> wrote:
@karalabe <https://github.com/karalabe> I'm assuming you are referring to
this documentation? https://github.com/ethereum/
wiki/wiki/JSON-RPC#hex-value-encoding
Unfortunately it starts with "at present", implying that in the future
other serialized data types may be sent. This is the kind of problem I'm
warning against in that future developers may decide that hex value
encoding is sub-optimal (it largely is for string data) and decide to add a
third datatype that is a UTF-8 encoded string. This would be a *very
reasonable* course of action without knowledge of the injection attack
vector. IMO, robust code should make it hard for future developers to
accidentally create injection attacks, not rely on future developers being
knowledgeable and aware of all attack vectors. If you are intentionally
adding code or a segment of a spec to protect against attack, then you
should be explicit about why the code/spec exists. At the least, the
documentation should state that values are all hex encoded to protect
against injection attacks, similar to how URLs are URL encoded.
Also, if hex encoding is required of all values passed, then it should be
done universally and special cases like net_version, web3_clientVersion
and shh_version should be deprecated. This allows the security code
(encoding/decoding) to live in one place in the code base near the wire
rather than having to be re-implemented for every method producer/consumer.
This reduces the chance that future developers will break the pattern or
accidentally do it wrong. By having exceptions to the rule already in the
codebase it further increases the liklihood that future developers will not
follow it. They will see net_version and think, "apparently I can safely
ignore the encoding section, it must be out of date".
Finally, right at the top of that page it clearly says:
It uses JSON (RFC 4627) as a data format
A security conscious developer looking to write robust code that interops
with Geth/Parity will read this and attempt to implement an RFC 4627
compliant parser. This will result in failure for all of the reasons above.
If Ethereum wants to keep the constraints you have laid out, please at
least document them in the Wiki at the top and make it clear that it isn't
an RFC 4627 compliant data format but rather a subset format. If you like I
can try to make these changes to the documentation, should this proposal
fail for Geth and Parity (the two primary clients).
By the way, is the whole "concatenated messages when sent over IPC"
documented anywhere? If not I think it should be, if so can someone please
point me at these docs? I figured out how to talk over IPC via trial and
error and looking at source code because I couldn't find any documentation.
------------------------------
In summary: Coding a client to-spec (rather than to-Geth or to-Parity) at
the moment is incredibly hard. I recommend either fixing the spec to
reflect reality, or fix reality to match the spec.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/ethcore/parity/issues/4647#issuecomment-282120852>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAH6GVhHxvS-_scz0FwxLWLF2QUpuBrGks5rffSggaJpZM4MJkTR>
.
|
@karalabe AFAIK everyone who has attempted to implement a client for geth's IPC agrees that some form of enveloping is needed. (@frozeman / Parity crew, please chime in if this isn't accurate.) In other words: you are building a product for developers, all of whom are telling you that a feature in your product is broken. Your response cannot seriously be to say, "it's working, you all just don't know how parsers work". @obscuren: does @karalabe represent the geth development team's view on this? If so, I guess this is a lost cause; we'll drop support for IPC and just use websockets. |
Sorry if I mis-represented my concern here, the problem isn't that it is not possible to write a JSON parser that works for valid input sequences. My issues are:
The UTF-8 problem is similarly solvable, but it greatly complicates the matter since you can't just take the byte stream and treat it like a string, your bytes to UTF-8 conversion needs to be done inside the streaming processor. Again, solvable problems, but hard to get right (as seen by all current implementations except Geth, which is doing whatever Go does under the hood). |
To get a better understanding of the problem I tried to create a javascript client and came up with this snippet. I'm not a js developer but I have a hard time understanding what the problem actually is? So far all languages that I worked with did have some kind of stream encoder/decoder support, either through its standard library or through some external library. I can see the benefit for delimiters, message size or something similar in case you don't have such tools available. But going into that direction means that all platforms that have support for json stream decoders won't be able to use the ipc socket directly but need to do something special. I'm pretty sure that will make other developers unhappy.
I also build DApps that use the ipc socket and I have never through of this feature to be broken. |
I do agree with @MicahZoltu here. The current IPC streaming is error prone and not clearly defined. Thats also the reason why i had to come up with this quirk: https://github.com/ethereum/web3.js/blob/develop/lib/web3/ipcprovider.js#L84-L89 I together with @bas-vk we implemented the IPC, and i do remember bringing up the possibility to properly send END characters, tho i came then with that workaround and we never looked at it again. @bas-vk The issue is that when you receive a JSON object which is larger than the buffer of the socket the json comes in chunks. Then receiving and parsing becomes problematic. Thats the reason why i had to come up with that dechunker. You probably never encountered that with your snippet as you don't receive large JSON resuslts. Thats easy to test with polling for 100 filters created. So i do agree that this can become an issue and we should solve it. So my proposal is that we end JSON objects by This would give us backwards compatibility in mist, is a small change and prevent future hacks to break the stream. |
Are you trying to solve your problem here, or creating a loud enough echo chamber to silence legitimate solution attempts?
Power games and threats won't help your case, on the contrary, it will just piss people off from even trying to help you and explain what is wrong with your approach. This Go server side program gradually sends in larger and larger JSON messages in a single stream, starting from about 32KB, going up by a bit more that 32KB in each iteration. It stops when it reaches one JSON being larger than 32MB. That is clearly above any possible protocol chunking that may occur. package main
import (
"encoding/json"
"fmt"
"net"
)
func main() {
listener, err := net.Listen("unix", "socket.ipc")
if err != nil {
return
}
defer listener.Close()
for {
conn, err := listener.Accept()
if err != nil {
fmt.Println("Listen failed", err)
return
}
defer conn.Close()
out := json.NewEncoder(conn)
// Send out JSON messages gradually growing from a few bytes to over 32 megabytes
msg := struct{ Items []string }{}
for batch := 1; batch < 1000; batch++ {
for i := 0; i < 1000; i++ {
msg.Items = append(msg.Items, "0123456789abcdef0123456789abcdef")
}
if err := out.Encode(msg); err != nil {
fmt.Println("Encode failed", err)
break
}
}
}
} This program adapted from @bas-vk 's code simply reads input JSONs one after the other, emitting the number of items is sees. var oboe = require('oboe')
var net = require('net');
var client = net.createConnection("socket.ipc");
client.on("connect", function() {
oboe(client).done(function(msg) {
console.log(msg.Items.length);
})
}); The output?
@frozeman @MicahZoltu @tinybike Pray tell me which part of the above JavaScript is so horribly convoluted, that you decide to roll your own parser, and then so loudly argue that it's too complicated? The entire parsing is literally one single line of code. |
Ok i had a call with all parties involved here in the go team and came to the conclusion that the issue is a non issue. Due to the fact that they JSON.marshall everything properly, injections which split the json aren't possible. See: https://play.golang.org/p/DHsCSo9ZgN This hopefully is also the case for parity. On the contrary @bas-vk found this great json parser library, which parses character by character and completes objects this way, which is way better than my dechunker. I full heartily recommend using one of these libraries: https://oboejs.com or the former SAX parser https://github.com/dscape/clarinet |
Just to clarify here: There can be no newlines in JSON payload. See the specification : they are encoded. If anyone still has issues with json parsing, I'll gladly help out with any questions. Hit me up over Gitter! If there are any non-compliant JSON-encoders (as in, if Geth or Parity or CPP-ethereum or whatever) spits out non-correct JSON, please let me know, I'd gladly pursue that, since it's a de-facto flaw and possibly a vulnerability. |
@holiman To clarify the clarification: they have to be escaped in strings, but if you follow the link to ECMA 404 pdf you will find exact definition of whitespace:
So yes, new lines can in fact occur in valid JSON payload, though that only happens if you do a pretty-print of some kind. |
Yes, I was not clear enough. With JSON payload, I meant within a String element in a JSON payload. |
This is a good starting point. Thank you for the code example. I'm not sure giant arrays containing the same 16 bytes over and over again makes for a particularly representative or challenging test case though. Let's try the same simple test again, just with more data; if it works, then we can try adding some actual variation to the data as well. Here's a real RPC request to geth, which we make thousands of times every day as part of Augur's initial markets loading:
Here's the modified Go program in a Gist, with the output of the above RPC request pasted in ( Here's the output from the JavaScript code snippet you posted:
Output from the Go program:
Stopping here because the program crashed. The next step would be to add some real variation to the data (both its contents and its length). (I should add, I'm not a Go developer, so please let me know if I'm messing something up / if this is an unfair example.) |
That isn't a fair example. You append |
I'm not sure it's a socket limitation. I'd expect it to block. But having a
500MB single json may nonetheless not be the fairest test case to test rpc
responses. Would be nice to explore if there are any buffer limitations in
place inside the js library, but I'd bet on hitting that. Also I'd try to
check system memory to see if you are hitting some nidejs limits, denying
further memory to be allocated in your test case.
On Feb 24, 2017 7:45 PM, "bas-vk" <notifications@github.com> wrote:
That isn't a fair example. You append fuckedString (length: 427584) in each
iteration of the loop, serializes to json and then send it. That data
stream is enormous and I'm pretty sure you hit a OS limit (there is a OS
dependant limit what unix sockets can buffer before writing more to it
fails) and adding a delimiter, length indication of whatever doesn't make
any difference.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/ethcore/parity/issues/4647#issuecomment-282355910>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAH6GZpeb_2fzDrF2dpomDSZlI54uzTNks5rfxdCgaJpZM4MJkTR>
.
|
I feel like this conversation is getting off track. My goal with this issue wasn't to get help writing a concatenated JSON parser in a particular language, that is something I am capable of. The purpose of this issue was to try to improve the client developer experience by encouraging Geth/Parity to offer an easier to implement API. A quick side-note about oboe, it doesn't handle UTF surrogates split on buffer boundaries; this is one of the many subtle bugs that make implementing concatenated JSON parsers difficult. However, I am not looking to use this issue to troubleshoot this particular problem, nor any other problem I have brought up. I believe the point of disconnect between the two "camps" here is that of "how common is concatenated JSON". Based on some of the comments I have seen throughout this thread, there seems to be a misconception that "Streaming JSON" is synonymous with "Concatenated JSON". I don't believe this to be the case and google and NPM searches seem to agree with me on this. Streaming JSON, in most cases, refers to single well defined JSON objects that are received and processed as a stream of bytes/characters rather than as a single collection of bytes/characters and processing the JSON object piecemeal as it comes in rather than all at once. This type of "stream" processing is very important when you are trying to receive/parse a 4GB streamed JSON array and you want to parse, handle, and discard elements as they come in or if you are trying to write a message stream processor that receives large JSON objects and you want to avoid old gen memory pressure in a garbage collected language. The thing that many of you appear to be talking about is Concatenated JSON. To the best of my knowledge this is fairly uncommon because the simpler (and more performant) solution to dealing with this sort of thing is enveloping or at least a sentinel sequence. I actually did go looking for a JavaScript concatenated JSON parser before opening up this issue. I didn't try oboe because the documentation describes it as a stream JSON processor (common use of the word) and nowhere do they claim to support concatenated JSON. There is even an open issue requesting support for concatenated JSON: jimhigson/oboe.js#44 (comment) (oboe author calls it Multi-JSON). The fact that it works at all for concatenated JSON (ignoring the bug I found) is surprising. Are there other arguments against enveloping besides the belief that Concatenated JSON is common/easy? If so, I would like to hear them. If not, then we should focus any further discussion on that. Regarding the ease of concatenated JSON parsing, my arguments are generally around a spec/RFC compliant concatenated JSON parser which means it must support UTF-8, whitespace (outside of tokens), etc. If Geth/Parity are asserting that this is not necessary, then I request that the documentation be updated to not only refer to the JSON-RPC spec but also include the additional constraints that the Ethereum JSON-RPC spec holds (e.g., ASCII-7 only, no whitespace, etc.). All of this started because I tried to write a spec compliant parser and ran into trouble after trouble and couldn't find a library that would do it for me (correctly). TL;DR: My argument is that a Concatenated JSON parser is hard to implement correctly, hard to find libraries for and not commonly used and should be traded in for an easier to implement protocol. The primary counter argument is that it is common and finding libraries is easy. I believe this is because of a misconception of the definition of "Streaming JSON". oboe bug: for reference only please don't try to troubleshoot this particular problem in this thread, I'm only including it for reference!
output:
|
The following RPC methods return non-hex-encoded strings, which (if they are not strictly ASCII-7) are in principle vulnerable to the splitting issue @MicahZoltu raised:
One workaround that doesn't involve changing IPC itself is to convert to hexadecimal character codes prior to transmission, then re-encode as UTF-8 upon receipt. Augur supports full UTF-8 for things like market descriptions this way. Anyway, all this said, AFAIK this issue is not directly relevant to Augur, so I'm not going to keep pushing. My goal here is just to make sure the geth and parity teams are aware of this issue; it's up to you guys to decide what (if anything) to do about it. |
Regarding the UTF-8 concerns: all non-ASCII UTF-8 codepoints begin with a header byte that has at least the first two bits set, followed by 1 to 3 bytes with high bits being exactly to When parsing JSON, if all you are concerned with are things like depth-changing tokens ( |
After reading all the comments we are happy to add |
Crossposted ethereum/go-ethereum#3702, since this is a systemic ecosystem issue.
When receiving JSON-RPC requests over WS or HTTP, the underlying protocol deals with ensuring that messages are well separated and that the connection can gracefully recover from bad bytes put onto the wire. While domain sockets/named pipes will deal with ordering and guaranteeing arrival, they do not do anything to separate messages. This leads to a sort of cascade of problems. The first being that the developer wanting to utilize the IPC needs to write code that can identify end of one message and beginning of another. Unfortunately, the only way to correctly deal with this is to write or use a streaming JSON parser that can stop when it reaches the end of an object. Unfortunately, most off-the-shelf parsers will error indicating malformed JSON if you try to parse a byte stream that has multiple separate JSON payloads concatenated together (such as the JSON-RPC over IPC protocol).
Even if the user writes a fully JSON compliant streaming parser, it also needs to deal with the fact that JSON is UTF-8 which means a surrogate pair may cross buffer boundaries. This means that not only does the user have to deal with the fact that messages themselves may cross buffer boundaries, but it also needs to deal with the fact that individual characters may cross message boundaries. This is a solvable problem but it introduces a lot of complexity for the developer to deal with.
I think most critically is the fact that I don't believe there is a recoverable way to write an end-on-end JSON parser without some kind of enveloping. Imagine someone sent this over HTTP, WS and IPC:
{ jsonrpc: "2.0", id: 5, method: "truncat
{ jsonrpc: "2.0", id: 6, method: "complete", params: "}{" }
With HTTP and WS, the channel would remain open and the receiver would be able to respond with an appropriate error message indicating one of the messages was malformed. For IPC however, there is no correct way to deal with this. As far as the receiver is concerned, it is in the middle of parsing a JSON string until it receives the next quote, at which point it will see some invalid JSON following it. Unfortunately, the parser doesn't know where to read up to in order to "skip to the next message" and start over. It could try looking for something like }{, but as you can see in this example the }{ would be in the middle of another message (inside a string) but the parser doesn't know that, so it would fail again. Hypothetically you could keep repeating this process (look for some sort of boundary character sequence) until you find a valid message, but that is incredibly complex and I'm not entirely certain there isn't an edge case that would
Proposal
I propose that a version 2 (or are we up to 3?) of the JSON-RPC over IPC be built that envelopes messages with a sentinel header and a length prefix. The length prefix would allow parsers to simply wait until they have the whole message before trying to process any JSON and if the JSON or UTF-8 inside the envelope is invalid it can throw away the whole message without any trouble. The sentinel would be a special byte sequence that would need to be escaped if found anywhere in the payload (could be as little as one byte, but that would require more frequent escaping) and would allow the channel to recover cleanly from malformed input by simply reading until the next sentinel.
Re: sentinel, is a 0 byte valid anywhere in a JSON string? If not, then the null byte could be used as the sentinel without need for escaping.
Notes
There are several Ethereum JSON-RPC over IPC clients that I know of at this point (Geth, Parity, Web3, Ethrpc) and so far I haven't seen anyone fully solve this problem. I'm not entirely certain what Geth is doing since it appears to be relying on something from go-lang, but I would be mildly surprised if it correctly deals with all of the problems associated with un-enveloped IPC payloads.
The text was updated successfully, but these errors were encountered: