Having the length of a "seq" or "map" be tied to the allocated capacity is dangerous. #115
Comments
This not only effects bincode, but cbor as well. |
I managed to fix this in my slow decoder because it has an intermediate representation. But CBOR does suffer from this bug in the direct decoder for the same reasons that bincode suffers from it. I can't think of another way to fix this either. A breaking change is unfortunate, but it looks like it might be limited to decoder implementors rather that users of decodable. |
It might be worth considering (potentially in addition to the changes proposed by TyOverby) having an additional Otherwise, making decoders oom free will involve either arbitrary caps on seq & map initial capacity or simply disabling pre-allocation entirely. Also, a complimentary (but more involved) solution to this issue may be structures to specify a per-member limit on number of elements (looking at many existing protocols, limits on lengths of things that would otherwise be unbounded are fairly common). This last bit should be able to live somewhere other than the rustc-serialize crate, however. |
Bump. This makes any decoder with decent performance unusable in network protocols. Could we incorporate @toverby's fix and bump the version number to 0.4.0? |
CC: @alexcrichton Old discussion here: #121 |
This may end up just being something that's just a case of where serde is recommended over rustc-serialize, unfortunately. |
Is rustc-serialize de facto deprecated? If so, this is going to be a terrible hit to the usability of bincode. Right now rustc-serialize is the only choice for bincode users on stable and beta releases of rust. Without
to
I've already had several people open up bugs on Bincode because they used serde and made mistakes in their implementations. In fact, when @erickt graciously added serde support to bincode, he didn't implement serialize/deserialize for structs in the bincode tests, meaning that we are only able to run our tests on nightly. Serde is great, and I love that I can support both rustc-serialize and serde at the same time, but forcing users into serde is not fun at all. In fact, if people are only doing serialization / deserialization using bincode, then I would argue that serde is strictly worse (performance & ease of maintainability). If the core team is going to stop supporting rustc-serialize and fully support serde, I'd understand, but it would be an incredibly disappointing decision, especially for those of us that already depend on it and would like bug fixes like this one. |
cc me |
@TyOverby We're in the process of making it more ergonomic to use serde on stable Rust -- basically support for custom derive on stable. I think this addresses your ease of maintainability concern? It may also make it possible to have an evolving, out-of-tree rustc_serialize setup, but ... we really don't want to further splinter the ecosystem. Could you elaborate on the performance worries with serde? Are they fundamental? In terms of the core team stance, the jury is still out; we're trying to make stable derive ergonomic in general, but haven't yet worked closely with @erickt on the guts of serde itself. Meanwhile, the existing rustc_serialize is effectively frozen -- breaking changes to the internal compiler version aren't really an option, as far as I can see. cc @erickt |
Hm, I missed that the breakage here is for decoders only. Perhaps there is some wiggle room -- we should think about it. @alexcrichton, maybe we can discuss at the next libs meeting? |
Yes that sounds prudent, I've added an agenda item. |
Hi @aturon, thanks for looking into this! My main concern about serde is the maintainability aspect. Right now I have roughly 200 lines of struct definitions in a game that I'm working on, and switching from I don't think the performance issue is fundamental, but I remember seeing in erickt's blog post that bincode on rustc-serialize was faster than on serde. I haven't done any benchmarks recently. |
It would be really great if this could be fixed. Over at @maidsafe we're using rustc_serialize and bincode to serialise all messages between p2p network peers. Currently, anyone can crash another peer by sending them a malicious packet. The options for fixing this are:
None of these solutions are very appealing so I hope it's not considered too much effort to fix the root problem in rustc_serialise. I'd be happy to spend time submitting PRs to any crates.io crates that this breaks. |
@TyOverby / @canndrew: It really is unfortunate how painful it is to use Syntex on stable rust. I'm currently working on an RFC to enhance Rust's support for code generators that mitigates some of the issues we currently have that hopefully will cut down on most of the major pain points. @TyOverby / @aturon: I haven't re-run performance tests in quite some time. Last time though, serde was slower at serialization (2143MB/s vs 3278MB/s), and slightly faster at deserialization (310MB/s vs 291MB/s). I didn't put too much effort into trying to optimize my bincode implementation, so there might be plenty of performance lying around. @TyOverby / @canndrew: The problem with bumping rustc_serialize is that the rust macro is really tied to the specific function signature. To keep with Rust's backwards compatibility promise, we'd have to add |
@erickt Wouldn't bumping the major version fit with back-compat promises? |
@TyOverby |
@sfackler I don't think that |
Ok, the libs team (cc @rust-lang-nursery/libs) discussed this in triage yesterday, and the conclusion was that we're interested in fixing this but would like to avoid breaking changes here at all costs. Along those lines it seems like there are two ways to fix this without breaking the API? Namely:
@TyOverby do you have an idea which solution may be the best to move forward with? |
Thanks for meeting and taking this into consideration! I think that option 2 would be a very good compromise. |
Ok! Would you be up for sending a PR for that? |
or am I forgetting that it already exists... |
I'll modify #122 |
Just kidding, here it is #150 |
superseded by #150 |
er, fixed in that PR |
The discussion seems to imply that Serde doesn't have this problem... but it does serde-rs/serde#744. |
For many decoders, (like JSON), calling
read_seq
with the length of items in a Json array or map is fine because the Json array or map is already in memory, so having a Vec or HashMap initializewith_capacity
isn't that bad.However, in a bug found by @jmesmon, the bincode decoder can be tricked into getting
Vec
to preallocate more memory than the system has, causing the program crash. I put in a lot of restrictions in Bincode to prevent DOS attacks, with the goal of being able to use Bincode on a public port for gamedev, but the automatic "capacity is the same as length" would let any attacker take down any program using bincode to decode.I've implemented a fix here. This would be a breaking change for people that implement their own
Decoder
s because the type signature differs, but elements that areDecodable
won't need to change. Furthermore, the#[derive(Decodable)]
plugin shouldn't need to change because it doesn't emit anyread_map
orread_seq
calls.The text was updated successfully, but these errors were encountered: