Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upHaving the length of a "seq" or "map" be tied to the allocated capacity is dangerous. #115
Comments
TyOverby
referenced this issue
May 31, 2015
Merged
Add a test case that exibits over-capacity out-of-memory error #4
This comment has been minimized.
This comment has been minimized.
|
This not only effects bincode, but cbor as well. |
This comment has been minimized.
This comment has been minimized.
BurntSushi
commented
May 31, 2015
|
I managed to fix this in my slow decoder because it has an intermediate representation. But CBOR does suffer from this bug in the direct decoder for the same reasons that bincode suffers from it. I can't think of another way to fix this either. A breaking change is unfortunate, but it looks like it might be limited to decoder implementors rather that users of decodable. |
This comment has been minimized.
This comment has been minimized.
|
It might be worth considering (potentially in addition to the changes proposed by TyOverby) having an additional Otherwise, making decoders oom free will involve either arbitrary caps on seq & map initial capacity or simply disabling pre-allocation entirely. Also, a complimentary (but more involved) solution to this issue may be structures to specify a per-member limit on number of elements (looking at many existing protocols, limits on lengths of things that would otherwise be unbounded are fairly common). This last bit should be able to live somewhere other than the rustc-serialize crate, however. |
TyOverby
referenced this issue
Jun 17, 2015
Closed
Add capacity bounds to `read_map` and `read_seq`. #121
TyOverby
referenced this issue
Mar 10, 2016
Merged
Respect size limits in read_seq and read_map. #63
This comment has been minimized.
This comment has been minimized.
canndrew
commented
Mar 11, 2016
|
Bump. This makes any decoder with decent performance unusable in network protocols. Could we incorporate @toverby's fix and bump the version number to 0.4.0? |
This comment has been minimized.
This comment has been minimized.
|
CC: @alexcrichton Old discussion here: #121 |
This comment has been minimized.
This comment has been minimized.
|
This may end up just being something that's just a case of where serde is recommended over rustc-serialize, unfortunately. |
This comment has been minimized.
This comment has been minimized.
|
Is rustc-serialize de facto deprecated? If so, this is going to be a terrible hit to the usability of bincode. Right now rustc-serialize is the only choice for bincode users on stable and beta releases of rust. Without
to
I've already had several people open up bugs on Bincode because they used serde and made mistakes in their implementations. In fact, when @erickt graciously added serde support to bincode, he didn't implement serialize/deserialize for structs in the bincode tests, meaning that we are only able to run our tests on nightly. Serde is great, and I love that I can support both rustc-serialize and serde at the same time, but forcing users into serde is not fun at all. In fact, if people are only doing serialization / deserialization using bincode, then I would argue that serde is strictly worse (performance & ease of maintainability). If the core team is going to stop supporting rustc-serialize and fully support serde, I'd understand, but it would be an incredibly disappointing decision, especially for those of us that already depend on it and would like bug fixes like this one. |
This comment has been minimized.
This comment has been minimized.
aturon
commented
Mar 12, 2016
|
cc me |
This comment has been minimized.
This comment has been minimized.
aturon
commented
Mar 12, 2016
|
@TyOverby We're in the process of making it more ergonomic to use serde on stable Rust -- basically support for custom derive on stable. I think this addresses your ease of maintainability concern? It may also make it possible to have an evolving, out-of-tree rustc_serialize setup, but ... we really don't want to further splinter the ecosystem. Could you elaborate on the performance worries with serde? Are they fundamental? In terms of the core team stance, the jury is still out; we're trying to make stable derive ergonomic in general, but haven't yet worked closely with @erickt on the guts of serde itself. Meanwhile, the existing rustc_serialize is effectively frozen -- breaking changes to the internal compiler version aren't really an option, as far as I can see. cc @erickt |
This comment has been minimized.
This comment has been minimized.
aturon
commented
Mar 12, 2016
|
Hm, I missed that the breakage here is for decoders only. Perhaps there is some wiggle room -- we should think about it. @alexcrichton, maybe we can discuss at the next libs meeting? |
This comment has been minimized.
This comment has been minimized.
|
Yes that sounds prudent, I've added an agenda item. |
This comment has been minimized.
This comment has been minimized.
|
Hi @aturon, thanks for looking into this! My main concern about serde is the maintainability aspect. Right now I have roughly 200 lines of struct definitions in a game that I'm working on, and switching from I don't think the performance issue is fundamental, but I remember seeing in erickt's blog post that bincode on rustc-serialize was faster than on serde. I haven't done any benchmarks recently. |
This comment has been minimized.
This comment has been minimized.
canndrew
commented
Mar 15, 2016
|
It would be really great if this could be fixed. Over at @maidsafe we're using rustc_serialize and bincode to serialise all messages between p2p network peers. Currently, anyone can crash another peer by sending them a malicious packet. The options for fixing this are:
None of these solutions are very appealing so I hope it's not considered too much effort to fix the root problem in rustc_serialise. I'd be happy to spend time submitting PRs to any crates.io crates that this breaks. |
This comment has been minimized.
This comment has been minimized.
|
@TyOverby / @canndrew: It really is unfortunate how painful it is to use Syntex on stable rust. I'm currently working on an RFC to enhance Rust's support for code generators that mitigates some of the issues we currently have that hopefully will cut down on most of the major pain points. @TyOverby / @aturon: I haven't re-run performance tests in quite some time. Last time though, serde was slower at serialization (2143MB/s vs 3278MB/s), and slightly faster at deserialization (310MB/s vs 291MB/s). I didn't put too much effort into trying to optimize my bincode implementation, so there might be plenty of performance lying around. @TyOverby / @canndrew: The problem with bumping rustc_serialize is that the rust macro is really tied to the specific function signature. To keep with Rust's backwards compatibility promise, we'd have to add |
This comment has been minimized.
This comment has been minimized.
|
@erickt Wouldn't bumping the major version fit with back-compat promises? |
This comment has been minimized.
This comment has been minimized.
sfackler
commented
Mar 16, 2016
|
@TyOverby |
This comment has been minimized.
This comment has been minimized.
|
@sfackler I don't think that |
This comment has been minimized.
This comment has been minimized.
|
Ok, the libs team (cc @rust-lang-nursery/libs) discussed this in triage yesterday, and the conclusion was that we're interested in fixing this but would like to avoid breaking changes here at all costs. Along those lines it seems like there are two ways to fix this without breaking the API? Namely:
@TyOverby do you have an idea which solution may be the best to move forward with? |
This comment has been minimized.
This comment has been minimized.
|
Thanks for meeting and taking this into consideration! I think that option 2 would be a very good compromise. |
This comment has been minimized.
This comment has been minimized.
|
Ok! Would you be up for sending a PR for that? |
This comment has been minimized.
This comment has been minimized.
|
or am I forgetting that it already exists... |
This comment has been minimized.
This comment has been minimized.
|
I'll modify #122 |
This comment has been minimized.
This comment has been minimized.
|
Just kidding, here it is #150 |
This comment has been minimized.
This comment has been minimized.
|
superseded by #150 |
alexcrichton
closed this
Mar 30, 2016
This comment has been minimized.
This comment has been minimized.
|
er, fixed in that PR |
This comment has been minimized.
This comment has been minimized.
|
The discussion seems to imply that Serde doesn't have this problem... but it does serde-rs/serde#744. |
TyOverby commentedMay 31, 2015
For many decoders, (like JSON), calling
read_seqwith the length of items in a Json array or map is fine because the Json array or map is already in memory, so having a Vec or HashMap initializewith_capacityisn't that bad.However, in a bug found by @jmesmon, the bincode decoder can be tricked into getting
Vecto preallocate more memory than the system has, causing the program crash. I put in a lot of restrictions in Bincode to prevent DOS attacks, with the goal of being able to use Bincode on a public port for gamedev, but the automatic "capacity is the same as length" would let any attacker take down any program using bincode to decode.I've implemented a fix here. This would be a breaking change for people that implement their own
Decoders because the type signature differs, but elements that areDecodablewon't need to change. Furthermore, the#[derive(Decodable)]plugin shouldn't need to change because it doesn't emit anyread_maporread_seqcalls.