Fixed OOM on malicious/malformed thrift #172
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is built on top of jorgecarleitao/parquet-format-safe#1 and eliminates OOMs and some
panics!
when reading malformed/malicious thrift.A big thanks to @evanrichter that responsibility disclosed this privately to me, and to contributors at https://users.rust-lang.org/t/how-to-avoid-oom-when-reading-untrusted-sources/79263?u=jorgecarleitao that significantly clarified the different aspects of reading from untrusted sources.
This still does not protect us from zip bombs in OOM, and there are still a lot of panics around - we will require a second read parameter to address them when decompressing (e.g. very large but repetitive strings).
On a positive note, fuzzing did not UB, so the first line of defense (
forbid(unsafe_code)
and careful review of dependencies) is working as intended.On a side note, after some digging, the ability to OOM and/or panic on malicious data is a recurrent problem on a significant part of this ecosystem (thrift, flatbuffers).
I plan to create an advisory against parquet2 to inform our users, but this will be a process since there are a bunch of places where we currently panic (and the ecosystem seems to be ok with this). For this reason we agreed to not have an embargo at this point.
The long term idea here is that having a reader that does not panic/OOM on invalid data allows these formats to be safely used beyond very sandboxed / restricted environments.
For now, the primary goal is to not have OOM, since it by default aborts the process.
Close #173