Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed OOM on malicious/malformed thrift #172

Merged
merged 1 commit into from Aug 10, 2022
Merged

Fixed OOM on malicious/malformed thrift #172

merged 1 commit into from Aug 10, 2022

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Aug 10, 2022

This PR is built on top of jorgecarleitao/parquet-format-safe#1 and eliminates OOMs and some panics! when reading malformed/malicious thrift.

A big thanks to @evanrichter that responsibility disclosed this privately to me, and to contributors at https://users.rust-lang.org/t/how-to-avoid-oom-when-reading-untrusted-sources/79263?u=jorgecarleitao that significantly clarified the different aspects of reading from untrusted sources.

This still does not protect us from zip bombs in OOM, and there are still a lot of panics around - we will require a second read parameter to address them when decompressing (e.g. very large but repetitive strings).

On a positive note, fuzzing did not UB, so the first line of defense (forbid(unsafe_code) and careful review of dependencies) is working as intended.

On a side note, after some digging, the ability to OOM and/or panic on malicious data is a recurrent problem on a significant part of this ecosystem (thrift, flatbuffers).

I plan to create an advisory against parquet2 to inform our users, but this will be a process since there are a bunch of places where we currently panic (and the ecosystem seems to be ok with this). For this reason we agreed to not have an embargo at this point.

The long term idea here is that having a reader that does not panic/OOM on invalid data allows these formats to be safely used beyond very sandboxed / restricted environments.

For now, the primary goal is to not have OOM, since it by default aborts the process.

Close #173

@codecov-commenter
Copy link

Codecov Report

Merging #172 (df20b3b) into main (1291b1e) will decrease coverage by 0.04%.
The diff coverage is 79.48%.

@@            Coverage Diff             @@
##             main     #172      +/-   ##
==========================================
- Coverage   74.02%   73.98%   -0.05%     
==========================================
  Files          74       74              
  Lines        3650     3675      +25     
==========================================
+ Hits         2702     2719      +17     
- Misses        948      956       +8     
Impacted Files Coverage Δ
src/bloom_filter/read.rs 0.00% <ø> (ø)
src/error.rs 18.18% <0.00%> (ø)
src/indexes/index.rs 64.10% <ø> (ø)
src/indexes/intervals.rs 89.79% <ø> (ø)
src/metadata/row_metadata.rs 46.66% <ø> (ø)
src/metadata/schema_descriptor.rs 75.47% <ø> (ø)
src/read/compression.rs 96.34% <ø> (ø)
src/schema/io_message/from_message.rs 71.18% <ø> (ø)
src/schema/io_thrift/from_thrift.rs 82.14% <ø> (ø)
src/schema/io_thrift/to_thrift.rs 100.00% <ø> (ø)
... and 23 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@jorgecarleitao jorgecarleitao changed the title Fixed OOM on malicious/invalid thrift Fixed OOM on malicious/malformed thrift Aug 10, 2022
@jorgecarleitao jorgecarleitao merged commit 3e86352 into main Aug 10, 2022
@jorgecarleitao jorgecarleitao deleted the fix_panics branch August 10, 2022 20:13
@evanrichter
Copy link
Contributor

nice to see this merged! I'll pull in your changes and run the fuzzer again :D

jorgecarleitao added a commit that referenced this pull request Aug 15, 2022
Decoding currently panics in many cases. This is a continuation of #172 (cc @evanrichter).
Many decoders are now fallible, since invalid input could cause them to panic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add max_size to get_page_stream and get_page_iterator
3 participants