Fixed OOM on malicious/malformed thrift #172

jorgecarleitao · 2022-08-10T16:53:57Z

This PR is built on top of jorgecarleitao/parquet-format-safe#1 and eliminates OOMs and some panics! when reading malformed/malicious thrift.

A big thanks to @evanrichter that responsibility disclosed this privately to me, and to contributors at https://users.rust-lang.org/t/how-to-avoid-oom-when-reading-untrusted-sources/79263?u=jorgecarleitao that significantly clarified the different aspects of reading from untrusted sources.

This still does not protect us from zip bombs in OOM, and there are still a lot of panics around - we will require a second read parameter to address them when decompressing (e.g. very large but repetitive strings).

On a positive note, fuzzing did not UB, so the first line of defense (forbid(unsafe_code) and careful review of dependencies) is working as intended.

On a side note, after some digging, the ability to OOM and/or panic on malicious data is a recurrent problem on a significant part of this ecosystem (thrift, flatbuffers).

I plan to create an advisory against parquet2 to inform our users, but this will be a process since there are a bunch of places where we currently panic (and the ecosystem seems to be ok with this). For this reason we agreed to not have an embargo at this point.

The long term idea here is that having a reader that does not panic/OOM on invalid data allows these formats to be safely used beyond very sandboxed / restricted environments.

For now, the primary goal is to not have OOM, since it by default aborts the process.

Close #173

codecov-commenter · 2022-08-10T16:58:03Z

Codecov Report

Merging #172 (df20b3b) into main (1291b1e) will decrease coverage by 0.04%.
The diff coverage is 79.48%.

@@            Coverage Diff             @@
##             main     #172      +/-   ##
==========================================
- Coverage   74.02%   73.98%   -0.05%     
==========================================
  Files          74       74              
  Lines        3650     3675      +25     
==========================================
+ Hits         2702     2719      +17     
- Misses        948      956       +8

Impacted Files	Coverage Δ
src/bloom_filter/read.rs	`0.00% <ø> (ø)`
src/error.rs	`18.18% <0.00%> (ø)`
src/indexes/index.rs	`64.10% <ø> (ø)`
src/indexes/intervals.rs	`89.79% <ø> (ø)`
src/metadata/row_metadata.rs	`46.66% <ø> (ø)`
src/metadata/schema_descriptor.rs	`75.47% <ø> (ø)`
src/read/compression.rs	`96.34% <ø> (ø)`
src/schema/io_message/from_message.rs	`71.18% <ø> (ø)`
src/schema/io_thrift/from_thrift.rs	`82.14% <ø> (ø)`
src/schema/io_thrift/to_thrift.rs	`100.00% <ø> (ø)`
... and 23 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

evanrichter · 2022-08-10T20:14:16Z

nice to see this merged! I'll pull in your changes and run the fuzzer again :D

@evanrichter

Decoding currently panics in many cases. This is a continuation of #172 (cc @evanrichter). Many decoders are now fallible, since invalid input could cause them to panic.

Fixed panics

df20b3b

jorgecarleitao added the bug Something isn't working label Aug 10, 2022

jorgecarleitao mentioned this pull request Aug 10, 2022

Add max_size to get_page_stream and get_page_iterator #173

Closed

jorgecarleitao changed the title ~~Fixed OOM on malicious/invalid thrift~~ Fixed OOM on malicious/malformed thrift Aug 10, 2022

jorgecarleitao merged commit 3e86352 into main Aug 10, 2022

jorgecarleitao deleted the fix_panics branch August 10, 2022 20:13

jorgecarleitao mentioned this pull request Aug 15, 2022

Made decoding fallible #178

Merged

jorgecarleitao added a commit that referenced this pull request Aug 15, 2022

Made decoding fallible (#178)

7caafa4

Decoding currently panics in many cases. This is a continuation of #172 (cc @evanrichter). Many decoders are now fallible, since invalid input could cause them to panic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed OOM on malicious/malformed thrift #172

Fixed OOM on malicious/malformed thrift #172

jorgecarleitao commented Aug 10, 2022 •

edited

Loading

codecov-commenter commented Aug 10, 2022

evanrichter commented Aug 10, 2022

Fixed OOM on malicious/malformed thrift #172

Fixed OOM on malicious/malformed thrift #172

Conversation

jorgecarleitao commented Aug 10, 2022 • edited Loading

codecov-commenter commented Aug 10, 2022

Codecov Report

evanrichter commented Aug 10, 2022

jorgecarleitao commented Aug 10, 2022 •

edited

Loading