Skip to content

Commit

Permalink
fuzz: account for Unicode class size in compiler
Browse files Browse the repository at this point in the history
This improves the precision of the "expression too big" regex
compilation error. Previously, it was not considering the heap usage
from Unicode character classes.

It's possible this will make some regexes fail to compile that
previously compiled. However, this is a bug fix. If you do wind up
seeing this though, feel free to file an issue, since it would be good
to get an idea of what kinds of regexes no longer compile but did.

This was found by OSS-fuzz:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33579
  • Loading branch information
BurntSushi committed Apr 22, 2021
1 parent 6d95a6f commit 41f14c2
Show file tree
Hide file tree
Showing 3 changed files with 33 additions and 1 deletion.
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
1.4.6 (2021-04-22)
==================
This is a small patch release that fixes the compiler's size check on how much
heap memory a regex uses. Previously, the compiler did not account for the
heap usage of Unicode character classes. Now it does. It's possible that this
may make some regexes fail to compile that previously did compile. If that
happens, please file an issue.

* [BUG OSS-fuzz#33579](https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33579):
Some regexes can use more heap memory than one would expect.


1.4.5 (2021-03-14)
==================
This is a small patch release that fixes a regression in the size of a `Regex`
Expand Down
10 changes: 9 additions & 1 deletion src/compile.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ pub struct Compiler {
suffix_cache: SuffixCache,
utf8_seqs: Option<Utf8Sequences>,
byte_classes: ByteClassSet,
extra_inst_bytes: usize,
}

impl Compiler {
Expand All @@ -54,6 +55,7 @@ impl Compiler {
suffix_cache: SuffixCache::new(1000),
utf8_seqs: Some(Utf8Sequences::new('\x00', '\x00')),
byte_classes: ByteClassSet::new(),
extra_inst_bytes: 0,
}
}

Expand Down Expand Up @@ -420,6 +422,8 @@ impl Compiler {
}

fn c_class(&mut self, ranges: &[hir::ClassUnicodeRange]) -> ResultOrEmpty {
use std::mem::size_of;

assert!(!ranges.is_empty());
if self.compiled.uses_bytes() {
Ok(Some(CompileClass { c: self, ranges: ranges }.compile()?))
Expand All @@ -429,6 +433,8 @@ impl Compiler {
let hole = if ranges.len() == 1 && ranges[0].0 == ranges[0].1 {
self.push_hole(InstHole::Char { c: ranges[0].0 })
} else {
self.extra_inst_bytes +=
ranges.len() * (size_of::<char>() * 2);
self.push_hole(InstHole::Ranges { ranges: ranges })
};
Ok(Some(Patch { hole: hole, entry: self.insts.len() - 1 }))
Expand Down Expand Up @@ -795,7 +801,9 @@ impl Compiler {
fn check_size(&self) -> result::Result<(), Error> {
use std::mem::size_of;

if self.insts.len() * size_of::<Inst>() > self.size_limit {
let size =
self.extra_inst_bytes + (self.insts.len() * size_of::<Inst>());
if size > self.size_limit {
Err(Error::CompiledTooBig(self.size_limit))
} else {
Ok(())
Expand Down
12 changes: 12 additions & 0 deletions tests/regression_fuzz.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,15 @@ fn fuzz1() {
fn empty_any_errors_no_panic() {
assert!(regex_new!(r"\P{any}").is_err());
}

// This tests that a very large regex errors during compilation instead of
// using gratuitous amounts of memory. The specific problem is that the
// compiler wasn't accounting for the memory used by Unicode character classes
// correctly.
//
// See: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33579
#[test]
fn big_regex_fails_to_compile() {
let pat = "[\u{0}\u{e}\u{2}\\w~~>[l\t\u{0}]p?<]{971158}";
assert!(regex_new!(pat).is_err());
}

0 comments on commit 41f14c2

Please sign in to comment.