Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API to mitigate exponential backtracking (ReDoS) #151

Open
fujimotos opened this issue Jun 25, 2020 · 3 comments
Open

Add API to mitigate exponential backtracking (ReDoS) #151

fujimotos opened this issue Jun 25, 2020 · 3 comments

Comments

@fujimotos
Copy link

Problem

Onigmo is a backtracking regex engine. For this reason, it can sometime
become exceptionally slow, in particular, with this class of inputs:

regex : ^(a+)+$
string: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab

Provide a library API that can limit the amount of backtracking, so
that it can exit gracefully from such expensive cases.

Steps to Reproduce

Attached is a simple script to reproduce this issue. You should be
able to confirm that Onigmo hangs up with:

$ cc -o sample sample.c libonigmo.a
$ ./sample

Expected Solution

PCRE2 provides pcre2_set_match_limit() that allows callers to set a
hard limit on recursive backtracking. If it exceeds the limit, it returns
PCRE_ERROR_MATCHLIMIT, aborting the match evaluation.

https://www.pcre.org/current/doc/html/pcre2_set_match_limit.html

It would be desriable if Onigmo provides a similar tunable parameter
as well.

Attachment (reproduciable code)

#include <string.h>
#include "onigmo.h"

int main(void)
{
  unsigned char *start, *range, *end;
  regex_t *reg;
  OnigErrorInfo einfo;
  OnigRegion *region;

  UChar *pattern = (UChar *) "^(a+)+$";
  UChar *str     = (UChar *) "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab";

  onig_new(&reg, pattern, pattern + strlen((char *) pattern),
    ONIG_OPTION_DEFAULT, ONIG_ENCODING_ASCII, ONIG_SYNTAX_DEFAULT,
    &einfo);

  region = onig_region_new();
  end   = str + strlen((char *)str);
  start = str;
  range = end;
  onig_search(reg, str, end, start, range, region, ONIG_OPTION_NONE);
  return 0;
}
@edsiper
Copy link

edsiper commented Jun 25, 2020

FYI: Fluentd and Fluent Bit relies heavily on onigmo, having this feature in place would be great.

@k-takata
Copy link
Owner

You can enable such check by defining USE_COMBINATION_EXPLOSION_CHECK in regint.h:

Onigmo/regint.h

Line 138 in a06a42b

/* #define USE_COMBINATION_EXPLOSION_CHECK */ /* (X*)* */

@fujimotos
Copy link
Author

@k-takata What do you think about Kosako's comment on this thread:

If USE_COMBINATION_EXPLOSION_CHECK flag in src/regint.h enabled, it solves several patterns.
However, it does not solve all pattern cases and it become slow.
Therefore, I was not able to use it.

kkos/oniguruma#13 (comment)

There is, we presume, a fair question as to how it is well trenched for production
usage. Indeed, this flag seems to be deprecated in the upstream project.

@fujimotos fujimotos reopened this Oct 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants