New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulnerable Regular Expression #3359

Closed
cristianstaicu opened this Issue Sep 5, 2017 · 15 comments

Comments

9 participants
@cristianstaicu
Copy link

cristianstaicu commented Sep 5, 2017

The following regular expression used in words.js is vulnerable to ReDoS:

/[a-z][A-Z]\|[A-Z]{2,}[a-z]\|[0-9][a-zA-Z]\|[a-zA-Z][0-9]\|[^a-zA-Z0-9 ]/

This regex is called by multiple methods exposed to the users such as "camelCase". The slowdown is moderately low: for 50.000 characters around 2 seconds matching time. However, I would still suggest one of the following:

  • remove the regex,
  • anchor the regex,
  • limit the number of characters that can be matched by the repetition,
  • limit the input size.

If needed, I can provide an actual example showing the slowdown.

@jdalton jdalton added the invalid label Sep 5, 2017

@jdalton

This comment has been minimized.

Copy link
Member

jdalton commented Sep 5, 2017

Hi @cristianstaicu!

50,000 characters? That's a very big string for the use cases of camelCase and friends. I'm up for improving the regexp though. Would you like to create a PR?

@jdalton jdalton closed this Sep 5, 2017

@cristianstaicu

This comment has been minimized.

Copy link
Author

cristianstaicu commented Sep 5, 2017

I agree that for the general case that is a very large value, but security is about corner cases. Let us consider that for the camel case export alone (https://www.npmjs.com/package/lodash.camelcase) there are 237 dependents. I am pretty sure it can produce unexpected results in some packages. For example this package which seems to be the most popular one:
https://www.npmjs.com/package/css-loader

It looks like it just applies the camel case package on parts extracted from random CSS files from the internet. I am pretty sure there are nastier things than this, such as applying it on HTTP headers which an attacker can easily control.

I would like to submit a PR, but you have to tell me if any of the following options are acceptable:

  1. transform the regex into :
/[a-z][A-Z]\|[A-Z]{2,2000}[a-z]\|[0-9][a-zA-Z]\|[a-zA-Z][0-9]\|[^a-zA-Z0-9 ]/

This effectively limits the word size to 2000 characters

  1. check that the string parameter in "words" function has at most 20,000 characters.

If, as you say, most of the use cases involve fewer number of characters, this should impact none of these use cases, but my prediction is that soon you will get new bug reports from angry users that random errors pop up in their application. :) Any other ideas?

@jdalton

This comment has been minimized.

Copy link
Member

jdalton commented Sep 5, 2017

Limiting the size is fine if there's no other way.
The regexp is a monster so if there's a way to improve that, I'd prefer that route.

@gibson042

This comment has been minimized.

Copy link
Contributor

gibson042 commented Sep 6, 2017

The regexp is a monster so if there's a way to improve that, I'd prefer that route.

Well, it seems to break down into the following alternatives:

  • lowercase a-z, followed by uppercase
  • two or more uppercase A-Z, followed by lowercase
  • 0-9 followed by A-Z (any case)
  • A-Z (any case) followed by 0-9
  • any character that is not a space or 0-9 or A-Z (any case)

So the most straightforward simplification just drops the unbounded quantifier and introduces more negated classes:

// only tweak number-adjacent classes:
/[a-z][A-Z]|[A-Z]{2}[a-z]|[0-9][^0-9 ]|[^0-9 ][0-9]|[^a-zA-Z0-9 ]/

// or tweak all three prefixes:
/[a-z][^a-z ]|[A-Z][0-9]|[A-Z]{2}[^A-Z ]|[0-9][^0-9 ]|[^a-zA-Z0-9 ]/

It's still ugly, but that seems hard to avoid given the lowercase-after-one-uppercase exception.

@viralanomaly

This comment has been minimized.

Copy link

viralanomaly commented Aug 21, 2018

Is there a plan to limit the size? The file in question has not changed since well before this issue was raised.

@manuel-jasso

This comment has been minimized.

Copy link

manuel-jasso commented Aug 22, 2018

Hi @jdalton, my company is tightening security and we need to address this issue. If checking the string parameter length is an acceptable solution, would you accept a PR with this change?

@jdalton

This comment has been minimized.

Copy link
Member

jdalton commented Aug 22, 2018

I'm still inclined to think of this issues as more or less expected. By that I mean if you're running camelCase over 50kB of text it's reasonable that it's not going to be great perf. Perf tends to decrease with larger payloads.

If checking the string parameter length is an acceptable solution, would you accept a PR with this change?

I'd prefer improving the performance of the existing regexp than limiting strings to an arbitrary limit. The limit would be if no other solution can be made.

@manuel-jasso

This comment has been minimized.

Copy link

manuel-jasso commented Aug 22, 2018

Thanks for responding 😄 .

I'm still inclined to think of this issues as more or less expected.

Totally agree, however the security team is asking me "can you guarantee that nobody will pass large strings" and I can guarantee that for my own code, but I cannot guarantee that for third party code unless the library itself does that.

I know this would be a larger change, but how about adding some kind of opt-in config parameter to loadsh so that it works in a more secure mode at run time, even though it would be a little slower? That would be a good trade-off between security and performance.

If you prefer to fix the RegEx, are the changes proposed by @cristianstaicu or @gibson042 acceptable?

@jdalton

This comment has been minimized.

Copy link
Member

jdalton commented Aug 22, 2018

Totally agree, however the security team is asking me "can you guarantee that nobody will pass large strings" and I can guarantee that for my own code [...] how about adding some kind of opt-in config parameter to loadsh so that it works in a more secure mode at run time, even though it would be a little slower?

You're probably better off using your own methods for this.

If you prefer to fix the RegEx, are the changes proposed by @cristianstaicu or @gibson042 acceptable?

No idea. You'll want to patch against 4.17.10 and ensure unit tests still pass.

@ccanning

This comment has been minimized.

Copy link

ccanning commented Aug 23, 2018

I think I finally have a fully-working regex that guards against the all caps attack and still uses your original matching patterns, I just changed the method to positive lookahead with non-greedy matching. It also doesn't have an arbitrary string length limit.

/[a-z][A-Z]|(?<=[A-Z][A-Z]+?[a-z])|[0-9][a-zA-Z]|[a-zA-Z][0-9]|[^a-zA-Z0-9 ]/; 
@thinkkevin

This comment has been minimized.

Copy link

thinkkevin commented Aug 24, 2018

[A-Z]{2,}[a-z] makes sense when using Regex to match a string greedily. BUT it is a waste if you just want to use such regex to test a string. It is same as [A-Z]{2}[a-z] for the regex.test purpose.

I tried three regex to test in browser.
The first one: /[a-z][A-Z]|[A-Z]{2,}[a-z]|[0-9][a-zA-Z]|[a-zA-Z][0-9]|[^a-zA-Z0-9 ]/
The second one: /[a-z][A-Z]|[A-Z]{2}[a-z]|[0-9][a-zA-Z]|[a-zA-Z][0-9]|[^a-zA-Z0-9 ]/
The third one: /[A-Z]{2}[a-z]/
screen shot 2018-08-23 at 8 18 00 pm

@manuel-jasso

This comment has been minimized.

Copy link

manuel-jasso commented Aug 24, 2018

Hi @jdalton, as you can see there is quite some excitement and interest to solve this issue! 😄

This is work in progress using the simplest RegEx with the smallest possible change (removing the comma):
manuel-jasso:prevent-redos#diff-001d0647fb00f8336795faccdec19a31R280

I added some tests to prove _.words can still parse huge words successfully, especially the kind of words that use the slow performing RegEx alternative, and it can do it in less than 250ms (I'm testing two cases and asserting it takes less than 500ms):
manuel-jasso:prevent-redos#diff-c1129c8b045390789fa8ff62f2c6b4a9R25211

Needless to say, all previous v4.17.10 _.words tests pass! 👍

What do you think so far? If this looks good I will clean up the code and submit a PR.

BTW for reference: https://github.com/sola-da/ReDoS-vulnerabilities

manuel-jasso pushed a commit to manuel-jasso/lodash that referenced this issue Aug 28, 2018

Manuel Jasso
Prevent ReDoS
To fix lodash#3359, modified reHasUnicodeWord to remove an unnecessary comma which made the regex greedy, this is only a test regex and not a matching regex. Added unit tests, this now should run under 5 ms instead of over 1000 ms for huge 50k+ char words.

This was referenced Aug 28, 2018

manuel-jasso added a commit to manuel-jasso/lodash that referenced this issue Aug 28, 2018

Prevent ReDoS
To fix lodash#3359, modified reHasUnicodeWord to remove an unnecessary comma which made the regex greedy, this is only a test regex and not a matching regex. Added unit tests, this now should run under 5 ms instead of over 1000 ms for huge 50k+ char words.

@jdalton jdalton added bug and removed invalid labels Aug 31, 2018

jdalton added a commit that referenced this issue Aug 31, 2018

Prevent ReDoS
To fix #3359, modified reHasUnicodeWord to remove an unnecessary comma which made the regex greedy, this is only a test regex and not a matching regex. Added unit tests, this now should run under 5 ms instead of over 1000 ms for huge 50k+ char words.
@DyzLecticus

This comment has been minimized.

Copy link

DyzLecticus commented Oct 29, 2018

Has the fix for this issue been merged and released?

@jdalton

This comment has been minimized.

Copy link
Member

jdalton commented Oct 29, 2018

Hi @DyzLecticus!

Yes. It's patched in v4.17.11.

@ddillard

This comment has been minimized.

Copy link

ddillard commented Feb 4, 2019

FYI, I submitted to get a CVE for this issue. I'll add a comment once the CVE ID has been issued.

opichals added a commit to opichals/gherkin-lint that referenced this issue Feb 11, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment