Thoughts about using an eslint plugin for REDOS detection? #1201

davisjam · 2018-04-04T15:54:38Z

This is a question for discussion, not an issue.

I am working on a vuln-regex-detector module.
I will be creating an eslint plugin based on this module in the next day or two. It will make one synchronous HTTP query for each regex it finds in the source code. Queries go to a server I host at Virginia Tech which has a database mapping regexes to pre-computed "is vulnerable?" answers. The server is running the server side of the code in this project.

Once it's ready, does anyone have concerns with adding my REDOS plugin to this project's eslint config?

joshbruce · 2018-04-04T15:59:48Z

UziTech · 2018-04-04T18:09:25Z

It would be really nice to get some sort of ReDoS test on travis. Could we make it so the server side code runs on travis instead of making http requests?

davisjam · 2018-04-04T18:25:11Z

@UziTech

Could we make it so the server side code runs on travis instead of making http requests?

Testing a regex is expensive (potentially 60 seconds+ per regex) so memoizing them is helpful. Then that memoized result has to live somewhere.

My impression (could be wrong) is that Travis isn't a good place for permanent storage.
Is there something wrong with HTTP requests?

UziTech · 2018-04-04T18:48:24Z

Is there something wrong with HTTP requests?

In my experience http requests from travis seem to be the least reliable part of travis and "one synchronous HTTP query for each regex" could be a lot of http requests for every pr.

Could we download the memoized results and check the regexes ourselves or send a bunch of regexes at once to limit the number of http requests?

davisjam · 2018-04-04T19:00:05Z

Could we download the memoized results and check the regexes ourselves?

100MB+, not a good option.

or send a bunch of regexes at once to limit the number of http requests?

I don't think you can batch nicely with an eslint plugin. The "is vulnerable?" function gets called for each regex node in the AST and doesn't have a bigger sense of the context.

If you want to set up a post-eslint "security scanner" that runs as part of npm run test or npm run lint, you could do batching that way.

Could be a lot of http requests for every PR

About 100 requests.

grep '/.*/' lib/marked.js | grep -v '\/\/' | wc -l
98

UziTech · 2018-04-04T19:04:44Z

I would rather it be separate from eslint so we could run tests, run redos tests, then run lint. I feel like that would make it easier to see what failed in travis.

davisjam · 2018-04-05T02:11:12Z

@UziTech The eslint plugin is now available here.

Here are the steps I took:

Clone marked to my laptop (several miles from my server and using wifi)
Add the eslint plugin and enable the rule
Time how long it takes to run npm run lint with and without the rule

Here's what I found:

Without plugin, linting takes 2 seconds
With plugin, linting takes 16 seconds

See Slack for additional notes.

styfle · 2018-04-05T12:57:28Z

@davisjam First of all, this is amazing work!

However, I agree with UziTech that this is less than ideal for every build to fire off HTTP requests. Especially for a "lint" test. Linting should be fast and it would be very odd to be working on code in a Plane without wifi and you can't run lint for formatting errors.

Can this be a separate build step? Why does it need to be an eslint plugin?

UziTech · 2018-04-05T13:10:14Z

I agree with @styfle. I have my IDE running eslint in the background and underlining code as I type, it would be pretty annoying to have to wait 16 seconds to get that visual cue as I'm typing.

davisjam · 2018-04-05T13:16:10Z

@styfle @UziTech

it would be very odd to be working on code in a Plane without wifi and you can't run lint for formatting errors.

If the server is inaccessible, the vuln-regex-detector module returns "INVALID" and the eslint plugin ignores that regex. So there wouldn't be linter errors. It would still be slow though.

Can this be a separate build step? Why does it need to be an eslint plugin?

There's no particular reason it has to be an eslint plugin, other than convenience.

I was exploring an eslint plugin for REDOS detection, along the lines of eslint-plugin-safe-regex and eslint-plugin-security.

eslint handles the jobs any such tool needs:

identifying the files to scan
giving you an AST to traverse

Identifying REDOS issues without false positives requires either (1) expensive local computation or (2) hitting a server. So running this at the same time as the other linting rules (e.g. for use in an IDE as @UziTech notes) is not ideal because of the performance hit.

Does anyone know of a good pattern for running this kind of check during the programming cycle?

What are your thoughts about adding it as part of npm run test, implemented as a separate scan for regexes on a designated set of files of interest (in this case, lib/marked.js)? This approach enables batching so it should be faster.

styfle · 2018-04-05T13:27:51Z

npm run test:regex would be good

davisjam · 2018-04-05T13:58:42Z

How about this?

"test:regex": "eslint --plugin vuln-regex-detector --rule '\"vuln-regex-detector/no-vuln-regex\": 2' lib/marked.js",

It's still slow because it's making synchronous requests for each regex. But since it's off the linter fast path I imagine that's OK now.

If acceptable, I'll open a PR.

davisjam · 2018-04-05T14:07:11Z

Time estimate: From an AWS micro instance, it takes 25 seconds to run npm run test:regex.

I'm working on a local cache of results so on dev machines the overhead would be minimal. On Travis it would still be in the "cost of 100 queries" range.

styfle · 2018-04-05T14:10:37Z

One other thought I just had: since this database is remotely managed, we can't easily tell if a given PR introduced a vulnerability or if the database was updated and the PR was the first to run CI.

Have you thought about how to solve this problem?

davisjam · 2018-04-05T14:37:22Z

@styfle I don't see a good way around that with the "query a remote server" configuration. The blame may be indiscriminate in general. If devs run npm run test:regex before pushing it will help, since the server may get to it and produce an answer before the query is run by Travis.

If developers really cared, the vuln-regex-detector module is in the client subdir of the vuln-regex-detector project. Developers could use aspects of the larger project as an application instead (see the bin/check-tree.pl script). But I'd rather see npm run test:regex incorporated into projects instead -- I trust Travis.

davisjam · 2018-04-05T15:01:14Z

Another thought: Because we can't easily distinguish between "this PR introduced a vulnerable regex" and "some other PR did so and we're the first PR to notice", we might ship a release with vulnerable regexes and only notice afterward.

That's not a great situation. But it's better than not noticing.

UziTech · 2018-04-05T15:04:37Z

@styfle I think that is a problem we will have to assess on a case by case bases. Maybe set the regex test to allow failure on travis and check it manually.

@davisjam I'm curious what the chance is for a false positive (invulnerable regex seen as vulnerable)?

The eslint plugin might not be the best solution in our case since we have so many regexes and especially since we construct the regexes.

I would think a better way for us to test the regexes would be to run the code (so the regexes are constructed) then send those regexes to the server (possibly with fewer http requests).

something like:

const vulnRegexDetector = require('vuln-regex-detector');
const marked = require('../lib/marked.js');

for(const name in marked.Lexer.rules) {
	const regex = marked.Lexer.rules[name];
	vulnRegexDetector.test(regex)
		.then((result) => {
			if (result === vulnRegexDetector.responses.vulnerable) {
				console.log('Regex is vulnerable');
			} else if (result === vulnRegexDetector.responses.safe) {
				console.log('Regex is safe');
			} else {
				console.log('Not sure if regex is safe or not');
			}
		});
}

UziTech · 2018-04-05T15:06:13Z

That only tests the block level normal rules. We should probably refactor marked so the regexes are easier to pull out for testing.

davisjam · 2018-04-05T15:11:59Z

I'm curious what the chance is for a false positive (invulnerable regex seen as vulnerable)?

There are no false positives (famous last words).
The server only marks a regex as vulnerable if it demonstrates catastrophic backtracking to its own satisfaction. See here for details. The only risk of a false positive that I can think of is an absurdly long attack string that might lead to timeouts due to the cost of string manipulation.

It might have false negatives though, most obviously if the detectors aren't aware of a class of vulnerabilities.

I would think a better way for us to test the regexes would be to run the code (so the regexes are constructed) then send those regexes to the server

This approach requires that we keep the test in sync with the code. The approach I proposed will test every regex, albeit somewhat slowly in a CI context. But I don't view 30-60 seconds as a huge problem given the overhead of CI in general.

For npm run test:regex on dev machines, as I said above I'm working on a local cache so only new regexes would be queried (should take <2 seconds).

UziTech · 2018-04-05T15:21:09Z

The problem with the eslint approach is that it won't catch regexes that are constructed into vulnerable regexes.

example:

block.vuln = edit(/(test)+/)
  .replace('test', /a+/)
  .getRegex();

the eslint plugin will test /(test)+/ and /a+/ which are not vulnerable but marked is actually using /(a+)+/ which is vulnerable according to your example.

davisjam · 2018-04-05T15:22:42Z

Entirely correct. When I said "there are no false positives" I meant for the regexes it considers -- namely statically defined ones that can be identified in an eslint pass.

Is this a concern in marked?

UziTech · 2018-04-05T15:28:44Z

Is this a concern in marked?

Yes, the example above is the way most of the regexes in marked are constructed.

I propose that when we modularize marked we create a Regex "class" that holds all the regexes and handles the construction so it will be trivial to test them all.

davisjam · 2018-04-05T15:33:01Z

Yes, the example above is the way most of the regexes in marked are constructed.

I didn't know that. The vulnerable regexes I've seen in marked are not constructed in this way. My initial scan used a static analysis similar to that of eslint.

No static analysis will be perfect, but it seems like the npm run test:regex implementation I indicated here is far better than nothing and requires no additional dev investment. You could swap in a dynamic implementation later.

UziTech · 2018-04-05T16:01:05Z

@davisjam Is there a way to send multiple regexes in one http request with the npm package?

davisjam · 2018-04-05T16:06:10Z

multiple regexes in one http request

Not yet. See this issue.

styfle · 2018-04-05T16:16:48Z

I thought about this problem a little more and I think the best way to solve this problem is to accept a version parameter (or a date parameter) in the API call.

This way, we can run the check for every PR and a failure will tell us that the new regex in the PR caused the failure.

Then once a week/month, we can create a new PR to bump the version parameter (or date) parameter so that new vulnerabilities are checked in this new PR and can be fixed before merging.

It requires manual updates but it avoids the scenario where we break a build for something like docs that isn't even touching the regex code.

Thoughts?

davisjam · 2018-04-05T16:38:24Z

It requires manual updates but it avoids the scenario where we break a build for something like docs that isn't even touching the regex code.

Since we're talking about security issues here, I would rather break the build as early as possible.

styfle · 2018-04-05T17:19:16Z

I guess we'll have to click through on PRs that fail to see if its the text:regex step. Then I'll just tag @davisjam to confirm if it really is a problem with the PR 😉

davisjam · 2018-04-05T17:24:20Z

For npm run test:regex on dev machines, as I said above I'm working on a local cache so only new regexes would be queried (should take <2 seconds).

vuln-regex-detector v1.0.2 (eslint-plugin-vuln-regex-detector v1.0.3) now maintains a local persistent cache in the FS.

The first npm run test:regex on a "sample dev machine" (AWS micro) takes 30 seconds.
This warms up the local cache, so a subsequent npm run test:regex is resolved locally and takes 2.5 seconds. That's about the same cost as running plain old npm run lint (2.1 seconds).

In a CI setting you'll pay the full cost of a cold cache.

davisjam · 2018-04-05T17:26:08Z

I guess we'll have to click through on PRs that fail to see if its the test:regex step.

Seems reasonable to me. We can discuss details in Slack.

- 'npm run test:redos' now scans for REDOS issues - added a Travis stage for 'security scan' Fixes: markedjs#1201 (a step towards it, anyway)

joshbruce added question proposal labels Apr 4, 2018

davisjam mentioned this issue Apr 5, 2018

Offering my REDOS tools nodejs/security-wg#208

Closed

davisjam mentioned this issue Apr 15, 2018

test: security scan #1220

Merged

davisjam added a commit to davisjam/marked that referenced this issue Apr 17, 2018

test: security scan

02b1343

- 'npm run test:redos' now scans for REDOS issues - added a Travis stage for 'security scan' Fixes: markedjs#1201 (a step towards it, anyway)

styfle closed this as completed in #1220 Apr 17, 2018

davisjam mentioned this issue May 1, 2018

Gfm tables #1245

Merged

6 tasks

zhenalexfan pushed a commit to zhenalexfan/MarkdownHan that referenced this issue Nov 8, 2021

test: security scan

f65b787

- 'npm run test:redos' now scans for REDOS issues - added a Travis stage for 'security scan' Fixes: markedjs#1201 (a step towards it, anyway)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thoughts about using an eslint plugin for REDOS detection? #1201

Thoughts about using an eslint plugin for REDOS detection? #1201

davisjam commented Apr 4, 2018

joshbruce commented Apr 4, 2018

UziTech commented Apr 4, 2018

davisjam commented Apr 4, 2018

UziTech commented Apr 4, 2018

davisjam commented Apr 4, 2018

UziTech commented Apr 4, 2018

davisjam commented Apr 5, 2018 •

edited

styfle commented Apr 5, 2018

UziTech commented Apr 5, 2018

davisjam commented Apr 5, 2018

styfle commented Apr 5, 2018

davisjam commented Apr 5, 2018

davisjam commented Apr 5, 2018

styfle commented Apr 5, 2018

davisjam commented Apr 5, 2018

davisjam commented Apr 5, 2018 •

edited

UziTech commented Apr 5, 2018

UziTech commented Apr 5, 2018

davisjam commented Apr 5, 2018 •

edited

UziTech commented Apr 5, 2018

davisjam commented Apr 5, 2018

UziTech commented Apr 5, 2018

davisjam commented Apr 5, 2018 •

edited

UziTech commented Apr 5, 2018

davisjam commented Apr 5, 2018

styfle commented Apr 5, 2018

davisjam commented Apr 5, 2018

styfle commented Apr 5, 2018

davisjam commented Apr 5, 2018 •

edited

davisjam commented Apr 5, 2018

Thoughts about using an eslint plugin for REDOS detection? #1201

Thoughts about using an eslint plugin for REDOS detection? #1201

Comments

davisjam commented Apr 4, 2018

joshbruce commented Apr 4, 2018

UziTech commented Apr 4, 2018

davisjam commented Apr 4, 2018

UziTech commented Apr 4, 2018

davisjam commented Apr 4, 2018

UziTech commented Apr 4, 2018

davisjam commented Apr 5, 2018 • edited

styfle commented Apr 5, 2018

UziTech commented Apr 5, 2018

davisjam commented Apr 5, 2018

styfle commented Apr 5, 2018

davisjam commented Apr 5, 2018

davisjam commented Apr 5, 2018

styfle commented Apr 5, 2018

davisjam commented Apr 5, 2018

davisjam commented Apr 5, 2018 • edited

UziTech commented Apr 5, 2018

UziTech commented Apr 5, 2018

davisjam commented Apr 5, 2018 • edited

UziTech commented Apr 5, 2018

davisjam commented Apr 5, 2018

UziTech commented Apr 5, 2018

davisjam commented Apr 5, 2018 • edited

UziTech commented Apr 5, 2018

davisjam commented Apr 5, 2018

styfle commented Apr 5, 2018

davisjam commented Apr 5, 2018

styfle commented Apr 5, 2018

davisjam commented Apr 5, 2018 • edited

davisjam commented Apr 5, 2018

davisjam commented Apr 5, 2018 •

edited

davisjam commented Apr 5, 2018 •

edited

davisjam commented Apr 5, 2018 •

edited

davisjam commented Apr 5, 2018 •

edited

davisjam commented Apr 5, 2018 •

edited