Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

route regex match fails for large URIs #7728

Closed
skambashi opened this issue Jul 26, 2019 · 13 comments

Comments

@skambashi
Copy link

commented Jul 26, 2019

Description:
We've noticed that requests with a very long URI crashes our envoy service for routes defined using a regex matcher.

We're not sure if it's due to some overflow bug in Envoy's regex parser, but ideally Envoy should not crash because of a long URI.

Repro steps:
Define a route with a match regex like the following:

"match": {
    "regex": "/asdf/.*"
}

and then make a request with a large URI:

val longString = "a" * (50 * 1024)
client.send("GET", "/asdf/{longString}")

We've gotten around it by using a prefix matcher instead, but this appears to be a potential DoS vulnerability if not a security issue.

@dio

This comment has been minimized.

Copy link
Member

commented Jul 26, 2019

cc. @jmarantz

@jmarantz

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2019

Yeah regexes are like that. See the recent fun that happened at Cloudflare:

https://blog.cloudflare.com/cloudflare-outage/

and for more detail in:

https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/

I think you will be a lot happier with prefix-match :)

So I wouldn't be surprised that a regex can cause Envoy to get very slow. I also might expect in some cases the regex library might throw during a match operation (not sure about this), and this path may not be well tested in Envoy. If that's an actual possibility I'd say the action-item for this bug is to repro and test a case where matching throws.

Another course of action is to add RE2 capability to Envoy. RE2 has a more restricted regex language which prevents catastrophic backtracking and also executes the regexes it does accept much more quickly than std::regex.

@PiotrSikora

This comment has been minimized.

Copy link
Member

commented Jul 26, 2019

@jmarantz erm, Cloudflare's regex was much more complex ((?:(?:\"|'|\]|\}|\\|\d|(?:nan|infinity|true|false|null|undefined|symbol|math)|\`|\-|\+)+[)]*;?((?:\s|-|~|!|{}|\|\||\+)*.*(?:.*=.*)))) and failed because of backtracking.

Regex in this issue (/asdf/.*) is as trivial as it gets, and it's not acceptable for it to be a DoS vector. We should really move away from std::regex to RE2 or PCRE2 with limited backtracking.

@mattklein123

This comment has been minimized.

Copy link
Member

commented Jul 26, 2019

For this reason I hate that we support regex matching at all. This type of situation is IMO almost unavoidable, though I agree with @PiotrSikora we should do better on this simple case.

@mattklein123 mattklein123 added this to the 1.12.0 milestone Jul 26, 2019

@jmarantz

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2019

Agreed. In any case I'll also note that we've observed that std::regex does memory allocations during matching, so it's possible that this crash could have been an OOM. I'm pretty sure RE2 does not do allocations during matching @yanavlasov might know for sure.

@yanavlasov

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2019

Yes, it is OOM. Is this place too public to give details?

@mattklein123 mattklein123 self-assigned this Jul 28, 2019

@mattklein123

This comment has been minimized.

Copy link
Member

commented Jul 28, 2019

std::regex has known issues and is not safe for general purpose untrusted use. I think the POR here is going to be to add a new "safe_regex" type using https://github.com/google/re2 and deprecate the existing regex matchers in various places. I can take this on.

@lizan

This comment has been minimized.

Copy link
Member

commented Jul 30, 2019

FWIW this is the underlying bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86164

@jplevyak

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2019

I tested with libstdc++ and libc++.

libstdc++ crashes OOM at ~50K string length, and takes ~1000usec / regex_match.
libc++ doesn't OOM but at 50K takes ~9500usecs / regex_match.

So libc++ may not be a great solution for this issue since 1msec of CPU is quite a lot.

Re2 in comparision takes ~15useconds / FullMatch independent of string size.

@mattklein123

This comment has been minimized.

Copy link
Member

commented Jul 31, 2019

@jplevyak nice, thanks for looking at that. FYI I'm working on re2.

@lizan

This comment has been minimized.

Copy link
Member

commented Jul 31, 2019

@jplevyak thanks for the benchmark, we're in same page that adding re2 support (via different field safe_regex to keep the API backward compatibility) is the way to go.

I agree either of std::regex implementation in libstdc++ or libc++ is not great. The point of using libc++ is not because it doesn't OOM nor perform better, it is more because that's what we does in fuzzing test, so fuzzers weren't able to catch issue like this.

mattklein123 added a commit that referenced this issue Aug 8, 2019

introduce safe regex matcher based on re2 engine
The libstdc++ std::regex implementation is not safe in all cases
for user provided input. This change deprecates the used of std::regex
in all user facing paths and introduces a new safe regex matcher with
an explicitly configurable engine, right now limited to Google's re2
regex engine. This is not a drop in replacement for std::regex as all
language features are not supported. As such we will go through a
deprecation period for the old regex engine.

Fixes #7728

Signed-off-by: Matt Klein <mklein@lyft.com>
@mhite

This comment has been minimized.

Copy link

commented Aug 19, 2019

Is it safe to assume that the issue potentially exists in configurations that define virtual_clusters match patterns (and not just route matches)?

@mattklein123

This comment has been minimized.

Copy link
Member

commented Aug 19, 2019

@mhite yes. See #7878 for WIP fix.

@htuch htuch closed this in #7878 Aug 23, 2019

htuch added a commit that referenced this issue Aug 23, 2019

introduce safe regex matcher based on re2 engine (#7878)
The libstdc++ std::regex implementation is not safe in all cases
for user provided input. This change deprecates the used of std::regex
in all user facing paths and introduces a new safe regex matcher with
an explicitly configurable engine, right now limited to Google's re2
regex engine. This is not a drop in replacement for std::regex as all
language features are not supported. As such we will go through a
deprecation period for the old regex engine.

Fixes #7728

Signed-off-by: Matt Klein <mklein@lyft.com>

mattklein123 pushed a commit to envoyproxy/data-plane-api that referenced this issue Aug 23, 2019

data-plane-api(CircleCI)
introduce safe regex matcher based on re2 engine (#7878)
The libstdc++ std::regex implementation is not safe in all cases
for user provided input. This change deprecates the used of std::regex
in all user facing paths and introduces a new safe regex matcher with
an explicitly configurable engine, right now limited to Google's re2
regex engine. This is not a drop in replacement for std::regex as all
language features are not supported. As such we will go through a
deprecation period for the old regex engine.

Fixes envoyproxy/envoy#7728

Signed-off-by: Matt Klein <mklein@lyft.com>

Mirrored from https://github.com/envoyproxy/envoy @ eff020170c6267e6c8dc235473f7fc85c5b1e07d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.