Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable regular expressions in Painless by default #20397

Closed
clintongormley opened this issue Sep 9, 2016 · 19 comments
Closed

Disable regular expressions in Painless by default #20397

clintongormley opened this issue Sep 9, 2016 · 19 comments
Assignees
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement v5.0.0-beta1

Comments

@clintongormley
Copy link

Painless is designed to be fast and safe. The one chink in its armour is Java regular expressions. A pathological regular expression could run in quadratic time or even throw a StackOverflowError.

I think we should disable Java regular expressions by default and add a setting which allows them to be enabled.

Nice to have: support Lucene regular expressions for matching without capturing, as they have a safety mechanism built in.

@s1monw
Copy link
Contributor

s1monw commented Sep 9, 2016

I don't know how complex it would be but maybe we can enable lucene regular expressions by default and only if the user enables it allow complex ones?

@nik9000
Copy link
Member

nik9000 commented Sep 9, 2016

I think we should disable Java regular expressions by default and add a setting which allows them to be enabled.

A node level setting or something sent with the request like we're talking about for assert? If a node level setting then a dynamic one or a static one?

I don't know how complex it would be but maybe we can enable lucene regular expressions by default and only if the user enables it allow complex ones?

Lucene's syntax is not a drop in replacement for java. It'd be nice if we only had a single regex syntax but I can see the allure as I know they are safe.

I'm also happy to investigate any alternative that lets us ship a safe regex engine out of the box. All of this stuff is experimental so it is allowed to change during the 5.x line and

@nik9000
Copy link
Member

nik9000 commented Sep 9, 2016

Whatever we do, I'm happy to take the issue.

@clintongormley
Copy link
Author

A node level setting or something sent with the request like we're talking about for assert? If a node level setting then a dynamic one or a static one?

I think it should be a static node level setting, like the settings which enable/disable scripting.

Lucene's syntax is not a drop in replacement for java. It'd be nice if we only had a single regex syntax but I can see the allure as I know they are safe.

Yeah, the bugger is that the regex syntax is already used by Java rx. I don't know how feasible it would be to have /.../ to mean Lucene rx, but (eg) have java/.../ mean Java rx.

@nik9000
Copy link
Member

nik9000 commented Sep 9, 2016

Yeah, the bugger is that the regex syntax is already used by Java rx. I don't know how feasible it would be to have /.../ to mean Lucene rx, but (eg) have java/.../ mean Java rx.

I think because we've marked it experimental we should allow ourselves to do this if we think it is right. I don't think it'd be impossible to do something like that.

@nik9000 nik9000 self-assigned this Sep 9, 2016
@nik9000
Copy link
Member

nik9000 commented Sep 9, 2016

Ok - I've self assigned this. I'll make the setting soon-ish.

@jdconrad
Copy link
Contributor

jdconrad commented Sep 10, 2016

Just to note, here's the regex that @clintongormley mentioned that was causing the slowness
'xxxxxxxxxxxxxxxx'=~/(x*x*)+y/

@jdconrad
Copy link
Contributor

@rjernst I know you had some ideas about the way options should be implemented for languages... Curious to hear your thoughts since whatever option is added here would likely be how other options are done in the future as well (asserts, loop counter, etc.)

@nik9000
Copy link
Member

nik9000 commented Sep 12, 2016

I think this one has to be a node level non-dynamic setting because it is part of the sandbox protections. I'd love asserts to be a request level setting because it is part of the debugging process. I guess I just think those are different things.

@jdconrad
Copy link
Contributor

jdconrad commented Sep 12, 2016

^^ Agreed these are different. I realize that I have done all of zero settings work so I will defer on this to others that know that code better than I do. I guess there's always the concern that the setting will end up being different on different nodes and crazy things happen?

@nik9000
Copy link
Member

nik9000 commented Sep 12, 2016

I guess there's always the concern that the setting will end up being different on different nodes and crazy things happen?

I think this is a more general problem. It'd be nice if elasticsearch.yaml could contain just the connection information and you could use the API to configure all the settings but this doesn't play nice with settings that we don't want to be changed easily like this one. So far for those settings we've just made them non-dynamic so they must be read from elasticsearch.yaml and lived with the possibility that they differ on different nodes.

@rjernst
Copy link
Member

rjernst commented Sep 12, 2016

I don't view loop counter as something we need to "protect" with security. DOS isn't something we can really protect against (and there are already numerous other ways to do this on an open cluster, for example with nasty prefix queries).

As far as asserts or other debugging features, I don't think these should be flags at all. Instead, they should not be available in scripts for real apis (eg in score scripts etc), but we should have an api similar to _analyze where you can run a script once with some given input (and that input would depend on which context it is expected to be run in, for example, with a score script, you would have to pass in the doc variable essentially). In this api, we can always have debugging features available (eg asserts, logging, whatever).

@nik9000
Copy link
Member

nik9000 commented Sep 12, 2016

DOS isn't something we can really protect against (and there are already numerous other ways to do this on an open cluster, for example with nasty prefix queries).

I think the goal of issues like this one are to do more to protect against it.

@rjernst
Copy link
Member

rjernst commented Sep 12, 2016

I think the goal of issues like this one are to do more to protect against it.

I disagree. I think the goal of issues like this are to not have trappy features that perform poorly.

@nik9000
Copy link
Member

nik9000 commented Sep 12, 2016

I think the goal of issues like this are to not have trappy features that perform poorly.

If that were the goal then we'd expose this setting on the request level or at least make it dynamic. Or maybe remove regex support entirely. Instead @clintongormley wants this static so it can be tightly controlled by an administrator.

nik9000 added a commit to nik9000/elasticsearch that referenced this issue Sep 12, 2016
Adds a new node level, non-dynamic setting, `script.painless.regex.enabled`
can be used to enable regexes.

Closes elastic#20397
@nik9000 nik9000 reopened this Sep 12, 2016
@nik9000
Copy link
Member

nik9000 commented Sep 12, 2016

I've pushed #20427 to master which makes the non-dynamic setting we talk about in the top of the issue. I've reopened this because we're still hashing out if that is exactly what we want. If it turns out to be what we want we can close then. If not then we can implement it on top of the start I've made in #20427.

nik9000 added a commit that referenced this issue Sep 12, 2016
Adds a new node level, non-dynamic setting, `script.painless.regex.enabled`
can be used to enable regexes.

Closes #20397
nik9000 added a commit that referenced this issue Sep 12, 2016
Adds a new node level, non-dynamic setting, `script.painless.regex.enabled`
can be used to enable regexes.

Closes #20397
@dakrone
Copy link
Member

dakrone commented Sep 14, 2016

@nik9000 can we remove the blocker label from this since we've pushed something addressing this?

@nik9000 nik9000 removed the blocker label Sep 14, 2016
@nik9000
Copy link
Member

nik9000 commented Sep 14, 2016

I suspect we should, yeah.

@clintongormley
Copy link
Author

Given that the title of this issue has been dealt with, I'm going to close it. Further discussions can happen on their own tickets.

@clintongormley clintongormley added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache and removed :Plugin Lang Painless labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement v5.0.0-beta1
Projects
None yet
Development

No branches or pull requests

6 participants