Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RegularExpressionQuotingRule #3252

Open
wants to merge 11 commits into
base: 1.11.x
Choose a base branch
from
Open

Conversation

staabm
Copy link
Contributor

@staabm staabm commented Jul 18, 2024

implements the idea described in phpstan/phpstan#11338 (comment)

closes phpstan/phpstan#11338

@staabm
Copy link
Contributor Author

staabm commented Jul 18, 2024

//cc @Seldaek

@staabm
Copy link
Contributor Author

staabm commented Jul 18, 2024

as pointed out by Jordi in phpstan/phpstan#11338 (comment) we could also think about introspecting every string-concatenation which contains a preg_quote function invocation.

that way this rule would also work for composer/pcre and nette/strings automatically.

this could report false-positives though when someone is building a regex out of several independent variables which get concatenated together and also brings the need to check every string concat in the code-base.


alternatively we could invoke the rule on all CallLike (or FuncCall|MethodCall) instead of just FuncCall.
that way we would at least cover calls to e.g. composer/pcre and nette/utils which use the preg_quote call directly in the call, like in

if (Preg::match('{fo'. preg_quote($s) .'+}', $string, $matches))
{ 
  // ...
} 

one last option could be to create a helper service which can be re-used from rules (similar to RegexArrayShapeMatcher - and implement separate new rules which wire the same underlying service.


Ondrej, what do you think about this use-case/implications?

@clxmstaab clxmstaab force-pushed the quote branch 2 times, most recently from d13063a to 18a2fbb Compare July 18, 2024 12:50
@staabm staabm marked this pull request as ready for review July 18, 2024 13:03
@phpstan-bot
Copy link
Collaborator

This pull request has been marked as ready for review.

@Seldaek
Copy link
Contributor

Seldaek commented Jul 18, 2024

this could report false-positives though when someone is building a regex out of several independent variables which get concatenated together and also brings the need to check every string concat in the code-base.

one last option could be to create a helper service which can be re-used from rules (similar to RegexArrayShapeMatcher - and implement separate new rules which wire the same underlying service.

On further consideration, because this is such a rare problem in real life (unless you stick to / for regex delimiters, but that's just a bad idea IMO additionally because we're in a web environment full of slashes that have to be escaped if they become special) I think it's maybe not worth too much complexity/effort/CPU cycles.

Just my 2c though :)

@staabm
Copy link
Contributor Author

staabm commented Jul 18, 2024

but wrong escaping - at best with user provided input - would be a security vulnerability, right?
(in case the end-user can influence the regex pattern, because of wrong escaping)

@Seldaek
Copy link
Contributor

Seldaek commented Jul 18, 2024

You could trigger a crash or in really bad cases maybe exploit something further down if you can break a regex validation or so but that's unsure how.. It's not like the good old days anymore where you had the e/execute modifier to call arbitrary functions anymore. That's gone as of php 7.0

@ondrejmirtes
Copy link
Member

Devil's advocate here: is this really a common mistake we should spend our time on?

Also: it'd be much more powerful to express this in the typesystem (but also a bit more complex). If we were able to carry "this is a preg_quoted string with such and such delimiter" in a subclass of ConstantStringType, we'd be able to check more code - that for example first saves the result of preg_quote in a variable and only then uses it in Concat in an argument to preg_match.

But other than that, I'm not sure even if this PR is worth it. I'd much rather have other existing functions like preg_match_all (shouldn't be too hard) or preg_replace_callback be aware of the awesome array shape inference from regexes :)

@staabm
Copy link
Contributor Author

staabm commented Jul 19, 2024

I don't have a strong opinion on this one. it was easy to implement and sounded useful.

I am ok with closing if you guys don't see value in it

@staabm
Copy link
Contributor Author

staabm commented Jul 19, 2024

But other than that, I'm not sure even if this PR is worth it. I'd much rather have other existing functions like preg_match_all (shouldn't be too hard) or preg_replace_callback be aware of the awesome array shape inference from regexes :)

its not forgotten - I promise ;-D

@staabm
Copy link
Contributor Author

staabm commented Jul 19, 2024

one last data-point, then I am silent

(unless you stick to / for regex delimiters, but that's just a bad idea IMO additionally because we're in a web environment full of slashes that have to be escaped if they become special)

at least on github.com we have ~650k files which use / as delimiter
https://github.com/search?q=%22preg_match%28%27%2F%22+language%3APHP&type=code&l=PHP

@Seldaek
Copy link
Contributor

Seldaek commented Jul 19, 2024

Yeah I know it's extremely common because some languages like JS have native regex types with / delimiters.. But just a terrible idea IMO

@ondrejmirtes
Copy link
Member

@Seldaek Sorry, I'm too slow. How is JS syntax relevant to quoting strings inside regexes?

@Seldaek
Copy link
Contributor

Seldaek commented Jul 19, 2024

I mean in JS you have to define regexes as /foo/, not as strings with any choice of delimiter. Also sed and probably other old tools propagated the use of / as a default delimiter.

That means / is a very common delimiter, but it is not escaped by default by preg_quote, as it is not a metacharacter. That makes it quite common IMO that people do preg_quote and forget to pass '/' as second argument. And if they do that they might experience crashes/problems at runtime if the user content contains /.

@staabm
Copy link
Contributor Author

staabm commented Jul 19, 2024

Should I rebase, or will it be closed?

@ondrejmirtes
Copy link
Member

Alright, I get it, so preg_quote($s) is always wrong because it doesn't even fit the /-delimited regex.

The problem most often isn't using the wrong delimiter '#' . preg_quote($s, '/') . '#' but using no delimiter in the call.

In that case this is definitely a useful addition!

src/Rules/Regexp/RegularExpressionQuotingRule.php Outdated Show resolved Hide resolved
src/Rules/Regexp/RegularExpressionQuotingRule.php Outdated Show resolved Hide resolved
src/Rules/Regexp/RegularExpressionQuotingRule.php Outdated Show resolved Hide resolved
src/Rules/Regexp/RegularExpressionQuotingRule.php Outdated Show resolved Hide resolved
src/Rules/Regexp/RegularExpressionQuotingRule.php Outdated Show resolved Hide resolved
@staabm
Copy link
Contributor Author

staabm commented Jul 20, 2024

btw: the new rule reports 7 real world bugs in the drupal CI :-)

src/Rules/Regexp/RegularExpressionQuotingRule.php Outdated Show resolved Hide resolved
}

if (!isset($normalizedArgs[1])) {
return RuleErrorBuilder::message(sprintf('Call to preg_quote() is missing delimiter %s to be effective.', $patternDelimiter))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't sound really true in context of check for delimiters which get properly escaped by default.

If the delimiter of the regex is one of those characters in the list, then a single-arg call to preg_quote is OK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code was not reached for the case mentioned.

I refactored the code to make it more reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants