Skip to content

makes mb_ereg clear the $regs argument on failure #2046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ju1ius
Copy link
Contributor

@ju1ius ju1ius commented Jul 30, 2016

When mb_ereg failed to match, it didn't update the $regs argument.
Now it will always set it to the empty array.

Fixes bug #72711

When `mb_ereg` failed to match, it didn't update the `$regs` argument.
Now it will always set it to the empty array.

Fixes bug #72711
@@ -49,7 +49,8 @@ With $regs arg:

Warning: mb_ereg(): empty pattern in %s on line %d
bool(false)
NULL
array(0) {
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, that change (yield empty array instead of null) makes sense, but I'm not sure whether it should be part of this bug fix. Sticking with the current behavior might avoid BC breaks.

Copy link
Contributor Author

@ju1ius ju1ius Jul 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmb69 The function did not yield null, it just didn't modify the zval you passed as an argument when the match fails. If it was a string it stays a string, if it was not defined it will be null, etc...

This behavior was not consitent with preg_match, was highly counter-intuitive and looked like a bug to me:

$pattern = '(a)\1';
mb_ereg($pattern, 'aa', $matches);
// $matches = ['aa', 'a']
mb_ereg($pattern, 'WTF?', $matches);
// $matches = ['aa', 'a'] <-- obviously, this ain't right...

preg_match($pattern, 'aa', $matches);
// $matches = ['aa', 'a']
preg_match($pattern, 'WTF?', $matches);
// $matches = [] <-- that's more like it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But of course, I can rebase on another branch if it's too much a BC break.
Just tell me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function did not yield null, it just didn't modify the zval you passed as an argument when the match fails. If it was a string it stays a string, if it was not defined it will be null, etc...

Ah, now I understand! Thanks.

I agree that the behavior of preg_match() makes more sense, but mb_ereg() behaves like ereg(). Changing the behavior of the former, but not that of the latter, would be inconsistent. However, I don't think changing ereg() makes sense, as it has been deprecated as of PHP 7.0.0.

Existing code using (mb_)ereg() falls into two categories: code that does properly respect the current behavior (either by setting $regs to an explicit default before calling (mb_)ereg(), or by checking the return value of (mb_)ereg() before accessing $regs), and code that doesn't. The suggested change would only affect the latter, so one might argue that the BC break is acceptable (we might even fix such broken code).

So, I'm not sure what to do here (except wrt. the docs, which have to be fixed).

What do others think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think changing ereg() makes sense, as it has been deprecated as of PHP 7.0.0.

Actually, it had already been deprecated as of PHP 5.3.0, and removed as of PHP 7.0.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nonetheless, the type declaration of the $regs parameter in the function prototype is array, what doesn't match the current behavior. However, the function prototype doesn't match the function's behavior anyway, what can be observed with strict_types=1.

@cmb69 Sorry I don't understand exactly what you mean. Should I close this PR or do you think it is useful ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I close this PR or do you think it is useful ?

I still think it is useful, as I prefer the behavior it introduces (same as preg_match()), and as it would fix the issue with the currently wrong function prototype (array $regs). However, firstly I considered it important to fix the docs (which I considered badly written), and secondly I'm still unsure to which PHP version this PR should be applied to (I would suggest at least master, i.e. PHP 7.2, but maybe even targetting PHP 5.6+ is appropriate).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, firstly I considered it important to fix the docs

Totally agree on that. I was thinking myself of adding an entire chapter dedicated to the Oniguruma pattern syntax and the similarities/differences with PCRE.
Do you think I should start ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think I should start ?

Yes, please! :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please! :-)

@cmb69 Okay I'm on it! 😉

@staabm
Copy link
Contributor

staabm commented Jul 31, 2016

Should mb_ereg be removed/deprecated like ereg was?

@cmb69
Copy link
Member

cmb69 commented Jul 31, 2016

Should mb_ereg be removed/deprecated like ereg was?

I don't think so. mb_ereg() does support several multibyte encodings, but preg_match() only supports UTF-8, so one would have to temporarily convert to UTF-8 for regexps matching.

@ju1ius
Copy link
Contributor Author

ju1ius commented Jul 31, 2016

Should mb_ereg be removed/deprecated like ereg was?

@staabm The naming is indeed infortunate. Combined to the rather spartan documentation, it leads people (as I did) into thinking the mb_ereg* family is a kind of underpowered version of ereg, for when you need to match those japanese characters.
Nothing could be farther from the truth. The regexp engine used by the mbstring extension is Oniguruma, used by the Ruby language, the SublimeText, TextMate, Atom editors among others.
It is just as good as PCRE and supports 30 different encodings, not just UTF-8.
Also PCRE's UTF-8 mode has a severe limitation: it is not memory safe if the string contains invalid UTF-8 sequences. Therefore it has to internally validate the encoding of the entire string before even being able to start matching. Which means using preg_match repeatedly on the same input string in utf-8 mode is an O(n^2) operation. More on the subject in #2035.

:trollface::trollface::trollface::trollface:
So if you're like me and think that, in 2016, having a default regexp engine that's neither multibyte-aware nor able to perform well on UTF-8 is a problem, you could in fact argue that the preg_ functions should be deprecated in favour of mb_ereg.
:trollface::trollface::trollface::trollface:

But that's not directly related to this issue 😉

@php-pulls
Copy link

Comment on behalf of cmb at php.net:

I've merged this PR into PHP 7.1+ now. I'm still not comfortable with changing the behavior for older versions, but 7.1 seems to be fine as it's still in beta phase.

@php-pulls php-pulls closed this Aug 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants