Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect regex for floating point numbers #2711

Closed
diesirae68 opened this issue Aug 26, 2023 · 11 comments
Closed

Incorrect regex for floating point numbers #2711

diesirae68 opened this issue Aug 26, 2023 · 11 comments
Labels
bug Documentation contains incorrect information good first issue Good for newcomers Status: Verified

Comments

@diesirae68
Copy link
Contributor

diesirae68 commented Aug 26, 2023

From manual page: https://www.php.net/manual/en/language.types.float.php
Hi! I think the regex for floating point numbers is wrong.

Please, replace

DNUM ([0-9]*(_[0-9]+)*[\.]{LNUM}) | ({LNUM}[\.][0-9]*(_[0-9]+)*)

with

DNUM (([0-9]+(_[0-9]+)*)?\.{LNUM}) | ({LNUM}\.([0-9]+(_[0-9]+)*)?)

The regex specified in the documentation accepts such numbers as correct, but they are incorrect:
_123.456
123._456

And need to fix this article in all languages. Thanx!

@Girgias Girgias added bug Documentation contains incorrect information good first issue Good for newcomers Status: Verified labels Aug 30, 2023
Mayuresh-22 added a commit to Mayuresh-22/doc-en that referenced this issue Sep 2, 2023
@nathansalter
Copy link

I actually think the REGEX can be simplified further to this:

LNUM = [0-9](_?[0-9]+)*
DNUM = [+-]?{LNUM}(\.{LNUM})?
EXPONENT_DNUM = {DNUM}[Ee]{DNUM}

Which might make it a bit easier to read

@Girgias
Copy link
Member

Girgias commented Nov 9, 2023

I actually think the REGEX can be simplified further to this:

LNUM = [0-9](_?[0-9]+)*
DNUM = [+-]?{LNUM}(\.{LNUM})?
EXPONENT_DNUM = {DNUM}[Ee]{DNUM}

Which might make it a bit easier to read

This doesn't seem to accept -.5 as a valid float.

@diesirae68
Copy link
Contributor Author

I actually think the REGEX can be simplified further to this:

LNUM = [0-9](_?[0-9]+)*
DNUM = [+-]?{LNUM}(\.{LNUM})?
EXPONENT_DNUM = {DNUM}[Ee]{DNUM}

Which might make it a bit easier to read

This doesn't seem to accept -.5 as a valid float.

You're right. But [+-]?(({LNUM})?\.{LNUM})|({LNUM}\.({LNUM})?) works correctly (this is a shorter expression than I originally wrote).

You can check this code:

$LNUM = '[0-9]+(_[0-9]+)*';
$DNUM = "[+-]?(({$LNUM})?\.{$LNUM})|({$LNUM}\.({$LNUM})?)";
var_dump(preg_match("/^{$DNUM}$/", '-.5'));

@Girgias
Copy link
Member

Girgias commented Nov 22, 2023

This looks correct, can you make a PR?

@thegallagher
Copy link

Sorry to jump in here at the last minute, but I'd like to offer an alternative opinion on this.

I believe the purpose of this section of the documentation is to describe the structure of a PHP float literal. The description doesn't need to be useful for other purposes (eg. input validation) and it doesn't need to be a valid PCRE regular expression.

The grammar used by the PHP lexer is:

LNUM            [0-9]+(_[0-9]+)*
DNUM            ({LNUM}?"."{LNUM})|({LNUM}"."{LNUM}?)
EXPONENT_DNUM   (({LNUM}|{DNUM})[eE][+-]?{LNUM})

The above grammar or something derived from it should be used to correct the documentation.

While obviously correct, it is unclear:

  • that both DNUM and EXPONENT_DNUM define complete float literals
  • that LNUM defines only part of a float literal
  • what LNUM and DNUM are

To clarify the above grammar and for consistency with the integer documentation, I offer the following alternative:

digits     : [0-9]+(_[0-9]+)*

decimal    : (digits? "." digits)
           | (digits "." digits?)

scientific : (digits | decimal) [eE][+-]? digits

float      : decimal
           | scientific

For me, the above grammar is easier to read and therefore better achieves the purpose I've described for the documentation, but I'd like to hear other thoughts.

@diesirae68
Copy link
Contributor Author

Sorry to jump in here at the last minute, but I'd like to offer an alternative opinion on this.

I believe the purpose of this section of the documentation is to describe the structure of a PHP float literal. The description doesn't need to be useful for other purposes (eg. input validation) and it doesn't need to be a valid PCRE regular expression.

The grammar used by the PHP lexer is:

LNUM            [0-9]+(_[0-9]+)*
DNUM            ({LNUM}?"."{LNUM})|({LNUM}"."{LNUM}?)
EXPONENT_DNUM   (({LNUM}|{DNUM})[eE][+-]?{LNUM})

The above grammar or something derived from it should be used to correct the documentation.

While obviously correct, it is unclear:

  • that both DNUM and EXPONENT_DNUM define complete float literals
  • that LNUM defines only part of a float literal
  • what LNUM and DNUM are

To clarify the above grammar and for consistency with the integer documentation, I offer the following alternative:

digits     : [0-9]+(_[0-9]+)*

decimal    : (digits? "." digits)
           | (digits "." digits?)

scientific : (digits | decimal) [eE][+-]? digits

float      : decimal
           | scientific

For me, the above grammar is easier to read and therefore better achieves the purpose I've described for the documentation, but I'd like to hear other thoughts.

Your description of the floating point number format would be ideal for documentation. I would choose this description.

@Girgias
Copy link
Member

Girgias commented Nov 24, 2023

People, if you have suggestions, you can just send a PR.

@diesirae68
Copy link
Contributor Author

People, if you have suggestions, you can just send a PR.

Done! :)

@diesirae68
Copy link
Contributor Author

Do I need to submit a PR for documentation in other languages?

@Girgias
Copy link
Member

Girgias commented Nov 27, 2023

Do I need to submit a PR for documentation in other languages?

You can, but this change is tracked via our RevCheck for all the languages via: http://doc.php.net

@Girgias
Copy link
Member

Girgias commented Nov 27, 2023

Fixed via #2967

@Girgias Girgias closed this as completed Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Documentation contains incorrect information good first issue Good for newcomers Status: Verified
Projects
None yet
Development

No branches or pull requests

4 participants