Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Bigint shorthand (123n) for GMP objects #5930

Closed
wants to merge 1 commit into from

Conversation

TysonAndre
Copy link
Contributor

@TysonAndre TysonAndre commented Aug 3, 2020

(i.e. Arbitrary-Precision integers - https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic)

Motivations:

  • https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt
    was a similar approach another dynamically typed language recently took to
    support convenient arbitrary precision.

  • Supporting bigints as anything other than objects in PHP's type system
    seemed from the discussion thread to have several drawbacks:

    • Native bigints by default would cause a B.C. break for extensions or
      php user code relying on float behavior.
    • Decrease in performance
    • Require updating opcache's inferences to support big integers
  • GMP objects already overrides numeric operators.

Implementation: This effectively makes 123_456n
a shorthand for gmp_init('123456')

Related to https://externals.io/message/77863

TODO:

  1. Support LibTomMath or another library backend
    to implement http://php.net/gmp,
    (related to the original RFC PR thread)
    GMP is LGPL or GPL, which looks like it would cause issues with packagers. (similar to readline - https://externals.io/message/104383#104385)

  2. Make GMP always-on and use the C library LibTomMath
    instead of GMP by default instead, unless that is impossible/impractical.
    (It wouldn't make sense to me to have a special syntax for big integers
    that only worked some of the time)

    Disable any PHP functions that do not have equivalents in LibTomMath.

    (or make BigInt be a distinct class from GMP that always uses LibTomMath)

  3. Consider a class_alias to make BigInt an alias of GMP

TODO: Support hexadecimal literals and binary literals,
if there is interest in this.


Drawbacks:

  • Objects cannot be used as array keys("Illegal offset type")
    even if they define __toString(). So $array[$bigint] = value won't work,
    $array[(string)$bigint] would have to be used.

  • Can't be used in constant expressions (such as parameter defaults, class constants, property defaults, etc.)
    (objects are forbidden in the resulting value of constant expressions for all object types)

  • Many developers/users may want arbitrary precision for all integers
    in a future major version and for that to continue working with scalar type
    hints, and that would be incompatible with this proposal.

    However, there may also be objections to changing the default.
    It seems likely that keeping integers as finite precision would be
    useful for opcache and the JIT to continue to efficiently optimize code.

(i.e. Arbitrary-Precision integers)

Motivations:
+ https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt
  was a similar approach another dynamically typed language recently took to
  support convenient arbitrary precision.
+ Supporting bigints as anything other than objects in PHP's type system
  seemed from the discussion thread to have several drawbacks:

  - Native bigints by default would cause a B.C. break for extensions or
    php user code relying on float behavior.
  - Decrease in performance
  - Require updating opcache's inferences to support big integers
+ GMP objects already overrides numeric operators.

Implementation: This effectively makes `123_456n`
a shorthand for `gmp_init('123456')`

Related to https://externals.io/message/77863

TODO:
1. Support LibTomMath or another library backend
   to implement http://php.net/gmp,
   (related to the original RFC PR thread)
   GMP is LGPL, which would cause issues with packagers.
2. Make GMP always-on and use the C library LibTomMath
   instead of GMP by default instead, unless that is impossible/impractical.
   (It wouldn't make sense to me to have a special syntax for big integers
   that only worked some of the time)

   Disable any PHP functions that do not have equivalents in LibTomMath.

   (or make BigInt be a distinct class that always uses LibTomMath)
3. Consider a class_alias to make BigInt an alias of GMP

TODO: Support hexadecimal literals and binary literals,
if there is interest in this.

-------

Drawbacks:
+ Objects cannot be used as array keys("Illegal offset type")
  even if they define `__toString()`
+ Many developers/users may want arbitrary precision for all integers
  in a future major version and for that to continue working with scalar type
  hints, and that would be incompatible with this proposal.

  However, there may also be objections to changing the default.
  It seems likely that keeping integers as finite precision would be
  useful for opcache and the JIT to continue to efficiently optimize code.
@javiereguiluz
Copy link
Contributor

Random comment: I see that 123n notation is used by JavaScript ... but for consistency with our own "special number notation" (0b..., 0x... and 0...) we might consider 0n... for this big number notation.

@Girgias
Copy link
Member

Girgias commented Aug 4, 2020

I'm not sure how to feel about this, it does seem reasonable but we did remove support for hexadecimal strings in PHP 7: https://wiki.php.net/rfc/remove_hex_support_in_numeric_strings or am I misunderstanding the TODO comment?

@TysonAndre
Copy link
Contributor Author

I'm not sure how to feel about this, it does seem reasonable but we did remove support for hexadecimal strings in PHP 7: https://wiki.php.net/rfc/remove_hex_support_in_numeric_strings or am I misunderstanding the TODO comment?

is_numeric('0x123') was true in php 5 and that RFC made it false in php 7. This PR is unrelated to that. The TODO in this PR is just something I didn't bother implementing until getting feedback to see if this was worth pursuing - e.g. 0x10n and 0b1000n would be equivalent to 16n, which would be equivalent to gmp_init('16') but more readable.

Random comment: I see that 123n notation is used by JavaScript ... but for consistency with our own "special number notation" (0b..., 0x... and 0...) we might consider 0n... for this big number notation.

I'd prefer it as a suffix, so that hex, octal, binary, and decimal could be clearly used in a manner users are familiar with, but 0nx10 is definitely possible to implement but harder to read.

zend_bool is_octal = lnum[0] == '0';

/* Digits 8 and 9 are illegal in octal literals. */
if (is_octal) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should throw instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does throw (this is C, not C++), and is based on the adjacent ordinary LNUM parsing. zend_throw_exception(zend_ce_parse_error, "Invalid numeric literal", 0);

For token_get_all, the choice to not throw without the TOKEN_PARSE flag is deliberate - tools such as php-parser couldn't work if token_get_all threw instead of returning the T_ERROR token.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean to not allow octals in bigints at all - reject any input starting with 0 except 0n

Copy link
Contributor Author

@TysonAndre TysonAndre Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the token_get_all() implementation in ext/tokenizer/tokenizer.c clears this exception - I was wondering how it worked for ordinary numbers

		/* Normal token_get_all() should not throw. */
		zend_clear_exception();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean to not allow octals in bigints at all - reject any input starting with 0 except 0n

That makes no sense, the GMP extension accepts Hex, Octal and Binary numbers just fine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But usecases vs. possible mistakes? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd find it more consistent to allow any integer token to have a bigint equivalent by adding the suffix n to it.

Possibly use cases would be cryptography with known hex values, arbitrary-precision math based on octal/hex/binary reference material, etc - e.g. https://en.wikipedia.org/wiki/Mersenne_prime

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants