[Proposal] Bigint shorthand (123n) for GMP objects #5930

TysonAndre · 2020-08-03T20:10:42Z

(i.e. Arbitrary-Precision integers - https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic)

Motivations:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt
was a similar approach another dynamically typed language recently took to
support convenient arbitrary precision.
Supporting bigints as anything other than objects in PHP's type system
seemed from the discussion thread to have several drawbacks:
- Native bigints by default would cause a B.C. break for extensions or
  php user code relying on float behavior.
- Decrease in performance
- Require updating opcache's inferences to support big integers
GMP objects already overrides numeric operators.

Implementation: This effectively makes 123_456n
a shorthand for gmp_init('123456')

Related to https://externals.io/message/77863

TODO:

Support LibTomMath or another library backend
to implement http://php.net/gmp,
(related to the original RFC PR thread)
GMP is LGPL or GPL, which looks like it would cause issues with packagers. (similar to readline - https://externals.io/message/104383#104385)
Make GMP always-on and use the C library LibTomMath
instead of GMP by default instead, unless that is impossible/impractical.
(It wouldn't make sense to me to have a special syntax for big integers
that only worked some of the time)

Disable any PHP functions that do not have equivalents in LibTomMath.

(or make BigInt be a distinct class from GMP that always uses LibTomMath)
Consider a class_alias to make BigInt an alias of GMP

TODO: Support hexadecimal literals and binary literals,
if there is interest in this.

Drawbacks:

Objects cannot be used as array keys("Illegal offset type")
even if they define __toString(). So $array[$bigint] = value won't work,
$array[(string)$bigint] would have to be used.
Can't be used in constant expressions (such as parameter defaults, class constants, property defaults, etc.)
(objects are forbidden in the resulting value of constant expressions for all object types)
Many developers/users may want arbitrary precision for all integers
in a future major version and for that to continue working with scalar type
hints, and that would be incompatible with this proposal.

However, there may also be objections to changing the default.
It seems likely that keeping integers as finite precision would be
useful for opcache and the JIT to continue to efficiently optimize code.

(i.e. Arbitrary-Precision integers) Motivations: + https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt was a similar approach another dynamically typed language recently took to support convenient arbitrary precision. + Supporting bigints as anything other than objects in PHP's type system seemed from the discussion thread to have several drawbacks: - Native bigints by default would cause a B.C. break for extensions or php user code relying on float behavior. - Decrease in performance - Require updating opcache's inferences to support big integers + GMP objects already overrides numeric operators. Implementation: This effectively makes `123_456n` a shorthand for `gmp_init('123456')` Related to https://externals.io/message/77863 TODO: 1. Support LibTomMath or another library backend to implement http://php.net/gmp, (related to the original RFC PR thread) GMP is LGPL, which would cause issues with packagers. 2. Make GMP always-on and use the C library LibTomMath instead of GMP by default instead, unless that is impossible/impractical. (It wouldn't make sense to me to have a special syntax for big integers that only worked some of the time) Disable any PHP functions that do not have equivalents in LibTomMath. (or make BigInt be a distinct class that always uses LibTomMath) 3. Consider a class_alias to make BigInt an alias of GMP TODO: Support hexadecimal literals and binary literals, if there is interest in this. ------- Drawbacks: + Objects cannot be used as array keys("Illegal offset type") even if they define `__toString()` + Many developers/users may want arbitrary precision for all integers in a future major version and for that to continue working with scalar type hints, and that would be incompatible with this proposal. However, there may also be objections to changing the default. It seems likely that keeping integers as finite precision would be useful for opcache and the JIT to continue to efficiently optimize code.

javiereguiluz · 2020-08-04T08:54:14Z

Random comment: I see that 123n notation is used by JavaScript ... but for consistency with our own "special number notation" (0b..., 0x... and 0...) we might consider 0n... for this big number notation.

Girgias · 2020-08-04T10:41:24Z

I'm not sure how to feel about this, it does seem reasonable but we did remove support for hexadecimal strings in PHP 7: https://wiki.php.net/rfc/remove_hex_support_in_numeric_strings or am I misunderstanding the TODO comment?

TysonAndre · 2020-08-05T03:58:43Z

I'm not sure how to feel about this, it does seem reasonable but we did remove support for hexadecimal strings in PHP 7: https://wiki.php.net/rfc/remove_hex_support_in_numeric_strings or am I misunderstanding the TODO comment?

is_numeric('0x123') was true in php 5 and that RFC made it false in php 7. This PR is unrelated to that. The TODO in this PR is just something I didn't bother implementing until getting feedback to see if this was worth pursuing - e.g. 0x10n and 0b1000n would be equivalent to 16n, which would be equivalent to gmp_init('16') but more readable.

Random comment: I see that 123n notation is used by JavaScript ... but for consistency with our own "special number notation" (0b..., 0x... and 0...) we might consider 0n... for this big number notation.

I'd prefer it as a suffix, so that hex, octal, binary, and decimal could be clearly used in a manner users are familiar with, but 0nx10 is definitely possible to implement but harder to read.

mvorisek · 2020-08-06T18:24:36Z

Zend/zend_language_scanner.l

+	zend_bool is_octal = lnum[0] == '0';
+
+	/* Digits 8 and 9 are illegal in octal literals. */
+	if (is_octal) {


we should throw instead

This does throw (this is C, not C++), and is based on the adjacent ordinary LNUM parsing. zend_throw_exception(zend_ce_parse_error, "Invalid numeric literal", 0);

For token_get_all, the choice to not throw without the TOKEN_PARSE flag is deliberate - tools such as php-parser couldn't work if token_get_all threw instead of returning the T_ERROR token.

I mean to not allow octals in bigints at all - reject any input starting with 0 except 0n

And the token_get_all() implementation in ext/tokenizer/tokenizer.c clears this exception - I was wondering how it worked for ordinary numbers

/* Normal token_get_all() should not throw. */ zend_clear_exception();

I mean to not allow octals in bigints at all - reject any input starting with 0 except 0n

That makes no sense, the GMP extension accepts Hex, Octal and Binary numbers just fine

But usecases vs. possible mistakes? :)

I'd find it more consistent to allow any integer token to have a bigint equivalent by adding the suffix n to it.

Possibly use cases would be cryptography with known hex values, arbitrary-precision math based on octal/hex/binary reference material, etc - e.g. https://en.wikipedia.org/wiki/Mersenne_prime

mvorisek reviewed Aug 6, 2020

View reviewed changes

cmb69 added the RFC label Aug 21, 2020

TysonAndre closed this Sep 18, 2021

TysonAndre mentioned this pull request Feb 14, 2022

Unify integer range across platforms #8016

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Proposal] Bigint shorthand (123n) for GMP objects #5930

[Proposal] Bigint shorthand (123n) for GMP objects #5930

Uh oh!

TysonAndre commented Aug 3, 2020 •

edited

Loading

Uh oh!

javiereguiluz commented Aug 4, 2020

Uh oh!

Girgias commented Aug 4, 2020

Uh oh!

TysonAndre commented Aug 5, 2020

Uh oh!

mvorisek Aug 6, 2020

Uh oh!

TysonAndre Aug 6, 2020

Uh oh!

mvorisek Aug 6, 2020

Uh oh!

TysonAndre Aug 6, 2020 •

edited

Loading

Uh oh!

Girgias Aug 6, 2020

Uh oh!

mvorisek Aug 6, 2020

Uh oh!

TysonAndre Aug 6, 2020

Uh oh!

Uh oh!

[Proposal] Bigint shorthand (123n) for GMP objects #5930

[Proposal] Bigint shorthand (123n) for GMP objects #5930

Uh oh!

Conversation

TysonAndre commented Aug 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

javiereguiluz commented Aug 4, 2020

Uh oh!

Girgias commented Aug 4, 2020

Uh oh!

TysonAndre commented Aug 5, 2020

Uh oh!

mvorisek Aug 6, 2020

Choose a reason for hiding this comment

Uh oh!

TysonAndre Aug 6, 2020

Choose a reason for hiding this comment

Uh oh!

mvorisek Aug 6, 2020

Choose a reason for hiding this comment

Uh oh!

TysonAndre Aug 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Girgias Aug 6, 2020

Choose a reason for hiding this comment

Uh oh!

mvorisek Aug 6, 2020

Choose a reason for hiding this comment

Uh oh!

TysonAndre Aug 6, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TysonAndre commented Aug 3, 2020 •

edited

Loading

TysonAndre Aug 6, 2020 •

edited

Loading