PHP RFC: More precise float value handling #1455

yohgaki · 2015-08-04T23:20:46Z

PHP RFC: More precise float value handling
https://wiki.php.net/rfc/precise_float_value

Magic number is better to be defined in a .h
Any suggestions?

TODO: Add/modify tests. Add PG(precision) 0 mode. Use PG(serialize_precision) for WDDX/XMLRPC.

How it works:

OLD Behavior:

MacBook-Pro:github-php-src yohgaki$ ./php-bin -d serialize_precision=17 -r '$a=["v"=>123450000000000000000000000000.1234567890123]; var_dump(json_encode($a), serialize($a), var_export($a, true));';
string(28) "{"v":1.2344999999999999e+29}"
string(39) "a:1:{s:1:"v";d:1.2344999999999999E+29;}"
string(42) "array (
  'v' => 1.2344999999999999E+29,
)"

NEW Behavior:

MacBook-Pro:github-php-src yohgaki$ ./php-bin -r '$a=["v"=>123450000000000000000000000000.1234567890123]; var_dump(json_encode($a), serialize($a), var_export($a, true));';
string(16) "{"v":1.2345e+29}"
string(27) "a:1:{s:1:"v";d:1.2345E+29;}"
string(30) "array (
  'v' => 1.2345E+29,
)"

… to be improved. Any suggestion where to define it?

nikic · 2015-08-05T12:56:06Z

ext/json/json_encoder.c

+		php_0cvt(d, 17, '.', 'e', num);
+	} else {
+		php_gcvt(d, JSON_G(precision), '.', 'e', num);
+	}


Instead of checking this everywhere, maybe just support ndigits=-1 in php_gcvt? This should allow us to directly use this mode in printf etc by passing -1 (or EG(serialize_precision)).

yeah definitely agree with that. it would be much better to handle that in php_gcvt. there is no need for ndigits (17 in php_0cvt) for mode 0 because it's not used as you can see here:

php-src/Zend/zend_strtod.c

Line 3940 in f6d196a

ndigits = 0;

so everything (checking -1) can be done in php_gcvt ... :)

OK. Sounds very reasonable. I'll change this.

Any comments fixed size buffer max definition? I just followed spprintf()'s code, but there are number of codes that calculate buffer size. I would like to use fixed size buffer for better performance.

Actually I was wrong and didn't look to php_gcvt properly (I just check zend_dtoa). It uses ndigits for selecting if the number should be printed in E-style so it's important:

php-src/main/snprintf.c

Line 163 in f6d196a

if ((decpt >= 0 && decpt > ndigit) || decpt < -3) { /* use E-style */

I'm not sure that we should ignore that and also not sure if (-1) is a good way how to change mode because it means disallowing precision setting (selecting when the E-style is used). Maybe something like ini "serialization_precision_mode" would be better but it's maybe too many ini settings...

Good point, looks like dtoa doesn't directly deal with stuff like scientific notation. Given that the dtoa 0 mode (or precision -1 mode) is supposed to return an accurate result, I don't think it's necessary to have separate ini settings for this -- we should simply automatically choose an appropriate ndigits for these checks if it was -1 originally. IIRC ndigits=17 is enough for an accurace result and I don't think this will produce overlong results, but not sure on that count. This will need some testing...

weltling · 2015-08-10T09:32:14Z

I'd have two comments

not a hardcoded value, but a constant should be used for default precision. Fe see https://msdn.microsoft.com/en-us/library/6bs3y5ya.aspx DBL_DIG
to round up the title, the comparison of doubles using DBL_EPSILON should be implemented

Thanks.

bukka · 2015-08-12T18:36:48Z

I have been looking to it in the last week or so and done few tests. I created a repo for that testing the original dtoa and added few results to README:

https://github.com/bukka/dtoa

I think that due to IEEE 754, it shouldn't go over 17 on mode 0. You can also see that from few tests but it’s logical as double can’t contain more valid digits which is picked by the rounding algorithm. It doesn't actually matter too much as this is just for setting where we print exponential notation. It probably makes sense to use 17 to match the logic that is applied on mode 2 when 17 is used for serialization.

@weltling
The DBL_DIG would have to be incremented because it is rounded down and the result is 15 on most platforms. However 16 digits can be safely used. The thing is that you have good chance for 17 as well as you can see in the few tests. But all of that doesn’t matter here because it is completely ignored by mode 0. The only point here is where to choose exponential notation and when fixed-point notation.

About DBL_EPSILON, I'm not sure what you try to suggest? For mode 0, all rounding is handled by dtoa so we don't have to care about that at all. The algorithm is nicely explained in ‘‘How to Print Floating-Point Numbers Accurately,’’ from G. L. Steele and J. L. Or did I miss anything?

weltling · 2015-08-12T22:31:08Z

@bukka

However 16 digits can be safely used. The thing is that you have good chance for 17 as well as you can see in the few tests.

There are hard doubts on this statement. The behavior is tightly bound to the concrete hardware. Plus, some aspects can differ depending on whether 64- or 32-bit mode is running.

About DBL_EPSILON, I'm not sure what you try to suggest?

The RFC claims to implement "more precise float value handling overall". Thus i was not about dtoa, but about the direct comparison.

double a = .0000023452345234581702983740;
int is_zero = a < DBL_EPSILON;

Thanks.

bukka · 2015-08-13T19:46:27Z

@weltling

There are hard doubts on this statement. The behavior is tightly bound to the concrete hardware.
Plus, some aspects can differ depending on whether 64- or 32-bit mode is running.

What I meant by "most platforms" is of course all platforms conforming IEEE 754 (having binary64) which I believe are the most platforms. That should be exactly the same on 32-bit. Even wiki says the same :) https://en.wikipedia.org/wiki/Double-precision_floating-point_format

Anyway as I said it's irrelevant for mode 0 as dtoa handles all of that for us. The precision is just used in php_gcvt to decide whether to use an exponential or fixed-point notation.

The RFC claims to implement "more precise float value handling overall". Thus i was not about dtoa, but about the direct comparison.

It is actually just about using different mode or default precision passed to dtoa. It's in the proposal section ;) Maybe the first sentence of the RFC should be changed though... :)

weltling · 2015-08-13T22:29:24Z

@bukka

Ok, so if it's not improvement "overall", then i was wondering for nothing why the epsilon part is missing :)

But with DBL_DIG - yes, i was referring to your exact statement which was insufficient. Still having hardcoded 17 where it should be like zero or precisely DBL_DIG is confusing.

Btw. _php_cvt() is unlikely to be inlined, but probably a justified sacrifice for not breaking ABI.

Thanks.

…preciated

nikic · 2016-02-24T12:53:00Z

main/spprintf.c

-						if (precision < 0)
-							precision = 0;
+						if (precision < -1)
+							precision = -1;


I don't think it's quite as simple as that. In particular precision is used for more than just floating point numbers. E.g. if you do something like ("%.*s", -1, str) it would try to use -1 as the string length. I think it would be best to drop the validation here and instead perform for the individual format specifiers.

Furthermore this is just spprintf -- we have similar code in snprintf as well and maybe a few other places.

Thin this was removed already before by Yasuo

php-pulls · 2016-06-26T13:51:06Z

Comment on behalf of bukka at php.net:

Superseded by #1956

yohgaki added 2 commits August 5, 2015 08:12

Initial patch for 0 mode float conversion. The magic number is better…

71ef7c1

… to be improved. Any suggestion where to define it?

Add JSON_G(precision)

08ac858

nikic reviewed Aug 5, 2015
View reviewed changes

laruence added the RFC label Aug 6, 2015

yohgaki and others added 9 commits August 30, 2015 17:13

Simply use ndigit for flag for zend_dtoa mode

0e32bc7

Enable 0 mode for echo/print

75eb88c

Add cast

64cc35b

Fix mode when precision=0. Add test

7fc62f5

Remove unneeded output from test

9e61d2f

Remove unneeded WS change

01e59c6

Add better value to test

d713c7b

Merge branch 'master' into master-precise-float

3af3348

Avoid magic number. NUM_BUF_SIZE may be in header. Suggestions are ap…

baf3207

…preciated

nikic reviewed Feb 24, 2016
View reviewed changes

bukka mentioned this pull request Jun 26, 2016

RFC: Precise float values handling #1956

Merged

php-pulls closed this Jun 26, 2016

bukka mentioned this pull request Feb 2, 2023

Bring minimum precision inline with spprintf #10475

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PHP RFC: More precise float value handling #1455

PHP RFC: More precise float value handling #1455

Uh oh!

yohgaki commented Aug 4, 2015

Uh oh!

nikic Aug 5, 2015

Uh oh!

bukka Aug 5, 2015

Uh oh!

yohgaki Aug 6, 2015

Uh oh!

bukka Aug 6, 2015

Uh oh!

nikic Aug 6, 2015

Uh oh!

weltling commented Aug 10, 2015

Uh oh!

bukka commented Aug 12, 2015

Uh oh!

weltling commented Aug 12, 2015

Uh oh!

bukka commented Aug 13, 2015

Uh oh!

weltling commented Aug 13, 2015

Uh oh!

nikic Feb 24, 2016

Uh oh!

bukka Jun 26, 2016

Uh oh!

php-pulls commented Jun 26, 2016

Uh oh!

Uh oh!

PHP RFC: More precise float value handling #1455

PHP RFC: More precise float value handling #1455

Uh oh!

Conversation

yohgaki commented Aug 4, 2015

Uh oh!

nikic Aug 5, 2015

Choose a reason for hiding this comment

Uh oh!

bukka Aug 5, 2015

Choose a reason for hiding this comment

Uh oh!

yohgaki Aug 6, 2015

Choose a reason for hiding this comment

Uh oh!

bukka Aug 6, 2015

Choose a reason for hiding this comment

Uh oh!

nikic Aug 6, 2015

Choose a reason for hiding this comment

Uh oh!

weltling commented Aug 10, 2015

Uh oh!

bukka commented Aug 12, 2015

Uh oh!

weltling commented Aug 12, 2015

Uh oh!

bukka commented Aug 13, 2015

Uh oh!

weltling commented Aug 13, 2015

Uh oh!

nikic Feb 24, 2016

Choose a reason for hiding this comment

Uh oh!

bukka Jun 26, 2016

Choose a reason for hiding this comment

Uh oh!

php-pulls commented Jun 26, 2016

Uh oh!

Uh oh!