Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP RFC: More precise float value handling #1455

Closed
wants to merge 11 commits into from

Conversation

6 participants
@yohgaki
Copy link
Contributor

yohgaki commented Aug 4, 2015

PHP RFC: More precise float value handling
https://wiki.php.net/rfc/precise_float_value

Magic number is better to be defined in a .h
Any suggestions?

TODO: Add/modify tests. Add PG(precision) 0 mode. Use PG(serialize_precision) for WDDX/XMLRPC.

How it works:

OLD Behavior:

MacBook-Pro:github-php-src yohgaki$ ./php-bin -d serialize_precision=17 -r '$a=["v"=>123450000000000000000000000000.1234567890123]; var_dump(json_encode($a), serialize($a), var_export($a, true));';
string(28) "{"v":1.2344999999999999e+29}"
string(39) "a:1:{s:1:"v";d:1.2344999999999999E+29;}"
string(42) "array (
  'v' => 1.2344999999999999E+29,
)"

NEW Behavior:

MacBook-Pro:github-php-src yohgaki$ ./php-bin -r '$a=["v"=>123450000000000000000000000000.1234567890123]; var_dump(json_encode($a), serialize($a), var_export($a, true));';
string(16) "{"v":1.2345e+29}"
string(27) "a:1:{s:1:"v";d:1.2345E+29;}"
string(30) "array (
  'v' => 1.2345E+29,
)"

yohgaki added some commits Aug 4, 2015

Initial patch for 0 mode float conversion. The magic number is better…
… to be improved. Any suggestion where to define it?
php_0cvt(d, 17, '.', 'e', num);
} else {
php_gcvt(d, JSON_G(precision), '.', 'e', num);
}

This comment has been minimized.

Copy link
@nikic

nikic Aug 5, 2015

Member

Instead of checking this everywhere, maybe just support ndigits=-1 in php_gcvt? This should allow us to directly use this mode in printf etc by passing -1 (or EG(serialize_precision)).

This comment has been minimized.

Copy link
@bukka

bukka Aug 5, 2015

Member

yeah definitely agree with that. it would be much better to handle that in php_gcvt. there is no need for ndigits (17 in php_0cvt) for mode 0 because it's not used as you can see here:

ndigits = 0;

so everything (checking -1) can be done in php_gcvt ... :)

This comment has been minimized.

Copy link
@yohgaki

yohgaki Aug 6, 2015

Author Contributor

OK. Sounds very reasonable. I'll change this.

Any comments fixed size buffer max definition? I just followed spprintf()'s code, but there are number of codes that calculate buffer size. I would like to use fixed size buffer for better performance.

This comment has been minimized.

Copy link
@bukka

bukka Aug 6, 2015

Member

Actually I was wrong and didn't look to php_gcvt properly (I just check zend_dtoa). It uses ndigits for selecting if the number should be printed in E-style so it's important:

if ((decpt >= 0 && decpt > ndigit) || decpt < -3) { /* use E-style */

I'm not sure that we should ignore that and also not sure if (-1) is a good way how to change mode because it means disallowing precision setting (selecting when the E-style is used). Maybe something like ini "serialization_precision_mode" would be better but it's maybe too many ini settings...

This comment has been minimized.

Copy link
@nikic

nikic Aug 6, 2015

Member

Good point, looks like dtoa doesn't directly deal with stuff like scientific notation. Given that the dtoa 0 mode (or precision -1 mode) is supposed to return an accurate result, I don't think it's necessary to have separate ini settings for this -- we should simply automatically choose an appropriate ndigits for these checks if it was -1 originally. IIRC ndigits=17 is enough for an accurace result and I don't think this will produce overlong results, but not sure on that count. This will need some testing...

*/
PHP_INI_BEGIN()
STD_PHP_INI_ENTRY("json.precision", "-1", PHP_INI_ALL, OnSetJsonPrecision, precision, zend_json_globals, json_globals)
PHP_INI_END()

This comment has been minimized.

Copy link
@nikic

nikic Aug 5, 2015

Member

Why does JSON have its own precision setting? Please use serialize_precision for this as well.

This comment has been minimized.

Copy link
@bukka

bukka Aug 5, 2015

Member

I'd actually prefer json precision as it's more flexible. the rfc also contains voting option for that so think it's fine to leave it (it can be removed if it's rejected though).

This comment has been minimized.

Copy link
@nikic

nikic Feb 24, 2016

Member

Sure it's more flexible, but imho we should have very strong reasons for introducing new ini settings. More than just "it might be useful to someone ... maybe". Do you have some particular use-case in mind where it would be important to print inaccurate JSON output?

If there are indeed cases where controlling the precision of JSON floats is of paramount importance, it might be preferable to add this as an option to json_encode rather than having a global setting for it.

This comment has been minimized.

Copy link
@bukka

bukka Jun 26, 2016

Member

Dropped in #1956

@laruence laruence added the RFC label Aug 6, 2015

@weltling

This comment has been minimized.

Copy link
Contributor

weltling commented Aug 10, 2015

I'd have two comments

Thanks.

@bukka

This comment has been minimized.

Copy link
Member

bukka commented Aug 12, 2015

I have been looking to it in the last week or so and done few tests. I created a repo for that testing the original dtoa and added few results to README:

https://github.com/bukka/dtoa

I think that due to IEEE 754, it shouldn't go over 17 on mode 0. You can also see that from few tests but it’s logical as double can’t contain more valid digits which is picked by the rounding algorithm. It doesn't actually matter too much as this is just for setting where we print exponential notation. It probably makes sense to use 17 to match the logic that is applied on mode 2 when 17 is used for serialization.

@weltling
The DBL_DIG would have to be incremented because it is rounded down and the result is 15 on most platforms. However 16 digits can be safely used. The thing is that you have good chance for 17 as well as you can see in the few tests. But all of that doesn’t matter here because it is completely ignored by mode 0. The only point here is where to choose exponential notation and when fixed-point notation.

About DBL_EPSILON, I'm not sure what you try to suggest? For mode 0, all rounding is handled by dtoa so we don't have to care about that at all. The algorithm is nicely explained in ‘‘How to Print Floating-Point Numbers Accurately,’’ from G. L. Steele and J. L. Or did I miss anything?

@weltling

This comment has been minimized.

Copy link
Contributor

weltling commented Aug 12, 2015

@bukka

However 16 digits can be safely used. The thing is that you have good chance for 17 as well as you can see in the few tests.

There are hard doubts on this statement. The behavior is tightly bound to the concrete hardware. Plus, some aspects can differ depending on whether 64- or 32-bit mode is running.

About DBL_EPSILON, I'm not sure what you try to suggest?

The RFC claims to implement "more precise float value handling overall". Thus i was not about dtoa, but about the direct comparison.

double a = .0000023452345234581702983740;
int is_zero = a < DBL_EPSILON;

Thanks.

@bukka

This comment has been minimized.

Copy link
Member

bukka commented Aug 13, 2015

@weltling

There are hard doubts on this statement. The behavior is tightly bound to the concrete hardware.
Plus, some aspects can differ depending on whether 64- or 32-bit mode is running.

What I meant by "most platforms" is of course all platforms conforming IEEE 754 (having binary64) which I believe are the most platforms. That should be exactly the same on 32-bit. Even wiki says the same :) https://en.wikipedia.org/wiki/Double-precision_floating-point_format

Anyway as I said it's irrelevant for mode 0 as dtoa handles all of that for us. The precision is just used in php_gcvt to decide whether to use an exponential or fixed-point notation.

The RFC claims to implement "more precise float value handling overall". Thus i was not about dtoa, but about the direct comparison.

It is actually just about using different mode or default precision passed to dtoa. It's in the proposal section ;) Maybe the first sentence of the RFC should be changed though... :)

@weltling

This comment has been minimized.

Copy link
Contributor

weltling commented Aug 13, 2015

@bukka

Ok, so if it's not improvement "overall", then i was wondering for nothing why the epsilon part is missing :)

But with DBL_DIG - yes, i was referring to your exact statement which was insufficient. Still having hardcoded 17 where it should be like zero or precisely DBL_DIG is confusing.

Btw. _php_cvt() is unlikely to be inlined, but probably a justified sacrifice for not breaking ABI.

Thanks.

if (precision < 0)
precision = 0;
if (precision < -1)
precision = -1;

This comment has been minimized.

Copy link
@nikic

nikic Feb 24, 2016

Member

I don't think it's quite as simple as that. In particular precision is used for more than just floating point numbers. E.g. if you do something like ("%.*s", -1, str) it would try to use -1 as the string length. I think it would be best to drop the validation here and instead perform for the individual format specifiers.

Furthermore this is just spprintf -- we have similar code in snprintf as well and maybe a few other places.

This comment has been minimized.

Copy link
@bukka

bukka Jun 26, 2016

Member

Thin this was removed already before by Yasuo

if (mode == 0) {
ndigit = 17;
}
digits = zend_dtoa(value, mode, ndigit, &decpt, &sign, NULL);

This comment has been minimized.

Copy link
@nikic

nikic Feb 24, 2016

Member

Not everything goes though php_gcvt. Did you check how precision=-1 will work for php_fcvt/php_ecvt (and how it should work)? I think the RFC should also be more clear on how this change interacts with both userland and internal printf functions.

This comment has been minimized.

Copy link
@bukka

bukka Jun 26, 2016

Member

I have been checking that and haven't found any case that would be using EG(precision) and either 'f', 'F', 'e' or 'E' format which is the only place that calls php_conv_fp - the func that calls php_fcvt/php_ecvt. Of course I might have missed it so pls let me know if you are aware of any such case.

@php-pulls

This comment has been minimized.

Copy link

php-pulls commented Jun 26, 2016

Comment on behalf of bukka at php.net:

Superseded by #1956

@php-pulls php-pulls closed this Jun 26, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.