Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float fields and sanitizer do not handle scientific notation #1502

Closed
MetaTunes opened this issue Jan 7, 2022 · 18 comments
Closed

Float fields and sanitizer do not handle scientific notation #1502

MetaTunes opened this issue Jan 7, 2022 · 18 comments

Comments

@MetaTunes
Copy link

MetaTunes commented Jan 7, 2022

Short description of the issue

$sanitizer->float() does not handle scientific notation even though scientific notation is rendered in InputfieldFloat.

Expected behavior

For example, $sanitzer->float('-1.253E-5') should just return the value unchanged (but as a float , not a string).

Actual behavior

$sanitizer->float('-1.253E-5') returns -1.2535. Entering scientific notation in a float field can therefore result in a grossly inaccurate result. This is particularly an issue since $sanitizer->float(-0.00001253) returns -1.253E-5 so if you enter a number like -0.00001253 into a float field and save it twice, you get -1.2535.

Screenshots/Links that demonstrate the issue

float2

Suggestion for a possible fix

Various regex matches in Sanitizer.php remove any non-numeric characters (lines 3950, 3955, 3957).
For example, the regex in line 3957 - preg_replace('/[^0-9]/', '', substr($str, $pos + 1)); removes any non-numeric character and thus the exponent term 'E-'.
These regexes should leave any terms 'E-', 'E+', 'e-' or 'e+' in place and then line 3960 if(!$options['getString']) $value = floatval($value); should do the required conversion.
However, the string needs to be checked that the exponentiation is at the end (and only once), so the fix is not entirely straighforward.

In any case, there is a short-cut that can be used to check for any internally-produced scientific notation, such as in the
InputfieldFloat rendering. Add, after line 3909, if((string)floatval($value) === $value) $value = floatval($value); or if((string)(float)$value === $value) $value = (float)$value; (or maybe both, to be certain; I'm not quite sure how locales are handled and can't easily test that). That will bypass all the unnecessary regex for simple cases like the example given above. It will not deal with all manually-entered scientific notation, e.g. -1.253E-2, as (float) will convert that to -0.001253 which is not the same string.

Steps to reproduce the issue

  1. Create a new field of type 'float', add it to a template
  2. Create/open a page with that template and enter a number such as -0.00001253 in it
  3. Save the page - the field displays -1.253E-5
  4. Save the page again; the field displays -1.2535.

Setup/Environment

SERVER DETAILS
ProcessWire: 3.0.190
PHP: 8.0.13
Webserver: Apache/2.4.35 (Win64) OpenSSL/1.1.1l
MySQL Server: 5.7.24
MySQL Client: mysqlnd 8.0.13

MODULE DETAILS
AdminPageFieldEditLinks: 3.1.4
ConnectPageFields: 0.3.3
CronjobDatabaseBackup: 1.2.4
FieldtypeCombo: 0.0.7
FieldtypeMeasurement: 0.0.8
FieldtypeRepeaterMatrix: 0.0.8
FieldtypeRuntimeOnly: 0.1.8
FieldtypeTable: 0.2.2
InputfieldCombo: 0.0.7
InputfieldMeasurement: 0.0.7
InputfieldRepeaterMatrix: 0.0.8
InputfieldTable: 0.2.2
PageAutosave: 0.0.6
ProcessDatabaseBackups: 0.0.6
ProcessTracyAdminer: 1.1.3
ProcessWireUpgrade: 0.1.1
ProcessWireUpgradeCheck: 0.0.9
SelectOncePerTable: 0.2.0
TracyDebugger: 4.22.19

ryancramerdesign added a commit to processwire/processwire that referenced this issue Jan 7, 2022
@ryancramerdesign
Copy link
Member

@MetaTunes Thanks, InputfieldFloat isn't intended to use scientific notation, so I have pushed an update to correct it. I've also updated the float sanitizer to support scientific notation so that it can properly sanitize values using it, and optionally convert to non-scientific notation.

@MetaTunes
Copy link
Author

I have tested that. It seems to deal with the negative exponentiation correctly, but not positive exponentiation. Entering a value such as 100000000000000 (1 followed by 14 zeroes) is initially converted to 1.0E+14 and then to 1.014 - an unusual case, but certainly not impossible. In any case, the field (and sanitizer) ought to accept and properly treat any entry such as 1.2E+3 (or 1.2e+3) correctly - it does so now with 1.2E-3 (and 1.2e-3).

ryancramerdesign added a commit to processwire/processwire that referenced this issue Jan 7, 2022
@ryancramerdesign
Copy link
Member

@MetaTunes I've added support for the "E+" now as well. Let me know if you can think of any other examples it should recognize.

@MetaTunes
Copy link
Author

MetaTunes commented Jan 7, 2022

Thanks @ryancramerdesign. That mostly looks pretty good. 1 with 14 zeroes works OK, but 15 or more zeroes (?!) misses out the E+ so 1000000000000000 becomes 115.

Also, ideally, it should accept E (or e) without the +
6.022E+2, 6.022e+2, 6.022e2 and 6.022E2 are all equivalent. When entering lab results, fewer keystrokes is best, so permitting the omission of the + would be helpful (and quite normal).

I'm using this in the context of a new app which uses my FieldtypeMeasurement and has a lot of scientific formulae in it, so hopefully if thaere are any remaining bugs I will find them.

EDIT: Perhaps worth mentioning that both (float) typcasting and floatval() recognise 6.022e2

@ryancramerdesign
Copy link
Member

@MetaTunes I can increase the number of digits, but have to have a hard limit somewhere. What do you recommend the limit be?

I'll add support for "E" without the "E+".

@ryancramerdesign
Copy link
Member

@MetaTunes I think I might need more examples as the one you mentioned seems to work okay here (unless I'm not reading it right)?

$value = 1000000000000000;
echo $value . "\n";
echo $sanitizer->float($value) . "\n";
echo $sanitizer->float($value, [ 'getString' => true ]) . "\n";

Result:

1000000000000000
1.0E+15
1000000000000000

@MetaTunes
Copy link
Author

Hi @ryancramerdesign . You are right that $sanitizer seems ok with the long numbers, but the inputfield renders them wrongly. If you enter 1 followed by 14 or less zeroes, it renders exactly that (not the scientific notation 1E+14, but that is consistent with sanitizer and floatval). 1 followed by 15 zeroes gives 115.
There seem to be a number of other issues with the inputfield:

  • 1E-2 renders 12 (similarly for any n in 1E-n)
  • but 10E-2 renders 0.1 correctly
  • and 1.0E-2 renders 0.01 correctly
  • 10E-14 renders 0.0000000000001 (I would have thought it should keep the scientific notation - $sanitizer correctly gives 1.0e-13)
  • 10E-20 renders 0 ($sanitzer correctly gives 1.0e-19 - it moves to scientific notation from 17 zeroes)
  • 1E+2 renders 12 etc.

I coud go on, but I guess you get the picture - $sanitizer seems to be OK and is consistent with floatval() but the inputfield is not OK.

ryancramerdesign added a commit to processwire/processwire that referenced this issue Jan 11, 2022
@ryancramerdesign
Copy link
Member

@MetaTunes Thanks, I've tried to cover all the cases you mentioned in the latest commit. Please let me know if you find any more. Also note that I had to disable rounding to support some of the cases you mentioned so you'll likely have to do the same. It is an option when editing your float field, you set the precision to a negative number, i.e. -1.

@MetaTunes
Copy link
Author

MetaTunes commented Jan 11, 2022

Hi @ryancramerdesign - I'm not sure what's gone wrong now, but now both $sanitizer and InputfieldFloat seem to be broken
E.g:

  • $sanitizer(-1.253E-5) returns -1.0e-5 (as does the Inputfield with decimals cleared, at least initially)
  • -1.253E-5 in inputfield returns -0 (with any value, including -1, entered as decimals, and also if the decimals is cleared and then save hit twice (as 2 is the default) )
  • -1.253E-5 in a ProTable double or float field returns -1.0E-5, so it's just broken my app again (fortunately only in dev!), plus I don't see a way to change the decimals in those fields (assuming that has the desired effect).

Sorry to be the bearer of bad news! I do wonder if it's all a bit over-complicated. After all, if a user is entering numbers in scientific notation (or if php is rendering them thus, for very large or small floats), they wouldn't normally include other formatting like thousand groups etc, so I don't know what sanitzation is needed beyond (float) or floatval(). I understand that they are not locale-aware, however. There is a possible locale-aware solution in the php manual - which I have illustrated below:

float6

EDIT:
Actually that parseFloat() handles thousands as well. Including

    $floatString = str_replace($LocaleInfo["thousands_sep"] , "", $floatString);
    $floatString = str_replace($LocaleInfo["decimal_point"] , ".", $floatString);

might be better still?

@ryancramerdesign
Copy link
Member

ryancramerdesign commented Jan 11, 2022

@MetaTunes One step at a time. Let's not venture into locale stuff just yet. If the previous examples are working, then that's good news. For your new example, I can tell you where it's coming from, though I don't understand why. Try this:

echo round(-1.253E-5, 5); // outputs: -1.0E-5

Since the $sanitizer->float() uses a round() call, that's where that particular result is coming from. Though I'm not sure what to do about it.

@MetaTunes
Copy link
Author

echo round(-1.253E-5, 5); // outputs: -1.0E-5
True
So, to produce an accurate result, the rounding needs to be the number of digits after the decimal point plus the negative exponent (3 + 5 = 8)
echo round(-1.253E-5, 8); // outputs: -1.253e-5
For consistency, for positive exponents, the exponent can be deducted from the number of decimals input (with a min of 0)

ryancramerdesign added a commit to processwire/processwire that referenced this issue Jan 11, 2022
@ryancramerdesign
Copy link
Member

@MetaTunes Thanks, I've tried to apply that to the latest commit, focusing on just the sanitizer now. And it seems to correct this particular result at least. Please let me know of more numbers you find that aren't working.

@MetaTunes
Copy link
Author

Replace line 3992 with something like:

$d = strlen(ltrim(strstr(stristr("$value", 'E', true), '.'), '.'));
$v = (int) ltrim(stristr("$value", 'E'), 'E+');
if(strpos($str, 'E')) $options['precision'] = max(0, $d - $v);

?

@MetaTunes
Copy link
Author

Thanks a lot @ryancramerdesign . As far as I can see, $sanitizer and the ProTable fields are working OK in my environment. InputfieldFloat still needs fixing though (-1.253E-5 returning 0).
At least, that is the case for locale setlocale(LC_ALL, "English", "en") - $sanitizer->float() returns the same values as parseFloat(). However I'm not sure if sanitizer deals with other locales correctly.

ryancramerdesign added a commit to processwire/processwire that referenced this issue Jan 12, 2022
@ryancramerdesign
Copy link
Member

@MetaTunes I've pushed some additional updates that might correct these issues. After updating, check that in your float field settings (Details tab) that the "number of decimal digits to round to" is set to -1, otherwise a value like -1.253E-5 may very well round to 0. Also check on your "Input" tab that the input type is set to "text" rather than "number". I've noticed that the HTML5 number input won't accept -1.253E-5 as a value for some reason. Whenever I try to use that number in the HTML5 number input the browser tells me "Please enter a valid value" (in Chrome). This is without min, max attributes set, and I also experimented with removing the step attribute, but apparently it just doesn't like -1.253E-5 though strangely it will accept -1.253E-4.

@MetaTunes
Copy link
Author

MetaTunes commented Jan 13, 2022

HI @ryancramerdesign . That all seems pretty good. The only slight wrinkle I can see is that the inputfield rounds to 6 significant digits, regardless of the decimals setting. I presume that is because the SQL schema uses the float type, not double. My app is actually using ProTable for these sort of fields (at the moment) and these do not suffer from the 6 digit restriction because I am using the 'double' type. I assume there was a good reason for using 'float' rather than 'double' in the schema, but it is a bit restrictive (and is not a restriction of the php float type) - I wonder if there could be an option to choose**?
**EDIT - warning of course that switching will truncate or introduce spurious digits!

@ryancramerdesign
Copy link
Member

@MetaTunes It's okay to change it to a double directly in the DB schema if you want to. I have done this on the rare occasion when it was helpful and it works just fine.

@MetaTunes
Copy link
Author

No further issues have arisen. Many thanks @ryancramerdesign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants