Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On number parsing #86

Open
glennsl opened this issue Mar 4, 2023 · 4 comments
Open

On number parsing #86

glennsl opened this issue Mar 4, 2023 · 4 comments

Comments

@glennsl
Copy link
Contributor

glennsl commented Mar 4, 2023

This touches a bit on the discussion in #83, but covers more broadly the current options for parsing numbers in JavaScript, to try to inform which semantics we should expose, more so than just how we should type and organize the JS API.

Currently, I believe the only option for parsing numbers are parseInt/parseFloat. Either directly through Float.parseFloat and Float.parseInt, but also as the underlying API used in Float.fromString andInt.fromString. While there unfortunately aren't any good options, in my opinion parseInt and parseFloat is the worst of all the bad options. For the sake of easy comparison and to facilitate efficient discussion, I've outlined the pros and cons of the options i know of below.

Note that I've only covered the differences between the options. Common peculiarities to all of them is that:

  • They accept scientific notation, but not optionally so it can't be turned off.
  • They don't distinguish parse failure from the actual valid value of NaN.
  • They don't support group separators.
  • They don't provide locale-aware parsing.

parseInt/parseFloat

The core of the prom with these functions is described by the following quote from MDN:

parseFloat() picks the longest substring starting from the beginning that generates a valid number literal. If it encounters an invalid character, it returns the number represented up to that point, ignoring the invalid character and all characters following it.

While this may not seems like such a big issue, since it just ignores certain kinds of "mistakes", keep in mind that there isn't just one number format used across the world. These functions will only parse a very simple number format, close to but not entirely the same as JavaScript number literals. And iif it ecnounters anything that doesn't fit that, it will just ignore the rest. For example, if any kind of group separator is used, that and everything that follows will be silently ignored. The same goes for using a decimal separator other than ..

In short, these invocations will all return 15:

parseInt("15,123");
parseInt("15 123");
parseInt("15 * 3");
parseInt("15px");

Pros

  • Simple functions that are easy to bind to.
  • Zero-cost.
  • Accepts optional radix argument.

Cons

  • Will accept any string that starts with something that can be parsed as a number.

Number coercion

There are a number of ways to trigger number coercion, such as using any numeric operator on them. E.g. if value is a string, +value will return a number.

This improves on parseInt and parseFloat most notably by rejecting input that isn't wholly parsable, such as all the examples in the section above. It does come with a few extra cons though, but they are for the most part possible to work around.

Pros

  • Only accepts wholly parsable strings.

Cons

  • No explicit radix option, but does parse numbers correctly when prefixed with 0x etc.
  • No int-specific variant. And while it's easy to create a wrapper function to reject floats that are not whole numbers, it's not possible to reject string such as "123.0" without a pre-processing step.
  • Empty or whitespace-only strings are converted to 0. (can easily be worked around with a wrapper function though).

Number constructor

This works very similar to number coercion, because that is the conversion mechanism actually used. But it's slightly slower. And when used as a constructor (with new) it will create a Number object rather than a primitive.

Pros

  • Same as number coercion
  • Easy to bind to, though doesn't transfer as naturally to ReScript

Cons

  • Same as number coercion
  • Slightly slower
@glennsl
Copy link
Contributor Author

glennsl commented Mar 4, 2023

What I propose, then, is:

  1. Use number coercion instead of parseInt and parseFloat int Int.fromString and Float.fromString.
  2. Discourage the use of Float.parseInt and Float.parseFloat.

@aspeddro
Copy link
Contributor

aspeddro commented Mar 4, 2023

The problem with coercion is that it doesn't give the same results as parseInt. If the function is named parseInt users think it works like parseInt from JS.

I propose returns an option.

@glennsl
Copy link
Contributor Author

glennsl commented Mar 5, 2023

I'm not proposing that parseInt and parseFloat should be replaced with similarly named functions that do number coercion. I'm proposing that they're left as-is, precisely because users will expect it to work like in JS, but to actively discourage their use.

Then I'm also proposing that number coercion is used instead of parseInt and parseFloat in Int.fromString and Float.fromString specifically, since they don't have a JS equivalent and can therefore have better semantics.

@glennsl
Copy link
Contributor Author

glennsl commented Mar 25, 2023

Related bug report in rescript-compiler about this, that was just closed as stale: rescript-lang/rescript-compiler#3732

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants