Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoiding the precision loss footgun #15

Closed
harrysarson opened this issue May 24, 2019 · 9 comments
Closed

Avoiding the precision loss footgun #15

harrysarson opened this issue May 24, 2019 · 9 comments

Comments

@harrysarson
Copy link
Contributor

UnitMath is shaping up nicely :) I have a question:

From the README (v0.3)

This means that if your custom type cannot be represented exactly by a floating-point number, you must use the two-argument unit(value, unitString) to construct units:

// Correct
let d = unit(Decimal('3.1415926535897932384626433832795', 'rad'))
let f = unit(Fraction(1, 2), 'kg') 
// Incorrect
let d = unit('3.1415926535897932384626433832795 rad') // Will lose precision
let f = unit('1 / 2 kg') // Parse error

This seems like an unnecessary footgun to introduce and I wonder if it is not possible to provide a better alternative.

@harrysarson
Copy link
Contributor Author

What I propose (largely thinking out loud) is the following (for one-argument constructor).

  • If the first character of the input is a number, split the input on the first whitespace character, give everything before the space to the constructor and parse everything after the space as the unit.
  • If the first character of the input is not a number parse the entire input as a unit.

The means that unit('1 / 2 kg') will still error (trying to parse unit of / 2 kg) but unit('1/2 kg') will work.
It is also now impossible to accidentally lose precision.

Thoughts?

@ericman314
Copy link
Owner

ericman314 commented May 24, 2019

I appreciate you giving this some thought, Harry. Would indeed be nice if that could work, especially for the Decimal.js case. The reason I wrote that message in the readme was that I considered it a good compromise between two extremes. On the one extreme, we could attempt to parse those custom types as you suggest, by passing everything up to the first whitespace to the type.conv function. But without knowing the details of that library, there's no guarantee the library even has the ability to parse a string. (And if it does, what if the custom type itself has whitespace?) On the other extreme, we could disallow the single-string constructor entirely, but I chose not to do that because I would have had to rewrite all the built-in unit definitions, which use the single-string version. The user is already required to provide a function to convert a number to the custom type, so I considered that that was where I would draw the line. UnitMath will convert a IEEE 754 number to your custom type, since that is what it already does with the built-in units, but anything beyond that and you'll have to use the two-argument constructor.

Another possibility is to allow the user to control how the custom type is parsed from a string, thus allowing the single string constructor, by providing a type.parse function. This would accept a string, consume the custom type portion of the string, and return the custom type and the unconsumed portion of the string. But that seems like unnecessary complexity. I feel like if the user has already gone to lengths to extend a custom type, they will be capable of constructing units of that type without losing precision.

One more possibility that might be a better compromise: The line below in Parser.js converts the parsed numeric portion of the string to a number, then to the custom type:

unit.value = options.type.conv(parseFloat(valueStr))

We could skip the parseFloat step when there is a custom type.conv. In the case of Decimal.js at least, parseFloat is unnecessary since Decimal.js can convert strings directly to Decimals. Since the user is technically only required to provide a function to convert IEEE 754 numbers to the custom type, we would need to provide an option to re-enable parseFloat if the user so desires. The parser would still not work with 1/2, but at least it would not unnecessarily throw away extra digits of Decimals.

@ericman314
Copy link
Owner

I've adjusted things so that those lines in Parser.js look like this now:

UnitMath/src/Parser.js

Lines 194 to 198 in 3260233

unit.value = options.type.parse(valueStr)
if (options.type.parse._IS_UNITMATH_DEFAULT_FUNCTION && !options.type.conv._IS_UNITMATH_DEFAULT_FUNCTION) {
// User supplied a conv function but not a parse. Use default parse, THEN use conv.
unit.value = options.type.conv(unit.value)
}

Each of the type. functions does this:

Function Converts
type.parse (default) string -> number
type.parse (user-supplied) string -> custom
type.conv number -> custom

If there is a custom type.parse, it will convert the numeric portion of the input string to the custom type in one step. If there is no custom type.parse, then it converts the input string to a number, and then converts that to the custom type. This means that:

unit('3.1415926535897932384626433832795 rad')

results in no loss of precision when type.parse is provided. Unfortunately, this only works for numbers that are in decimal or scientific notation. For fractions and complex numbers, you still have to use the two-argument constructor, but now both arguments can be strings:

unit('1 / 2', 'kg') 
unit('4 + 3i', 'kVA')

Do you think this is acceptable?

@harrysarson
Copy link
Contributor Author

I think this hits the nail on the head:

The parser would still not work with 1/2, but at least it would not unnecessarily throw away extra digits of Decimals.

Any solution that results in no loss of precision would be great in my book 👍

Do you think this is acceptable?

Yes, with one caveat:

If there is no custom type.parse, then it converts the input string to a number, and then converts that to the custom type.

My opinion is that this should throw an error.


I would go even further and say that simply throwing an error if the one argument constructor is invoked when using a custom type is acceptable. However, the solution you propose is much better than this.

I am arguing that if the constructor throws an error, a user can pull up the unitmath README and realise what they need to change. If the constructor silently loses precision then the user will have hard to track down bugs appearing in their program sometime in the future.

@josdejong
Copy link

Interesting discussion. Would it work to just have a parse option which should be able to parse both string or number input into the custom numeric type? So you don't need a separate parse and conv?

@josdejong
Copy link

And maybe it's just fine not to accept things like unit('1 / 2 m') to be parsed into a fraction? You can always parse via unit(fraction('1/2'), 'm'), and if needed, the user can write it's own wrapper to support this type of input when needed. It's quite an edge case I think.

@ericman314
Copy link
Owner

@harrysarson, I was a little uneasy about that as well. I will add an error as appropriate to ensure that the string --> number --> custom conversion path is never invoked.

Would it work to just have a parse option which should be able to parse both string or number input into the custom numeric type?

@josdejong, possibly we could do that, but I thought it would be good to have two separate functions so that the user doesn't have to do their own type checking if they don't want to. Once Harry's latest suggestion is implemented, conv and parse will be used for totally different purposes in the library, so I think it's good to keep them logically separated as well. But you could use the same function for each. For the two custom types I've looked at, Decimal.js and Fraction.js, both are capable of converting either a string or a number to the custom type. The conv function will always be required, and will throw an error immediately if not supplied, but parse will be optional and will not throw unless you try to invoke a constructor where the value is a string.

So in summary, this is what will happen:

If the user supplies conv and parse:

unit('rad') // Works
unit('3.14159 rad') // Works
unit('3.14159', 'rad') // Works
unit(3.14159, 'rad') // Works
unit(Decimal('3.14159', 'rad') // Works
unit('1 / 2 rad') // Parse error
unit('1 / 2', 'rad') // Works
unit(Fraction('1 / 2'), 'rad') // Works

If the user only supplies conv:

unit('rad') // Works
unit('3.14159 rad') // Parse error
unit('3.14159', 'rad') // Parse error
unit(3.14159, 'rad') // Works
unit(Decimal('3.14159', 'rad') // Works
unit('1 / 2 rad') // Parse error
unit('1 / 2', 'rad') // Parse error
unit(Fraction('1 / 2'), 'rad') // Works

@josdejong
Copy link

I get it. Maybe though in that case it's not needed to embed parse into the library, since the user can wrap the constructor himself too? Like unit(parse('1 / 2 rad')). It would be nice to have just a single "parse" function, that's much easier to understand. If it's needed it's needed of course, but it would be great to keep the API as plain and simple as possible.

@ericman314
Copy link
Owner

After considering all your comments and giving this a lot of thought, I have put together a PR that combines parse and conv, and adds error messages when omitting required type functions. I would appreciate any feedback you have. #17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants