Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More lenient float multipleOf validation #878

Conversation

tdamsma
Copy link

@tdamsma tdamsma commented Nov 5, 2021

Many user have ran into floating point precision errors when validation multipleOf. E.g. #818, #810, #687, #185, #320, #247. Technically it can be argued that it is silly to check if any number is an integer multiple of e.g. 0.1. Ask silly questions, get silly answers. This approach is a bit unhelpful though.

Looking at other implementations in javascript and python, other libraries struggle with this too

Checking if 10.1 is a multiple of 0.1 yields mixed results:

>>> import fastjsonschema
>>> fastjsonschema.validate({"multipleOf": 0.1}, 10.1)
10.1

>>> import jsonschema_rs
>>> jsonschema_rs.JSONSchema.from_str('{"multipleOf": 0.1}').is_valid(10.1)
False
>>> jsonschema_rs.JSONSchema({"multipleOf": 0.1}).is_valid(10.1)

https://jsonschema.dev/ fails validation
https://www.jsonschemavalidator.net/ accepts it

And react-jsonschema-form accepts it.

So there is no real consensus amongst the libraries. Still I find the number of issues a strong indication that the current behaviour of this library is unexpected. This PR's modifies the multipleOf behaviour to allow for float tolerance (epsilon) to be taken into account.

@Julian
Copy link
Member

Julian commented Nov 5, 2021

Hi. Thanks for this but I'm going to politely decline.

As I mentioned in the other issues, if someone wants non-float behavior, there is already a way to get it via Decimals.

Otherwise, this change just moves which floats are incorrect multiples from some to others.

If you indeed are interested in this, I'd point you to this PR which claims there is a better algorithm entirely for us to use.

I don't really think that looking at the number of tickets filed is a good metric for surprisability (and I'm not sure I believe surprise is the right metric at all in this case, this library implements a spec, so for better or worse it's compliance with the spec that trumps).

But just looking at issues filed is going to be biased against the non-current behavior -- people who expect this behavior aren't filing tickets saying things worked as expected :)

But do appreciate the PR nonetheless.

@Julian Julian closed this Nov 5, 2021
@tdamsma
Copy link
Author

tdamsma commented Nov 8, 2021

@Julian, I understand and sympathize with your points, but am going to try to convince you otherwise nonetheless.

I see now that you have been involved in discussions on the jsonschema spec itself, without reaching a meaningful course of action. I just see many valid points raised, linked issues, calls for clarification of the spec, other languages sturggling with this etc.

My take is that it is up to the language to interpret/implement this as they want (correct me if I'm wrong here). The spec offers no real guidance here (and whilest I think that should be improved, that is another discussion), for the python implementation I think python's best practices should be guiding.

And that is where it becomes a bit unclear I guess. On first sight, python works like this:

>>> 0.1*round(10.1/0.1)
10.100000000000001
>>> 0.1*(10.1/0.1)
10.1

And it seems fair to reason that 10.1 is indeed not an integer multiple of 0.1.

But digging a bit deeper, both python itself and the native json library already do some implicit rounding:

>>> import json
>>> v = json.loads("0.100000000000000001")
>>> print(v)
0.1
>>> print(f"{v:.20g}")
0.10000000000000000555
>>> print(json.dumps(v))
0.1

There are three occurences of implicit rounding here:

  • the input is rounded to the nearest float.
  • then printing the float, the value is by default represented as 0.1. This behaviour was explcitily introduced in python https://docs.python.org/3/tutorial/floatingpoint.html
  • when converting it back into json, the number is also represented as 0.1

So the underlying philosophy seems (to a certain degree) to be to treat float(0.1) as exactly 1/10.

So even though the behaviour below is strict/correct/explainable:

 >>> 1/10 == 0.1
True
>>> 10.1 / 101 == 0.1
False

This is also the case:

jsonschema.validate(instance=json.loads("10.00000000000000005"), schema={"multipleOf": 1})

If I really cared, I should have used a Decimal representation:

>>> import simplejson 
>>> import jsonschema
>>> jsonschema.validate(instance=simplejson.loads("10.00000000000000005",use_decimal=True), schema={"multipleOf": 1})

Failed validating 'multipleOf' in schema:
    {'multipleOf': 1}

On instance:
    Decimal('10.00000000000000005')

So whilest you have argued that is you want more correct behaviour of multipleOf one should use Decimal and not float, I woul like to argue that I think it is more pythonic to silently add some leniency to multipleOf validation (equal to the float representation error) just like python displays float(0.1) differently from Decimal('0.1'). If you really care about exactness here use Decimal, otherwise do some rounding and get on with it.

this change just moves which floats are incorrect multiples from some to others.

Not really, it just checks (or should check) if the number is equal to the nearest representable float.

Now there are many different ways to implement this, and perhaps one based on Decimal would feel less arbitrary:

instance = 10.1
dB = 0.1
instance_d = Decimal(repr(instance))
dB_d = Decimal(repr(dB))
quotient = instance_d / dB_d
failed = int(quotient) != quotient

If the users intention actually is to use arbitrary precision decimal representaion they can do so by using the Decimal package explicitly, something they should be doing anyways as this holds:

>>> Decimal(1) / Decimal(0.1)
Decimal('9.999999999999999444888487687')
>>> 1 / 0.1
10.0

Which in the current implementation actually leads to the arbitrary behaviour where some numbers are multiples of some irrepresentable numbers and others are not.

@Julian
Copy link
Member

Julian commented Nov 12, 2021

Apologies for not having more time to respond in detail, but there's no implicit rounding in your examples -- what Python does in recent versions is a very targeted, very specific display change -- https://bugs.python.org/issue1580

It reprs floats by using the shortest input that produces the given float. But no rounding is happening, 0.1 isn't a representable float, so you always have gotten the same float, before or after the repr change. And absolutely no tolerance is added -- it's strictly a display change.

There's as far as I know no precedence anywhere in Python for introducing some implicit tolerance when doing float operations unless the user explicitly asks for it, and what jsonschema does matches that -- it does float operations and gives the same answers Python does if you do them yourself:

>>> v = 10.00000000000000005 / 1
>>> print(f"{v:.50g}")
10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants