Inconsistent results for min() and max() with math.nan as argument #88536

joellarose · 2021-06-10T01:54:16Z

BPO	44370
Nosy	@rhettinger, @mdickinson, @stevendaprano, @serhiy-storchaka, @sweeneyde

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-06-10.01:54:15.874>
labels = ['interpreter-core', 'type-bug', '3.9']
title = 'Inconsistent results for min() and max() with math.nan as argument'
updated_at = <Date 2021-06-13.09:23:18.604>
user = 'https://bugs.python.org/joellarose'

bugs.python.org fields:

activity = <Date 2021-06-13.09:23:18.604>
actor = 'rhettinger'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2021-06-10.01:54:15.874>
creator = 'joel.larose'
dependencies = []
files = []
hgrepos = []
issue_num = 44370
keywords = []
message_count = 14.0
messages = ['395497', '395498', '395501', '395506', '395507', '395509', '395521', '395561', '395576', '395621', '395623', '395624', '395736', '395737']
nosy_count = 6.0
nosy_names = ['rhettinger', 'mark.dickinson', 'steven.daprano', 'serhiy.storchaka', 'Dennis Sweeney', 'joel.larose']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue44370'
versions = ['Python 3.9']

joellarose · 2021-06-10T01:54:16Z

If math.nan is the first argument for either max() or min(), the result is always nan, regardless of the other values in the results. However, if nan is in any other position in the arguments list, the result is always what you would expect if nan was not in the arguments list.

E.g.
from math import nan, inf

>>> max(nan, 5, 3, 0)
nan

>>> min(nan, 5, 3, 0)
nan

>>> min(nan, 5, 3, 0, inf, -inf)
nan

>>> max(nan, 5, 3, 0, inf, -inf)
nan

>>> max(inf, 5, 3, 0, nan, -inf)
inf

>>> max(8, inf, 5, 3, 0, nan, -inf)
inf

>>> min(8, inf, 5, 3, 0, nan, -inf)
-inf

>>> min(8, nan, inf, 5, 3, 0, nan, -inf)
-inf

>>> max(8, nan, inf, 5, 3, 0, nan, -inf)
inf

>>> max(8, 5, 3, 0, nan)
8

As you can see, the results for min() and max() are inconsistent for nan depending on whether it's the first argument or not.

I would prefer for min() and max() to ignore nan unless all arguments are nan. However, the other path for consistent behaviour is to return nan if any of the arguments are nan.

joellarose · 2021-06-10T01:57:36Z

Forgot to mention the environment:
Python version: 3.9.0
OS: Windows 10

I have not tested this on any other version of python.

joellarose · 2021-06-10T02:13:24Z

The same problem occurs if the argument is a list. The same inconsistency happens depending on the position in the list that nan happens to be.

>>> max([5, nan, 3, 0, 8, -10])
8

>>> min([5, nan, 3, 0, 8, -10])
-10

>>> min([nan, 5, 3, 0, 8, -10])
nan

>>> max([nan, 5, 3, 0, 8, -10])
nan

Passing a tuple with the same values produces the same inconsistency.

For the examples above, replacing the lists with sets with the same values (i.e. replace [] with {}) always results in nan. This may have to do with the hash value of nan always making the first value in iteration be nan given the sample space.

sweeneyde · 2021-06-10T06:33:24Z

This is essentially the same issue, but with sorted(): https://bugs.python.org/issue36095

The trouble is that the implementation of min() is roughly equivalent to:

iterable = iter(iterable)
current_min = next(iterable)
for x in iterable:
    if x < current_min:
        current_min = x
return current_min

NaN < number and number < NaN both return false. Thus including NaNs make floats not totally ordered, so there's no perfect way to sort them or take a max/min. This leads to some unexpected result, as you've identified. So you could say that the real "bug" is in floats more broadly.

This behavior was originally implemented to comply with IEEE 754. Even if that was a mistake, it would be very difficult to change this behavior in case someone's code relies on exactly using this implantation. I also don't like the idea of special-casing the behavior of min/max for floats, since right now the behavior is completely agnostic of the types of the items.

sweeneyde · 2021-06-10T06:35:09Z

implantation --> implementation

mdickinson · 2021-06-10T06:55:00Z

See also bpo-11986, which this report is essentially a duplicate of.

I think it may be time to implement math.ieee754_total_order (after suitable bikeshedding about the name), so that one can do

   sorted(my_list_of_floats, key=math.ieee754_total_order)

and

   max(my_list_of_floats, key=math.ieee754_total_order)

stevendaprano · 2021-06-10T11:45:15Z

math.ieee754_total_order sounds good to me, but how would you get the minimum?

Wikipedia doesn't have much on the 2019 revision to IEEE-754 except to say that the min/max rules have been overhauled again, but without giving much detail.

https://en.wikipedia.org/wiki/IEEE_754

Also relevant:

https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf

serhiy-storchaka · 2021-06-10T18:29:47Z

The idea of math.ieee754_total_order looks interesting, but how would it work with min/max?

In https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf there is a comparison of several standards and implementations of symmetric and associative min/max. There are two options: propagate NaN and ignore it (treat it as missing data). It differs between different standards and implementations but in particular standard or implementation the same rule is used for min and max.

If ieee754_total_order(NAN) < ieee754_total_order(1), then min(1, NAN) -> NAN ("propagate") and max(1, NAN) -> 1 ("missing data").
If ieee754_total_order(NAN) > ieee754_total_order(1), then min(1, NAN) -> 1 ("missing data") and max(1, NAN) -> NAN ("propagate").

rhettinger · 2021-06-10T21:37:21Z

We should consider biting the bullet and revising the default NaN sort order. It has been a perpetual irritant.

Have NaNs always compare to less than any other float value.
When comparing two distinct NaNs, use the NaN payload
and fall back to object id if necessary.
When comparing two identical NaNs, always return False.

That preserves the IEEE-754 guarantee that NaNs never compare equal to one another -- which is one of the few properties that people rely on.

IEEE-754 doesn't concern itself at all with the notion of object identity, so giving two NaNs a deterministic but unequal ordering isn't at odds with the standard.

IMHO, making min(), max(), and sorted() both deterministic and commutative is a big enough win to warrant foregoing one of the less important properties of NaN. User care about determinism and commutativity more than they care that float('NaN') > 10 and float('NaN') < 10 both return False.

“Logic clearly dictates that the needs of the many outweigh the needs of the few.” -- Spock, The Wrath of Khan ;-)

Python is practical language, so we should consider making a choice based on pragmatism rather than strict standard compliance. Bug reports like this are the tip of the iceberg. Given the prevalence of NaNs being used as a placeholder for missing data, it is likely that people struggle with this every day.

The original design of NaNs was to let invalid intermediate results flow through a calculation and not require error testing at every step. But in practice, NaNs are not commonly used this way. Mainly, they are used as the float equivalent of None in an array of floats. We should support this reality. IMO it is a small disaster that NaNs completely wreck sorting and do so silently. From user POV, an arcane property of NaNs is an inadequate justification for the misbehavior.

mdickinson · 2021-06-11T08:50:11Z

We should consider biting the bullet and revising the default NaN sort order.

If we went that route, I think we wouldn't need to consider payload or identity. We could just do:

    NaN < NaN      ->  False
    NaN < non-NaN  ->  True
non-NaN < NaN      ->  False
non-NaN < non-NaN  ->  usual numeric comparison

That then satisfies the axioms for a total ordering, albeit that the implied equality isn't Python's == (and it can't be, if we're going to keep the property that NaN != NaN): all NaNs compare equal for the equality determined by the above order, and the stability of the sort means that those NaNs will retain their order relative to each other in the sorted output.

Making NaN < non-NaN return True (which happens under both the proposal above and Raymond's more elaborate proposal) would be a break with IEEE 754, though.

There's also a somewhat arbitrary choice to be made here: do we consider NaNs to be negative or positive? That is, do we want NaNs to sort to the beginning of the list, or the end?

mdickinson · 2021-06-11T09:01:35Z

That is, do we want NaNs to sort to the beginning of the list, or the end?

FWIW, NumPy chooses to sort NaNs to the end of the list: https://numpy.org/doc/stable/reference/generated/numpy.sort.html

serhiy-storchaka · 2021-06-11T09:10:19Z

Maybe add two key functions for making NaN either the smallest or the largest number?

min(1, NAN, key=nan_is_smallest) -> NAN
max(1, NAN, key=nan_is_smallest) -> 1
sorted([1, NAN, 2], key=nan_is_smallest) -> [NAN, 1, 2]
min(1, NAN, key=nan_is_largest) -> 1
max(1, NAN, key=nan_is_largest) -> NAN
sorted([1, NAN, 2], key=nan_is_largest) -> [1, 2, NAN]

rhettinger · 2021-06-13T09:17:41Z

There's also a somewhat arbitrary choice to be made here:
do we consider NaNs to be negative or positive?
That is, do we want NaNs to sort to the beginning of the
list, or the end?

I had suggested putting NaNs at the beginning because that is how None was sorted in Python 2. But that doesn't really need to be a guide. Likely the best choice is to follow numpy's decision, placing NaNs at the end.

rhettinger · 2021-06-13T09:23:18Z

Maybe add two key functions for making NaN either the smallest
or the largest number?

SQL does this with NULLS FIRST or NULLS LAST but it is a nuisance. I think it better to opt for simplicity and choose a default.

joellarose mannequin added 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Jun 10, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

iritkatriel added 3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes and removed 3.9 only security fixes labels Apr 10, 2023

hugovk mentioned this issue Mar 24, 2024

max(a,b) sometimes depends on order of args #117200

Closed

Eclips4 mentioned this issue Jun 7, 2024

Anomalous behaviour with nan values in min/max functions #120205

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent results for min() and max() with math.nan as argument #88536

Inconsistent results for min() and max() with math.nan as argument #88536

joellarose mannequin commented Jun 10, 2021

joellarose mannequin commented Jun 10, 2021

joellarose mannequin commented Jun 10, 2021

joellarose mannequin commented Jun 10, 2021

sweeneyde commented Jun 10, 2021

sweeneyde commented Jun 10, 2021

mdickinson commented Jun 10, 2021

stevendaprano commented Jun 10, 2021

serhiy-storchaka commented Jun 10, 2021

rhettinger commented Jun 10, 2021

mdickinson commented Jun 11, 2021

mdickinson commented Jun 11, 2021

serhiy-storchaka commented Jun 11, 2021

rhettinger commented Jun 13, 2021

rhettinger commented Jun 13, 2021

Inconsistent results for min() and max() with math.nan as argument #88536

Inconsistent results for min() and max() with math.nan as argument #88536

Comments

joellarose mannequin commented Jun 10, 2021

joellarose mannequin commented Jun 10, 2021

joellarose mannequin commented Jun 10, 2021

joellarose mannequin commented Jun 10, 2021

sweeneyde commented Jun 10, 2021

sweeneyde commented Jun 10, 2021

mdickinson commented Jun 10, 2021

stevendaprano commented Jun 10, 2021

serhiy-storchaka commented Jun 10, 2021

rhettinger commented Jun 10, 2021

mdickinson commented Jun 11, 2021

mdickinson commented Jun 11, 2021

serhiy-storchaka commented Jun 11, 2021

rhettinger commented Jun 13, 2021

rhettinger commented Jun 13, 2021