Remove cmp parameter to list.sort() and builtin.sorted() #46104

gvanrossum · 2008-01-09T00:02:06Z

BPO	1771
Nosy	@rhettinger, @mdickinson, @bitdancer
Files	nocmp.diff: Remove cmp from C files.

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2008-01-30.20:16:11.740>
created_at = <Date 2008-01-09.00:02:05.825>
labels = ['interpreter-core']
title = 'Remove cmp parameter to list.sort() and builtin.sorted()'
updated_at = <Date 2010-07-20.20:34:50.482>
user = 'https://github.com/gvanrossum'

bugs.python.org fields:

activity = <Date 2010-07-20.20:34:50.482>
actor = 'metal'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2008-01-30.20:16:11.740>
closer = 'rhettinger'
components = ['Interpreter Core']
creation = <Date 2008-01-09.00:02:05.825>
creator = 'gvanrossum'
dependencies = []
files = ['9155']
hgrepos = []
issue_num = 1771
keywords = []
message_count = 26.0
messages = ['59575', '59578', '59579', '59867', '59877', '59885', '59937', '59939', '61877', '62476', '62477', '62480', '62481', '95973', '95975', '95982', '96024', '96026', '96034', '96058', '96062', '102019', '102022', '102030', '102031', '110969']
nosy_count = 7.0
nosy_names = ['rhettinger', 'mark.dickinson', 'dtorp', 'LeWiemann', 'r.david.murray', 'tixxit', 'metal']
pr_nums = []
priority = 'normal'
resolution = 'accepted'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue1771'
versions = ['Python 3.0']

gvanrossum · 2008-01-09T00:02:05Z

We should either change the API so you can pass in a '<' comparator, or
get rid of it completely (since key=... takes care of all use cases I
can think of).

rhettinger · 2008-01-09T00:37:53Z

I support removal; however, there is an uncommon corner-case that is
well served by __cmp__ where a primary key is sorted ascending and a
secondary key is sorted descending -- that case is a PITA with the key=
function because you need a way to invert the sense of the reversed
input (there are tricks for this but they are obscure).

rhettinger · 2008-01-09T00:42:20Z

Forgot to mention, the easy work-around is to do two consecutive sorts
and take advantage of the guaranteed stability:

  l.sort(key=secondary, reverse=True)
  l.sort(key=primary)

rhettinger · 2008-01-13T22:30:36Z

Let's do this. It is a nice simplification and makes the sort tools
easier to learn and use.

rhettinger · 2008-01-14T00:22:26Z

Patch attached for the C files making them much cleaner and a bit
faster. Still needs the related tests to be deleted and the docs
updated.

gvanrossum · 2008-01-14T02:33:53Z

Cool! Doesn't it feel good to rip out stuff? :-)

I do hope that you'll make sure all tests pass (-uall) before submitting
this.

rhettinger · 2008-01-14T23:24:36Z

Yes, it does feel great. The code is cleaner and faster. The API is
simple and matches all the other key= functions in
min/max/nsmallest/nlargest/groupby.

After more thought, I would like to make one more change and require the
arguments to be keywords such as sort(key=str.lower) but not
sort(str.lower).

The issue is that the cmp= interface has been around so long that it is
ingrained into our thinking and in our code. Having to write-out the
keyword makes the intent explicit and will avoid accidently passing in a
cmp= function when a key= function was intended. In Py3.1, the
restriction could be relaxed and l.sort(f) could be accepted for
l.sort(key=f).

For the 2-to-3 tool, I wrote a converter that automatically transitions
code currently using a custom compare function:

2.6 code: s.sort(cmp=lambda p, q: cmp(p.lower(), q.lower()))
3.0 code: s.sort(key=CmpToKey(lambda p, q: cmp(p.lower(), q.lower())))

Ideally, the automatcic conversion would be accompanied by a suggestion
to manually rewrite to something like:

3.0 code: s.sort(key=str.lower)

--- converter code ---

def CmpToKey(mycmp):
    'Convert a cmp= function into a key= function'
    class K(object):
        def __init__(self, obj, *args):
            self.obj = obj
        def __cmp__(self, other):
            return mycmp(self.obj, other.obj)
    return K

gvanrossum · 2008-01-14T23:35:45Z

After more thought, I would like to make one more change and require the
arguments to be keywords such as sort(key=str.lower) but not
sort(str.lower).

Works for me. (To be honest I thought key was already required to be a
keyword. :-)

rhettinger · 2008-01-30T20:15:45Z

Checked-in rev 60453.

LeWiemann · 2008-02-17T01:45:55Z

Is this really necessary?

I see that the sorting code gets a little simpler, but I believe that
there are valid use cases for cmp out there, where using a key would at
least be cumbersome. So why remove it when it's useful?

For instance, if you have an algorithm to determine the order in which
any two elements should occur (and for some reason the algorithm
satisfies transitivity) then it's usable as the cmp function, but
turning it into a key function may be complicated (and adversly affect
performance); you might end up having to map each element to a number
or tuple describing the ordering properties of the element, which can
be non-trivial. Besides, it can also be non-obvious.

LeWiemann · 2008-02-17T01:49:36Z

"Non-obvious" to the novice that this technique can be used, and non-
obvious to the reader of the program.

I'm envisioning key functions that return long sequences of -1/0/1
integers, or numbers between 0 and 2**50... Not good.

Instead of removing cmp, I'd suggest simply placing a note in the
documentation saying that key is preferred over cmp, to encourage
readability.

rhettinger · 2008-02-17T05:27:27Z

FWIW, an object with a complex element-to-element comparison can define
an __lt__() method and have it work with sort, min, max, etc.

LeWiemann · 2008-02-17T05:33:25Z

I know, but I don't always want to define the comparison on the object
itself, because it's not an intrinsic feature of the object. It's just
the way I happen to sort it right now. (That's especially likely if
the ordering algorithm is complex.)

tixxit · 2009-12-04T22:26:55Z

I am not sure I understand the reasoning behind removing the cmp
parameter (and agree with Lea Wiemann). Trying to wedge a proper
comparison into the key parameter is clumsy and unreadable (as can be
seen in the 2to3 example above). The intrinsic ordering on objects does
not necessarily match up with the way you want to sort them. For
example, a natural intrinsic order on 2 points in 2d is lexicographical,
however you often want to sort by angular order relative to some other
point instead. Clearly this can never be put in __cmp__ or __lt__,
because the sorted order is relative to some other unknown point. Trying
to do this with the key function doesn't make sense; it would not be
clear you are sorting by angular order and you'd have to instantiate a
bunch of wrapper objects just to do basic sorting. Another quick example
would be sorting hyperplanes by intersection on a ray. Sorting points
along a direction given by a vector.

I understand removing redundant features from a language, but I just
can't see how key replaces this functionality in a readable or efficient
way. This highlights an important class of cases (since it was mentioned
that none could be thought of) in which we wish to make comparisons
between values where a comparison (<, > or ==) is more numerically
sound, more efficient, or the only option (perhaps the ordering is
defined explicitly) then computing the exact values (eg. angle). As far
as it seems, the only way to do this with key is by following the
example given and creating a class solely to wrap each object that
overrides __cmp__, which is certainly non-obvious (ie. there is no one,
obvious way to do it).

gvanrossum · 2009-12-04T22:49:38Z

Can someone provide a code sample to make this argument more
understandable for those of us who don't compare points by angular order
for a living... :-)

I'm not sure what the 2to3 example (I presume you mean msg59937) shows
except that conversion from a cmp function to a key function may require
you to actually think...

Also, for all of you asking for cmp back, I hope you realize that
sorting N values using a custom cmp function makes about N log N calls
calls to cmp, whereas using a custom key calls the key function only N
times. This means that even if your cmp function is faster than the
best key function you can write, the advantage is lost as N increases
(which is just where you'd like it to matter most :-).

rhettinger · 2009-12-05T07:04:45Z

FWIW, we had a long discussion on comp.lang.python and the net result
was that no use cases were found that required a cmp function. One
complex case (sorting recursive tree structures) at first appeared to
need a cmp-function but was found to be simpler and faster using a
key-function. The net result of the conversation was the feeling that
people who have grown-up using cmp-functions in either Python, C or some
other language feel like they've lost something but really haven't. In
contrast, people who use SQL or spreadsheet database tools find that key
functions come naturally since neither supports cmp-functions, instead
preferring the user to specify primary and secondary key functions.

Also, it was pointed-out the removal of cmp-functions in sorted() and
list.sort() was part of a larger effort to remove all forms of cmp from
the whole language (i.e. the builtin cmp function is gone and so it the
__cmp__ magic method). Rich comparisons have completely supplanted all
uses of cmp-functions in the language as a whole -- having multiple ways
to do it was confusing.

In converting code from 2-to-3, we have found two sticky cases.

The first occurs when an API had exposed cmp functions to the end-user
(for example, unittest.getTestCaseNames() and unittest.makeSuite() have
an optional sortUsing parameter that allows the user to specify a
cmp-function). To support that use case (so that end-user API's would
not have to be changed), we added a CmpToKey() tool which automatically
converts cmp-functions to key functions. This tool is referenced in the
docs and it could be added to the 2-to-3 converter.

The second case occurs when a primary key is sorted ascending and a
secondary key is sorted descending. The technique for that is to take
advantage of sort stability and do two sorts:

   s.sort(key=secondary, reverse=True)
   s.sort(key=primary)   
   # now sorted by primary ascending, secondary descending

That technique is going to be documented in an update of the sorting
how-to. It doesn't seem to arise much in practice and the cmp function
equivalent seems to be harder for beginners to write (though at least it
can be done with a single cmp-function and a single sort).

mdickinson · 2009-12-06T11:57:29Z

Tom, I think I'm missing your point: all three of the examples you give
seem like perfect candidates for a key-based sort rather than a
comparison-based one. For the first example, couldn't you do something
like:

def direction(pt1, pt2):
    """angle of line segment from point 1 to point 2"""
    return atan2(pt2.y - pt1.y, pt2.x - pt1.x)

my_points.sort(key=lambda pt: direction(reference_pt, pt))

? How would having a cmp keyword argument make this any easier or
simpler?

Here's the best example I can think of for which key-based sorting is
problematic: imagine that the Decimal type doesn't exist, and that you
have triples (sign, coefficient_string, exponent) representing
arbitrary-precision base 10 floating-point numbers. It's fairly tricky
to come up with a key function that maps these triples to some existing
ordered type, so that they can be sorted in natural numerical order.
The problem lies in the way that the sort order for the coefficient
string and exponent depends on the value of the sign (one way for
positive numbers, reversed for negative numbers). But it's not a big
deal to define a wrapper for cases like this.

tixxit · 2009-12-06T16:24:41Z

Mark: I think your example actually helps illustrate my point. My point was
that computing the angle directly is less efficient or not as nice
numerically. For instance, if you are sorting points by angle relative to an
extreme point you could do something like this (a first step in the Graham
Scan) - be prepared for non-Python 3.0 code ;)

from functools import partial
from random import random

def turn(p, q, r):
    """Return -1, 0, or 1 if p,q,r forms a right, straight, or left turn
respectively."""
    return cmp((q[0] - p[0])*(r[1] - p[1]) - (r[0] - p[0])*(q[1] - p[1]), 0)

pts = [(random(), random()) for i in xrange(10)]
i = min(xrange(len(pts)), key=lambda i: pts[i])
p = pts.pop(i)
pts.sort(cmp=partial(turn, p))

Here our comparison function requires only 2 multiplications and 5
additions/subtractions. This function is nice especially if you are using
arbitrary precision or rational numbers as it is exact.

On Sun, Dec 6, 2009 at 6:57 AM, Mark Dickinson <report@bugs.python.org>wrote:

Mark Dickinson <dickinsm@gmail.com> added the comment:

Tom, I think I'm missing your point: all three of the examples you give
seem like perfect candidates for a key-based sort rather than a
comparison-based one. For the first example, couldn't you do something
like:

def direction(pt1, pt2):
"""angle of line segment from point 1 to point 2"""
return atan2(pt2.y - pt1.y, pt2.x - pt1.x)

my_points.sort(key=lambda pt: direction(reference_pt, pt))

? How would having a cmp keyword argument make this any easier or
simpler?

Here's the best example I can think of for which key-based sorting is
problematic: imagine that the Decimal type doesn't exist, and that you
have triples (sign, coefficient_string, exponent) representing
arbitrary-precision base 10 floating-point numbers. It's fairly tricky
to come up with a key function that maps these triples to some existing
ordered type, so that they can be sorted in natural numerical order.
The problem lies in the way that the sort order for the coefficient
string and exponent depends on the value of the sign (one way for
positive numbers, reversed for negative numbers). But it's not a big
deal to define a wrapper for cases like this.

----------
nosy: +mark.dickinson

Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue1771\>

mdickinson · 2009-12-06T17:27:31Z

Ah. Thanks for the explanation; I see your point. I guess you do just
have to use a CmpToKey-type wrapper for this sort of comparison.

Though for this *particular* example, it seems to me that you could still use a key function

lambda q: (q[0] - p[0])/(q[1]-p[1]),

which would be even more efficient. This is assuming that your extreme
point p has minimal second coordinate amongst points being sorted, which
as I understand it is how the Graham Scan typically starts. There's the
minor difficulty of dealing with points with the *same* second coordinate
as p, where I guess the key value should be some form of +Infinity. I can
also see that this might be problematic if you're working with a type
that's exact for addition and multiplication but not for division (e.g.,
Decimal with unbounded precision).

It would be interesting to see some relative timings for the sort stage of
the Graham scan (on a million random points, say): key function versus cmp
function.

tixxit · 2009-12-07T15:46:21Z

If the equal min y-coords are handled, I think it'd be quicker too. As Guido
noted, O(n) function calls is better then O(n log n) =] Though the general
case is still unhandled. And, though it doesn't help my case, the Graham
Scan can also be performed on points sorted lexicographically too, by
constructing the upper & lower hull separately, hehe.

Now, I understand cmp on the whole was removed from the language. Using
__lt__, __eq__, etc. really is more natural. However, having an explicit cmp
function for sorting makes sense to me. At the very least, it is more
obvious and natural for some problems - though I respect that using a key
func. is often faster. In some rare (though "rare" is very subjective) cases
it is required; packing a cmp function into __cmp__ in a wrapper object is
really just a hard-to-read cmp function and highlights the need for cmp. I
would actually love to see it added for min/max too actually, since I find I
often use a simple reduce function in place of a min(lst, cmp=...).
Enforcing proper comparisons (<, >, ==, etc) makes sense, but would having
the cmp function live, so to speak, in sorting really be that bad? Just
inform the user in the docs that key is preferred and often faster.

gvanrossum · 2009-12-07T18:10:28Z

Python's sort implementation is carefully written to only use the "<"
comparison, ever. So a cmp really isn't the most natural way to specify
a comparison. (This should really be documented somewhere -- I know know
it because Tim Peters & I shared an office while he did most of the work
on Python's sort.)

metal · 2010-03-31T16:36:58Z

I have a tree:

A
/ \
B C
/ \
D E

which is implemented as a dict

tree = {
  'A': set(['B', 'C']),
  'B': set(['D', 'E']), 
  'C': set(),
  'D': set(),
  'E': set(),
}

I want to sort the nodes.

and I don't know how to write a key function for sort() in this situation

so I write a cmp function:

sorted(tree, cmp=lambda x, y: 1 if x in tree[y] else -1 if y in tree[x] else 0)

and it gets ['A', 'C', 'B', 'E', 'D'].

how to convert cmp to key really confused me and it surely need more typing time.

so I disagree the removal

dtorp · 2010-03-31T17:40:05Z

sorted(tree, cmp=lambda x, y: 1 if x in tree[y] else -1 if y in tree[x] else 0)

and it gets ['A', 'C', 'B', 'E', 'D'].

That cmp function is nonsense and isn't even close to being correct:

>>> from random import shuffle
>>> for i in range(10):
... 	t = list(tree)
... 	shuffle(t)
... 	print sorted(t, cmp=lambda x, y: 1 if x in tree[y] else -1 if y in tree[x] else 0)
	
['E', 'C', 'B', 'D', 'A']
['A', 'D', 'C', 'B', 'E']
['C', 'B', 'E', 'D', 'A']
['E', 'D', 'A', 'C', 'B']
['A', 'B', 'D', 'E', 'C']
['D', 'A', 'E', 'C', 'B']
['C', 'D', 'A', 'B', 'E']
['A', 'C', 'B', 'D', 'E']
['A', 'C', 'B', 'E', 'D']
['A', 'C', 'B', 'D', 'E']

how to convert cmp to key really confused
me and it surely need more typing time.

Just cut and paste the recipe. Simple.

metal · 2010-03-31T19:14:28Z

Sorry I ripped the code from a mess and I forget the tree is "leaflized" as

tree = {
  'A': set(['B', 'C', 'D', 'E']),
  'B': set(['D', 'E']), 
  'C': set(),
  'D': set(),
  'E': set(),
}

I don't want to talk about the actual problem. I think sometime
it's hard to give an "absolute" weight value as key, for this example,
is the relationship in graph. Then user have to "Copy and paste the recipe" to get a cmp function which should be already there. We are
all adults here, why don't SIMPLELY tell the newbie don't use cmp() use key() unless you know what you are doing.

Thanks for reply.

bitdancer · 2010-03-31T20:57:43Z

cmp is gone. It's chances of coming back are close enough to zero that an assertAlmostEqual test will pass :). The rest of the discussion should move to one of the general python lists.

metal · 2010-07-20T20:34:50Z

Shame on me, after a long time I realized the problem referenced in my old post (http://bugs.python.org/msg102019) was actually topological sorting. It can't be done by Python's sort(), which doesn't support partial order. Trying to use cmp parameter is completely wrong. And cmp would mislead people like me to sort a partial order, evil! Now I'm absolutely agree with gone of cmp, viva Raymond Hettinger!

gvanrossum added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jan 9, 2008

rhettinger self-assigned this Jan 13, 2008

rhettinger closed this as completed Jan 30, 2008

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove cmp parameter to list.sort() and builtin.sorted() #46104

Remove cmp parameter to list.sort() and builtin.sorted() #46104

gvanrossum commented Jan 9, 2008

gvanrossum commented Jan 9, 2008

rhettinger commented Jan 9, 2008

rhettinger commented Jan 9, 2008

rhettinger commented Jan 13, 2008

rhettinger commented Jan 14, 2008

gvanrossum commented Jan 14, 2008

rhettinger commented Jan 14, 2008

gvanrossum commented Jan 14, 2008

rhettinger commented Jan 30, 2008

LeWiemann mannequin commented Feb 17, 2008

LeWiemann mannequin commented Feb 17, 2008

rhettinger commented Feb 17, 2008

LeWiemann mannequin commented Feb 17, 2008

tixxit mannequin commented Dec 4, 2009

gvanrossum commented Dec 4, 2009

rhettinger commented Dec 5, 2009

mdickinson commented Dec 6, 2009

tixxit mannequin commented Dec 6, 2009

mdickinson commented Dec 6, 2009

tixxit mannequin commented Dec 7, 2009

gvanrossum commented Dec 7, 2009

metal mannequin commented Mar 31, 2010

dtorp mannequin commented Mar 31, 2010

metal mannequin commented Mar 31, 2010

bitdancer commented Mar 31, 2010

metal mannequin commented Jul 20, 2010

Remove cmp parameter to list.sort() and builtin.sorted() #46104

Remove cmp parameter to list.sort() and builtin.sorted() #46104

Comments

gvanrossum commented Jan 9, 2008

gvanrossum commented Jan 9, 2008

rhettinger commented Jan 9, 2008

rhettinger commented Jan 9, 2008

rhettinger commented Jan 13, 2008

rhettinger commented Jan 14, 2008

gvanrossum commented Jan 14, 2008

rhettinger commented Jan 14, 2008

gvanrossum commented Jan 14, 2008

rhettinger commented Jan 30, 2008

LeWiemann mannequin commented Feb 17, 2008

LeWiemann mannequin commented Feb 17, 2008

rhettinger commented Feb 17, 2008

LeWiemann mannequin commented Feb 17, 2008

tixxit mannequin commented Dec 4, 2009

gvanrossum commented Dec 4, 2009

rhettinger commented Dec 5, 2009

mdickinson commented Dec 6, 2009

tixxit mannequin commented Dec 6, 2009

mdickinson commented Dec 6, 2009

tixxit mannequin commented Dec 7, 2009

gvanrossum commented Dec 7, 2009

metal mannequin commented Mar 31, 2010

dtorp mannequin commented Mar 31, 2010

metal mannequin commented Mar 31, 2010

bitdancer commented Mar 31, 2010

metal mannequin commented Jul 20, 2010