Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove cmp parameter to list.sort() and builtin.sorted() #46104

Closed
gvanrossum opened this issue Jan 9, 2008 · 26 comments
Closed

Remove cmp parameter to list.sort() and builtin.sorted() #46104

gvanrossum opened this issue Jan 9, 2008 · 26 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@gvanrossum
Copy link
Member

BPO 1771
Nosy @rhettinger, @mdickinson, @bitdancer
Files
  • nocmp.diff: Remove cmp from C files.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2008-01-30.20:16:11.740>
    created_at = <Date 2008-01-09.00:02:05.825>
    labels = ['interpreter-core']
    title = 'Remove cmp parameter to list.sort() and builtin.sorted()'
    updated_at = <Date 2010-07-20.20:34:50.482>
    user = 'https://github.com/gvanrossum'

    bugs.python.org fields:

    activity = <Date 2010-07-20.20:34:50.482>
    actor = 'metal'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2008-01-30.20:16:11.740>
    closer = 'rhettinger'
    components = ['Interpreter Core']
    creation = <Date 2008-01-09.00:02:05.825>
    creator = 'gvanrossum'
    dependencies = []
    files = ['9155']
    hgrepos = []
    issue_num = 1771
    keywords = []
    message_count = 26.0
    messages = ['59575', '59578', '59579', '59867', '59877', '59885', '59937', '59939', '61877', '62476', '62477', '62480', '62481', '95973', '95975', '95982', '96024', '96026', '96034', '96058', '96062', '102019', '102022', '102030', '102031', '110969']
    nosy_count = 7.0
    nosy_names = ['rhettinger', 'mark.dickinson', 'dtorp', 'LeWiemann', 'r.david.murray', 'tixxit', 'metal']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue1771'
    versions = ['Python 3.0']

    @gvanrossum
    Copy link
    Member Author

    We should either change the API so you can pass in a '<' comparator, or
    get rid of it completely (since key=... takes care of all use cases I
    can think of).

    @gvanrossum gvanrossum added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jan 9, 2008
    @rhettinger
    Copy link
    Contributor

    I support removal; however, there is an uncommon corner-case that is
    well served by __cmp__ where a primary key is sorted ascending and a
    secondary key is sorted descending -- that case is a PITA with the key=
    function because you need a way to invert the sense of the reversed
    input (there are tricks for this but they are obscure).

    @rhettinger
    Copy link
    Contributor

    Forgot to mention, the easy work-around is to do two consecutive sorts
    and take advantage of the guaranteed stability:

      l.sort(key=secondary, reverse=True)
      l.sort(key=primary)

    @rhettinger
    Copy link
    Contributor

    Let's do this. It is a nice simplification and makes the sort tools
    easier to learn and use.

    @rhettinger rhettinger self-assigned this Jan 13, 2008
    @rhettinger
    Copy link
    Contributor

    Patch attached for the C files making them much cleaner and a bit
    faster. Still needs the related tests to be deleted and the docs
    updated.

    @gvanrossum
    Copy link
    Member Author

    Cool! Doesn't it feel good to rip out stuff? :-)

    I do hope that you'll make sure all tests pass (-uall) before submitting
    this.

    @rhettinger
    Copy link
    Contributor

    Yes, it does feel great. The code is cleaner and faster. The API is
    simple and matches all the other key= functions in
    min/max/nsmallest/nlargest/groupby.

    After more thought, I would like to make one more change and require the
    arguments to be keywords such as sort(key=str.lower) but not
    sort(str.lower).

    The issue is that the cmp= interface has been around so long that it is
    ingrained into our thinking and in our code. Having to write-out the
    keyword makes the intent explicit and will avoid accidently passing in a
    cmp= function when a key= function was intended. In Py3.1, the
    restriction could be relaxed and l.sort(f) could be accepted for
    l.sort(key=f).

    For the 2-to-3 tool, I wrote a converter that automatically transitions
    code currently using a custom compare function:

    2.6 code: s.sort(cmp=lambda p, q: cmp(p.lower(), q.lower()))
    3.0 code: s.sort(key=CmpToKey(lambda p, q: cmp(p.lower(), q.lower())))

    Ideally, the automatcic conversion would be accompanied by a suggestion
    to manually rewrite to something like:

    3.0 code: s.sort(key=str.lower)

    --- converter code ---

    def CmpToKey(mycmp):
        'Convert a cmp= function into a key= function'
        class K(object):
            def __init__(self, obj, *args):
                self.obj = obj
            def __cmp__(self, other):
                return mycmp(self.obj, other.obj)
        return K

    @gvanrossum
    Copy link
    Member Author

    After more thought, I would like to make one more change and require the
    arguments to be keywords such as sort(key=str.lower) but not
    sort(str.lower).

    Works for me. (To be honest I thought key was already required to be a
    keyword. :-)

    @rhettinger
    Copy link
    Contributor

    Checked-in rev 60453.

    @LeWiemann
    Copy link
    Mannequin

    LeWiemann mannequin commented Feb 17, 2008

    Is this really necessary?

    I see that the sorting code gets a little simpler, but I believe that
    there are valid use cases for cmp out there, where using a key would at
    least be cumbersome. So why remove it when it's useful?

    For instance, if you have an algorithm to determine the order in which
    any two elements should occur (and for some reason the algorithm
    satisfies transitivity) then it's usable as the cmp function, but
    turning it into a key function may be complicated (and adversly affect
    performance); you might end up having to map each element to a number
    or tuple describing the ordering properties of the element, which can
    be non-trivial. Besides, it can also be non-obvious.

    @LeWiemann
    Copy link
    Mannequin

    LeWiemann mannequin commented Feb 17, 2008

    "Non-obvious" to the novice that this technique can be used, and non-
    obvious to the reader of the program.

    I'm envisioning key functions that return long sequences of -1/0/1
    integers, or numbers between 0 and 2**50... Not good.

    Instead of removing cmp, I'd suggest simply placing a note in the
    documentation saying that key is preferred over cmp, to encourage
    readability.

    @rhettinger
    Copy link
    Contributor

    FWIW, an object with a complex element-to-element comparison can define
    an __lt__() method and have it work with sort, min, max, etc.

    @LeWiemann
    Copy link
    Mannequin

    LeWiemann mannequin commented Feb 17, 2008

    I know, but I don't always want to define the comparison on the object
    itself, because it's not an intrinsic feature of the object. It's just
    the way I happen to sort it right now. (That's especially likely if
    the ordering algorithm is complex.)

    @tixxit
    Copy link
    Mannequin

    tixxit mannequin commented Dec 4, 2009

    I am not sure I understand the reasoning behind removing the cmp
    parameter (and agree with Lea Wiemann). Trying to wedge a proper
    comparison into the key parameter is clumsy and unreadable (as can be
    seen in the 2to3 example above). The intrinsic ordering on objects does
    not necessarily match up with the way you want to sort them. For
    example, a natural intrinsic order on 2 points in 2d is lexicographical,
    however you often want to sort by angular order relative to some other
    point instead. Clearly this can never be put in __cmp__ or __lt__,
    because the sorted order is relative to some other unknown point. Trying
    to do this with the key function doesn't make sense; it would not be
    clear you are sorting by angular order and you'd have to instantiate a
    bunch of wrapper objects just to do basic sorting. Another quick example
    would be sorting hyperplanes by intersection on a ray. Sorting points
    along a direction given by a vector.

    I understand removing redundant features from a language, but I just
    can't see how key replaces this functionality in a readable or efficient
    way. This highlights an important class of cases (since it was mentioned
    that none could be thought of) in which we wish to make comparisons
    between values where a comparison (<, > or ==) is more numerically
    sound, more efficient, or the only option (perhaps the ordering is
    defined explicitly) then computing the exact values (eg. angle). As far
    as it seems, the only way to do this with key is by following the
    example given and creating a class solely to wrap each object that
    overrides __cmp__, which is certainly non-obvious (ie. there is no one,
    obvious way to do it).

    @gvanrossum
    Copy link
    Member Author

    Can someone provide a code sample to make this argument more
    understandable for those of us who don't compare points by angular order
    for a living... :-)

    I'm not sure what the 2to3 example (I presume you mean msg59937) shows
    except that conversion from a cmp function to a key function may require
    you to actually think...

    Also, for all of you asking for cmp back, I hope you realize that
    sorting N values using a custom cmp function makes about N log N calls
    calls to cmp, whereas using a custom key calls the key function only N
    times. This means that even if your cmp function is faster than the
    best key function you can write, the advantage is lost as N increases
    (which is just where you'd like it to matter most :-).

    @rhettinger
    Copy link
    Contributor

    FWIW, we had a long discussion on comp.lang.python and the net result
    was that no use cases were found that required a cmp function. One
    complex case (sorting recursive tree structures) at first appeared to
    need a cmp-function but was found to be simpler and faster using a
    key-function. The net result of the conversation was the feeling that
    people who have grown-up using cmp-functions in either Python, C or some
    other language feel like they've lost something but really haven't. In
    contrast, people who use SQL or spreadsheet database tools find that key
    functions come naturally since neither supports cmp-functions, instead
    preferring the user to specify primary and secondary key functions.

    Also, it was pointed-out the removal of cmp-functions in sorted() and
    list.sort() was part of a larger effort to remove all forms of cmp from
    the whole language (i.e. the builtin cmp function is gone and so it the
    __cmp__ magic method). Rich comparisons have completely supplanted all
    uses of cmp-functions in the language as a whole -- having multiple ways
    to do it was confusing.

    In converting code from 2-to-3, we have found two sticky cases.

    The first occurs when an API had exposed cmp functions to the end-user
    (for example, unittest.getTestCaseNames() and unittest.makeSuite() have
    an optional sortUsing parameter that allows the user to specify a
    cmp-function). To support that use case (so that end-user API's would
    not have to be changed), we added a CmpToKey() tool which automatically
    converts cmp-functions to key functions. This tool is referenced in the
    docs and it could be added to the 2-to-3 converter.

    The second case occurs when a primary key is sorted ascending and a
    secondary key is sorted descending. The technique for that is to take
    advantage of sort stability and do two sorts:

       s.sort(key=secondary, reverse=True)
       s.sort(key=primary)   
       # now sorted by primary ascending, secondary descending

    That technique is going to be documented in an update of the sorting
    how-to. It doesn't seem to arise much in practice and the cmp function
    equivalent seems to be harder for beginners to write (though at least it
    can be done with a single cmp-function and a single sort).

    @mdickinson
    Copy link
    Member

    Tom, I think I'm missing your point: all three of the examples you give
    seem like perfect candidates for a key-based sort rather than a
    comparison-based one. For the first example, couldn't you do something
    like:

    def direction(pt1, pt2):
        """angle of line segment from point 1 to point 2"""
        return atan2(pt2.y - pt1.y, pt2.x - pt1.x)
    
    my_points.sort(key=lambda pt: direction(reference_pt, pt))

    ? How would having a cmp keyword argument make this any easier or
    simpler?

    Here's the best example I can think of for which key-based sorting is
    problematic: imagine that the Decimal type doesn't exist, and that you
    have triples (sign, coefficient_string, exponent) representing
    arbitrary-precision base 10 floating-point numbers. It's fairly tricky
    to come up with a key function that maps these triples to some existing
    ordered type, so that they can be sorted in natural numerical order.
    The problem lies in the way that the sort order for the coefficient
    string and exponent depends on the value of the sign (one way for
    positive numbers, reversed for negative numbers). But it's not a big
    deal to define a wrapper for cases like this.

    @tixxit
    Copy link
    Mannequin

    tixxit mannequin commented Dec 6, 2009

    Mark: I think your example actually helps illustrate my point. My point was
    that computing the angle directly is less efficient or not as nice
    numerically. For instance, if you are sorting points by angle relative to an
    extreme point you could do something like this (a first step in the Graham
    Scan) - be prepared for non-Python 3.0 code ;)

    from functools import partial
    from random import random
    
    def turn(p, q, r):
        """Return -1, 0, or 1 if p,q,r forms a right, straight, or left turn
    respectively."""
        return cmp((q[0] - p[0])*(r[1] - p[1]) - (r[0] - p[0])*(q[1] - p[1]), 0)
    
    pts = [(random(), random()) for i in xrange(10)]
    i = min(xrange(len(pts)), key=lambda i: pts[i])
    p = pts.pop(i)
    pts.sort(cmp=partial(turn, p))

    Here our comparison function requires only 2 multiplications and 5
    additions/subtractions. This function is nice especially if you are using
    arbitrary precision or rational numbers as it is exact.

    On Sun, Dec 6, 2009 at 6:57 AM, Mark Dickinson <report@bugs.python.org>wrote:

    Mark Dickinson <dickinsm@gmail.com> added the comment:

    Tom, I think I'm missing your point: all three of the examples you give
    seem like perfect candidates for a key-based sort rather than a
    comparison-based one. For the first example, couldn't you do something
    like:

    def direction(pt1, pt2):
    """angle of line segment from point 1 to point 2"""
    return atan2(pt2.y - pt1.y, pt2.x - pt1.x)

    my_points.sort(key=lambda pt: direction(reference_pt, pt))

    ? How would having a cmp keyword argument make this any easier or
    simpler?

    Here's the best example I can think of for which key-based sorting is
    problematic: imagine that the Decimal type doesn't exist, and that you
    have triples (sign, coefficient_string, exponent) representing
    arbitrary-precision base 10 floating-point numbers. It's fairly tricky
    to come up with a key function that maps these triples to some existing
    ordered type, so that they can be sorted in natural numerical order.
    The problem lies in the way that the sort order for the coefficient
    string and exponent depends on the value of the sign (one way for
    positive numbers, reversed for negative numbers). But it's not a big
    deal to define a wrapper for cases like this.

    ----------
    nosy: +mark.dickinson


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1771\>


    @mdickinson
    Copy link
    Member

    Ah. Thanks for the explanation; I see your point. I guess you do just
    have to use a CmpToKey-type wrapper for this sort of comparison.

    Though for this *particular* example, it seems to me that you could still use a key function

    lambda q: (q[0] - p[0])/(q[1]-p[1]),

    which would be even more efficient. This is assuming that your extreme
    point p has minimal second coordinate amongst points being sorted, which
    as I understand it is how the Graham Scan typically starts. There's the
    minor difficulty of dealing with points with the *same* second coordinate
    as p, where I guess the key value should be some form of +Infinity. I can
    also see that this might be problematic if you're working with a type
    that's exact for addition and multiplication but not for division (e.g.,
    Decimal with unbounded precision).

    It would be interesting to see some relative timings for the sort stage of
    the Graham scan (on a million random points, say): key function versus cmp
    function.

    @tixxit
    Copy link
    Mannequin

    tixxit mannequin commented Dec 7, 2009

    If the equal min y-coords are handled, I think it'd be quicker too. As Guido
    noted, O(n) function calls is better then O(n log n) =] Though the general
    case is still unhandled. And, though it doesn't help my case, the Graham
    Scan can also be performed on points sorted lexicographically too, by
    constructing the upper & lower hull separately, hehe.

    Now, I understand cmp on the whole was removed from the language. Using
    __lt__, __eq__, etc. really is more natural. However, having an explicit cmp
    function for sorting makes sense to me. At the very least, it is more
    obvious and natural for some problems - though I respect that using a key
    func. is often faster. In some rare (though "rare" is very subjective) cases
    it is required; packing a cmp function into __cmp__ in a wrapper object is
    really just a hard-to-read cmp function and highlights the need for cmp. I
    would actually love to see it added for min/max too actually, since I find I
    often use a simple reduce function in place of a min(lst, cmp=...).
    Enforcing proper comparisons (<, >, ==, etc) makes sense, but would having
    the cmp function live, so to speak, in sorting really be that bad? Just
    inform the user in the docs that key is preferred and often faster.

    @gvanrossum
    Copy link
    Member Author

    Python's sort implementation is carefully written to only use the "<"
    comparison, ever. So a cmp really isn't the most natural way to specify
    a comparison. (This should really be documented somewhere -- I know know
    it because Tim Peters & I shared an office while he did most of the work
    on Python's sort.)

    @metal
    Copy link
    Mannequin

    metal mannequin commented Mar 31, 2010

    I have a tree:

    A
    / \
    B C
    / \
    D E

    which is implemented as a dict

    tree = {
      'A': set(['B', 'C']),
      'B': set(['D', 'E']), 
      'C': set(),
      'D': set(),
      'E': set(),
    }

    I want to sort the nodes.

    and I don't know how to write a key function for sort() in this situation

    so I write a cmp function:

    sorted(tree, cmp=lambda x, y: 1 if x in tree[y] else -1 if y in tree[x] else 0)

    and it gets ['A', 'C', 'B', 'E', 'D'].

    how to convert cmp to key really confused me and it surely need more typing time.

    so I disagree the removal

    @dtorp
    Copy link
    Mannequin

    dtorp mannequin commented Mar 31, 2010

    sorted(tree, cmp=lambda x, y: 1 if x in tree[y] else -1 if y in tree[x] else 0)

    and it gets ['A', 'C', 'B', 'E', 'D'].

    That cmp function is nonsense and isn't even close to being correct:

    >>> from random import shuffle
    >>> for i in range(10):
    ... 	t = list(tree)
    ... 	shuffle(t)
    ... 	print sorted(t, cmp=lambda x, y: 1 if x in tree[y] else -1 if y in tree[x] else 0)
    	
    ['E', 'C', 'B', 'D', 'A']
    ['A', 'D', 'C', 'B', 'E']
    ['C', 'B', 'E', 'D', 'A']
    ['E', 'D', 'A', 'C', 'B']
    ['A', 'B', 'D', 'E', 'C']
    ['D', 'A', 'E', 'C', 'B']
    ['C', 'D', 'A', 'B', 'E']
    ['A', 'C', 'B', 'D', 'E']
    ['A', 'C', 'B', 'E', 'D']
    ['A', 'C', 'B', 'D', 'E']

    how to convert cmp to key really confused
    me and it surely need more typing time.

    Just cut and paste the recipe. Simple.

    @metal
    Copy link
    Mannequin

    metal mannequin commented Mar 31, 2010

    Sorry I ripped the code from a mess and I forget the tree is "leaflized" as

    tree = {
      'A': set(['B', 'C', 'D', 'E']),
      'B': set(['D', 'E']), 
      'C': set(),
      'D': set(),
      'E': set(),
    }

    I don't want to talk about the actual problem. I think sometime
    it's hard to give an "absolute" weight value as key, for this example,
    is the relationship in graph. Then user have to "Copy and paste the recipe" to get a cmp function which should be already there. We are
    all adults here, why don't SIMPLELY tell the newbie don't use cmp() use key() unless you know what you are doing.

    Thanks for reply.

    @bitdancer
    Copy link
    Member

    cmp is gone. It's chances of coming back are close enough to zero that an assertAlmostEqual test will pass :). The rest of the discussion should move to one of the general python lists.

    @metal
    Copy link
    Mannequin

    metal mannequin commented Jul 20, 2010

    Shame on me, after a long time I realized the problem referenced in my old post (http://bugs.python.org/msg102019) was actually topological sorting. It can't be done by Python's sort(), which doesn't support partial order. Trying to use cmp parameter is completely wrong. And cmp would mislead people like me to sort a partial order, evil! Now I'm absolutely agree with gone of cmp, viva Raymond Hettinger!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants