Skip to content

Commit

Permalink
Comparing and Sorting: Additions and reordering
Browse files Browse the repository at this point in the history
* Reorder sections to make more sense when read sequentially
* In each section, mention
  - what changed
  - why it changed
  - how it changed
  - how to fix it
* Reuse the Person class in "The cmp Argument" to make the example
  more natural (though it is longer this way, sadly)
* Add a new (empty) section -- Unorderable Types
  • Loading branch information
encukou committed Sep 21, 2016
1 parent f694cc2 commit db8ea1f
Showing 1 changed file with 182 additions and 91 deletions.
273 changes: 182 additions & 91 deletions source/comparisons.rst
Original file line number Diff line number Diff line change
@@ -1,83 +1,71 @@
Comparing and Sorting
---------------------

Comparing and sorting undergo a large number of changes in Python 3 but you
can use a lot of functionality described below since Python 2.4.
Python 3 is strict when comparing objects of disparate types. It also drops
*cmp*-based comparison and sorting in favor of rich comparisons
and key-based sorting, modern alternatives that have been available at least
since Python 2.4.
Details and porting strategies follow.

In short, the ``__cmp__()`` special method is never called, there is no ``cmp``
parameter to any of the sorting-related functions, and there is no builtin
``cmp()`` function.
Unorderable Types
~~~~~~~~~~~~~~~~~


The ``cmp`` Argument
~~~~~~~~~~~~~~~~~~~~

* :ref:`Fixer <python-modernize>`: *None*
* Prevalence: Common

In Python 2, there is ``cmp`` agrument in ``.sort()`` or ``sorted()`` functions
which influence order in sorting process. ``cmp`` argument contains function
that returns -1, 0 or 1 when comparing objects. For example::

>>> def compare(a, b):
... """ Compare objects from last letter to first"""
... return cmp(a[::-1], b[::-1])
>>> animals = ['dog', 'cat', 'horse', 'cow']
>>> sorted(animals, cmp=compare)
['horse', 'dog', 'cat', 'cow']

In Python 3, ``cmp`` is gone. Instead ``cmp`` there is a ``key`` parameter
which contains a function that returs the key under which to sort.

The difference is then mainly in fact that instead of function that compares
two values directly there is a function that simply returns one value which
will then be compared. Same example implemented in new way with ``key``
parameter::

>>> def keyfunction(item):
... """Key for comparison that returns reversed string"""
... return item[::-1]
>>> animals = ['dog', 'cat', 'horse', 'cow']
>>> sorted(animals, key=keyfunction)
['horse', 'dog', 'cat', 'cow']

Using ``key`` parameter is easier and faster because in case of ``cmp``
function for comparison needs to be called multiple times for one item
in set while ``key`` function is called only once for each item in set.

Another advantage of the functions returning key is that it can be easily
used as lambda. Again the same example as before::

>>> animals = ['dog', 'cat', 'horse', 'cow']
>>> sorted(animals, key=lambda item: item[::-1])
['horse', 'dog', 'cat', 'cow']


The ``cmp`` Function
~~~~~~~~~~~~~~~~~~~~

* :ref:`Fixer <python-modernize>`: *None*
* Prevalence: Common

Since having ``__cmp__()`` and rich comparison methods goes against the
principle of there is only one obvious way to do something, Python 3
ignores the ``__cmp__()`` method. Also, the cmp() function is gone.

If you really need the ``cmp()`` functionality, you could use the expression::

(a > b) - (a < b)

as the equivalent for cmp(a, b), but rich comparisons gives you a good way
to handle this changes.
XXX

Rich Comparisons
~~~~~~~~~~~~~~~~

* :ref:`Fixer <python-modernize>`: *None*
* Prevalence: Common

Suppose that you have a class to represent person with ``__cmp__()``
implemented::
The :meth:`~py2:object.__cmp__` special method is no longer honored in Python 3.

In Python 2, ``__cmp__(self, other)`` implemented comparison between two
objects, returning a negative value if ``self < other``, positive if
``self > other``, and zero if they were equal.

This approach of trepresenting comparison results is common in C-style
languages. But, early in Python 2 development, it became apparent that
only allowing three cases for the relative order of objects is too limiting.

This led to the introduction of *rich comparison* methods, which assign a
special method to each operator:

======== ============
Operator Method
======== ============
== ``__eq__``
!= ``__ne__``
< ``__lt__``
<= ``__le__``
> ``__gt__``
>= ``__ge__``
======== ============

Each takes the same two arguments as *cmp*, and must return either a result
value (typically Boolean), raise an exception, or return ``NotImplemented``
to signal the operation is not defined.

In Python 3, the *cmp* style of comparisons was dropped.
All objects that implemented ``__cmp__`` must be updated to implement *all* of
the rich methods instead.
(There is one exception: on Python 3, ``__ne__`` will, by default, delegate to
``__eq__`` and return the inverted result . However, this is *not* the case
in Python 2.)

To avoid the hassle of providing all six functions, you can implement
``__eq__``, ``__ne__``, and only one of the ordering operators, and use the
:func:`functools.total_ordering` decorator to fill in the rest.
Note that the decorator is not available in Python 2.6. If you need
to support that version, you'll need to supply all six methods.

The ``@total_ordering`` decorator does come at the cost of somewhat slower
execution and more complex stack traces for the derived comparison methods,
so defining all six explicitly may be necessary in some cases even if
Python 2.6 support is dropped.

As an example, suppose that you have a class to represent a person with
``__cmp__()`` implemented::

class Person(object):
def __init__(self, firstname, lastname):
Expand All @@ -90,36 +78,32 @@ implemented::
def __repr__(self):
return "%s %s" % (self.first, self.last)

If only thing you need to support is sorting, you just need to change
``__cmp__()`` implementation to ``__lt__()``. The previous example will
look like this::
With ``total_ordering``, the class would become::

from functools import total_ordering

@total_ordering
class Person(object):

def __init__(self, firstname, lastname):
self.first = firstname
self.last = lastname
self.first = firstname
self.last = lastname

def __eq__(self, other):
return ((self.last, self.first) == (other.last, other.first))

def __ne__(self, other):
return not (self == other)

def __lt__(self, other):
return ((self.last, self.first) < (other.last, other.first))

def __repr__(self):
return "%s %s" % (self.first, self.last)

Since Python 3.2 there is a simple way how to support other comparison
operators without separated implementation for each of them. Solution is
``@total_ordering`` decorator from ``functools`` module.

If you want to use ``@total_ordering`` decorator, your class only has to
implement one of ``__lt__()``, ``__le__()``, ``__gt__()``, or ``__ge__()``
and in addition it should implement ``__eq__()``. If these conditions are
satisfied, you can use ``@total_ordering`` to gain the rest of comparison
operators in your class.

Final implementation might look like this::

from functools import total_ordering
If ``total_ordering`` cannot be used, or if efficiency is important,
all methods can be given explicitly::

@total_ordering
class Person(object):

def __init__(self, firstname, lastname):
Expand All @@ -129,13 +113,120 @@ Final implementation might look like this::
def __eq__(self, other):
return ((self.last, self.first) == (other.last, other.first))

def __ne__(self, other):
return ((self.last, self.first) != (other.last, other.first))

def __lt__(self, other):
return ((self.last, self.first) < (other.last, other.first))

def __le__(self, other):
return ((self.last, self.first) <= (other.last, other.first))

def __gt__(self, other):
return ((self.last, self.first) > (other.last, other.first))

def __ge__(self, other):
return ((self.last, self.first) >= (other.last, other.first))

def __repr__(self):
return "%s %s" % (self.first, self.last)

But sometimes it might be better to implement all six comparison methods
manually because easy solution with ``@total_ordering`` does come at
the cost of slower execution and more complex stack traces for the
derived comparison methods.

The ``cmp`` Function
~~~~~~~~~~~~~~~~~~~~

* :ref:`Fixer <python-modernize>`: *None*
* Prevalence: Common

As part of the move away from *cmp*-style comparisons, the :func:`py2:cmp`
function was removed in Python 3.

If it is necessary – usually to conform to an external API, it can be provided
by this code::

def cmp(x, y):
"""
Replacement for built-in funciton cmp that was removed in Python 3

Compare the two objects x and y and return an integer according to
the outcome. The return value is negative if x < y, zero if x == y
and strictly positive if x > y.
"""

return (x > y) - (x < y)

The expression used is not straightforward, so if you need the functionality,
we recommend adding the full, documented function to your project's utility
library.


The ``cmp`` Argument
~~~~~~~~~~~~~~~~~~~~

* :ref:`Fixer <python-modernize>`: *None*
* Prevalence: Uncommon

In Python 2, ``.sort()`` or ``sorted()`` functions have a ``cmp`` parameter,
which determines the sort order. The argument for ``cmp`` is a function
that, like all *cmp*-style functions, returns a negative, zero, or positive
result depending on the order of its two arguments.

For example, given a list of instances of a Person class (defined above)::

>>> actors = [Person('Eric', 'Idle'),
... Person('John', 'Cleese'),
... Person('Michael', 'Palin'),
... Person('Terry', 'Gilliam'),
... Person('Terry', 'Jones')]
...

one way to sort it by last name in Python 2 would be::

>>> def cmp_last_name(a, b):
... """ Compare names by last name"""
... return cmp(a.last, b.last)
...
>>> sorted(actors, cmp=cmp_last_name)
['John Cleese', 'Terry Gilliam', 'Eric Idle', 'Terry Jones', 'Michael Palin']

This function is called many times – O(*n* log *n*) – during the comparison.

As an alternative to *cmp*, sorting functions can take a keyword-only ``key``
parameter, a function that returs the key under which to sort::

>>> def keyfunction(item):
... """Key for comparison by last name"""
... return item.last
...
>>> sorted(actors, key=keyfunction)
['John Cleese', 'Terry Gilliam', 'Eric Idle', 'Terry Jones', 'Michael Palin']

The advantage of this approach is that this function is called only once for
each item.
When simple types such as tuples, strings, and numbers are used for keys,
the many comparisons are then handled by optimized C code.
Also, in most cases key functions are more readable than *cmp*: usually,
one thinks of sorting by some aspect of an object (such as last name),
rather than by comparing individual objects.
The main disdvantage is that the old *cmp* style is commonly used in
C-language APIs, so external libraries are likely to provide similar functions.

In Python 3, the ``cmp`` parameter was removed, and only ``key`` (or no
argument at all) can be used.

There is no fixer for this change.
However, discovering it is straightforward: the calling ``sort`` with the
``cmp`` argument raises TypeError in Python 3.
Each *cmp* function must be replaced by a *key* function.
There are two ways to do this:

* If the function did a common operation on both arguments, and then compared
the results, replace it by just the common operation.
In other words, ``cmp(f(a), f(b))`` should be replaced with ``f(item)``
* If the above does not apply, wrap the *cmp*-style function with
:func:`functools.cmp_to_key`. See its documentation for details.

The ``cmp_to_key`` function is not available in Python 2.6, so if you need
to support that version, you'll need copy it `from Python sources`_

.. _from Python sources: https://hg.python.org/cpython/file/2.7/Lib/functools.py

0 comments on commit db8ea1f

Please sign in to comment.