# The Python Data Model

If you learned another object-oriented language before Python, you may have found it strange to call `len(collection)` instead of `collection.len()`. This apparent oddity is the tip of an iceberg that, when properly understood, is the key to everything we call *Pythonic*. The iceberg is called the [Python Data Model](https://docs.python.org/3/reference/datamodel.html), and it describes the API that you can use to make your own objects play well with the most idiomatic language features.

To create pleasant, intuitive and expressive libraries and APIs, you need to leverage the Python Data Model, so that your objects behave consistently with the built-in objects in the language. 

The Data Model is based on a set of fundamental interfaces. Python is a dynamically typed language, so to implement an interface you just code methods with the required names and signatures. You are not required to implement an interface fully if a partial implementation covers your use cases. The Data Model interfaces all use method names prefixed and suffixed with `__` (two underscores), such as `__add__` or `__len__`. These are known as special methods, magic methods or *dunder* methods (after *double underscore*).

The special method names allow your objects to implement, support, and interact with basic language constructs such as:

* Iteration
* Collections
* Attribute access
* Operator overloading
* Function and method invocation
* Object creation and destruction
* String representation and formatting
* Managed contexts (i.e., with blocks)

To see the Data Model in action, we'll implement a subset of the **Sequence** interface, which describes the behavior of strings, lists, tuples, arrays and many other Python types.


## A Pythonic Card Deck

Let's code a package to represent decks of playing cards. First, we'll create a simple class to represent an individual card. A card will be a record with two data attributes and no methods. Python has a factory to make such simple classes: ``collections.namedtuple``.

In [1]:
import collections

Card = collections.namedtuple('Card', ['rank', 'suit'])

Note that the ``namedtuple`` factory takes two arguments: the name of the class to create and a sequence of attribute names.

The ``Card`` class can then be instantiated as usual:

In [2]:
beer_card = Card('7', 'diamonds')
beer_card

Card(rank='7', suit='diamonds')

The string representation of a ``Card`` instance is so explicit that you can clone a card by applying ``eval()`` to its `repr()`:

In [3]:
my_card = eval(repr(beer_card))
my_card == beer_card

True

We are now ready to code the class to represent a deck of cards. I'll call it `` FrenchDeck`` since that is the formal name of the set of 52 cards with 4 suits used not only in France but in most of the Western world.

In [4]:
class FrenchDeck:
    ranks = [str(n) for n in range(2, 11)] + list('JQKA')
    suits = 'spades diamonds clubs hearts'.split()

    def __init__(self):
        self._cards = [Card(rank, suit) for suit in self.suits
                                        for rank in self.ranks]

    def __len__(self):
        return len(self._cards)

    def __getitem__(self, position):
        return self._cards[position]

The code is short but it does a lot.

The trickiest part is the use of a list comprehension in the initializer to build a list of cards by computing the cartesian product of the lists suits and ranks. The logic of that list comprehension is explained in chapter 2 of Fluent Python, but right now we just want to focus on the external behavior of the class, so please believe that ``self._cards`` holds a list of 52 ``Card`` instances, as you'll see right away.

In [5]:
deck = FrenchDeck()

len(deck)

52

The ``len`` built-in function knows how to handle a ``FrenchDeck`` because we implemented the ``__len__`` special method. This is consistent with how built-in collections work, and saves the user from memorizing arbitrary method names for common operations ("How to get the number of items? Is it `.length()`, `.size()` or what?").

The `__getitem__` special method supports the use of ``[]`` and provides a lot of functionality.

We can get any card by index, as usual. For example, first and last:

In [6]:
deck[0], deck[-1]

(Card(rank='2', suit='spades'), Card(rank='A', suit='hearts'))

We can also use slice notation to retrieve a subset of the cards:

In [7]:
deck[:3]

[Card(rank='2', suit='spades'),
 Card(rank='3', suit='spades'),
 Card(rank='4', suit='spades')]

Here's how to use the the third parameter of a slice to get just the Aces in the deck by starting at card index 12 and skipping 13 cards from that point onwards.

In [8]:
deck[12::13]

[Card(rank='A', suit='spades'),
 Card(rank='A', suit='diamonds'),
 Card(rank='A', suit='clubs'),
 Card(rank='A', suit='hearts')]

The `in` operator also works with our `FrenchDeck` instances. This behavior can be optimized by implementing a `__contains__` method, but if you provide a `__getitem__` method, Python is smart enough to scan the collection from item 0 to the end. 

In [9]:
Card('Q', 'hearts') in deck

True

In [10]:
Card('Z', 'clubs') in deck

False

### Iteration

Our card decks are also iterable. Implementing an `__iter__` method to return a custom iterator is the optimal way do achieve this. But, as a fallback, Python knows how to iterate over any collection that implements `__getitem__` and accepts integer indexes starting at 0:

In [11]:
for card in deck:
  print(card)

Card(rank='2', suit='spades')
Card(rank='3', suit='spades')
Card(rank='4', suit='spades')
Card(rank='5', suit='spades')
Card(rank='6', suit='spades')
Card(rank='7', suit='spades')
Card(rank='8', suit='spades')
Card(rank='9', suit='spades')
Card(rank='10', suit='spades')
Card(rank='J', suit='spades')
Card(rank='Q', suit='spades')
Card(rank='K', suit='spades')
Card(rank='A', suit='spades')
Card(rank='2', suit='diamonds')
Card(rank='3', suit='diamonds')
Card(rank='4', suit='diamonds')
Card(rank='5', suit='diamonds')
Card(rank='6', suit='diamonds')
Card(rank='7', suit='diamonds')
Card(rank='8', suit='diamonds')
Card(rank='9', suit='diamonds')
Card(rank='10', suit='diamonds')
Card(rank='J', suit='diamonds')
Card(rank='Q', suit='diamonds')
Card(rank='K', suit='diamonds')
Card(rank='A', suit='diamonds')
Card(rank='2', suit='clubs')
Card(rank='3', suit='clubs')
Card(rank='4', suit='clubs')
Card(rank='5', suit='clubs')
Card(rank='6', suit='clubs')
Card(rank='7', suit='clubs')
Card(rank='8', sui

By supporting iteration, we can leverage many functions in the standard library that work with iterables, like `enumerate()`, `reversed()` as well as the constructor for `list` and several other collection types.

In [12]:
list(enumerate(reversed(deck), 1))

[(1, Card(rank='A', suit='hearts')),
 (2, Card(rank='K', suit='hearts')),
 (3, Card(rank='Q', suit='hearts')),
 (4, Card(rank='J', suit='hearts')),
 (5, Card(rank='10', suit='hearts')),
 (6, Card(rank='9', suit='hearts')),
 (7, Card(rank='8', suit='hearts')),
 (8, Card(rank='7', suit='hearts')),
 (9, Card(rank='6', suit='hearts')),
 (10, Card(rank='5', suit='hearts')),
 (11, Card(rank='4', suit='hearts')),
 (12, Card(rank='3', suit='hearts')),
 (13, Card(rank='2', suit='hearts')),
 (14, Card(rank='A', suit='clubs')),
 (15, Card(rank='K', suit='clubs')),
 (16, Card(rank='Q', suit='clubs')),
 (17, Card(rank='J', suit='clubs')),
 (18, Card(rank='10', suit='clubs')),
 (19, Card(rank='9', suit='clubs')),
 (20, Card(rank='8', suit='clubs')),
 (21, Card(rank='7', suit='clubs')),
 (22, Card(rank='6', suit='clubs')),
 (23, Card(rank='5', suit='clubs')),
 (24, Card(rank='4', suit='clubs')),
 (25, Card(rank='3', suit='clubs')),
 (26, Card(rank='2', suit='clubs')),
 (27, Card(rank='A', suit='diamo

Another powerful function that works with iterables is `sorted`. It builds a sorted list from iterables that generate a series of comparable values.

In [13]:
sorted(deck)

[Card(rank='10', suit='clubs'),
 Card(rank='10', suit='diamonds'),
 Card(rank='10', suit='hearts'),
 Card(rank='10', suit='spades'),
 Card(rank='2', suit='clubs'),
 Card(rank='2', suit='diamonds'),
 Card(rank='2', suit='hearts'),
 Card(rank='2', suit='spades'),
 Card(rank='3', suit='clubs'),
 Card(rank='3', suit='diamonds'),
 Card(rank='3', suit='hearts'),
 Card(rank='3', suit='spades'),
 Card(rank='4', suit='clubs'),
 Card(rank='4', suit='diamonds'),
 Card(rank='4', suit='hearts'),
 Card(rank='4', suit='spades'),
 Card(rank='5', suit='clubs'),
 Card(rank='5', suit='diamonds'),
 Card(rank='5', suit='hearts'),
 Card(rank='5', suit='spades'),
 Card(rank='6', suit='clubs'),
 Card(rank='6', suit='diamonds'),
 Card(rank='6', suit='hearts'),
 Card(rank='6', suit='spades'),
 Card(rank='7', suit='clubs'),
 Card(rank='7', suit='diamonds'),
 Card(rank='7', suit='hearts'),
 Card(rank='7', suit='spades'),
 Card(rank='8', suit='clubs'),
 Card(rank='8', suit='diamonds'),
 Card(rank='8', suit='hearts

We can define custom sorting criteria by implementing a function to produce a key from each item in the series, and passing it as the `key=` argument to `sorted`. Here is a function that implements the "spades high" ordering, where cards are sorted by rank and, within each rank, spades is the highest suit, followed by hearts, diamonds and clubs:

In [14]:
def spades_high(card):
    rank_value = FrenchDeck.ranks.index(card.rank)
    return (rank_value, card.suit)

As written, `spades_high` produces the highest key value for the Ace of spades and the lowest for the 2 of clubs:

In [15]:
spades_high(Card('A', 'spades')), spades_high(Card('2', 'clubs'))

((12, 'spades'), (0, 'clubs'))

In [16]:
sorted(deck, key=spades_high)

[Card(rank='2', suit='clubs'),
 Card(rank='2', suit='diamonds'),
 Card(rank='2', suit='hearts'),
 Card(rank='2', suit='spades'),
 Card(rank='3', suit='clubs'),
 Card(rank='3', suit='diamonds'),
 Card(rank='3', suit='hearts'),
 Card(rank='3', suit='spades'),
 Card(rank='4', suit='clubs'),
 Card(rank='4', suit='diamonds'),
 Card(rank='4', suit='hearts'),
 Card(rank='4', suit='spades'),
 Card(rank='5', suit='clubs'),
 Card(rank='5', suit='diamonds'),
 Card(rank='5', suit='hearts'),
 Card(rank='5', suit='spades'),
 Card(rank='6', suit='clubs'),
 Card(rank='6', suit='diamonds'),
 Card(rank='6', suit='hearts'),
 Card(rank='6', suit='spades'),
 Card(rank='7', suit='clubs'),
 Card(rank='7', suit='diamonds'),
 Card(rank='7', suit='hearts'),
 Card(rank='7', suit='spades'),
 Card(rank='8', suit='clubs'),
 Card(rank='8', suit='diamonds'),
 Card(rank='8', suit='hearts'),
 Card(rank='8', suit='spades'),
 Card(rank='9', suit='clubs'),
 Card(rank='9', suit='diamonds'),
 Card(rank='9', suit='hearts'),


### Monkey patching

The standard library provides many functions that operate on sequences. For example, picking a random item is as simple as this:

In [25]:
import random
random.choice(deck)

Card(rank='5', suit='diamonds')

This is a live notebook, so each time you run the code above, the random choice will be computed again, producing different results.

How about shuffling? Let's try it next.

In [18]:
try:
    random.shuffle(deck)
except TypeError as e:  # this error is expected!
    print(repr(e))  
else:
    print('The deck was shuffled!')

TypeError("'FrenchDeck' object does not support item assignment",)


The first time you run this notebook you should see an exception above: `TypeError: 'FrenchDeck' object does not support item assignment`. The problem is that `random.shuffle` works by rearranging the items in-place, but it can only do `deck[i] = card` if the sequence implements the `__setitem__` method. 

We could redefine the whole `FrenchDeck` class here, but let's do a *monkey patch*, just for fun. Monkey patching is changing classes or modules at run time. To enable shuffling, we can create a function that puts a card in a certain position of a deck: 

In [19]:
def put(deck, index, card):
    deck._cards[index] = card

Note how `put` is tricky: it assigns to the "private" atribute `deck._cards`. Monkey patches are naughty. They often touch the intimate parts of the target objects.

Now we can patch the `FrenchDeck` card to insert the `put` function as its `__setitem__` method:

In [20]:
FrenchDeck.__setitem__ = put

Now we can shuffle the deck and get the first five cards to verify:

In [27]:
random.shuffle(deck)
deck[:5]

[Card(rank='J', suit='clubs'),
 Card(rank='Q', suit='diamonds'),
 Card(rank='3', suit='hearts'),
 Card(rank='8', suit='diamonds'),
 Card(rank='2', suit='clubs')]

Again, in a live notebook such as this, each time you run the cell above you should get a different result.

If you want to disable item assignment to experiment further, you can delete `__setitem__` from the `FrenchDeck` class. Then `random.shuffle` will stop working. Uncomment and run the next cell to try this.

In [22]:
# del FrenchDeck.__setitem__
# random.shuffle(deck)  # <-- this will break

Monkey patching has a bad reputation among Pythonistas. Monkey patches are often tightly bound to the implementation details of the patched code, so we apply them only as a last resort. 

However, some important Python projects use this technique to great effect. For example, the `gevent` networking library, [uses monkey patching extensively](http://www.gevent.org/intro.html#monkey-patching) to make the Python standard library support highly concurrent network I/O. 

Here, monkey patching was a didactic device to illustrate these ideas:

* Classes are objets too, so you can add attributes to them at run-time.
* Methods are merely functions assigned to class attributes.
* What makes a method "special" is naming: Python recognizes a fixed set of special method names, such as `__setitem__`.
* Many standard operations are implemented by special methods. For example, getting and setting items in sequences triggers the `__getitem__` and `__setitem__` methods.

Most of the special method names supported by Python are in the [Data model](https://docs.python.org/3/reference/datamodel.html) chapter of the Python Language Reference.

### What we saw so far

The `FrenchDeck` example demonstrates how smart use of Python features lets us go very far with just a little coding.

`FrenchDeck` implicitly inherits from `object`, but its functionality is not inherited: it comes from leveraging the Data Model and composition. The `__len__` and `__getitem__` methods delegate all the work to a `list` object, `self._cards`. Note that my code never calls special methods directly. They are called by the interpreter.

By implementing the special methods `__len__` and `__getitem__`, our class behaves like a basic Python sequence, allowing it to benefit from core language features such as: 

* The `len()` built-in function.
* Item access.
* Slicing.
* Iteration.
* Several functions that accept sequences or iterables (e.g. `list`, `enumerate`, `sorted`, `random.choice`.

By adding `__setitem__`, the deck became mutable, thus supporting `random.shuffle`. When I first wrote this example years ago I actually did implement a `FrenchDeck.shuffle` method. But then I realized that I was already coding a sequence-like object, so I should just use the existing sequence shuffling function in the standard library.

The main point is this: if you follow the conventions of the Data Model, your users can take more advantage of the standard library. So your objects will be easier to use and more powerful at the same time.

The next example will show how special methods are used for operator overloading.

## Mathematical vectors

Programmers often think of a vector as a synomym for array, but we'll work with Euclidean vectors used in Math and Physics, like these:

<img src="img/vectors550x473.png">

The picture above illustrates a vector addition. We'd like to represent those objects in code, like this:

```python
v1 = Vector(2, 4)
v2 = Vector(2, 1)
v3 = v1 + v2
print(v3)  # --> Vector(4, 5)
```

Let's start with the basic methods every object should have: `__init__` and `__repr__` (for testing and debugging).

### Vector take #1: initialization and inspection

Our first step is enables building and inspecting instances of `Vector`. We'll use an `array` of doubles to store the arguments passed to the constructor:

In [37]:
from array import array

class Vector:

    def __init__(self, *components):
        self._components = array('d', components)

    def __repr__(self):
        components_str = ', '.join(str(x) for x in self._components)
        return '{}({})'.format(self.__class__.__name__, components_str)

Like the standard Python console and debugger, iPython uses `repr(v1)` to render a `v1` object, triggering a call to `v1.__repr__()`:

In [38]:
v1 = Vector(2, 4)
v1

Vector(2.0, 4.0)

The `__repr__` method uses a generator expression to iterate over the array of floats to render each as a string.

### Detour: why `repr` and `__repr__`, `len` and `__len__`?

You may be wondering why we say that Python calls `repr` but we implement `__repr__`. With the `FrenchDeck` it was the same thing: we saw that `len(deck)` resulted in a call to `deck.__len__()`.

There is a practical reason: for built-in types, a call such as `len(obj)` does not invoke `obj.__len__()`. If the type of `obj` is an instance of a variable length built-in type coded in C, its memory representation has a struct named `PyVarObject` with an `ob_size` field. In that case, `len(obj)` just returns the value of the `ob_size` field, avoiding an expensive dynamic attribute lookup and method call. Only it `obj` is a user defined type, then `len()` will call the `__len__()` special method, as a fallback.

A similar rationale explains why Java arrays have a `.length` atribute, while most Java collections implement `.length()` or `.size()` methods. The difference is that Python strives for consistency: it optimizes the operation of the fundamental built-in types, but allows our own types to behave consistently by implementing special methods.