# Chapter 1. The Python Data Model

The Python interpreter invokes special methods to perform basic object operations, often triggered by special syntax. The special method names are always written with leading and trailing double underscores (i.e., __getitem__). For example, the syntax `obj[key]` is supported by the `__getitem__` special method. In order to evaluate `my_collection[key]`, the interpreter calls `my_collection.__getitem__(key)`.

The special method names allow your objects to implement, support, and interact with basic language constructs such as:
- Iteration
- Collections
- Attribute access
- Operator overloading
- Function and method invocation
- Object creation and destruction
- String representation and formatting
- Managed contexts (i.e., `with` blocks)

### A Pythonic Card Deck

The following is a simple example demonstrating the power of two special methods, `__getitem__` and `__len__`.

**Example 1-1** is a class to represent a deck of playing cards.

*Example 1-1. A deck as a sequence of cards*

In [1]:
import collections

Card = collections.namedtuple('Card',['rank','suit'])

class FrenchDeck:
    ranks = [str(n) for n in range(2,11)] + list('JQKA')
    suits = 'spades diamonds clubs hearts'.split()
    
    def __init__(self):
        self._cards = [Card(rank, suit) for suit in self.suits
                                        for rank in self.ranks]
        
    def __len__(self):
        return len(self._cards)
    
    def __getitem__(self, position):
        return self._cards[position]

In [2]:
beer_card = Card('7', 'diamonds')
beer_card

Card(rank='7', suit='diamonds')

In [3]:
deck = FrenchDeck()
len(deck)

52

Reading specific cards from the deck - say the first or the last - should be as easy as `deck[0]` or `deck[-1]`, and this is what the `__getitem__` method provides:

In [4]:
deck[0]

Card(rank='2', suit='spades')

In [5]:
deck[-1]

Card(rank='A', suit='hearts')

Python already has a function we can use to get a random item from a sequence: `random.choice`. We can just use it on a deck instance:

In [6]:
from random import choice
choice(deck)

Card(rank='K', suit='diamonds')

In [7]:
choice(deck)

Card(rank='K', suit='clubs')

In [8]:
choice(deck)

Card(rank='J', suit='hearts')

So far, two advantages of using special methods to leverage the Python data model:
- Users don't have to memorize arbitrary method names for standard operations
- It's easier to use the Python standard library than having to reinvent the wheel

Because `__getitem__` delegates to the `[]` operator of `self._cards`, our deck automatically supports slicing. Here's how to pick the top three cards from a brand new deck, and then pick just the aces by starting on index 12 and skipping 13 cards at a time:

In [9]:
deck[:3]

[Card(rank='2', suit='spades'),
 Card(rank='3', suit='spades'),
 Card(rank='4', suit='spades')]

In [10]:
deck[12::13]

[Card(rank='A', suit='spades'),
 Card(rank='A', suit='diamonds'),
 Card(rank='A', suit='clubs'),
 Card(rank='A', suit='hearts')]

By implementing the `__getitem__` special method, our deck is also iterable:

In [11]:
for card in deck[:10]:
    print(card)

Card(rank='2', suit='spades')
Card(rank='3', suit='spades')
Card(rank='4', suit='spades')
Card(rank='5', suit='spades')
Card(rank='6', suit='spades')
Card(rank='7', suit='spades')
Card(rank='8', suit='spades')
Card(rank='9', suit='spades')
Card(rank='10', suit='spades')
Card(rank='J', suit='spades')


The deck can also be iterated in reverse:

In [12]:
for card in reversed(deck[:10]):
    print(card)

Card(rank='J', suit='spades')
Card(rank='10', suit='spades')
Card(rank='9', suit='spades')
Card(rank='8', suit='spades')
Card(rank='7', suit='spades')
Card(rank='6', suit='spades')
Card(rank='5', suit='spades')
Card(rank='4', suit='spades')
Card(rank='3', suit='spades')
Card(rank='2', suit='spades')


Iteration is often implicit. If a collection has no `__containts__` method, the `in` operator does a sequential scan. Case in point: `in` works without our `FrenchDeck` because it is iterable:

In [14]:
Card('Q', 'hearts') in deck

True

In [15]:
Card('7', 'beasts') in deck

False

How about sorting? A common system of ranking cards is by rank, then by suit, then spades/hearts/diamonds/clubs. Here is a function that ranks cards by that rule, returning 0 for the 2 of clubs and 51 for the ace of spades:

In [16]:
suit_values = dict(spades=3, hearts=2, diamonds=1, clubs=0)

def spades_high(card):
    rank_value = FrenchDeck.ranks.index(card.rank)
    return rank_value * len(suit_values) + suit_values[card.suit]

Given `spades_high`, we can now list our deck in order of increasing rank:

In [17]:
for card in sorted(deck, key=spades_high)[:10]:
    print(card)

Card(rank='2', suit='clubs')
Card(rank='2', suit='diamonds')
Card(rank='2', suit='hearts')
Card(rank='2', suit='spades')
Card(rank='3', suit='clubs')
Card(rank='3', suit='diamonds')
Card(rank='3', suit='hearts')
Card(rank='3', suit='spades')
Card(rank='4', suit='clubs')
Card(rank='4', suit='diamonds')


Although `FrenchDeck` implicitly inherits from `object`, its functionality is not inherited, but comes from leveraging the data model and composition. By implementing the special methods `__len__` and `__getitem__`, our `FrenchDeck` behaves like a standard Python sequence, allowing it to benefit from core language features (e.g., iteration and slicing) and from the standard library, as shown by the examples using `random.choice`,`reversed`, and `sorted`. Thanks to composition, `__len__` and `__getitem` implementations can hand off all the work to a `list` object, `self._cards`.

So far, a `FrenchDeck` cannot be shuffled, because it is *immutable*: the cards and their positions cannot be changed, except by violating encapsulation and handling the `_cards` attribute directly. In Chapter 11, that will be fixed by adding a one-line `__setitem__` method.

### How Special Methods are Used

Special methods are meant to be called by the Python interpreter, and not by you. You don't write `my_object.__len__()`. You write `len(my_object)` and, if `my_object` is an instance of a user-defined class, then Python calls the `__len__` instance method you implemented.

But for built-in types like `list`, `str`, `bytearray`, and so on, the interpreter takes a shortcut: the CPython implementation of `len()` actually returns the value of the `ob_size` field in the `PyVarObject` C struct that represents any variable-sized built-in object in memory. This is much faster than calling a method.

More often than not, the special method call is implicit. `for i in x:` actually causes the invocation of `iter(x)`, which in turn may call `x.__iter__()` if that is available.

Your code should not typically have many direct calls to special methods. Unless you are doing a lot of metaprogramming, you should be implementing special methods more often that invoking them explicitly.

The only special method that is frequently called by user code directly is `__init__`, to invoke the initializer of the superclass in your own `__init__` implementation.

If you need to invoke a special method, it is usualy better to call the related built-in function (e.g., len, iter, str, etc). These built-ins call the corresponding special method, but often provide other services, and - for built-in types - are faster than method calls.

Avoid creating arbitrary, custom attributes with the `__foo__` syntax because such names may acquire special meanings in the future, even if they are unused today.

### Emulating Numeric Types

Several special methods allow user objects to respond to operators such as `+`. We will implement a class to represent two-dimensional vectors (Euclidean vectors like those used in math and physics).

```
>>> v1 = Vector(2,4)
>>> v2 = Vector(2,1)
>>> v1 + v2
Vector(4, 5)
```
Note how the `+` operator produces a `Vector` result, which is displayed in a friendly manner in the console.

The `abs` built in function returns the absolute value of integers and floats, and the magnitude of `complex` numbers, so to be consistent, our API also uses `abs` to calculate the magnitude of a vector:
```
>>> v = Vector(3,4)
>>> abs(v)
5.0
```

We can also implement the `*` operator to perform scalar multiplication (i.e., multiplying a vector by a number to produce a new vector with the same direction and a multiplied magnitude):
```
>>> v * 3
Vector(9, 12)
>>> abs(v * 3)
15.0
```

**Example 1-2** is a `Vector` class implementing the operations just described, through the use of the special methods `__repr__`,`__abs__`,`__add__`, and `__mul__`.

*Example 1-2. A simple two-dimensional vector class*

In [18]:
from math import hypot

class Vector:
    
    def __init__(self, x=0, y=0):
        self.x = x
        self.y = y
        
    def __repr__(self):
        return 'Vector(%r, %r)' % (self.x, self.y)
    
    def __abs__(self):
        return hypot(self.x, self.y)
    
    def __bool__(self):
        return bool(abs(self))
    
    def __add__(self, other):
        x = self.x + other.x
        y = self.y + other.y
        return Vector(x, y)
    
    def __mul__(self, scalar):
        return Vector(self.x * scalar, self.y * scalar)

### String Representation

The `__repr__` special method is called by the `repr` built-in to get the string representation of the object for inspection. If we did not implement `__repr__`, vector instances would be shown in the console like `<Vector object at 0x10e100070>`.

The interactive console and debugger call `repr` on the results of the expressions evaluated, as does the `%r` placeholder in classic formatting with the % operator, and the `!r` conversion field in the new Format String Syntax used in the `str.format` method.

Note that in our `__repr__` implementation, we used `%r` to obtain the standard representation of the attributes to be displayed. This is good practice, because it shows the crucial difference between `Vector(1, 2)` and `Vector('1', '2')` - the latter would not work in the context of this example, because the constructor's arguments must be numbers, not `str`.

The strings returned by `__repr__` should be unambiguous and, if possible, match the source code necessary to re-create the object being represented. That is why our chosen representation looks like calling the constructor of the class (e.g., `Vector(3, 4)`).

Contrast `__repr__` with `__str__`, which is called by the `str()` constructor and implicitly used by the `print` function. `__str__` should return a string suitable for display to end users.

If you only implement one of these special methods, choose `__repr__`, because when no custom `__str__` is available, Python will call `__repr__` as a fallback.

### Arithmetic Operators

Example 1-2 implements two operators: `+` and `*`, to show basic usage of `__add__` and `__mul__`. Note that in both cases, the methods create and return a new instance of `Vector`, and do not modify either operand - `self` or `other` are merely read. This is the expected behavior of infix operators: to create new objects and not touch their operands.

### Boolean Value of a Custom Type

Although Python has a `bool` type, it accepts any object in a boolean context, such as the expression controlling an `if` or `while` statement, or as operands to `and`, `or`, and `not`. To determine whether a value `x` is *truthy* or *falsy*, Python applies `bool(x)`, which always returns `True` or `False`. 

By default, instances of user-defined classes are considered truthy, unless either `__bool__` or `__len__` is implemented. Basically, `bool(x)` calls `x.__bool__()` and uses the result. If `__bool__` is not implemented, Python tries to invoke `x.__len__()`, and if that returns zero, `bool` returns `False`. Otherwise `bool` returns `True`.

Note how special method `__bool__` allows your objects to be consistent with the truth value testing rules defined in the "Built-in Types" chapter of *The Python Standard Library* documentation.

> ### NOTE 
> A faster implementation of `Vector.__bool__` is this:
> ```
> def __bool__(self):
    return bool(self.x or self.y)
> ```
> This is harder to read, but avoids the trip through `abs`, `__abs__`, the squares and square root. The explicit conversion to `bool` is needed because `__bool__` must return a boolean and `or` returns either operand as is: `x or y` evaluates to `x` if that is *truthy*, otherwise the result is `y`, whatever that is.

### Overview of Special Methods

The "Data Model" chapter of *The Python Language Reference* lists 83 special method names, 47 of which are used to implement aritchmetic, bitwise, and comparison operators.

As an overview of what is available, see Tables 1-1 and 1-2.

*Table 1-1. Special method names (operators excluded)*

| Category | Method Names|
| :------: | :---------: |
| String/bytes representation | `__repr__`, `__str__`, `__format__`, `__bytes__` |
| Conversion to number | `__abs__`, `__bool__`, `__complex__`, `__int__`, `__float__`, `__hash__`, `__index__` |
| Emulating collections | `__len__`, `__getitem__`, `__setitem__`, `__delitem__`, `__contains__` |
| Iteration | `__iter__`, `__reversed__`, `__next__` |
| Emulating Callables | `__call__` |
| Context Management | `__enter__`, `__exit__` |
| Instance Creation and Destruction | `__new__`, `__init__`, `__del__` |
| Attribute Management | `__getattr__`, `__getattribute__`, `__setattr__`, `__delattr__`, `__dir__` |
| Attribute Descriptors | `__get__`, `__set__`, `__delete__` |
| Class Services | `__prepare__`, `__instancecheck__`, `__subclasscheck__` |



*Table 1-2. Special method names for operators*

| Category | Method names and related operators |
| -------- | ---------------------------------- |
| Unary numeric operators | `__neg__-`, `__pos__+`, `__abs__abs()` |
| Rich | `__lt__<`, `__le__<=`, `__eq__==` |
| Comparison Operators | `__ne__!=`, `__gt__>`,`__ge__>=`|
| Arithmetic Operators | `__add__+`, `__sub__-`, `__mul__*`, `__truediv__/`, `__floordiv__//`, `__mod__%`, `__divmod__divmod()`, `__pow__** or pow()`, `__round__round()` |
| Reversed Arithmetic Operators | `__radd__`, `__rsub__`, `__rmul__`, `__rtruediv__`, `__rfloordiv__`, `__rmod__`, `__rdivmod__`, `__rpow__` |
| Augmented Assignment Arithmetic Operators | `__iadd__`, `__isub__`, `__imul__`, `__itruediv__`, `__ifloordiv__`, `__imod__`, `__ipow__` |
| Bitwise Operators | `__invert__~`, `__lshift__<<`, `__rshift__>>`, `__and__&`, `__or__|`, `__xor__^` |
| Reversed Bitwise Operators | `__rlshift__`, `__rrshift__`, `__rand__`, `__rxor__`, `__ror__` |
| Augmented Assignment Bitwise Operators | `__ilshift__`, `__irshift__`, `__iand__`, `__ixor__`, `__ior__` |

> ### TIP
> The reversed operators are fallbacks used when operands are swapped (`b * a`) instead of `a * b`) while augmented assignments are shortcuts combining an infix operator with variable assignment (`a = a * b` becomes `a *= b`).

### Chapter Summary

By implementing special methods, your objects can behave like the built-in types, enabling the expressive coding style the community considers Pythonic. A basic requirement for a Python object is to provide usable string representations of itself, one used for debugging and logging, another for presentation to end users. That is why the special methods `__repr__` and `__str__` exist in the data model.

Emulating sequences is one of the most widely used applications of the special methods.

Thanks to operator overloading, Python offers a rich selection of numeric types, fromt he built-ins to `decimal.Decimal` and `fractions.Fraction`, all supporting infix arithmetic operators. Implementing operators, including reversed operators and augmented assignment, will be shown in Chapter 13 via enhancements of the `Vector` example.

# Chapter 2. An Array of Sequences

Most of the discussion in this chapter applies to sequences in general, from the familiar `list` to `str` and `bytes` types that are new in Python 3.

Specific topics on lists, tuples, arrays, and queues are also covered here, but the focus on Unicode strings and byte sequences is deferred to Chapter 4. Also, the idea here is to cover sequence types that are already in use.

### Overview of Built-In Sequences

The standard library offers a rich selection of sequence types implemented in C: 

*Container sequences*
> `list`, `tuples`, and `collections.deque` can hold items of different types.

*Flat sequences*
> `str`, `bytes`, `bytearray`, `memoryview`, and `array.array` hold items of one type

*Container sequences* hold references to the objects they contain, which may be of any type, while *flat sequences* physically store the value of each item within its own memory space, and not as distinct objects.

Thus, flat sequences are more compact, but they are limited to holding primitive values like characters, bytes, and numbers.

Another way of grouping sequence types is by mutability:

*Mutable sequences*
> `list`,`bytearray`,`array.array`, `collections.deque`, and `memoryview`

*Immutable sequences*
> `tuple`, `str`, and `bytes`


The most fundamental sequency type is the `list` - mutable and mixed-type.

Mastering list comprehensions opens the door to generator expressions, which - among other uses - can produce elements to fill up sequences of any type.

### List Comprehensions and Generator Expressions

A quick way to build a sequence is using a list comprehension (if the target is a `list`) or a generator expression (for all other kinds of sequences)

**Tip**: List comprehensions = *listcomps*, generator expressions = *genexps*

**List Comprehensions and Readability**

*Example 2-1. Build a list of Unicode codepoints from a string*

In [4]:
symbols = '$¢£¥€¤'
codes = []
for symbol in symbols:
    codes.append(ord(symbol))
codes

[36, 162, 163, 165, 8364, 164]

*Example 2-2. Build a list of Unicode codepoints from a string, take two*

In [5]:
symbols = '$¢£¥€¤'
codes = [ord(symbol) for symbol in symbols]
codes

[36, 162, 163, 165, 8364, 164]

A `for` loops may be used to do lots of different things: scanning a sequence to count or pick items, computing aggregates (sums, averages), or any number of other processing tasks. A listcomp is meant to do one thing only: build a new list.

It is possible to abuse list comprehensions to write truly incomprehensible code. If you are not doing something with the produced list, you should not use that syntax. Also, keep it short. If the list comprehension spans more than two lines, it is probably best to break it apart or rewrite as a plain old `for` loop. Use your best judgement.

**Syntax Tip**: In Python code, line breaks are ignored inside pairs of [], {}, or (). So you can build multiline lines, listcomps, geneps, dictionaries and the like without using ugly \ line continuation escape.

List comprehensions build lists from sequences or any other iterable type by filtering and transforming items. The `filter` and `map` built-ins can be composed to do the same, but readability suffers.

### Listcomps Versus map and filter

Listcomps do everything the `map` and `filter` functions do, without the contortions of the functionally challenged Python `lambda`.

*Example 2-3. The same list built by a listcomp and a map/filter composition*

In [6]:
symbols = '$¢£¥€¤'
beyond_ascii = [ord(s) for s in symbols if ord(s) > 127]
beyond_ascii

[162, 163, 165, 8364, 164]

In [7]:
beyond_ascii = list(filter(lambda c: c > 127, map(ord, symbols)))
beyond_ascii

[162, 163, 165, 8364, 164]

### Cartesian Products

Listcomps can generate lists from the Cartesian product of two or more iterables. The items that make up the cartesian product are tuples made from items from every input iterable. The resulting list has a length equal to the lengths of the input iterables multiplied.

For example, imagine you need to produce a list of T-shirts available in two colors and three sizes. Example 2-4 shows how to produce that list using a listcomp. The result has six items.

*Example 2-4. Cartesian product using a list comprehension*

In [8]:
colors = ['black', 'white']
sizes = ['S','M','L']
tshirts = [(color, size) for color in colors for size in sizes]
tshirts

[('black', 'S'),
 ('black', 'M'),
 ('black', 'L'),
 ('white', 'S'),
 ('white', 'M'),
 ('white', 'L')]

In [10]:
for color in colors:
    for size in sizes:
        print((color,size))

('black', 'S')
('black', 'M')
('black', 'L')
('white', 'S')
('white', 'M')
('white', 'L')


In [11]:
tshirts = [(color,size) for size in sizes
                        for color in colors]
tshirts

[('black', 'S'),
 ('white', 'S'),
 ('black', 'M'),
 ('white', 'M'),
 ('black', 'L'),
 ('white', 'L')]

Listcomps are a one-trick pony: they build lists. To fill up other sequence types, a genexp is the way to go.

### Generator Expressions

To initialize tuples, arrays, and other types of sequences, you could also start from a listcomp, but a genexp saves memory because it yields items one by one using the iterator protocol instead of building a whole list just to feed another constructor.

Genexps use the same syntax as listcomps, but are enclosed in parentheses rather than brackets. Example 2-5 shows basic usage of genexps to build a tuple and an array.

*Example 2-5. Initializing a tuple and an array from a generator expression*

In [12]:
symbols = '$¢£¥€¤'
tuple(ord(symbol) for symbol in symbols)

(36, 162, 163, 165, 8364, 164)

In [13]:
import array
array.array('I', (ord(symbol) for symbol in symbols))

array('I', [36, 162, 163, 165, 8364, 164])

Example 2-6 uses a genexp with a Cartesian product to print out a roster of T-shirts of two colors in three sizes. In contrast with Example 2-4, here the six-item list of T-shirts is never built in memory: the generator expression feeds the `for` loop producing one item at a time. If the two lists used in the Cartesian product had 1,000 items each, using a generator expression would save the expense of building a list with a million items just to feed the `for` loop.

*Example 2-6. Cartesian product in a generator expression*

In [14]:
colors = ['black','white']
sizes = ['S','M','L']
for tshirt in ('%s %s' % (c, s) for c in colors for s in sizes):
    print(tshirt)

black S
black M
black L
white S
white M
white L


### Tuples Are Not Just Immutable Lists

Some Python texts present tuples as "immutable lists", but that is short selling them. Tuples do double duty: they can be used as immutable lists and also as records with no field names. This use is sometimes overlooked.

**Tuples as Records**
Tuples hold records, each item in the tuple holds the data for one fields and the position of the item gives its meaning.

If you think of a tuple just as an immutable list, the quantity and the order of the items may or may not be important, the number of items is often fixed and their order is always vital.

Example 2-7 shows tuples being used as records. Note that in every expression, sorting the tuple would destroy the information because the meaning of each data item is given by its position in the tuple.

*Example 2-7. Tuples used as records*

In [15]:
lax_coordinates = (39.9425, -118.408056)
city, year, pop, chg, area = ('Tokyo', 2003, 32450, 0.66, 8014)
traveler_ids = [('USA', '31195855'), ('BRA', 'CE342567'), ('ESP','XDA205856')]
for passport in sorted(traveler_ids):
    print('%s/%s' % passport)

BRA/CE342567
ESP/XDA205856
USA/31195855


In [16]:
for country, _ in traveler_ids:
    print(country)

USA
BRA
ESP


### Tuple Unpacking

In Example 207, we assigned `('Tokyo', 2003, 32450, 0.66, 8014)` to `city`, `year`, `pop`, `chg`, `area` in a single statement. Then, in the last line, the % operator assigned each item in the `passport` tuple to one slot in the format string in the `print` argument. Those are two examples of *tuple unpacking*.

**Tip**: Tuple unpacking works with any iterable object. The only requirement is that the iterable yields exactly one item per variable in the receiving tuple, unless you use a star (`*`) to capture excess items as explained in "Using `*` to grab excess items". The term *tuple unpacking* is widely used by Pythonistas, but *iterable unpacking* is gaining traction.

The most visible form of tuple unpacking is parallel assignment; that is, assigning items from an iterable to a tuple of variables, as you can see in this example:

In [18]:
lax_coordinates = (33.9425, -118.408056)
latitude, longitude = lax_coordinates # tuple unpacking
print(latitude)
print(longitude)

33.9425
-118.408056


An elegant application of tuple unpacking is swapping the values of variables without using a temporary variable:
```
>>> b, a = a, b
```

Another example of tuple unpacking is prefixing an argument with a star when calling a function:

In [19]:
divmod(20, 8)

(2, 4)

In [20]:
t = (20, 8)
divmod(*t)

(2, 4)

In [21]:
quotient, remainder = divmod(*t)
quotient, remainder

(2, 4)

The preceding code also shows a further use of tuple unpacking: enabling functions to return multiple values in a way that is convenient to the caller. For example, the `os.path.split()` function builds a tuple `(path, last_part)` from a filesystem path:

In [22]:
import os
_, filename = os.path.split('\Misc Programs\Jupyter Notebooks\Fluent Python Notes.ipynb')
filename

'Fluent Python Notes.ipynb'

Sometimes when we only care about certain parts of a tuple when unpacking, a dummy variable like `_` is used as a placeholder.

**Warning**: If you write internationalized software, `_` is not a good dummy variable because it is traditionally used as an alias to the `gettext.gettext` function, as recommended in the `gettext` module documentation. Otherwise, it's a nice name for a placeholder variable.

Another way of focusing on just some of the items when upacking a tuple is to use the `*`, as we'll see.

**Using `*` to grab excess items**

Defining function parameters with `*args` to grab arbitrary excess arguments is a classic Python feature.

In Python 3, this idea was extended to apply to parallel assignment as well:

In [23]:
a, b, *rest = range(5)
a, b, rest

(0, 1, [2, 3, 4])

In [24]:
a, b, *rest = range(3)
a, b, rest

(0, 1, [2])

In [25]:
a, b, *rest = range(2)
a, b, rest

(0, 1, [])

In the context of parallel assignment, the `*` prefix can be applied to exactly one variable, but it can appear in any position:

In [26]:
a, *body, c, d = range(5)
a, body, c, d

(0, [1, 2], 3, 4)

In [28]:
*head, b, c, d = range(5)
head, b, c, d

([0, 1], 2, 3, 4)

Finally, a powerful feature of tuple unpacking is that it works with nested structures.

### Nested Tuple Unpacking

The tuple to receive an expression to unpack can have nested tuples, like `(a, b, (c, d))` and Python will do the right thing if the expression matches the nesting structure. Example 2-8 shows nested tuple unpacking in action.

*Example 2-8. Unpacking nested tuples to access the longitude*

In [29]:
metro_areas = [
    ('Tokyo', 'JP', 36.933, (35.689722, 139.691667)),
    ('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
    ('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),
    ('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),
    ('Sao Paulo', 'BR', 19.649, (-23.547778, -46.635833)),
]

print('{:15} | {:^9} | {:^9}'.format('', 'lat.', 'long.'))
fmt = '{:15} | {:^9.4f} | {:^9.4f}'

for name, cc, pop, (latitude, longitude) in metro_areas:
    if longitude <= 0:
        print(fmt.format(name, latitude, longitude))

                |   lat.    |   long.  
Mexico City     |  19.4333  | -99.1333 
New York-Newark |  40.8086  | -74.0204 
Sao Paulo       | -23.5478  | -46.6358 


As designed, tuples are very handy. But there is a missing feature when using them as records: sometimes it is desirable to name the fields. That is why the `namedtuple` function was invented.

### Named Tuples

The `collections.namedtuple` function is a factory that produces subclasses of `tuple` enhanced with field names and a class name - which helps debugging.

**Tip**: Instances of a class that you build with `namedtuple` take exactly the same amount of memory as tuples because the field names are stored in the class. They use less memory than a regular object because they don't store attributes in a per-instance `__dict__`.

Recall how we built the `Card` class in Example 1-1 in Chapter 1:
```
Card = collections.namedtuple('Card', ['rank','suit'])
```
Example 2-9 shows how we could define a named tuple to hold information about a city.

*Example 2-9. Defining and using a named tuple type*

In [30]:
from collections import namedtuple
City = namedtuple('City', 'name country population coordinates')
tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))
tokyo

City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))

In [31]:
tokyo.population

36.933

In [32]:
tokyo.coordinates

(35.689722, 139.691667)

In [33]:
tokyo[1]

'JP'

A named tuple type has a few attributes in addition to those inherited from `tuple`. Example 2-10 shows the most useful: the `_fields` class attribute, the class method `_make(iterable)`, and the `_asdict()` method.

*Example 2-10. Named tuple attributes and methods (continued from previous example)*

In [34]:
City._fields

('name', 'country', 'population', 'coordinates')

In [35]:
LatLong = namedtuple('Latlong', 'lat long')
delhi_data = ('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889))
delhi = City._make(delhi_data)
delhi._asdict()

OrderedDict([('name', 'Delhi NCR'),
             ('country', 'IN'),
             ('population', 21.935),
             ('coordinates', Latlong(lat=28.613889, long=77.208889))])

In [36]:
for key, value in delhi._asdict().items():
    print(key + ':', value)

name: Delhi NCR
country: IN
population: 21.935
coordinates: Latlong(lat=28.613889, long=77.208889)


**Notes**:
- `_fields` is a tuple with the field names of the class.
- `_make()` allows you to instantiate a named tuple from an iterable; `City(*delhi_data)` would do the same
- `_asdict()` returns a `collections.OrderedDict` built from the named tuple instance. That can be used to produce a nice display of city data.

Now that we've explored the power of tuples as records, we can consider their second role as an immutable variant of the `list` type.

### Tuples as Immutable Lists

When using a `tuple` as an immutable variation of `list`, it helps to know how similar they actually are. As you can see in Table 2-1, `tuple` supports all `list` methods that do not involve adding or removing items, with one exception - tuple lacks the `__reversed__` method. However, that is just for optimization; `reversed(my_tuple)` works without it.

*Table 2-1. Methods and attributes found in list or tuple (methods implemented by object are omitted for brevity)*

| ---  | list | tuple | ---  |
| ---- | ---- | ---- | ---- |
| `s.__add__(s2)` | X | X   | s + s2 - concatenation |
| `s.__iadd_(s2)` | X |  ---   | s + s2 - in-place concatentation |
| `s.append(e)`   | X |  ---   | Append one element after last |
| `s.clear()`     | X |  ---   | Delete all items |
| `s.__contains__(e)` | X | X | e in s |
| `s.copy()` | X | --- | Shallow copy of the list |
| `s.count(e)` | X | X | Count occurences of an element |
| `s.__delitem__(p)` | X | --- | Remove item at position p |
| `s.extend(it)` | X | --- | Append items from iterable `it` |
| `s.__getitem__(p)` | X | X | `s[p]` get item at position |
| `s.__getnewargs__()` | --- | X | Support for optimized serialization with `pickle` |
| `s.index(e)` | X | X | Find position of first occurence of `e` |
| `s.insert(p, e)` | X | --- | Insert element `e` before the item at position `p` |
| `s.__iter__()` | X | X | Get iterator |
| `s.__len__()` | X | X | `len(s)` - number of items |
| `s.__mul__(n)` | X | X | `s * n` - repeated concatentation |
| `s.__imul__(n)` | X | --- | `s *= n` - in-place repeated concatenation |
| `s.__rmul__(n)` | X | X | `n * s` - reversed repeated concatenation |
| `s.pop([p])` | X | --- | Remove and return last item or item at optional position `p` |
| `s.remove(e)` | X | --- | Remove first occurrence of element `e` by value |
| `s.reverse()` | X | --- | Reverse the order of the items in place |
| `s.__reversed__()` | X | --- | Get iterator to scan items from last to first |
| `s.__setitem__(p, e)` | X | --- | `s[p] = e` = put `e` in position `p`, overwriting existing item |
| `s.sort([key], [reverse])` | X | --- | Sort items in place with optional keyword arguments `key` and `reverse` |


Every Python programmer knows that sequences can be slided using the `s[a:b]` syntax. We now turn to some less well-known facts about slicing.

### Slicing

A common feature of `list`, `tuple`, `str`, and all sequence types in Python is the support of slicing operations, which are more powerful than most people realize.

In this section, we describe the *use* of these advanced forms of slicing.

**Why Slices and Range Exclude the Last Item**

The Pythonic convention of excluding the last item in slices and ranges works well with the zero-based indexing used in Python, C, and many other languages. Some convenient features of the convention are:
- It's easy to see the length of a slice or range when only the stop position is given: `range(3)` and `my_list[:3]` both produce three items.
- It's easy to compute the length of a slice or range when start and stop are given: just subtract `stop - start`.
- It's easy to split a sequence in two parts at any index `x`, without overlapping: simply get `my_list[:x]` and `my_list[x:]`. For example:

In [37]:
l = [10, 20, 30, 40, 50, 60]
l[:2]

[10, 20]

In [38]:
l[2:]

[30, 40, 50, 60]

In [39]:
l[:3]

[10, 20, 30]

In [40]:
l[3:]

[40, 50, 60]

### Slice Objects

This is no secret, but worth repeating just in case: `s[a:b:c]` can be used to specify a stride or step `c`, causing the resulting slice to skip items. The stride can also be negative, returning items in reverse. Three examples make this clear:

In [41]:
s = 'bicycle'
s[::3]

'bye'

In [42]:
s[::-1]

'elcycib'

In [43]:
s[::-2]

'eccb'

Another example was shown in Chapter 1 when we used `deck[12::13]` to get all the aces in the unshuffled deck.

The notation `a:b:c` is only valid within `[]` when used as the indexing or subscript operator, and it produces a slice object: `slice(a, b, c)`. To evaluate the expression `seq[start:stop:step]`, Python calls `seq`, `__getitem__(slice(start, stop, step))`. Even if you are not implementing your own sequence types, knowing about slice objects is useful because it lets you assign names to slices, just like spreadsheets allow naming of cell ranges.

Suppose you need to parse flat-file data like the invoice shown in Example 2-11. Instead of filling your code with hardcoded slices, you can name them. See how readable this makes the `for` loop at the end of the example.

*Example 2-11. Line items from a flat-file invoice*

In [48]:
invoice = """
0.....6.................................40........52...55........ 
1909  Pimoroni PiBrella                    $ 17.50    3   $ 52.50 
1489  6mm Tactile Switch x20               $ 4.95     2   $ 9.90 
1510  Panavise Jr. - PV-201                $ 28.00    1   $ 28.00 
1601  PiTFT Mini Kit 320x240               $ 34.95    1   $ 34.95
"""
SKU = slice(0, 6)
DESCRIPTION = slice(6, 40)
UNIT_PRICE = slice(40, 52)
QUANTITY = slice(52, 55)
ITEM_TOTAL = slice(55, None)
line_items = invoice.split('\n')[2:]
for item in line_items:
    print(item[UNIT_PRICE], item[DESCRIPTION])

   $ 17.50   Pimoroni PiBrella                 
   $ 4.95    6mm Tactile Switch x20            
   $ 28.00   Panavise Jr. - PV-201             
   $ 34.95   PiTFT Mini Kit 320x240            
 


### Multidimensional Slicing and Ellipsis

The `[]` operator can also take multiple indexes or slices separated by commas. This is used, for instance, in the external NumPy package, where items of a two-dimensional `numpy.ndarray` can be fetched using the syntax `a[i, j]` and a two-dimensional slice obtained with an expression like `a[m:n, k:l]`. The `__getitem__` and `__setitem__` special methods that handle the `[]` operator simply receive the indices in `a[i, j]` as a tuple. In other words, to evaluate `a[i, j]`, Python calls `a.__getitem__((i, j))`.

The built-in sequence types in Python are one-dimensional, so they support only one index or slice, and not a tuple of them.

The ellipsis - writen with three full stops (`...`) and not ... (Unicode U+2026) = is recongnized as a token by the Python parser. It is an alias to the `Ellipsis` object, the single sintance of the `ellipsis` class. As such, it can be passed as an argument to functions and as part of a slice specification, as in `f(a, ..., z)` or `a[i:...]`. Numpy uses `...` as a shortcut when slicing arrays of many dimensions; for example, if `x` is a four-dimensional array, `x[i, ...]` is a shortcut for `x[i, :, :, :,]`.

Slices are not just useful to extract information from sequences; they can also be used to change mutable seuqences in place = that is, without rebuilding them from scratch.

### Assigning to Slices

Mutable sequences can be grafted, excised, and otherwise modified in place using slice notation on the left side of an assignment statement or as the target of a `del` statement. The next few examples give an idea of the power of this notation:

In [1]:
l = list(range(10))
l

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [2]:
l[2:5] = [20, 30]
l

[0, 1, 20, 30, 5, 6, 7, 8, 9]

In [3]:
del l[5:7]
l

[0, 1, 20, 30, 5, 8, 9]

In [4]:
l[3::2] = [11, 22]
l

[0, 1, 20, 11, 5, 22, 9]

In [5]:
l[2:5] = 100

TypeError: can only assign an iterable

In [6]:
l[2:5] = [100]
l

[0, 1, 100, 22, 9]

> When the target of the assignment is a slice, the right side must be an iterable object, even if it has just one item.

### Using `+` and `*` with Sequences

Python programmers expect that sequences support `+` and `*`. Usually both operands of `+` must be of the same sequence type, and neither of them is modified but a new sequence of the same type is created as result of the concatenation.

To concatenate multiple copies of the same sequence, multiply it by an integer. Again, a new sequence is created:

In [7]:
l = [1, 2, 3]
l*5

[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]

In [8]:
5 * 'abcd'

'abcdabcdabcdabcdabcd'

Both `+` and `*` always create a new object, and never change their operands.

**Warning**: Beware of expressions like `a * n` when `a` is a sequence containing mutable items because the result may surprise you. For example, trying to initialize a list of lists as `my_list = [[]] * 3` will result in a list with three references to the same inner list, which is probably not what you want.

The next section covers the pitfalls of trying to use `*` to initialize a list of lists.

**Building Lists of Lists**

Sometimes we need to initialize a list with a certain number of nested lists = for example, to distribute students in a list of teams or to represent squares on a game board. The best way of doing so is with a list comprehension, as in Example 2-12.

*Example 2-12. A list with three lists of length 3 can represent a tic-tac-toe board*

In [10]:
board = [['_'] * 3 for i in range(3)]
board

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

In [11]:
board[1][2] = 'x'
board

[['_', '_', '_'], ['_', '_', 'x'], ['_', '_', '_']]

A tempting but wrong shortcut is doing it like Example 2-13.

*Example 2-13. A list with three references to the same list is useless*

In [12]:
weird_board = [['_'] * 3] * 3
weird_board

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

In [13]:
weird_board[1][2] = 'O'
weird_board

[['_', '_', 'O'], ['_', '_', 'O'], ['_', '_', 'O']]

The problem with Example 2-13 is that, in essence, it behaves like this code:

In [14]:
row = ['_'] * 3
board = []
for i in range(3):
    board.append(row)

In [15]:
board = []
for i in range(3):
    row = ['_'] * 3
    board.append(row)
    
board

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

In [17]:
board[2][0] = 'X'
board

[['_', '_', '_'], ['_', '_', '_'], ['X', '_', '_']]

Each iteration builds a new `row` and appends it to `board`.