# Chapter 2. An Array of Sequence

Understanding the variety of sequence available in Python saves us from reinventing the wheel, and their common interface inspires us to create APIs that properly support and leverage existing and future sequence types.

## Overview of Built-in Sequence


**Container sequence**

    list, tuple, and collections.deque can hold items of different types.
    
**Flat sequences**
    
    str, bytes, bytearray, memoryview, and array.array hold items of one type


**Container sequences hold references to the objects they contain, which may be of any type, while flat sequence phsically store the value of each item within its own memory space, and not as distinct objects.** Thus, flat sequences are more compact, but they are limited to holding primitive values like characters, bytes, and numbers.

Another way of grouping sequence types is by mutability:

**Mutale sequence**

    list, bytearray, array.array, collections.deque, and memoryview
    
 **Immutable sequence**
 
     tuple, str, and bytes

![](https://ss1.bdstatic.com/70cFuXSh_Q1YnxGkpoWK1HF6hhy/it/u=2668458762,2133731349&fm=15&gp=0.jpg)

**Keeping in mind these common traints -- mutable versus immutable; container veruss flat -- is helpful to extrapolate what you know about one sequence type to others.**

## List Comprehensions and Generator Excpressions

A quick way to build a sequence is using a list comprehension or a generator expression.

> For brevity, many Python programmers refer to list comprehensions as `listcomps`, and generator expression as `genexps`.

### List Comprehensions and Readability

In [1]:
symbols = '$¢£¥€¤'
codes = [ord(symbol) for symbol in symbols]
codes

[36, 162, 163, 165, 8364, 164]

A for loop may be used to do lots of different things: scaning a sequence to count or pick items, computing aggregates(sums, averages), or any number of other processing tasks.

> **Syntax Tip: In Python code, line breaks are ignored inside paris of [],{}, or (). So you can build multiline lists, listcomps, genexps, dictionaries and the like without using the ugly \ line continue escape.**

> 句法提示：Python会忽略[]、{}和()中的换行，因此如在构建列表、列表推到、生成器表达式、字典时，可以忽略不太好看的续行符\\.

### Listcomps Versus map and filter

Listcompas do everything the map and filter do.

In [2]:
symbols = '$¢£¥€¤'
beyond_ascii = [ord(symbol) for symbol in symbols if ord(symbol) > 127]
beyond_ascii

[162, 163, 165, 8364, 164]

In [3]:
beyond_ascii = list(filter(lambda c: c > 127, map(ord, symbols)))
beyond_ascii

[162, 163, 165, 8364, 164]

I used to believe that map and filter were faster than the equivalent listcomps, but Alex Matelli pointed out that's not the case -- at least not in the preceding examples.

### Cartesian Products

Cartesian Products  笛卡尔积

Listcomps can generate lists form the Cartesian product of two more iterables. The items that make up the cartesian product are tuples made from items from every input iterable.

For example, imagine you need to produce a list of T-shirt available in two colors and three sizes. Beblow shows how to produce that list using a listcomp.

In [4]:
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
tshirts = [(color, size) for color in colors for size in sizes]
tshirts

[('black', 'S'),
 ('black', 'M'),
 ('black', 'L'),
 ('white', 'S'),
 ('white', 'M'),
 ('white', 'L')]

In [5]:
tshirts = [(color, size) for size in sizes for color in colors]
tshirts

[('black', 'S'),
 ('white', 'S'),
 ('black', 'M'),
 ('white', 'M'),
 ('black', 'L'),
 ('white', 'L')]

## Generator Expressions

**To initialize tuples, array, and other types of sequence, you could also start from a listcomp, but a genexp saves memory because it yields item one by on using iterator protocol instead of building a whole list just to feed another constructor.**

Genexps use the same syntax as listcomps, but are enclosed in parentheses rather than brackets.

In [6]:
symbols = '$¢£¥€¤'
(ord(s) for s in symbols)

<generator object <genexpr> at 0x110523048>

In [7]:
tuple(ord(s) for s in symbols)

(36, 162, 163, 165, 8364, 164)

Cartersian product in a generator expression.

In [8]:
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
((c, s) for c in colors for s in sizes)

<generator object <genexpr> at 0x11035bb88>

In [9]:
for tshirt in ((c, s) for c in colors for s in sizes):
    print(tshirt)

('black', 'S')
('black', 'M')
('black', 'L')
('white', 'S')
('white', 'M')
('white', 'L')


> The generator expression yields items one by one; a list with all six T-shirt varaiations is never produced in this example.

## Tuples Are Not Just Immutable Lists

### Tuples as Records

Tuples hold records: each item in the tuple holds the data for one field and the position of the item gives its meaning.

If you think of a tuple just as an immutable list, the quantity and the order of the items may or may not be important, depending on the context. **But when using a tuple as a collection of fields, the number of items is often fixed and their order is always vital.**

In [10]:
lax_coordinates = (33.9425, -118.408056)
city, year, pop, chg, area = ('Tokyo', 2003, 32450, 0.66, 8014)

In [11]:
lax_coordinates

(33.9425, -118.408056)

### Tuple Unpacking

> Tuple unpacking works with any iterable object. The only requirement is that the iterable yield exactly one item per variable in the receiving tuple, unless you use a star(*) to capture excess items. The term tuple unpacking is widely used by Pthonistas.

The most visible form of tuple unpacking is `Parallel assignment`, assigning items from an iterble to a tuple of variables.

In [12]:
lax_coordinates = (33.9425, -118.408056)
latitude, longitude = lax_coordinates

In [13]:
latitude

33.9425

In [14]:
longitude

-118.408056

An application of tuple unpacking is swapping the values of variables without using a temporary varaible:

In [15]:
a = 100
b = 10

In [16]:
b, a = a, b

In [17]:
a, b

(10, 100)

In [18]:
_, a = (100, 200)

In [19]:
a

200

> If you write internationalized softwared, `_` is not good dummy variable because it is traditionally used as an alias to the `gettext.gettext` function.

#### Using * to grab excess items

Defining function parameters with `*args` to grab arbitrary excess arguemnts is a classic Python feature.

In [20]:
a, b, *rest = range(5)

In [21]:
a, b, rest

(0, 1, [2, 3, 4])

### Nested Tuple Unpacking

In [22]:
metro_areas = [
    ('Tokyo','JP',36.933,(35.689722,139.691667)),
    ('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),
]

In [23]:
for name, cc, pop, (latitude, longitude) in metro_areas:
    print(name, latitude, longitude)

Tokyo 35.689722 139.691667
Mexico City 19.433333 -99.133333


### Named Tuples

The `collections.nametuple` function is a facotry that produces subclass of tuple enhaced with field names and a class name - which helps debugging.

In [24]:
import collections

Card = collections.namedtuple('Card', ['rank', 'suit'])

In [25]:
City = collections.namedtuple('City', 'name country population coordinates')
tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))

In [26]:
tokyo

City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))

In [27]:
City._fields

('name', 'country', 'population', 'coordinates')

### Tuples as Immutable Lists

When using a tuple as immutable variation of list, it helps to know how similar they actually are. Tuple supports all list methods that do not involve adding or removing items, with one exception -- tuple lacks the `__reversed__` method.

## Slicing

**A common feature of lists, tuple, str, and all sequence in Python is the support of slicing operations, which are more powerful than most people realize.**

### Why Slices and Range Exclude the Last Item

In [28]:
l = list(range(10, 70, 10))

In [29]:
l

[10, 20, 30, 40, 50, 60]

In [30]:
l[:2]

[10, 20]

In [31]:
l[2:]

[30, 40, 50, 60]

### Slice Objects

In [32]:
s = 'bicycle'
s[::3]

'bye'

In [33]:
s[::-1]

'elcycib'

**The notation a:b:c is only valid within [] when used as the indexing or subscript operator, and it produces a slice object: slice(a, b, c). To evaluate the expression seq[start:stop:step], Python calls `seq.__getitem__(slice(strat, stop, step))`.** Even if you are not implementing your own sequence types, knowing about slice objects is useful because it lets you assign names to slices.

In [34]:
slice(0, 6)

slice(0, 6, None)

### Multidimensional Silicing and Ellipsis

**The [] operator can also take multiple indexes or slice separated by commas. This is used in the extenal Numpy package, where items of a two-dimensional numpy.adarray can be fetched using the syntax a[i, j] and two-dimensional slice obtained with an expression liek a[m:n, k:l].**

### Assigning to Slices

In [35]:
l = list(range(10))
l

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [36]:
l[2:5] = [20, 30]
l

[0, 1, 20, 30, 5, 6, 7, 8, 9]

### Using + and * with Sequences

Python programmers expect that sequences support + and *. Usally both operands of + must be of the same sequence type, and neither of them is modified but a new sequence of the same type is created as result of the concatenation.


In [37]:
l = list(range(1, 4))

In [38]:
l * 5

[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]

In [39]:
5 * 'abcd'

'abcdabcdabcdabcdabcd'

In [40]:
id(l), id(l*2)

(4568854792, 4568111112)

**Both + and * always create a new object, and never change their oprands.**

**+和*不修改原来的操作对象，而是创建一个全新的序列。**

>Beware of expression `a * n` when a is a sequence containing mutable items because the result may suprise you. FOr example, trying to initialize a list of lists as `my_ist = [[]] * 3` will result in a list with three references to the same inner list, which is probably not what you want.

### Building Lists of Lists

In [41]:
board = [['_'] * 3 for i in range(3)]

In [42]:
board

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

In [43]:
board[1][1] = 'X'

In [44]:
board

[['_', '_', '_'], ['_', 'X', '_'], ['_', '_', '_']]

## Argument Assignment with Sequence

The argument assignment operators `+=` and `*=` behave very differently depending on the first operand. We will focus on argumented addition first(+=), but the concepts also apply to `*=` and other agumented assignment operators.

The special method that makes `+=` works is `__iadd__`. However if `__iadd__` is not implemnted, Python falls back to calling `__add__`.

In [45]:
l = list(range(3))

For mutable sequences, it is a good bet that `__iadd__` is implemented and that `+=` happens in place. For immutable sequence, clearly there is no way for that to happen.

In [46]:
l = list(range(1, 4))

In [47]:
id(l)

4568112136

In [48]:
l *= 2

In [49]:
l

[1, 2, 3, 1, 2, 3]

In [50]:
id(l)

4568112136

> After multiplication, the list is the same object, with new items appended.

In [51]:
t = tuple(range(1, 4))

In [52]:
id(t)

4568869192

In [53]:
t *= 3

In [54]:
id(t)

4566922968

>After multiplication, a new tuple was created.

**Repeated concatenation of immutable sequences is inefficient, because instead of just appending new items, the interpreter has to copy the whole target sequence to crate a new one with the new items concatenated.**

## list.sort and the sorted Built-In Function

**The `list.sort` method sorts a list in place -- that is, without making a copy. It returns None to remind us that is changes the target object, and does not create a new list. This is an important Python API convention: functions or methods that change an object in place should return None to make it clear to the caller that the object itself was changed, and no new object was created. The same behavior can be seen in the `random.shuffle` function.**

> The convention of returning None to signal in-place changes has a drawback: you cannot cascade calls to those methods. In constrast, methods that return new objects can be cascaded in the fluent interface style.

In constract, the built-in function `sorted` creates a new list and returns it.

Both `list.sort` and `sorted` takes two optional keyword-only arguments:

`reverse`
    If True, the items are returned in returned in descending order. The default is False.
    
`key` A one-argument function that will be applied to each item to produce its sorting key.

## Managing Ordered Sequences with bisect

The `bisect` moudle offers two main functions -- `bisect` and `insort`-- that used the binary search to quickly find and insert items in any sorted sequence.

### Searching with bisect

`bisect` does a binary search for needle in haystack -- which must be a sorted sequence -- to locate the position where needle can be inserted while maintaining haystack in ascending order.

In [55]:
import bisect
import sys

In [56]:
HAYSTACK = [1, 4, 5, 6, 8, 12, 15, 20, 21, 23, 23, 26, 29, 30]
NEEDLES = [0, 1, 2, 5, 8, 10, 22, 23, 29, 30, 31]

In [57]:
ROW_FMT = '{0:2d} @ {1:2d} {2}{0:<2d}'

def demo(bisect_fn):
    for needle in reversed(NEEDLES):
        position = bisect_fn(HAYSTACK, needle)
        offset = position * ' |'
        print(ROW_FMT.format(needle, position, offset))

In [58]:
demo(bisect.bisect_left)

31 @ 14  | | | | | | | | | | | | | |31
30 @ 13  | | | | | | | | | | | | |30
29 @ 12  | | | | | | | | | | | |29
23 @  9  | | | | | | | | |23
22 @  9  | | | | | | | | |22
10 @  5  | | | | |10
 8 @  4  | | | |8 
 5 @  2  | |5 
 2 @  1  |2 
 1 @  0 1 
 0 @  0 0 


In [59]:
demo(bisect.bisect_right)

31 @ 14  | | | | | | | | | | | | | |31
30 @ 14  | | | | | | | | | | | | | |30
29 @ 13  | | | | | | | | | | | | |29
23 @ 11  | | | | | | | | | | |23
22 @  9  | | | | | | | | |22
10 @  5  | | | | |10
 8 @  5  | | | | |8 
 5 @  3  | | |5 
 2 @  1  |2 
 1 @  1  |1 
 0 @  0 0 


### Inserting with bisect.insort

`insort(seq, item)` inserts item into seq so as to keep seq in ascending order.

In [60]:
import bisect
import random

In [61]:
SIZE = 7

random.seed(1729)

my_list = []
for i in range(SIZE):
    new_item = random.randrange(SIZE*2)
    bisect.insort(my_list, new_item)
    print('%2d ->' % new_item, my_list)

10 -> [10]
 0 -> [0, 10]
 6 -> [0, 6, 10]
 8 -> [0, 6, 8, 10]
 7 -> [0, 6, 7, 8, 10]
 2 -> [0, 2, 6, 7, 8, 10]
10 -> [0, 2, 6, 7, 8, 10, 10]


Like bisect, `insort` takes optional lo, hi arguments to limit hte search to a subsequence. 

## When a List Is Not the Answer

The list is flexiable and easy to use, but depending on specific requirements, there are better options. For example, if you need to store 10 million floating-point values, an array must be efficient. On the other hand, if you are adding and removing items from the ends of a list as a FIFO or LIFO data structure, a deque(double-ended queue) works faster.

### Arrays

If the list will only contain numbers, an `array.array` is more efficient than a list: it supports all mutable sequence operationis, and additional methods for fast loading and saving such as `.formbytes` and `.tofile`.

A Python array is as lean as a C array. When creating an array, you provide a typecode a letter to determine the underlying C type used to store each item in the array.

In [62]:
import array
import random

In [63]:
floats = array.array('d', (random.random() for i in range(10**7)))

In [64]:
floats[-1]

0.5963321947530882

In [65]:
fp = open('floats.bin', 'wb')
floats.tofile(fp)
fp.close()

In [66]:
floats2 = array.array('d')
f = open('floats.bin', 'rb')
floats2.fromfile(f, 10**7)
f.close()

In [67]:
floats2[-1]

0.5963321947530882

`array.tofile` and `array.fromfile` are easy to use and faster.

> Another fast and more flexible way of saving numeric data is the `pickle` module for object serialization. Saving an array of floats with `pickle.dump` is almost as fast ass with `array.tofile`. However, pickle handles almost all built-in types, including complex numbers, netsted collections, and even instance of user-defined classes automatically.

### MemoryView

The built-in `memoryview` class is a shared-memory sequence type that lets you handle slices of arrays without copying bytes. It was inpired by the Numpy library.

```
A memoryview is essentially a generalized NumPy array structure in Python itself. It allows you share memory between data-structure(things like PIL images, SQLite databases, NumPy array, etc.) without first copying. This is very important for large datasets.
```

`memoryview.cast` method lets you change the wary multiple bytes are read and written as units without moving bits around. `memoryview.cast` returns you another `memoryview` object, always sharing the same memory.

In [68]:
numbers = array.array('h', [-2, -1, 0, 1, 2])
memv = memoryview(numbers)

In [69]:
len(memv)

5

In [70]:
memv[0]

-2

In [71]:
memv_oct = memv.cast('B')

In [72]:
memv_oct.tolist()

[254, 255, 255, 255, 0, 0, 1, 0, 2, 0]

In [73]:
memv_oct[5] = 4

In [74]:
numbers

array('h', [-2, -1, 1024, 1, 2])

Meanwhile, if you are doing advanced numeric processing in arrays, you should using the NumPy and SciPy libraries.

### Numpy and SciPy

**For advanced array and matrix operations, NumPy and SciPy are the reason why Python become mainstream in scientific computing applications.** 

NumPy implements multidimensional array and matrix types that holds not only numbers but also user defined records, and provides efficient elementwise opertions.

SciPy is library, written on the top of NumPy, offering many scientific computing algorithms from linear algebra, numerical calculus, and statistics.

In [75]:
import numpy as np

In [76]:
a = np.arange(12)

In [77]:
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [78]:
type(a)

numpy.ndarray

In [79]:
a.shape

(12,)

In [80]:
a.reshape(3, 4)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

### Deque and Other Queues

The `.append` and `.pop` methods make a list usable as a stack or a queue. But inserting and removing from the left of list is costly because the entire list must shifted.

**The class `collections.deque` is a thread-safe double-ended queue desinged for fast inserting and removing from both ends.**

In [81]:
import collections

In [82]:
dq = collections.deque(range(10), maxlen=10)

In [83]:
dq

deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [84]:
dq.rotate(3)

In [85]:
dq

deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6])

In [86]:
dq.rotate(-4)

In [87]:
dq

deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 0])

In [88]:
dq.appendleft(-1)

In [89]:
dq

deque([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [90]:
dq.extend([11, 22, 33])

In [91]:
dq

deque([3, 4, 5, 6, 7, 8, 9, 11, 22, 33])

In [92]:
dq.extendleft([10, 20, 30, 40])

In [93]:
dq

deque([40, 30, 20, 10, 3, 4, 5, 6, 7, 8])

**The append and popleft operations are atomic,m so deque is safe use as a LIFO queue in multithread application without the need of using locks.**

## Chapter Summary

**Python sequences are often as mutable or immutable, but it is also useful to consider a different axis: flat sequences and container sequences.** The former are more compact, faster, and easier to use, but are limited to storing atomic data such as numbers, characters, and bytes. Container sequences are more flexible, but may suprised you when they hold mutable objects, you need to be careful to use them correctly with nested data structure.

List comprehension and generator expression are powerful notations to build and initialize sequences.

Tuple in Python play two roles: as records with unamed fields and as immutable lists. When a tuple is used as a record, tuple unpacking is the safest, most readable way of getting at the fields. The new `*` syntax makes tuple unpacking even better by making it easier to ignore some fields and to deal with optional fields.

Sequence slicing is favorite Python syntax features, and it is even more powerful than many realize. Multidimensional slicing and ellips(...) notation, as used in NumPy, may also supported by user-defined sequence.

Repeated concatentions as in seq * n is convenient and, with care, can be used to initialize lists of lsit containing immutable items. **Agumented assignment with += and *= behaves differently  for mutable and immutable sequences.** In the latter case, these operators build new sequences, Buf if the sequence is mutable, it is usuallly changed in place.

The `sort` method and `sorted` built-in function are easy to use and flexible.

Beyond lists and tuples, the Python standard library provides `array.array`.