# Chapter 2: An Array of Sequences

Most of the discussion in this chapter deals with the variety of sequences in Python. Understanding the different sequences saves us from reinventing the wheel, and their common interface lets us create APIs that properly support and leverage existing and future sequence types.

In [None]:
%load_ext autoreload
%autoreload 2

## Overview of built-in sequences

*Container sequences:* `list`, `tuple` and `collections.deque` can hold items of different types. Holds references to objects they contain.

*Flat sequences:* `str`, `bytes` and `array.array` can only hold items of one type. Physically store value of each item in memory. More compact, limited to hold primitives.

Alternatively, we can categorise them in

*Mutable sequences:* `list`, `collections.deque` and `array.array`.

*Immutable sequences:* `str`, `bytes` and `tuple`.

## List Comprehensions and Generator Expressions

*List comprehensions* (listcomps) are only meant to be used when you want to return a list. To filll up other sequence types, a *generator expression* (genexps) should be used.

Genexps save memory by yielding items one at a time using the iterator protocol, instead of building a whole list to feed to another constructor. 

Genexps use the same syntax as listcomps, but are enclosed in parentheses instead of brackets.

If the generator is the single argument in a function call, there is no need to duplicate the parentheses.

## Tuples Are Not Just Immutable Lists

### Tuples as Records

Tuples hold records: Each item in the tuple hodls the data for one field and the position of the item gives its meaning.

In [3]:
# Example 2-7
lax_coordinates = (33.9425, - 118.408056)

city, year, pop, chg, area = ("Tokyo", 2003, 32450, 0.66, 8014)

traveler_ids = [("USA", "31195855"), ("BRA", "CE342567")]
for passport in sorted(traveler_ids):
    print("%s/%s" % passport)
    
for country, _ in traveler_ids:
    print(country)

BRA/CE342567
USA/31195855
USA
BRA


### Tuple Unpacking

In [4]:
latitude, longitude = lax_coordinates
print(latitude)
print(longitude)

33.9425
-118.408056


We can use `*` to grab excess items when unpacking a tuple. The `*` prefix can appear in any position.

In [5]:
a, b, *rest = range(5)

In [8]:
rest

[2, 3, 4]

In [9]:
a, *rest, b = range(5)

In [10]:
rest

[1, 2, 3]

Python will do the right thing if the expression matches the nesting structure.

In [16]:
a, b, (c, d) = ("a", "b", ("c", "d"))

### Named Tuples

The `collections.namedtuple` function is a factory that produces subclasses of `tuple` enhanced with field names and a class name.

Two parameters are required for a `namedtuple`: A class name and a list of field names (or iterable of strings or with single-space delimited string). Data must be passed using positional arguments in constructor.

In [17]:
from collections import namedtuple

City = namedtuple("City", "name country population coordinates")
tokyo = City("Tokyo", "JP", 36933, (35.689722, 139.691667))

In [18]:
tokyo

City(name='Tokyo', country='JP', population=36933, coordinates=(35.689722, 139.691667))

In [20]:
tokyo.population

36933

In [19]:
tokyo.coordinates

(35.689722, 139.691667)

### Tuples as immutable lists

`tuple` supports all `list` methods that do not involve adding or removing items.

## Slicing

`s[a:b:c]` can be used to specify a stride or step `c`, causing the resulting slice to skip items. The stridee can also be negative, returning items in reverse.

In [21]:
s = "bicycle"

In [22]:
s[::3]

'bye'

In [23]:
s[::-3]

'eyb'

In [24]:
s[::-1]

'elcycib'

In [31]:
from examples import FrenchDeck

deck = FrenchDeck()

# Getting only aces using slicing
deck[12::13]

[Card(rank='A', suit='spades'),
 Card(rank='A', suit='diamonds'),
 Card(rank='A', suit='clubs'),
 Card(rank='A', suit='hearts')]

The notation `a:b:c` is only valid within `[]` when used as the indexing or subscript operator, and it produces a **slice object** `slice(a, b, c)`. Slice objects are usefull because they let you assign names to slices (see example 2-11 on p. 37).

### Multidimensional Slicing and Ellipses

Multidimensional slicing can fetch items in multidimensional arrays (e.g. in a 2-D `numpy.ndarray`) using the syntax `a[j, k]`, or using slices `a[m:n, k:l]`. To evaluate `a[j, k]`, NumPy calls `a.__getitem__((j, k))`. Note that built-in Python sequences only support one index or slice (not a tuple).

NumPy also supports the use of `...`, which is an alias of the `Ellipsis` object, the single instance of the `ellipses` class. It is used as a short cut for slicing higher-dimensional arrays, e.g. `x[i, ...]` is shorthand for `x[i, :, :, :]` (given that `x` is a 4-D array).

### Assigning to Slices

Mutable sequences can be grafted, excised and otherwise modified in place using slice notation on the left side of an assignment statement. Note that when the target of the assignment is a slice, the right side *must be an iterable*.

In [32]:
a = list(range(10))
a

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [33]:
a[3:5] = [20, 30]
a

[0, 1, 2, 20, 30, 5, 6, 7, 8, 9]

In [34]:
del a[5:7]
a

[0, 1, 2, 20, 30, 7, 8, 9]

In [35]:
# This will fail, right side not an iterable
a[:3] = 100

TypeError: can only assign an iterable

### Using + and * with Sequences

Both `+` and `*` always create a new object, and will never change their operands (which usuallly must be of the same sequence type). Note that there are pitfalls when using `*`, see p. 40.

### Augmented Assignment with Sequences

The augmented assignment operators `+=` and `*=` behave very differently depending on the first operand. 

If `a` is a  mutable sequence (which almost surely has the special method `__iadd__`, which is what `+=` calls), `a` will be modified in place. 

In the case that `a` is an immutable sequence, the identity of the bound object may or may not change (see p. 41).

## `list.sort` and the `sorted` Built-In Function

The `list.sort` method sorts a list in place. It returns `None` to indicate that it has not created a new list. This is an **important Python API convention**: Functions or methods that change an object in place should return `None` to make it clear to the caller that the object itself was changed, and no new object was created.

The built-in function `sorted` creates a new list and returns it.

Both `list.sort` and `sorted` take two optional, keyword-only arguments:

* `reverse`: If `True`, items are returned in descending order (default `False`).

* `key`: A one-argument function that will be applied to each item to produce its sorting key.

In [36]:
fruits = ["apple", "banana", "grape", "raspberry"]
sorted(fruits)

['apple', 'banana', 'grape', 'raspberry']

In [37]:
sorted(fruits, key=len)

['apple', 'grape', 'banana', 'raspberry']

In [38]:
sorted(fruits, key=len, reverse=True)

['raspberry', 'banana', 'apple', 'grape']

In [39]:
fruits

['apple', 'banana', 'grape', 'raspberry']

In [44]:
# Sorts in place
fruits.sort(key=len) # Returns None, omitted by console
fruits

['apple', 'grape', 'banana', 'raspberry']

## Managing Ordered Sequences with `bisect`

Once sequences are sorted, they can be very efficiently searched.

The `bisect` module offers two main functions - `bisect` and `insort` - that use the binary search algorithm to quickly find and insert items in any sorted sequence. See pp. 47-49 for details.

A useful application of `bisect` is to perform table lookups by numeric values.

In [46]:
import bisect

def grade(score, breakpoints=None, grades="FDCBA"):
    if breakpoints is None:
        breakpoints = [60, 70, 80, 90]
    i = bisect.bisect(breakpoints, score)
    return grades[i]

[grade(score) for score in (33, 99, 77, 70, 89, 90, 100)]

['F', 'A', 'C', 'C', 'B', 'A', 'A']

`insort(item, seq)` inserts `item` into `seq` so as to keep `seq` in ascending order.

Both `bisect` and `insort` take optional `lo`, `hi` arguments to limit the search to a sub-sequence.

## When a List is Not the Answer

### Arrays

If the list will only contain numbers, an `array.array` is more efficient that a `list`. 

When creating an `array`, you provide a typecode letter indicating the underying C type used to store each item.

### Memory Views

The built-in `memoryview` class is a shared-memory sequence type that lets you handle slices of arrays without copying bytes (inspired by NumPy). See pp. 54-55 for details.

### Deques and Other Queues

Inserting and removing from the eft of a list is expensive, because the whole list needs to be shuffled. `collections.deque` is optimised for inserting and removing from both ends, and can be bounded.

Note that removing items from the middle of a `deque` is not as fast.

In [12]:
from collections import deque

dq = deque(range(10), maxlen=10)
dq

deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [13]:
# n > 0 takes items from right and prepends the to the left end
dq.rotate(3)
print(dq)

# n < 0 takes items from left and appends them to the right end
dq.rotate(-3)
print(dq)

deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6], maxlen=10)
deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)


In [14]:
# Last element removed
dq.appendleft(-4)
dq

deque([-4, 0, 1, 2, 3, 4, 5, 6, 7, 8])