# Overview of Built-In Sequences

The standard library offers a rich selection of sequence types implemented in C:
* Container sequences: 
    * *list*, *tuple*, and *collections.deque* can hold items of different types
    * hold references to the objects they contain, which may be of any type
* Flat sequences: 
    * *str*, *bytes*, *bytearray*, *memoryview*, and *array.array* hold items of one type
    * physically store the value if each item within its own memory space
    * more compact, but limited to holding primitive values like characters, bytes, and numbers
    
Another way of groupong sequence types is by mutability:
* Mutable sequences: *list*, *bytearray*, *array.array*, *collections.deque*, and *memoryview*
* Immutable sequences: *tuple*, *str*, and *bytes*

# List Comprehensions (Listcomps) and Generator Expressions

## List Comprehensions and Readability

In [None]:
# Example 2-1. Build a list of Unicode codepoints from a string
symbols = '$¢£¥€¤'
codes = []
for symbol in symbols:
    codes.append(ord(symbol))
codes

In [None]:
# Example 2-2. Build a list of Unicode codepoints from a string
# Use list comprehension
symbols = '$¢£¥€¤'
codes = [ord(symbol) for symbol in symbols]
codes

## Listcomps Versus map and filter

In [None]:
# Example 2-3. The same list built by a listcomp and a map/filter composition
symbols = '$¢£¥€¤'
beyond_ascii = [ord(s) for s in symbols if ord(s) > 127]
beyond_ascii

In [None]:
beyond_ascii = list(filter(lambda c : c > 127, map(ord, symbols)))
beyond_ascii

## Cartesian Products

In [None]:
# Example 2-4. Cartesian product using a list comprehension
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
# Generate a list of tuples arranged by color, then size
tshirts = [(color, size) for color in colors for size in sizes]
tshirts

In [None]:
# Note how the resulting list is arranged as if the for loops were nested in the same order as they appear in the listcomp 
for color in colors:
    for size in sizes:
        print((color, size))

In [None]:
# To get items arranged by size, then color, just rearrange the for clauses; adding a line break to the listcomp makes it easy to see how the result will be ordered.
tshirts = [(color, size) for size in sizes \
                         for color in colors]
tshirts

## Generator Expressions

To initialize tuples, arrays, and other types of sequences, you could also start from a listcomp, but a genexp saves memory because it yields items one by one using the iterator protocol instead of building a whole list just to feed another constructor.

Genexps use the same syntax as listcomps, but are enclosed in parentheses () rather than brackets [].

In [None]:
# Example 2-5. Basic usage of genexps to build a tuple and an array
symbols = '$¢£¥€¤'
# Same as tuple((ord(symbol) for symbol in symbols)), we can remove extra parentheses
tuple(ord(symbol) for symbol in symbols)

In [None]:
import array
array.array('I', (ord(symbol) for symbol in symbols))

In [None]:
# Example 2-6. Cartesian product in a generator expression
# The six-item list of T-shirts is never built in memory: 
# the generator expression feeds the for loop producing one item at a time.
# If two lists used in the Catersian product had 1,000 items each,
# using a generator expression would save expense of building a list
# with a million items just to feed the for loop
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
for tshirt in ('%s %s' % (c, s) for c in colors for s in sizes):
    print(tshirt)


# Tuples Are Not Just Immutable Lists

Some introductory texts about Python present tuples as “immutable lists,” but that is short selling them. Tuples do double duty: they can be used as immutable lists and also as records with no field names. This use is sometimes overlooked, so we will start with that.

## Tuples as Records

Tuples hold records: each item in the tuple holds the data for one field and the position of the item gives its meaning.

If you think of a tuple just as an immutable list, the quantity and the order of the items may or may not be important, depending on the context. But when using a tuple as a collection of fields, the number of items is often fixed and their order is always vital.

In [None]:
# Example 2-7. Tuples used as records

# Latitude and longitude of the Los Angeles International Airport.
lax_coordinates = (33.9425, -118.408056)

# Data about Tokyo: name, year, population (millions), population change (%), area (km2)
city, year, pop, chg, area = ('Tokyo', 2003, 32450, 0.66, 8014)

# A list of tuples of the form (country_code, passport_number)
traveler_ids = [('USA', '31195855'), ('BRA', 'CE342567')]

# As we iterate over the list, passport is bound to each tuple
# The % formatting operator understands tuples and treats each item as a separate field.
for passport in sorted(traveler_ids):
    print('%s/%s' % passport)
    
# The for loop knows how to retrieve the items of a tuple separately—this is called “unpack-ing". 
# Here we are not interested in the second item, so it's assigned to _, a dummy variable
for country, _ in traveler_ids:
    print(country)


## Tuple Unpacking

The most visible form of tuple unpacking is parallel assignment; that is, assigning items from an iterable to a tuple of variables, as you can see in this example:

In [None]:
lax_coordinates = (33.9425, -118.408056)
latitude, longitude = lax_coordinates # tuple unpacking
latitude

In [None]:
longitude

A elegant application if tuple unpacking is swapping the values of variables without using a temporary variable:

In [None]:
b, a = a, b

Another example of tuple unpacking is prefixing an argument with a star when calling a function:

In [None]:
divmod(20, 8)

In [None]:
t = (20, 8)
divmod(*t)

In [None]:
quotient, remainder = divmod(*t)
quotient, remainder

The preceding code also shows a further use of tuple unpacking: enabling functions to return multiple values in a way that is convenient to the caller. Forexample, the *os.path.split()* function builds a tuple *(path, last_part)* from a filesystem path:

In [None]:
import os
_, filename = os.path.split('/home/luciano/.ssh/idrsa.pub')
filename

Sometimes when we only care about certain parts of a tuple when unpacking, a dummy variable like _ is used as placeholder, as in the preceding example.

Another way of focusing on just some of the items when unpacking a tuple is to use the *, as we’ll see right away.

## Using * to grab excess items

Defining function parameters with *args to grab arbitrary excess arguments is a classic Python feature.
In Python 3, this idea was extended to apply to parallel assignment as well:

In [None]:
a, b, *rest = range(5)
a, b, rest

In [None]:
a, b, *rest = range(3)
a, b, rest

In [None]:
a, b, *rest = range(2)
a, b, rest

In the context of parallel assignment, the * prefix can be applied to exactly one variable, but it can appear in any position:

In [None]:
a, *body, c, d = range(5)
a, body, c, d

In [None]:
*head, b, c, d = range(5)
head, b, c, d

Finally, a powerful feature of tuple unpacking is that it works with nested structures

## Nested Tuple Unpacking

The tuple to receive an expression to unpack can have nested tuples, like (a, b, (c, d)), and Python will do the right thing if the expression matches the nesting structure. 

In [None]:
# Example 2-8. Unpacking nested tuples to access the longitude
metro_areas = [('Tokyo', 'JP', 36.933, (35.689722, 139.691667)), 
              ('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
              ('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),
              ('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),
              ('Sao Paulo', 'BR', 19.649, (-23.547778, -46.635833)),]
print('{:15} | {:^9} | {:^9}'.format('', 'lat.', 'long.'))
fmt = '{:15} | {:9.4f} | {:9.4f}'
for name, cc, pop, (latitude, longitude) in metro_areas:
    if longitude <= 0:
        print(fmt.format(name, latitude, longitude))

As designed, tuples are very handy. But there is a missing feature when using them as records: sometimes it is desirable to name the fields. That is why the namedtuple function was invented. Read on.

## Named Tuples

The collections.namedtuple function is a factory that produces subclasses of tuple enhanced with field names and a class name—which helps debugging.

In [None]:
# Example 2-9. Defining and using a named tuple type
from collections import namedtuple
# Two parameters are required to create a named tuple: 
# a class name and a list of field names, which can be given as 
# an iterable of strings or as a single space-delimited string. 
City = namedtuple('City', 'name country population coordinates')
tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))
tokyo

In [None]:
# You can access the fields by name or position.
tokyo.population

In [None]:
tokyo.coordinates

In [None]:
tokyo[1]

A named tuple type has a few attributes in addition to those inherited from tuple.

In [None]:
# Example 2-10. Named tuple attributes and methods (continued from the previous example)
# the _fields class attribute, a tuple with the field names of the class.
City._fields

In [None]:
LatLong = namedtuple('LatLong', 'lat long')
delhi_data = ('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889))

# the class method _make(iterable)
# _make() allow you to instantiate a named tuple from an iterable; City(*delhi_data) would do the same.
delhi = City._make(delhi_data)

# the _asdict() instance method
# _asdict() returns a collections.OrderedDict built from the named tuple instance. That can be used to produce a nice display of city data.
delhi._asdict()

In [None]:
for key, value in delhi._asdict().items():
    print(key + ':', value)

Now that we’ve explored the power of tuples as records, we can consider their second role as an immutable variant of the list type.

## Tuples as Immutable Lists

When using a tuple as an immutable variation of list, it helps to know how similar they actually are. As you can see in Table 2-1, tuple supports all list methods that do not involve adding or removing items, with one exception—tuple lacks the \____reversed__\__ method. However, that is just for optimization; reversed(my_tuple) works without it.

See the book for Table 2-1. Methods and attributes found in list or tuple (methods implemented by object are omitted for brevity).

Every Python programmer knows that sequences can be sliced using the s[a:b] syntax. We now turn to some less well-known facts about slicing. 

# Slicing

A common feature of list, tuple, str, and all sequence types in Python is the support of slicing operations, which are more powerful than most people realize.

In this section, we describe the use of these advanced forms of slicing. Their implementation in a user-defined class will be covered in Chapter 10, in keeping with our philosophy of covering ready-to-use classes in this part of the book, and creating new classes in Part IV.

## Why Slices and Range Exclude the Last Item

The Pythonic convention of excluding the last item in slices and ranges works well with the zero-based indexing used in Python, C, and many other languages. Some convenient features of the convention are:
* It’s easy to see the length of a slice or range when only the stop position is given: range(3) and my_list[:3] both produce three items.
* It’s easy to compute the length of a slice or range when start and stop are given: just subtract stop - start.
* “It’s easy to split a sequence in two parts at any index x, without overlapping: simply get my_list[:x] and my_list[x:]. For example:

In [None]:
l = [10, 20, 30, 40, 50, 60]
l[:2] # split at 2

In [None]:
l[2:]

In [None]:
l[:3] # split at 3

In [None]:
l[3:]

Now let’s take a close look at how Python interprets slice notation.

## Slice Objects

This is no secret, but worth repeating just in case: s[a:b:c] can be used to specify a stride or step c, causing the resulting slice to skip items. The stride can also be negative, returning items in reverse. Three examples make this clear:

In [None]:
s = 'bicycle'
s[::3]

In [None]:
s[::-1]

In [None]:
s[::-2]

The notation a:b:c is only valid within [] when used as the indexing or subscript operator, and it produces a slice object: slice(a, b, c). As we will see in “How Slicing Works”, to evaluate the expression seq[start:stop:step], Python calls seq.\____getitem__\__(slice(start, stop, step)). Even if you are not implementing your own sequence types, knowing about slice objects is useful because it lets you assign names to slices, just like spreadsheets allow naming of cell ranges.

Suppose you need to parse flat-file data like the invoice shown in Example 2-11. Instead of filling your code with hardcoded slices, you can name them. See how readable this makes the for loop at the end of the example.

In [None]:
invoice = """
... 0.....6.................................40........52...55........
... 1909  Pimoroni PiBrella                     $17.50    3    $52.50
... 1489  6mm Tactile Switch x20                 $4.95    2     $9.90
... 1510  Panavise Jr. - PV-201                 $28.00    1    $28.00
... 1601  PiTFT Mini Kit 320x240                $34.95    1    $34.95
... """
SKU = slice(0, 6)
DESCRIPTION = slice(6, 40)
UNIT_PRICE = slice(40, 52)
QUANTITY = slice(52, 55)
ITEM_TOTAL = slice(55, None)
line_items = invoice.split('\n')[2:]
for item in line_items:
    print(item[UNIT_PRICE], item[DESCRIPTION])

We’ll come back to slice objects when we discuss creating your own collections in “Vector Take #2: A Sliceable Sequence”. Meanwhile, from a user perspective, slicing includes additional features such as multidimensional slices and ellipsis (...) notation. Read on.

## Multidimensional Slicing and Ellipsis

The [] operator can also take multiple indexes or slices separated by commas. This is used, for instance, in the external NumPy package, where items of a two-dimensional numpy.ndarray can be fetched using the syntax a[i, j] and a two-dimensional slice obtained with an expression like a[m:n, k:l]. Example 2-22 later in this chapter shows the use of this notation. The \____getitem__\__ and \____setitem__\__ special methods that handle the [] operator simply receive the indices in a[i, j] as a tuple. In other words, to evaluate a[i, j], Python calls a.\____getitem__\__((i, j)).

The built-in sequence types in Python are one-dimensional, so they support only one index or slice, and not a tuple of them.

The ellipsis — written with three full stops (...) and not … (Unicode U+2026)—is recognized as a token by the Python parser. It is an alias to the Ellipsis object, the single instance of the ellipsis class.2 As such, it can be passed as an argument to functions and as part of a slice specification, as in f(a, ..., z) or a[i:...]. NumPy uses ... as a shortcut when slicing arrays of many dimensions; for example, if x is a four - dimensional array, x[i, ...] is a shortcut for x[i, :, :, :,]. See the Tentative NumPy Tutorial to learn more about this.

At the time of this writing, I am unaware of uses of Ellipsis or multidimensional indexes and slices in the Python standard library. If you spot one, let me know. These syntactic features exist to support user-defined types and extensions such as NumPy.

Slices are not just useful to extract information from sequences; they can also be used to change mutable sequences in place—that is, without rebuilding them from scratch.

## Assigning to Slices

Mutable sequences can be grafted, excised, and otherwise modified in place using slice notation on the left side of an assignment statement or as the target of a del statement. The next few examples give an idea of the power of this notation:

In [None]:
l = list(range(10))
l

In [None]:
l[2:5] = [20, 30]
l

In [None]:
del l[5:7]
l

In [None]:
l[3::2] = [11, 12]
l

In [None]:
# When the target of the assignment is a slice, the right side must be an iterable object, even if it has just one item.
l[2:5] = 100

In [None]:
l[2:5] = [100]
l

Everybody knows that concatenation is a common operation with sequences of any type. Any introductory Python text explains the use of + and * for that purpose, but there are some subtle details on how they work, which we cover next.

# Using + and * with Sequences

Python programmers expect that sequences support + and \*. Usually both operands of + must be of the same sequence type, and neither of them is modified but a new sequence of the same type is created as result of the concatenation.

To concatenate multiple copies of the same sequence, multiply it by an integer. Again, a new sequence is created:

In [None]:
l = [1, 2, 3]
l * 5

In [None]:
5 * 'abcd'

Both + and * always create a new object, and never change their operands.

The next section covers the pitfalls of trying to use * to initialize a list of lists.

## Building Lists of Lists

Sometimes we need to initialize a list with a certain number of nested lists—for example, to distribute students in a list of teams or to represent squares on a game board. The best way of doing so is with a list comprehension, as in Example 2-12.

In [None]:
# Example 2-12. A list with three lists of length 3 can represent a tic-tac-toe board
board = [['_'] * 3 for i in range(3)]
board

In [None]:
board[1][2] = 'X'
board

In [None]:
# A tempting with wrong shortcut
# Example 2-13. A list with three references to the same list is useless
weird_board = [['_'] * 3] * 3
weird_board

In [None]:
# Placing a mark in row 1, column 2, reveals that all rows are aliases referring to the same object
weird_board[1][2] = 'O'
weird_board

The problem with Example 2-13 is that, in essence, it behaves like this code:

In [None]:
# The same row is appended three times to board.
row = ['_'] * 3
board = []
for i in range(3):
    board.append(row)
board

On the other hand, the list comprehension from Example 2-12 is equivalent to this code:

In [None]:
board = []
for i in range(3):
    row = ['_'] * 3
    board.append(row)
board

In [None]:
board[2][0] = 'X'
board

So far we have discussed the use of the plain + and * operators with sequences, but there are also the += and *= operators, which produce very different results depending on the mutability of the target sequence. The following section explains how that works.

# Augmented Assignment with Sequences

The augmented assignment operators += and *= behave very differently depending on the first operand. To simplify the discussion, we will focus on augmented addition first (+=), but the concepts also apply to *= and to other augmented assignment operators.

The special method that makes += work is \____iadd__\__ (for “in-place addition”). However, if \____iadd__\__ is not implemented, Python falls back to calling \____add__\__. Consider this simple expression:

In [None]:
a += b

If a implements \____iadd__\__, that will be called. In the case of mutable sequences (e.g., list, bytearray, array.array), a will be changed in place (i.e., the effect will be similar to a.extend(b)). However, when a does not implement \____iadd__\__, the expression a += b has the same effect as a = a + b: the expression a + b is evaluated first, producing a new object, which is then bound to a. In other words, the identity of the object bound to a may or may not change, depending on the availability of \____iadd__\__.

In general, for mutable sequences, it is a good bet that __iadd__ is implemented and that += happens in place. For immutable sequences, clearly there is no way for that to happen.

What I just wrote about += also applies to \*=, which is implemented via \____imul__\__. The \____iadd__\__ and \____imul__\__ special methods are discussed in Chapter 13.

Here is a demonstration of \*= with a mutable sequence and then an immutable one:

In [None]:
l = [1, 2, 3]
# ID of the initial list
id(l)

In [None]:
l *= 2
l

In [None]:
# After multiplication, the list is the same object, with new items appended 
id(l)

In [None]:
t = (1, 2, 3)
# ID of the initial tuple 
id(t)

In [None]:
t *= 2
# After multiplication, a new tuple was created 
id(t)

Repeated concatenation of immutable sequences is inefficient, because instead of just appending new items, the interpreter has to copy the whole target sequence to create a new one with the new items concatenated.

We’ve seen common use cases for +=. The next section shows an intriguing corner case that highlights what “immutable” really means in the context of tuples.

## A += Assignment Puzzler

Try to answer without using the console: what is the result of evaluating the two expressions in Example 2-14? 

In [None]:
# Example 2-14. A riddle
t = (1, 2, [30, 40])
t[2] += [50, 60]

What happens next? Choose the best answer:
1. t becomes (1, 2, [30, 40, 50, 60]).
2. TypeError is raised with the message 'tuple' object does not support item assignment.
3. Neither.
4. Both A and B.

When I saw this, I was pretty sure the answer was B, but it’s actually D, “Both A and B.”! Example 2-15 is the actual output from a Python 3.4 console (actually the result is the same in a Python 2.7 console). 

In [None]:
# Example 2-15. The unexpected result: item t2 is changed and an exception is raised
t = (1, 2, [30, 40])
t[2] += [50, 60]

In [None]:
t

Online Python Tutor is an awesome online tool to visualize how Python works in detail. Figure 2-3 is a composite of two screenshots showing the initial and final states of the tuple t from Example 2-15.

If you look at the bytecode Python generates for the expression s[a] += b (Example 2-16), it becomes clear how that happens. 

In [None]:
import dis
dis.dis('s[a] += b')

0, 3, 6, 7: Put the value of s[a] on TOS (Top Of Stack)

8, 11: Perform TOS += b. This succeeds if TOS refers to a mutable object (it’s a list, in Example 2-15).

12, 13: Assign s[a] = TOS. This fails if s is immutable (the t tuple in Example 2-15). 

This example is quite a corner case—in 15 years of using Python, I have never seen this strange behavior actually bite somebody.
I take three lessons from this:
* Putting mutable items in tuples is not a good idea.
* Augmented assignment is not an atomic operation—we just saw it throwing an exception after doing part of its job.
* Inspecting Python bytecode is not too difficult, and is often helpful to see what is going on under the hood.”

After witnessing the subtleties of using + and * for concatenation, we can change the subject to another essential operation with sequences: sorting

# list.sort and the sorted Built-In function

The *list.sort* method sorts a list in place—that is, without making a copy. It returns None to remind us that it changes the target object, and does not create a new list. This is an important Python API convention: functions or methods that change an object in place should return None to make it clear to the caller that the object itself was changed, and no new object was created. The same behavior can be seen, for example, in the *random.shuffle* function.

In contrast, the built-in function sorted creates a new list and returns it. In fact, it accepts any iterable object as an argument, including immutable sequences and generators (see Chapter 14). Regardless of the type of iterable given to sorted, it always returns a newly created list.

Both list.sort and sorted take two optional, keyword-only arguments:
* reverse: If True, the items are returned in descending order (i.e., by reversing the comparison of the items). The default is False.
* key: A one-argument function that will be applied to each item to produce its sorting key. For example, when sorting a list of strings, key=str.lower can be used to perform a case-insensitive sort, and key=len will sort the strings by character length. The default is the identity function (i.e., the items themselves are compared).

The key optional keyword parameter can also be used with the min() and max() built-ins and with other functions from the standard library (e.g., itertools.groupby() and heapq.nlargest()).

Here is a few examples to clarify the use of these functions and keyword arguments:

In [None]:
fruits = ['grape', 'raspberry', 'apple', 'banana']
# This produces a new list of strings sorted alphabetically
sorted(fruits)

In [None]:
# Inspecting the original list, we see it is unchanged
fruits

In [None]:
# This is simply reverse alphabetical ordering
sorted(fruits, reverse = True)

In [None]:
# A new list of strings, now sorted by length. 
# Because the sorting algorithm is stable, "grape" and "apple", both of length 5,
# are in the original order
sorted(fruits, key = len)

In [None]:
# These are the strings sorted in descending order of length. 
# It is not the reverse of the previous result because the sorting is stable, 
# so again “grape” appears before “apple.
sorted(fruits, key = len, reverse = True)

In [None]:
# So far, the ordering of the original fruits list has not changed
fruits

In [None]:
# This sorts the list in place, and returns None (which the console omits)
fruits.sort()

In [None]:
# Now fruits is sorted
fruits

Once your sequences are sorted, they can be very efficiently searched. Fortunately, the standard binary search algorithm is already provided in the bisect module of the Python standard library. We discuss its essential features next, including the convenient bisect.insort function, which you can use to make sure that your sorted sequences stay sorted.

# Managing Ordered Sequences with bisect

The bisect module offers two main functions—bisect and insort—that use the binary search algorithm to quickly find and insert items in any sorted sequence. 

## Searching with bisect

*bisect(haystack, needle)* does a binary search for needle in haystack — which must be a sorted sequence — to locate the position where needle can be inserted while maintaining haystack in ascending order. In other words, all items appearing up to that position are less than or equal to needle. You could use the result of bisect(haystack, needle) as the index argument to haystack.insert(index, needle)—however, using insort does both steps, and is faster.

In [None]:
# Example 2-17. bisect finds insertion points for items in a sorted sequence
import bisect
import sys

HAYSTACK = [1, 4, 5, 6, 8, 12, 15, 20, 21, 23, 23, 26, 29, 30]
NEEDLES = [0, 1, 2, 5, 8, 10, 22, 23, 29, 30, 31]

ROW_FMT = '{0:2d} @ {1:2d}    {2}{0:<2d}'

def demo(bisect_fn):
    for needle in reversed(NEEDLES):
        # Use the chosen bisect function to get the insertion position
        position = bisect_fn(HAYSTACK, needle)
        # Build a pattern of vertical bars propositional to the offset
        offset = position * '  |' 
        # Print formatted row showing needle and insertion point.
        print(ROW_FMT.format(needle, position, offset))

if __name__ == '__main__':
    # Choose the bisect function to use according to the last command-line argument.
    if sys.argv[-1] == 'left':
        bisect_fn = bisect.bisect_left
    else:
        bisect_fn = bisect.bisect

    # Print header with name of function selected.
    print('DEMO:', bisect_fn.__name__)
    print('haystack ->', ' '.join('%2d' % n for n in HAYSTACK))
    demo(bisect_fn)

The behavior of bisect can be fine-tuned in two ways.

* First, a pair of optional arguments, lo and hi, allow narrowing the region in the sequence to be searched when inserting. lo defaults to 0 and hi to the len() of the sequence.

* Second, bisect is actually an alias for bisect_right, and there is a sister function called bisect_left. Their difference is apparent only when the needle compares equal to an item in the list: bisect_right returns an insertion point after the existing item, and bisect_left returns the position of the existing item, so insertion would occur before it. With simple types like int this makes no difference, but if the sequence contains objects that are distinct yet compare equal, then it may be relevant. For example, 1 and 1.0 are distinct, but 1 == 1.0 is True

An interesting application of bisect is to perform table lookups by numeric values—for example, to convert test scores to letter grades, as in Example 2-18. 

In [None]:
# Example 2-18. Given a test score, grade returns the corresponding letter grade
def grade(score, breakpoints = [60, 70, 80, 90], grades='FDCBA'):
    i = bisect.bisect(breakpoints, score)
    return grades[i]

[grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]

The code in Example 2-18 is from the bisect module documentation, which also lists functions to use bisect as a faster replacement for the index method when searching through long ordered sequences of numbers.

These functions are not only used for searching, but also for inserting items in sorted sequences, as the following section shows.

## Inserting with bisect.insort

Sorting is expensive, so once you have a sorted sequence, it’s good to keep it that way. That is why bisect.insort was created.
insort(seq, item) inserts item into seq so as to keep seq in ascending order. See Example 2-19.

In [None]:
# Example 2-19. Insort keeps a sorted sequence always sorted
import bisect
import random

SIZE = 7

random.seed(1729)

my_list = []
for i in range(SIZE):
    new_item = random.randrange(SIZE * 2)
    bisect.insort(my_list, new_item)
    print('%2d ->' % new_item, my_list)

Like bisect, insort takes optional lo, hi arguments to limit the search to a sub-sequence. There is also an insort_left variation that uses bisect_left to find insertion points.
Much of what we have seen so far in this chapter applies to sequences in general, not just lists or tuples. Python programmers sometimes overuse the list type because it is so handy—I know I’ve done it. If you are handling lists of numbers, arrays are the way to go. The remainder of the chapter is devoted to them.

# When a List IS Not the Answer

The list type is flexible and easy to use, but depending on specific requirements, there are better options. For example, if you need to store 10 million floating-point values, an array is much more efficient, because an array does not actually hold full-fledged float objects, but only the packed bytes representing their machine values—just like an array in the C language. On the other hand, if you are constantly adding and removing items from the ends of a list as a FIFO data structure, a deque (double-ended queue) works faster.

For the remainder of this chapter, we discuss mutable sequence types that can replace lists in many cases, starting with arrays.

## Arrays

If the list will only contain numbers, an array.array is more efficient than a list: it supports all mutable sequence operations (including .pop, .insert, and .extend), and additional methods for fast loading and saving such as .frombytes and .tofile.

A Python array is as lean as a C array. When creating an array, you provide a typecode, a letter to determine the underlying C type used to store each item in the array. For example, b is the typecode for signed char. If you create an array('b'), then each item will be stored in a single byte and interpreted as an integer from –128 to 127. For large sequences of numbers, this saves a lot of memory. And Python will not let you put any number that does not match the type for the array.

Example 2-20 shows creating, saving, and loading an array of 10 million floating-point random numbers.

In [None]:
from array import array # Import the array type.
from random import random
# Create an array of double-precision floats (typecode 'd') 
# from any iterable object—in this case, a generator expression.
floats = array('d', (random() for i in range(10**7)))
# Inspect the last number in the array.
floats[-1]

In [None]:
fp = open('floats.bin', 'wb')
# Save the array to a binary file
floats.tofile(fp)
fp.close()
# Create an empty array of doubles
floats2 = array('d')
fp = open('floats.bin', 'rb')
# Read 10 milion numbers from the binary file
floats2.fromfile(fp, 10**7)
fp.close()
# Inspect the last number in the array
floats[-1]
# Verify that the contents of the arrays match
floats2 == floats

As you can see, array.tofile and array.fromfile are easy to use. If you try the example, you’ll notice they are also very fast. A quick experiment show that it takes about 0.1s for array.fromfile to load 10 million double-precision floats from a binary file created with array.tofile. That is nearly 60 times faster than reading the numbers from a text file, which also involves parsing each line with the float built-in. Saving with array.tofile is about 7 times faster than writing one float per line in a text file. In addition, the size of the binary file with 10 million doubles is 80,000,000 bytes (8 bytes per double, zero overhead), while the text file has 181,515,739 bytes, for the same data.

For the specific case of numeric arrays representing binary data, such as raster images, Python has the bytes and bytearray types discussed in Chapter 4.

As of Python 3.4, the array type does not have an in-place sort method like list.sort(). If you need to sort an array, use the sorted function to rebuild it sorted:
a = array.array(a.typecode, sorted(a))
To keep a sorted array sorted while adding items to it, use the bisect.insort function (as seen in “Inserting with bisect.insort”).

If you do a lot of work with arrays and don’t know about memoryview, you’re missing out. See the next topic.

## Memory Views

The built-in memoryview class is a **shared-memory** sequence type that lets you handle slices of arrays without copying bytes. It was inspired by the NumPy library (which we’ll discuss shortly in "NumPy and SciPy”). Travis Oliphant, lead author of NumPy, answers When should a memoryview be used? like this:
*A memoryview is essentially a generalized NumPy array structure in Python itself (without the math). It allows you to share memory between data-structures (things like PIL images, SQLlite databases, NumPy arrays, etc.) without first copying. This is very important for large data sets.*

Using notation similar to the array module, the memoryview.cast method lets you change the way multiple bytes are read or written as units without moving bits around (just like the C cast operator). memoryview.cast returns yet another memoryview object, always sharing the same memory.

See Example 2-21 for an example of changing a single byte of an array of 16-bit integers.

In [None]:
# Example 2-21. Changing the value of an array item by poking one of its bytes
import array
numbers = array.array('h', [-2, -1, 0, 1, 2])
# Build memoryview from array of 5 short signed integers (typecode 'h').
memv = memoryview(numbers)
len(memv)

In [None]:
# memv sees the same 5 items in the array.
memv[0]

In [None]:
# Create memv_oct by casting the elements of memv to typecode 'B' (unsigned char).
memv_oct = memv.cast('B')
# Export elements of memv_oct as a list, for inspection.
memv_oct.tolist()

In [None]:
# Assign value 4 to byte offset 5.
memv_oct[5] = 4
# Note change to numbers: a 4 in the most significant byte of a 2-byte unsigned integer is 1024.
numbers

We’ll see another short example with memoryview in the context of binary sequence manipulations with struct (Chapter 4, Example 4-4).
Meanwhile, if you are doing advanced numeric processing in arrays, you should be using the NumPy and SciPy libraries. We’ll take a brief look at them right away.

## NumPy and SciPy

Throughout this book, I make a point of highlighting what is already in the Python standard library so you can make the most of it. But NumPy and SciPy are so awesome that a detour is warranted.

For advanced array and matrix operations, NumPy and SciPy are the reason why Python became mainstream in scientific computing applications. NumPy implements multi-dimensional, homogeneous arrays and matrix types that hold not only numbers but also user-defined records, and provides efficient elementwise operations.

SciPy is a library, written on top of NumPy, offering many scientific computing algorithms from linear algebra, numerical calculus, and statistics. SciPy is fast and reliable because it leverages the widely used C and Fortran code base from the Netlib Repository. In other words, SciPy gives scientists the best of both worlds: an interactive prompt and high-level Python APIs, together with industrial-strength number-crunching functions optimized in C and Fortran.

As a very brief demo, Example 2-22 shows some basic operations with two-dimensional arrays in NumPy.

In [None]:
# Example 2-22. Basic operations with rows and columns in a numpy.ndarray
import numpy
# Build and inspect a numpy.ndarray with integers 0 to 11.
a = numpy.arange(12)
a

In [None]:
type(a)

In [None]:
# Inspect the dimensions of the array: this is a one-dimensional, 12-element array.
a.shape

In [None]:
# Change the shape of the array, adding one dimension, then inspecting the result.
a.shape = 3, 4
a

In [None]:
# Get row at index 2
a[2]

In [None]:
# Get element at index 2, 1
a[2, 1]

In [None]:
# Get column at index 1
a[:, 1]

In [None]:
# Create a new array by transposing (swapping columns with rows).
a.transpose()

Having looked at flat sequences—standard arrays and NumPy arrays—we now turn to a completely different set of replacements for the plain old list: queues.

## Deques and Other Queues

The .append and .pop methods make a list usable as a stack or a queue (if you use .append and .pop(0), you get FIFO behavior). But inserting and removing from the left of a list (the 0-index end) is costly because the entire list must be shifted.

The class collections.deque is a thread-safe double-ended queue designed for fast inserting and removing from both ends. It is also the way to go if you need to keep a list of “last seen items” or something like that, because a deque can be bounded—i.e., created with a maximum length—and then, when it is full, it discards items from the opposite end when you append new ones. Example 2-23 shows some typical operations performed on a deque.

In [None]:
from collections import deque
# The optional maxlen argument sets the maximum number of items allowed in this instance of deque; this sets a read-only maxlen instance attribute.
dq = deque(range(10), maxlen = 10)
dq

In [None]:
# Rotating with n > 0 takes items from the right end and prepends them to the left; when n < 0 items are taken from left and appended to the right.
dq.rotate(3)
dq

In [None]:
dq.rotate(-4)
dq

In [None]:
# Appending to a deque that is full (len(d) == d.maxlen) discards items from the other end;
# note in the next line that the 0 is dropped
dq.appendleft(-1)
dq

In [None]:
# Adding three items to the right pushes out the leftmost -1, 1, and 2. 
dq.extend([11, 22, 33])
dq

In [None]:
# Note that extendleft(iter) works by appending each successive item of the iter argument to the left of the deque, therefore the final position of the items is reversed. 
dq.extendleft([10, 20, 30, 40])
dq

Note that deque implements most of the list methods, and adds a few specific to its design, like popleft and rotate. But there is a hidden cost: removing items from the middle of a deque is not as fast. It is really optimized for appending and popping from the ends.
The append and popleft operations are atomic, so deque is safe to use as a FIFO queue in multithreaded applications without the need for using locks.

Besides deque, other Python standard library packages implement queues:
* queue: This provides the synchronized (i.e., thread-safe) classes Queue, LifoQueue, and PriorityQueue. These are used for safe communication between threads. All three classes can be bounded by providing a maxsize argument greater than 0 to the constructor. However, they don’t discard items to make room as deque does. Instead, when the queue is full the insertion of a new item blocks—i.e., it waits until some other thread makes room by taking an item from the queue, which is useful to throttle the number of live threads
* multiprocessing: Implements its own bounded Queue, very similar to queue.Queue but designed for interprocess communication. A specialized multiprocessing.JoinableQueue is also available for easier task management.
* asyncio: Newly added to Python 3.4, asyncio provides Queue, LifoQueue, PriorityQueue, and JoinableQueue with APIs inspired by the classes contained in the queue and multiprocessing modules, but adapted for managing tasks in asynchronous programming.
* heapq: In contrast to the previous three modules, heapq does not implement a queue class, but provides functions like heappush and heappop that let you use a mutable sequence as a heap queue or priority queue.

This ends our overview of alternatives to the list type, and also our exploration of sequence types in general—except for the particulars of str and binary sequences, which have their own chapter (Chapter 4). 

# Chapter Summary

Mastering the standard library sequence types is a prerequisite for writing concise, effective, and idiomatic Python code.

Python sequences are often categorized as mutable or immutable, but it is also useful to consider a different axis: flat sequences and container sequences. The former are more compact, faster, and easier to use, but are limited to storing atomic data such as numbers, characters, and bytes. Container sequences are more flexible, but may surprise you when they hold mutable objects, so you need to be careful to use them correctly with nested data structures.

List comprehensions and generator expressions are powerful notations to build and initialize sequences. If you are not yet comfortable with them, take the time to master their basic usage. It is not hard, and soon you will be hooked.

Tuples in Python play two roles: as records with unnamed fields and as immutable lists. When a tuple is used as a record, tuple unpacking is the safest, most readable way of getting at the fields. The new * syntax makes tuple unpacking even better by making it easier to ignore some fields and to deal with optional fields. Named tuples are not so new, but deserve more attention: like tuples, they have very little overhead per instance, yet provide convenient access to the fields by name and a handy ._asdict() to export the record as an OrderedDict.

Sequence slicing is a favorite Python syntax feature, and it is even more powerful than many realize. Multidimensional slicing and ellipsis (...) notation, as used in NumPy, may also be supported by user-defined sequences. Assigning to slices is a very expressive way of editing mutable sequences.

Repeated concatenation as in seq * n is convenient and, with care, can be used to initialize lists of lists containing immutable items. Augmented assignment with += and \*= behaves differently for mutable and immutable sequences. In the latter case, these operators necessarily build new sequences. But if the target sequence is mutable, it is usually changed in place—but not always, depending on how the sequence is implemented.

The sort method and the sorted built-in function are easy to use and flexible, thanks to the key optional argument they accept, with a function to calculate the ordering criterion. By the way, key can also be used with the min and max built-in functions. To keep a sorted sequence in order, always insert items into it using bisect.insort; to search it efficiently, use bisect.bisect.

Beyond lists and tuples, the Python standard library provides array.array. Although NumPy and SciPy are not part of the standard library, if you do any kind of numerical processing on large sets of data, studying even a small part of these libraries can take you a long way.

We closed by visiting the versatile and thread-safe collections.deque, comparing its API with that of list in Table 2-3 (see book) and mentioning other queue implementations in the standard library. 