# An array of sequences

ABC introduced many ideas we now consider “Pythonic”:
-  generic operations on sequences
-  built-in tuple and mapping types
-  structure by indentation
-  strong typing without variable declarations

Objects whose value can change are said to be mutable (M); objects whose value is unchangeable once they are created are called immutable (IM). Mutable objects have methods such as pop, append, extend, remove, __setitem__, __delitem__. Immutable objects have __getitem__, __contains__, index and count.

The standard library offers a rich selection of sequence types implemented in C. 

-  Container sequences hold references to the objects they contain, which may be of any type
    -  list (M)
    -  tuple (IM)
    -  collections.deque (M)
-  Flat sequences physically store the value of each item within its own memory space, and not as distinct objects. flat sequences are more compact, but they are limited to holding primitive values like characters, bytes and numbers.
    -  str (IM)
    -  bytes (IM)
    -  bytearray (M)
    -  memoryview (M)
    -  array.array (M)
    
## List comprehensions and generator expressions

List comprehension is an elegant way to define and create list in Python, implementing a well-known notation for sets as used by mathematicians.

### For loop vs listcomps

A for loop may be used to do lots of different things: scanning a sequence to count or pick items, computing aggregates (sums, averages), or any number of other processing. The code in Example 2-1 is building up a list. In contrast, a listcomp is meant to do one thing only: to build a new list.

In [26]:
sentence = "the quick brown fox jumps over the lazy dog"
words = sentence.split()
word_lengths = []

#using a for loop
for word in words:
      if word != "the":
          word_lengths.append(len(word))
#note 3 is missing from the start of the list
print(word_lengths)

#using list comprehension
word_lengths_comp = []
word_lengths_comp = [len(word) for word in words if word != "the"]
print(word_lengths_comp)

[5, 5, 3, 5, 4, 4, 3]
[5, 5, 3, 5, 4, 4, 3]


### Listcomps versus map and filter

Use map() or filter() for expressions that are too long or complicated to express with a list comprehension. If you already have a function defined, it is often reasonable to use map, though it is considered 'unpythonic'. 

In [27]:
symbols = '$¢£¥€¤'
beyond_ascii_listcomp = [ord(s) for s in symbols if ord(s) > 127]
print(beyond_ascii_listcomp)
beyond_ascii_map = list(filter(lambda c: c > 127, map(ord, symbols)))
print(beyond_ascii_map)

[162, 163, 165, 8364, 164]
[162, 163, 165, 8364, 164]


### Cartesian products

Listcomps can generate lists from the cartesian product of two or more iterables. The items that make up the cartesian product are tuples made from items from every input iterable. The resulting list has a length equal to the lengths of the input iterables mul‐ tiplied.

colors = ['black', 'white']

sizes = ['S', 'M', 'L']

tshirts = [(color, size) for color in colors for size in sizes] 
tshirts
for color in colors:
    for size in sizes:
        print((color, size))
tshirts = [(color, size) for size in sizes for color in colors]
print(tshirts)

### Generator expressions

To initialize tuples, arrays and other types of sequences, you could also start from a listcomp but a genexp saves memory because it yields items one by one using the iterator protocol instead of building a whole list just to feed another constructor. If two lists used in the cartesian product had a thousand items each, using a generator expression would save the expense of building a list with a million items just to feed the for loop.

In [28]:
symbols = '$¢£¥€¤'
print(tuple(ord(symbol) for symbol in symbols))
import array
array.array('I', (ord(symbol) for symbol in symbols))

(36, 162, 163, 165, 8364, 164)


array('I', [36, 162, 163, 165, 8364, 164])

## Tuples are not just immutable lists

Tuples do double-duty: they can be used as immutable lists and also as records with no field names.

### Tuples as records

In [29]:
city, year, pop, chg, area = ('Tokyo', 2003, 32450, 0.66, 8014)
traveler_ids = [('USA', '31195855'), ('BRA', 'CE342567'), ('ESP', 'XDA205856')]
for passport in sorted(traveler_ids):
    print('%s/%s' % passport)
for country, _ in traveler_ids:
    print(country)
lax_coordinates = (33.9425, -118.408056)
# tuple unpacking
latitude, longitude = lax_coordinates 
print(latitude)
print(longitude)
import os
#os.path.split() function builds a tuple (path, last_part) from a filesystem path
_, filename = os.path.split('/home/luciano/.ssh/idrsa.pub')
print(filename)

BRA/CE342567
ESP/XDA205856
USA/31195855
USA
BRA
ESP
33.9425
-118.408056
idrsa.pub


### Using * to grab excess items


In [30]:
a, b, *rest = range(5)
print(a, b, rest)
a, b, *rest = range(3)
print(a, b, rest)
a, b, *rest = range(2)
print(a, b, rest)

0 1 [2, 3, 4]
0 1 [2]
0 1 []


### Nested tuple unpacking

The tuple to receive an expression to unpack can have nested tuples, like (a, b, (c, d)) and Python will do the right thing if the expression matches the nesting structure.

In [31]:
metro_areas = [('Tokyo','JP',36.933,(35.689722,139.691667)), 
               ('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)), 
               ('Mexico City', 'MX', 20.142, (19.433333, -99.133333)), 
               ('New York-Newark', 'US', 20.104, (40.808611, -74.020386)), 
               ('Sao Paulo', 'BR', 19.649, (-23.547778, -46.635833)),]
#defining colum widths
print('{:15} | {:^9} | {:^9}'.format('', 'lat.', 'long.'))
#setting the numner of digits
fmt = '{:15} | {:9.4f} | {:9.4f}'
for name, cc, pop, (latitude, longitude) in metro_areas: #
    if longitude <= 0: #
        print(fmt.format(name, latitude, longitude))

                |   lat.    |   long.  
Mexico City     |   19.4333 |  -99.1333
New York-Newark |   40.8086 |  -74.0204
Sao Paulo       |  -23.5478 |  -46.6358


### Named tuples

In [35]:
from collections import namedtuple
City = namedtuple('City', 'name country population coordinates')
tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))
print(tokyo)
print(tokyo.coordinates)
print(tokyo[1])

#A named tuple type has a few attributes in addition to those inherited from tuple. 
#the _fields class attribute, 
#the class method _make(iterable)
#the _asdict() instance method.

City._fields #_fields
LatLong = namedtuple('LatLong', 'lat long')
delhi_data = ('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889)) 
delhi = City._make(delhi_data) #_make()
delhi._asdict() #_asdict()
for key, value in delhi._asdict().items():
    print(key + ':', value)

City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))
(35.689722, 139.691667)
JP
name: Delhi NCR
country: IN
population: 21.935
coordinates: LatLong(lat=28.613889, long=77.208889)


It is also possible to use tuples as immutable lists

## Slicing

In [39]:
l=[10,20,30,40,50,60]
print(l[:2])
print(l[2:])
print(l[:3])
print(l[3:])
s = 'bicycle'
#slice every 3rd letter
print(s[::3])
#slice every -1 letter (i.e. from the end backwards - reversing the word)
print(s[::-1])
#slice every other letter from end to beginning
print(s[::-2])

[10, 20]
[30, 40, 50, 60]
[10, 20, 30]
[40, 50, 60]
bye
elcycib
eccb


Slicing flat-file data:

In [42]:
invoice ="""
0.....6.................................40........52...55........
1909  Pimoroni PiBrella                     $17.50    3    $52.50
1489  6mm Tactile Switch x20                $4.95     2     $9.90
1510  Panavise Jr. - PV-201                 $28.00    1    $28.00
1601  PiTFT Mini Kit 320x240                $34.95    1    $34.95
"""
SKU = slice(0, 6)
DESCRIPTION = slice(6, 40)
UNIT_PRICE = slice(40, 52)
QUANTITY = slice(52, 55)
ITEM_TOTAL = slice(55, None)
line_items = invoice.split('\n')[2:]
for item in line_items:
    print(item[UNIT_PRICE], item[DESCRIPTION])

    $17.50   Pimoroni PiBrella                 
    $4.95    6mm Tactile Switch x20            
    $28.00   Panavise Jr. - PV-201             
    $34.95   PiTFT Mini Kit 320x240            
 


### Multi-dimensional slicing and ellipsis

The [] operator can also take multiple indexes or slices separated by commas.

The ellipsis — written with three full stops ... and not Unicode U+2026 — is recognized as a token by the Python parser.

In [43]:
l = list(range(10))
l[2:5] = [20, 30]
print(l)
del l[5:7]
print(l)
l[3::2] = [11, 22]
print(l)
#Using + and * with sequences
l=[1,2,3]
print(l*5)
print(5 * 'abcd')

[0, 1, 20, 30, 5, 6, 7, 8, 9]
[0, 1, 20, 30, 5, 8, 9]
[0, 1, 20, 11, 5, 22, 9]
[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
abcdabcdabcdabcdabcd


### Building lists of lists
Sometimes we need to initialize a list with a certain number of nested lists, for example, to distribute students in a list of teams or to represent squares on a game board. The best way of doing so is with a list comprehension, like this:

In [49]:
#using listcomp
board = [['_'] * 3 for i in range(3)]
board[1][2] = 'X'
print(board)
# Board that doesn't work
weird_board = [['_'] * 3] * 3
weird_board[1][2] = 'O'
print(weird_board)
# Using for
board = []
for i in range(3):
    row=['_']*3 #
    board.append(row)
board[2][0] = 'X'
print(board)

[['_', '_', '_'], ['_', '_', 'X'], ['_', '_', '_']]
[['_', '_', 'O'], ['_', '_', 'O'], ['_', '_', 'O']]
[['_', '_', '_'], ['_', '_', '_'], ['X', '_', '_']]


## Augmented assignment with sequences

The augmented assignment operators += and *= behave very differently depending on the first operand. For example, the special method that makes += work is __iadd__ (for “in-place addition”). However, if __iadd__ is not implemented, Python falls back to calling __add__. Consider this simple expression:

In [None]:
a+=b
#for mutable equivalent to:
a.extend(b) # original object modified
#for immutable equivalent to:
a = a + b # new object created

In [51]:
#list i.e. mutable
l=[1,2,3]
print(l)
#original ID
print(id(l))
l*=2
# original is modified - i.e. inplace operand
print(l)
print(id(l))
#using a tuple - i.e. immutable
t=(1,2,3)
#original ID
print(id(t))
#operating
t *=2
#note new ID
print(id(t))

[1, 2, 3]
4409327496
[1, 2, 3, 1, 2, 3]
4409327496
4408207184
4408247304


## list.sort and the sorted built-in function

The list.sort method sorts a list in-place, that is, without making a copy. It returns None to remind us that it changes the target object, and does not create a new list. In contrast, the built-in function sorted creates a new list and returns it.

In [54]:
fruits = ['grape', 'raspberry', 'apple', 'banana']
#alphabetised new list
print(sorted(fruits))
#original
print(fruits)
#backwards new list
print(sorted(fruits, reverse=True))
#sorted by length new list
print(sorted(fruits, key=len))
#backwards by length new list
print(sorted(fruits, key=len, reverse=True))
#original again
print(fruits)
# sorting in place
fruits.sort()
#original is now sorted
print(fruits)

['apple', 'banana', 'grape', 'raspberry']
['grape', 'raspberry', 'apple', 'banana']
['raspberry', 'grape', 'banana', 'apple']
['grape', 'apple', 'banana', 'raspberry']
['raspberry', 'banana', 'grape', 'apple']
['grape', 'raspberry', 'apple', 'banana']
['apple', 'banana', 'grape', 'raspberry']


### Searching with bisect



In [61]:
import bisect
import sys

HAYSTACK = [1, 4, 5, 6, 8, 12, 15, 20, 21, 23, 23, 26, 29, 30]
NEEDLES = [0, 1, 2, 5, 8, 10, 22, 23, 29, 30, 31]

ROW_FMT = '{0:2d} @ {1:2d}    {2}{0:<2d}'

def demo(bisect_fn):
    for needle in reversed(NEEDLES):
        #Use the chosen bisect function to get the insertion point.
        position = bisect_fn(HAYSTACK, needle)
        #Build a pattern of vertical bars proportional to the offset.
        offset = position * '  |' 
        #Print formatted row showing needle and insertion point.
        print(ROW_FMT.format(needle, position, offset))
        
if __name__ == '__main__':
    
    #Choose the bisect function to use according to the last command line argument.
    if sys.argv[-1] == 'left':
        bisect_fn = bisect.bisect_left 
    else:
        bisect_fn = bisect.bisect
    
    #Print header with name of function selected.
    print('DEMO:', bisect_fn.__name__)
    print('haystack ->', ' '.join('%2d' % n for n in HAYSTACK)) 
    demo(bisect_fn)

# Given a test score, grade returns the corresponding letter grade.
def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
    i = bisect.bisect(breakpoints, score)
    return grades[i]
[grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]

DEMO: bisect
haystack ->  1  4  5  6  8 12 15 20 21 23 23 26 29 30
31 @ 14      |  |  |  |  |  |  |  |  |  |  |  |  |  |31
30 @ 14      |  |  |  |  |  |  |  |  |  |  |  |  |  |30
29 @ 13      |  |  |  |  |  |  |  |  |  |  |  |  |29
23 @ 11      |  |  |  |  |  |  |  |  |  |  |23
22 @  9      |  |  |  |  |  |  |  |  |22
10 @  5      |  |  |  |  |10
 8 @  5      |  |  |  |  |8 
 5 @  3      |  |  |5 
 2 @  1      |2 
 1 @  1      |1 
 0 @  0    0 


['F', 'A', 'C', 'C', 'B', 'A', 'A']

In [65]:
import bisect
import random
SIZE=14

random.seed(1729)
my_list = []
for i in range(SIZE):
    #create a new number from range SIZE * 2
    new_item = random.randrange(SIZE*2)
    # insert into list, in correct order
    bisect.insort(my_list, new_item)
    print('%2d ->' % new_item, my_list)

20 -> [20]
 1 -> [1, 20]
13 -> [1, 13, 20]
16 -> [1, 13, 16, 20]
14 -> [1, 13, 14, 16, 20]
 5 -> [1, 5, 13, 14, 16, 20]
21 -> [1, 5, 13, 14, 16, 20, 21]
 1 -> [1, 1, 5, 13, 14, 16, 20, 21]
 5 -> [1, 1, 5, 5, 13, 14, 16, 20, 21]
27 -> [1, 1, 5, 5, 13, 14, 16, 20, 21, 27]
 9 -> [1, 1, 5, 5, 9, 13, 14, 16, 20, 21, 27]
16 -> [1, 1, 5, 5, 9, 13, 14, 16, 16, 20, 21, 27]
11 -> [1, 1, 5, 5, 9, 11, 13, 14, 16, 16, 20, 21, 27]
27 -> [1, 1, 5, 5, 9, 11, 13, 14, 16, 16, 20, 21, 27, 27]


## When a list is not the answer
The list type is flexible and easy to use, but depending on specific requirements there are better options. 

For example, if you need to store 10 million of floating point values an **array** is much more efficient, because an array does not actually hold full-fledged float objects, but only the packed bytes representing their machine values — just like an array in the C language. 

On the other hand, if you are constantly adding and removing items from the ends of a list as a FIFO or LIFO data structure, a **deque** (double-ended queue) works faster.

### Arrays

If all you want to put in the list are numbers, an array.array is more efficient than a list: it supports all mutable sequence operations (including .pop, .insert and .ex tend), and additional methods for fast loading and saving such as .frombytes and .tofile.

Saving with array.tofile is about 7 times faster than writing one float per line in a text file. In addition, the size of the binary file with 10 million doubles is 80,000,000 bytes (8 bytes per double, zero overhead), while the text file has 181,515,739 bytes, for the same data.

In [66]:
from array import array
from random import random
# create an array of double-precision floats (typecode 'd') from any iterable
object, in this case a generator expression;
floats = array('d', (random() for i in range(10**7)))
#inspect the last number in the array;
print(floats[-1])
fp = open('floats.bin', 'wb')
#save the array to a binary file;
floats.tofile(fp)
fp.close()
floats2 = array('d')
fp = open('floats.bin', 'rb')
#create an empty array of doubles;
floats2.fromfile(fp, 10**7)
fp.close()
#inspect the last number in the array;
print(floats2[-1])
#check floats match
print(floats2 == floats)

0.5243459839714878
0.5243459839714878
True


## Memory views

The built-in memorview class is a shared-memory sequence type that lets you handle slices of arrays without copying bytes. A memoryview is essentially a generalized NumPy array structure in Python itself (without the math). It allows you to share memory between data-structures (things like PIL images, SQLlite databases, NumPy arrays, etc.) without first copying. This is very important for large data sets.

In [76]:
from array import array

numbers = array('h', [-2, -1, 0, 1, 2])
memv = memoryview(numbers)
print(len(memv))
print(memv[0])
memv_oct = memv.cast('B')
print(memv_oct.tolist())
print(memv_oct[5])
memv_oct[5] = 4
print(numbers)

5
-2
[254, 255, 255, 255, 0, 0, 1, 0, 2, 0]
0
<memory at 0x106d1b048>
array('h', [-2, -1, 1024, 1, 2])


In [77]:
import numpy
a = numpy.arange(12)
print(a) 
type(a)
print(a.shape)
a.shape = 3, 4
print(a)
print(a[2])
print(a[2,1])
print(a[:, 1])
print(a.transpose())

[ 0  1  2  3  4  5  6  7  8  9 10 11]
(12,)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[ 8  9 10 11]
9
[1 5 9]
[[ 0  4  8]
 [ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]]


## Deques and other queues

The .append and .pop methods make a list usable as a stack or a queue (if you use .append and .pop(0), you get LIFO behavior). But inserting and removing from the left of a list (the 0-index end) is costly because the entire list must be shifted.

The class collections.deque is a thread-safe double-ended queue designed for fast inserting and removing from both ends. It is also the way to go if you need to keep a list of “last seen items” or something like that, because a deque can be bounded — i.e. created with a maximum length and then, when it is full, it discards items from the opposite end when you append new ones.

In [78]:
from collections import deque
dq = deque(range(10), maxlen=10)
print(dq)
#shift the list by 3
dq.rotate(3)
print(dq)
# move the list back 4
dq.rotate(-4)
print(dq)
# add -1 to the left of the list
dq.appendleft(-1)
print(dq)
# add values to the end
dq.extend([11, 22, 33])
print(dq)
# extend by values at the beginning
dq.extendleft([10, 20, 30, 40])
print(dq)

deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)
deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6], maxlen=10)
deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 0], maxlen=10)
deque([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)
deque([3, 4, 5, 6, 7, 8, 9, 11, 22, 33], maxlen=10)
deque([40, 30, 20, 10, 3, 4, 5, 6, 7, 8], maxlen=10)


Besides deque, other Python Standard Library packages implement queues:

-  **queue**: Provides the synchronized (i.e. thread-safe) classes Queue, LifoQueue and Priori tyQueue. These are used for safe communication between threads. All three classes can be bounded by providing a maxsize argument greater than 0 to the constructor. However, they don’t discard items to make room as deque does. Instead, when the queue is full the insertion of a new item blocks — i.e. it waits until some other thread makes room by taking an item from the queue, which is useful to throttle the num‐ ber of live threads.
-  **multiprocessing**: Implements its own bounded Queue, very similar to queue.Queue but designed for inter-process communication. There is also has a specialized multiprocess ing.JoinableQueue for easier task management.
-  **asyncio**: Newly added to Python 3.4, asyncio provides Queue, LifoQueue, PriorityQueue and JoinableQueue with APIs inspired by the classes in queue and multiprocess ing, but adapted for managing tasks in asynchronous programming.
-  **heapq**: In contrast to the previous three modules, heapq does not implement a queue class, but provides functions like heappush and heappop that let you use a mutable se‐ quence as a heap queue or priority queue