# Chapter 2 An array of sequences

## Overview of built-in sequences

- Container vs. flat
  - Container sequences, 
    - list, tuple and collections.deque can hold items of different types
    - hold references to the objects they contain
  - Flat sequences,
    - str, bytes, bytearray, memoryview and array.array hold items of one type.
    - physically store the value of each item within its own memory space
    
- Mutable vs. Immutable
  - Mutable sequences
    - list, bytearray, array.array, collections.deque and memoryview
  - Immutable sequences
    - tuple, str and bytes

## List comprehensions (list-comps) and generator expressions (genexps)

- List-comps
  - Only meant to build a new list
  - No longer leaking the value
  - Listcomps do everything the map and filter functions do, without the contortions of the functionally challenged Python lambda
- Genexps (Generator expressions)
  -  A genexp saves memory because it yields items one by one using the iterator protocol instead of building a whole list just to feed another constructor.
 

In [1]:
dummy = [x for x in 'alksdjf']

In [2]:
x

NameError: name 'x' is not defined

^ See, no longer leaking the value. List comprehensions, generator expressions and their siblings set and dict comprehensions now have their own local scope, like functions

In [3]:
x = 'ABC'

In [7]:
dummy = [ord(x) for x in x]
print(dummy)

[65, 66, 67]


In [8]:
# the local variables do not mask the variables from the surrounding scope
print(x)

ABC


In [1]:
symbols = '$¢£¥€¤'

In [2]:
%%time
i = 0
while i <= 1000000:
    beyond_ascii = [ord(s) for s in symbols if ord(s) > 127]
    i += 1

CPU times: user 2.52 s, sys: 21.1 ms, total: 2.54 s
Wall time: 2.55 s


In [3]:
%%time
i = 0
while i <= 1000000:
    beyond_ascii = list(filter(lambda c: c > 127, map(ord, symbols)))
    i += 1

CPU times: user 3.04 s, sys: 25.3 ms, total: 3.07 s
Wall time: 3.09 s


Hard to say which one is fater

In [6]:
# Cartesian products
colors = ['black', 'white']
sizes = ['s', 'm', 'xxxxxl']
tshirts = [(color, size) for color in colors for size in sizes]
tshirts

[('black', 's'),
 ('black', 'm'),
 ('black', 'xxxxxl'),
 ('white', 's'),
 ('white', 'm'),
 ('white', 'xxxxxl')]

In [7]:
symbols = '$¢£¥€¤'
tuple(ord(symbol) for symbol in symbols)

(36, 162, 163, 165, 8364, 164)

In [8]:
import array
array.array('I', (ord(symbol) for symbol in symbols))

array('I', [36, 162, 163, 165, 8364, 164])

^ genexps can generate different types of constructor rather than just a list

In [10]:
for tshirt in ('%s %s' % (c, s) for c in colors for s in sizes):
    print(tshirt)

black s
black m
black xxxxxl
white s
white m
white xxxxxl


^ genexp yields items one by one; a list with all 6 t-shirt variations is never produced

## Tuples are not just immutable lists
- Tuples do double-duty:
    - can be used as immutable lists
    - also as records with no field names, each item in the tuple holds the data for one field and the position of the item gives its meaning
- Tuple unpacking
    - Swapping the values of variables without using a temporary variable
    - Prefixing an argument with a star when calling a function
    - Parallel assignment, the * prefix can be applied to exactly one variable, but it can appear in any position
- Nested tuple unpacking
- Named tuple
    - `collections.namedtuple` function is a factory that produces subclasses of tuple enhanced with field names and a class name
        - _fields is a tuple with the field names of the class
        - _make() lets you instantiate a named tuple from an iterable
        - _asdict() returns a collections.OrderedDict built from the named tuple instance. That can be used to produce a nice display of city data
- Tuples as immutable lists
        

In [15]:
traveler_ids = [('USA', '31195855'), ('BRA', 'CE342567'), ('ESP', 'XDA205856')]
for passport in sorted(traveler_ids):
#     print('%s/%s' % passport)
    print('/'.join(passport))

BRA/CE342567
ESP/XDA205856
USA/31195855


^ Unpacking!!!

In [16]:
# swapping the values of variables without using a temporary variable
a = 1
b = 2
print(('a', a))
print(('b', b))
b, a = a, b
print(('a', a))
print(('b', b))

('a', 1)
('b', 2)
('a', 2)
('b', 1)


In [17]:
# prefixing an argument with a star when calling a function
t = (20, 7)
divmod(*t)

(2, 6)

In [20]:
a, b, *rest = range(5)
print((a, b, rest))

(0, 1, [2, 3, 4])


In [21]:
a, b, *rest = range(2)
print((a, b, rest))

(0, 1, [])


In [23]:
a, *middle, b = range(6)
print((a, middle, b))

(0, [1, 2, 3, 4], 5)


^ parallel assignment, the * prefix can be applied to exactly one variable, but it can appear in any position

In [24]:
metro_areas = [
    ('Tokyo','JP',36.933,(35.689722,139.691667)),
    ('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
    ('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),
    ('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),
    ('Sao Paulo', 'BR', 19.649, (-23.547778, -46.635833)),
]

In [25]:
print('{:15} | {:^9} | {:^9}'.format('', 'lat.', 'long.'))
fmt = '{:15} | {:9.4f} | {:9.4f}'
for name, cc, pop, (latitude, longitude) in metro_areas:
    if longitude <= 0:
        print(fmt.format(name, latitude, longitude))

                |   lat.    |   long.  
Mexico City     |   19.4333 |  -99.1333
New York-Newark |   40.8086 |  -74.0204
Sao Paulo       |  -23.5478 |  -46.6358


^ Nested unpacking

In [1]:
from collections import namedtuple
City = namedtuple('City', 'name country population coordinates')
tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))
tokyo

City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))

In [2]:
tokyo.population

36.933

In [3]:
tokyo.coordinates

(35.689722, 139.691667)

In [4]:
tokyo[1]

'JP'

^ Named tuple

In [6]:
City._fields

('name', 'country', 'population', 'coordinates')

In [13]:
LatLong = namedtuple('LatLong', 'lat long')
delhi = City('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889))

In [14]:
delhi.coordinates.lat

28.613889

^ nested named tuple!

In [16]:
# _make() lets you instantiate a named tuple from an iterable
delhi_data = ('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889))
delhi = City._make(delhi_data)
delhi

City(name='Delhi NCR', country='IN', population=21.935, coordinates=LatLong(lat=28.613889, long=77.208889))

In [18]:
# _asdict() returns a collections.OrderedDict built from the named tuple instance
delhi._asdict()

OrderedDict([('name', 'Delhi NCR'),
             ('country', 'IN'),
             ('population', 21.935),
             ('coordinates', LatLong(lat=28.613889, long=77.208889))])

## Slicing
- Slice objects
    - `s[a:b:c]` aka `seq[start:stop:step]` can be used to specify a stride or step `c`, causing the resulting slice to skip items. The stride can also be negative, returning items in reverse
    - Instead of filling your code with hard-coded slices, you can name them with `slice()` function
- Assign to slices
    - Mutable sequences can be grafted, excised and otherwise modified in-place using slice notation on the left side of an assignment statement or as the target of a del statement.

In [23]:
s = 'iloveyou'
s[1:6:2]

'lvy'

In [24]:
s[::-1]

'uoyevoli'

In [25]:
test_slice = slice(2, 5)
for i in ['alsdkjf', '10948aiuf', 'oizucvxjnar']:
    print(i[test_slice])

sdk
948
zuc


In [34]:
l = list(range(10))

In [35]:
del l[5:7]
l

[0, 1, 2, 3, 4, 7, 8, 9]

In [36]:
l[3::2] = [11, 22, 33]
l

[0, 1, 2, 11, 4, 22, 8, 33]

In [37]:
l[2:5] = 100

TypeError: can only assign an iterable

In [38]:
l[2:5] = [100]
l

[0, 1, 100, 22, 8, 33]

^ When the target of the assignment is a slice, the right-hand side must be an iterable object, even if it has just one item

## Using + and * with sequences
- Building lists of lists
    - The best way of initialize a list with a certain number of nested lists is with a list comprihension

In [40]:
5 * 'abcd'

'abcdabcdabcdabcdabcd'

In [41]:
5 * [1,2,3]

[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]

In [42]:
[1,2,3] * 5

[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]

In [48]:
my_list = [[]]
my_list

[[]]

In [49]:
my_list * 3

[[], [], []]

In [52]:
board = [['_']*3 for i in range(3)]
board

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

In [54]:
board[0][2] = 'X'
board

[['_', '_', 'X'], ['_', '_', '_'], ['_', '_', '_']]

In [57]:
weird_board = [['_'] * 3] * 3
weird_board

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

In [59]:
weird_board[0][2] = 'X'
weird_board

[['_', '_', 'X'], ['_', '_', 'X'], ['_', '_', 'X']]

^ So weird... Because the outer list is made of three references to the same inner list.

## Augmented assignment with sequences
The special method that makes += work is `__iadd__` (for “in-place addition”). However, if `__iadd__` is not implemented, Python falls back to calling `__add__`. 
- In general, for mutable sequences it is a good bet that __iadd__ is implemented and that += happens in-place. For immutable sequences, clearly there is no way for that to happen.
- Repeated concatenation of immutable sequences is inefficient, because instead of just appending new items, the interpreter has to copy the whole target sequence to create a new one with the new items concatenated5.
- Putting mutable items in tuples is not a good idea.

In [60]:
l = [1,2,3]
print(id(l))
l *= 2
print(id(l))

4581127880
4581127880


In [61]:
l = (1,2,3)
print(id(l))
l *= 2
print(id(l))

4581110480
4581046312


^ What would happen to immutable sequence

## list.sort and the sorted built-in function
- The `list.sort` method sorts a list in-place, that is, without making a copy.
- In contrast, the built-in function `sorted` creates a new list and returns it. In fact, `sorted` accepts any iterable object as argument, including immutable sequences and generators


In [6]:
fruits = ['apple', 'Orange', 'grape', 'pear', 'kiwi']

In [7]:
sorted(fruits)

['Orange', 'apple', 'grape', 'kiwi', 'pear']

In [9]:
sorted(fruits, key=len)

['pear', 'kiwi', 'apple', 'grape', 'Orange']

In [8]:
sorted(fruits, key=str.lower)

['apple', 'grape', 'kiwi', 'Orange', 'pear']

## Managing ordered sequences with `bisect`
- `bisect` and `insort` use the binary search algorithm to quickly find and insert items in any sorted sequence.
- A pair of optional arguments `lo` and `hi` allow narrowing the region in the sequence to be searched when inserting.

In [12]:
import bisect
bisect

<module 'bisect' from '/Users/evelynfu/.local/share/virtualenvs/citadel-F5Bi0BjY/lib/python3.7/bisect.py'>

In [18]:
sorted_fruits = sorted(fruits, key=str.lower)

In [24]:
bisect.bisect(sorted_fruits, 'avacado')

1

In [26]:
bisect.insort(sorted_fruits, 'avacado')

In [27]:
sorted_fruits

['apple', 'avacado', 'grape', 'kiwi', 'Orange', 'pear']

## When a list is not the answer
- `array` could be better option than `list` when holding millions of floating point values
    - `array.tofile` and `array.fromfile` are easy to use. If you try the example, you’ll notice they are also very fast
    - If you need to sort an array, use the sorted function to rebuild it sorted: `a = array.array(a.typecode, sorted(a))`
- `deque` (double-ended queue) could be a better choice when you are constantly adding and removing items
- `set` is optimized for fast membership checking (`if a in my_collection`)
- A `memoryview` is essentially a generalized NumPy array structure in Python itself (without the math). It allows you to share memory between data-structures (things like PIL images, SQLlite databases, NumPy arrays, etc.) without first copying

In [28]:
import array

In [39]:
a = array.array('i', (1,2,3,4))
b = array.array('i', (10,))

print(a + b)
a += b
print(a)

array('i', [1, 2, 3, 4, 10])
array('i', [1, 2, 3, 4, 10])


In [46]:
d = ['hi', 'asd', 'hi', 'wo', 'yay', 'yay', 'yay']

In [47]:
count_result = {}
for i in d:
    if i not in count_result:
        count_result[i] = d.count(i)
print(count_result)

{'hi': 2, 'asd': 1, 'wo': 1, 'yay': 3}


In [52]:
a.extend(b)
a

array('i', [1, 2, 3, 4, 10, 10, 10])

In [1]:
from collections import deque

In [2]:
dq = deque(range(10), maxlen=10)

In [3]:
dq

deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [4]:
dq.extend([10, 11, 12])

In [5]:
dq

deque([3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

In [6]:
dq.extendleft([1,2,3])

In [7]:
dq

deque([3, 2, 1, 3, 4, 5, 6, 7, 8, 9])

In [8]:
dq = deque(range(10))
print(dq)
dq.extendleft([1,2,3])
print(dq)

deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
deque([3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


In [9]:
dq.appendleft([1,2,3])

In [10]:
dq

deque([[1, 2, 3], 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [12]:
dq.appendleft((1,2,3))
print(dq)

deque([(1, 2, 3), [1, 2, 3], 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


[Why numbering should start at zero](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html) by Prof. Dijkstra. Very interesting piece!