# Advanced Data Types

We can get a lot done with the builtin data types `int`, `float`, `str`, `list`, `dict`, and `set`. 

This module will introduce the `collections` standard library module, which provides us with the additional types

- `OrderedDict`
- `defaultdict`
- `deque`
- `namedtuple`
- `Counter`

# The `collections` module

## Ordered Dictionaries

In [None]:
%%python2
# dictionaries did not retain insertion order prior to Python 3.6
d = {}
d['one'] = 3
d['two'] = 6
d['three'] = 0
for k, v in d.items():
    print k, v

In [None]:
%%python3
d = {}
d['one'] = 3
d['two'] = 6
d['three'] = 0
for k, v in d.items():
    print(k, v)

In [None]:
%%python2 
from collections import OrderedDict
d = OrderedDict()
d['one'] = 3
d['two'] = 6
d['three'] = 0
for k, v in d.items():
    print k, v

In [None]:
# Python 3.6 dicts retain insertion order by default
d = {}
d['one'] = 3
d['two'] = 6
d['three'] = 0
d

In [None]:
for k, v in d.items():
    print(k, v)

In [None]:
from collections import OrderedDict
d = OrderedDict()
d['one'] = 3
d['two'] = 6
d['three'] = 0


In [None]:
# destructively iterate over a dict
while d:
    print(d.popitem())
d

In [None]:
# OrderedDict less useful in Python 3.6, but it does have a
# new method...
from collections import OrderedDict
d = OrderedDict()
d['one'] = 3
d['two'] = 6
d['three'] = 0
print(d)

In [None]:
d.move_to_end('one')
print(d)

In [None]:
d.move_to_end('three', False)
print(d)

In [None]:
d.move_to_end?

In [None]:
# bisect module is used for binary search

In [1]:
import bisect, random
values = [random.random() for x in range(5)]
values

[0.5238816573180048,
 0.5402294647554153,
 0.11943930486692877,
 0.4828512911590215,
 0.9607712672763543]

In [2]:
values.sort()
values

[0.11943930486692877,
 0.4828512911590215,
 0.5238816573180048,
 0.5402294647554153,
 0.9607712672763543]

In [3]:
bisect.bisect(values, 0.5)

2

In [4]:
value = values[2]

In [5]:
bisect.bisect(values, value)

3

In [6]:
bisect.bisect_left(values, value)

2

# The `collections` module: Default Dictionaries

In [7]:
d = {}
d.get('a', 5)

5

In [8]:
print(d.get('a'))

None


In [9]:
d

{}

In [10]:
d.setdefault('b', 10)

10

In [11]:
d

{'b': 10}

## Default Dictionaries
* suppose we need a default value for any key which does not exist in the dictionary
 * we can use the __`get()`__ function, or __`setdefault()`__ (or an exception handler), or we can use a `Default Dictionary`

In [13]:
s = 'thequickbrownfoxjumpsoverthelazydog'
result = {}
for ch in s:
    lst = result.setdefault(ch, [])
    lst.append(ch)
result


{'t': ['t', 't'],
 'h': ['h', 'h'],
 'e': ['e', 'e', 'e'],
 'q': ['q'],
 'u': ['u', 'u'],
 'i': ['i'],
 'c': ['c'],
 'k': ['k'],
 'b': ['b'],
 'r': ['r', 'r'],
 'o': ['o', 'o', 'o', 'o'],
 'w': ['w'],
 'n': ['n'],
 'f': ['f'],
 'x': ['x'],
 'j': ['j'],
 'm': ['m'],
 'p': ['p'],
 's': ['s'],
 'v': ['v'],
 'l': ['l'],
 'a': ['a'],
 'z': ['z'],
 'y': ['y'],
 'd': ['d'],
 'g': ['g']}

In [14]:
def mydefault():
    return 42

from collections import defaultdict

dd = defaultdict(mydefault)
dd['foo']

42

In [15]:
dd

defaultdict(<function __main__.mydefault()>, {'foo': 42})

In [16]:
list()

[]

In [17]:
s = 'thequickbrownfoxjumpsoverthelazydog'
result = defaultdict(list)
for ch in s:
    result[ch].append(ch)
result


defaultdict(list,
            {'t': ['t', 't'],
             'h': ['h', 'h'],
             'e': ['e', 'e', 'e'],
             'q': ['q'],
             'u': ['u', 'u'],
             'i': ['i'],
             'c': ['c'],
             'k': ['k'],
             'b': ['b'],
             'r': ['r', 'r'],
             'o': ['o', 'o', 'o', 'o'],
             'w': ['w'],
             'n': ['n'],
             'f': ['f'],
             'x': ['x'],
             'j': ['j'],
             'm': ['m'],
             'p': ['p'],
             's': ['s'],
             'v': ['v'],
             'l': ['l'],
             'a': ['a'],
             'z': ['z'],
             'y': ['y'],
             'd': ['d'],
             'g': ['g']})

# The `collections` module: Deque

# Deque
* double ended queue
* pronounced "deck"

Aside: definition the word `queue`:

The letter q, followed by 4 letters patiently waiting their turn

In [18]:
from collections import deque
dq = deque(range(10), maxlen=10) # maxlen is optional
dq


deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [19]:
dq.rotate(3) # +n takes items from right, prepends to left, vice versa for -n
dq


deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6])

In [20]:
dq.rotate(-4)
dq


deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 0])

In [21]:
dq.appendleft('a') # appending to full deque discards item(s) from other end
dq


deque(['a', 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [22]:
dq.append('end')

In [23]:
dq

deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 'end'])

In [24]:
dq.extend('bcd')
dq

deque([4, 5, 6, 7, 8, 9, 'end', 'b', 'c', 'd'])

In [25]:
dq.extendleft((-1, -2, -3))
dq

deque([-3, -2, -1, 4, 5, 6, 7, 8, 9, 'end'])

In [26]:
dq.pop() # same as list

'end'

In [27]:
dq.popleft()

-3

In [28]:
dq

deque([-2, -1, 4, 5, 6, 7, 8, 9])

#### Aside: append vs extend

In [29]:
foo = [1, 2]
foo.append([5, 6, 7])
foo

[1, 2, [5, 6, 7]]

In [30]:
foo = [1, 2]
foo.extend([5, 6, 7])  #aka foo += [5,6,7]
foo

[1, 2, 5, 6, 7]

In [31]:
foo = [1, 2]
foo.append('bcd')
foo

[1, 2, 'bcd']

In [32]:
foo = [1, 2]
foo.extend('bcd')  # like += 
foo

[1, 2, 'b', 'c', 'd']

`</aside>`

In [33]:
dq

deque([-2, -1, 4, 5, 6, 7, 8, 9])

In [34]:
dq.remove(4) # same as list
dq

deque([-2, -1, 5, 6, 7, 8, 9])

In [35]:
dq.reverse()
print(dq)

deque([9, 8, 7, 6, 5, -1, -2], maxlen=10)


In [36]:
dq.append(0)
dq

deque([9, 8, 7, 6, 5, -1, -2, 0])

In [37]:
del dq[2]

In [38]:
dq

deque([9, 8, 6, 5, -1, -2, 0])

# The `collections` module: Named Tuples


## Named Tuples
* tuples are quite handy, but they are missing a key feature when using them as records–sometimes we want to name the fields
 * more efficient (i.e., less memory) than dictionaries because instances don't need to contain the keys themselves, as dictionaries do, just the values
* __`namedtuple()`__ returns not an individual object but a new class, customized for the given names

In [39]:
from collections import namedtuple
Point = namedtuple('Point', 'x y')
# first argument is the name of the tuple class itself
# second argument is attribute names as an iterable of strings or a
# single space/comma-delimited string

In [42]:
point1 = Point(1, 3)
point1, type(point1)

(Point(x=1, y=3), __main__.Point)

In [43]:
issubclass(Point, tuple)

True

In [44]:
point2 = Point(x=-3, y=-2)
point2

Point(x=-3, y=-2)

In [47]:
print(point1[0], point1[1]) # what we would do if just a tuple

1 3


In [46]:
print(point1.x, point1.y) # much nicer, because fields are named

1 3


In [49]:
from collections import namedtuple
Coords = namedtuple('Coords', 'lat long')
City = namedtuple('City', 'name country population coords')
tokyo = City('Tokyo', 'JP', 36.933, Coords(lat=35.689722, long=139.691667))
tokyo

City(name='Tokyo', country='JP', population=36.933, coords=Coords(lat=35.689722, long=139.691667))

In [50]:
tokyo.population

36.933

In [51]:
tokyo.coords

Coords(lat=35.689722, long=139.691667)

In [52]:
tokyo[1]

'JP'

In [53]:
type(City)

type

In [54]:
type(tokyo)

__main__.City

In [55]:
City._fields

('name', 'country', 'population', 'coords')

In [57]:
for field in City._fields:
    print(field, getattr(tokyo, field))

name Tokyo
country JP
population 36.933
coords Coords(lat=35.689722, long=139.691667)


In [58]:
for i, field in enumerate(City._fields): # tuple containing field names
    print(i, field, getattr(tokyo, field), tokyo[i])

0 name Tokyo Tokyo
1 country JP JP
2 population 36.933 36.933
3 coords Coords(lat=35.689722, long=139.691667) Coords(lat=35.689722, long=139.691667)


# `<Aside name="function arguments">`

In [59]:
def foo(*args):
    print(args)

In [60]:
foo(1)

(1,)


In [61]:
foo(1,2,3)

(1, 2, 3)


In [62]:
def bar(a, b):
    print(a, b)

In [63]:
bar(*(1,2))

1 2


In [64]:
bar(*[1,2])

1 2


In [65]:
bar(*'fo')

f o


In [66]:
def wrapped(a, b, c):
    print(a, b, c)

def wrapper(*args):
    return wrapped(*args)

In [67]:
wrapper(1,2,3)

1 2 3


In [68]:
bar(1,2)

1 2


In [69]:
bar(b=2, a=1)

1 2


In [70]:
def foo2(**kwargs):
    print(kwargs)

In [71]:
foo2(a=1, b=2, c=3)

{'a': 1, 'b': 2, 'c': 3}


In [72]:
def wrapped(a, b, c):
    print(a, b, c)

def wrapper(*args, **kwargs):
    return wrapped(*args, **kwargs)

In [73]:
wrapper(1, 2, c=4)

1 2 4


In [74]:
def mysum(first, *rest):
    print('first, rest == ', first, rest)
    if rest:
        return first + mysum(*rest)
    else:
        return first

In [75]:
mysum(1,2,3,4,5)

first, rest ==  1 (2, 3, 4, 5)
first, rest ==  2 (3, 4, 5)
first, rest ==  3 (4, 5)
first, rest ==  4 (5,)
first, rest ==  5 ()


15

# `</Aside>`

In [77]:
LatLong = namedtuple('LatLong', 'lat long')
delhi_data = ('Delhi NCR', 'IN', 21.935,
              LatLong(28.613889, 77.2098889)) # tuple
delhi = City._make(delhi_data) # same as City(*delhi_data)
delhi

City(name='Delhi NCR', country='IN', population=21.935, coords=LatLong(lat=28.613889, long=77.2098889))

In [79]:
delhi2 = City(*delhi_data)
delhi2

City(name='Delhi NCR', country='IN', population=21.935, coords=LatLong(lat=28.613889, long=77.2098889))

In [80]:
delhi == delhi2 == delhi_data

True

In [81]:
d = delhi._asdict() # returns an OrderedDict built from named tuple
d

OrderedDict([('name', 'Delhi NCR'),
             ('country', 'IN'),
             ('population', 21.935),
             ('coords', LatLong(lat=28.613889, long=77.2098889))])

In [82]:
City(**d)

City(name='Delhi NCR', country='IN', population=21.935, coords=LatLong(lat=28.613889, long=77.2098889))

https://docs.python.org/3/library/dataclasses.html for a read-write implementation of something similar

In [83]:
from sys import getsizeof

In [84]:
getsizeof({})

248

In [85]:
getsizeof(())

56

In [86]:
getsizeof([])

72

In [87]:
getsizeof(set())

232

In [88]:
getsizeof(deque())

640

In [89]:
getsizeof('')

49

In [90]:
getsizeof(delhi)

88

In [91]:
88 - 56

32

In [92]:
4 * 8 == 32

True

There are 10 kinds of people in the world...

those who read binary
and those who don't


# Advanced Datatypes: Counters

## Counters
* __`dict`__ subclass for counting things
* unordered collection where things being counted are `dict` keys and the counts are `dict` values
* __`Counters`__ can have negative values

In [93]:
from collections import Counter
c = Counter()
c

Counter()

In [94]:
dict(c)

{}

In [96]:
c = Counter('disagree and commit')
c

Counter({'d': 2,
         'i': 2,
         's': 1,
         'a': 2,
         'g': 1,
         'r': 1,
         'e': 2,
         ' ': 2,
         'n': 1,
         'c': 1,
         'o': 1,
         'm': 2,
         't': 1})

In [97]:
c = Counter({'red': 5, 'blue': -1})
c

Counter({'red': 5, 'blue': -1})

In [98]:
c['blue'] += 1
c

Counter({'red': 5, 'blue': 0})

In [99]:
c['green'] += 10

In [100]:
c

Counter({'red': 5, 'blue': 0, 'green': 10})

In [101]:
c = Counter(red=6, blue=5, green=3, pink=1, yellow=-3)
c.elements() # returns an iterator

<itertools.chain at 0x10929b250>

In [102]:
list(c.elements())

['red',
 'red',
 'red',
 'red',
 'red',
 'red',
 'blue',
 'blue',
 'blue',
 'blue',
 'blue',
 'green',
 'green',
 'green',
 'pink']

In [103]:
c.most_common(3) # returns the n most common elements

[('red', 6), ('blue', 5), ('green', 3)]

In [104]:
c

Counter({'red': 6, 'blue': 5, 'green': 3, 'pink': 1, 'yellow': -3})

In [105]:
dict(c)

{'red': 6, 'blue': 5, 'green': 3, 'pink': 1, 'yellow': -3}

In [106]:
Counter(['red', 'red', 'blue', 'red', 'green'])

Counter({'red': 3, 'blue': 1, 'green': 1})

In [107]:
d = Counter(f5=0, pink=0, red=3, blue=5, green=2, yellow=1)
c.subtract(d) # preserves negative values (like -=)
c

Counter({'red': 3, 'blue': 0, 'green': 1, 'pink': 1, 'yellow': -4, 'f5': 0})

In [109]:
# The - operator does *not* preserve negative values
Counter(red=6, blue=5, green=-1) - Counter(red=3, blue=7, green=1)

Counter({'red': 3})

In [110]:
c.items()

dict_items([('red', 3), ('blue', 0), ('green', 1), ('pink', 1), ('yellow', -4), ('f5', 0)])

In [111]:
+c # generates new Counter, discarding 0s or negatives (Py3)

Counter({'red': 3, 'green': 1, 'pink': 1})

In [113]:
c = Counter(red=6, blue=5, green=3, pink=1, yellow=-3)
c = -c # discard positives and multiply remaining negatives by -1
c

Counter({'yellow': 3})

In [114]:
dct = {'green': 1, 'pink': 1, 'red': 3}
dct.update(red=1, green=5, pink=-2)
dct

{'green': 5, 'pink': -2, 'red': 1}

In [115]:
c = Counter({'green': 1, 'pink': 1, 'red': 3})
c.update(red=1, green=5, pink=-2) # updates the counts
c

Counter({'green': 6, 'pink': -1, 'red': 4})

In [122]:
c = Counter(a=3, b=1, c=4)
d = Counter(b=2, a=1, c=5)
c + d

Counter({'a': 4, 'b': 3, 'c': 9})

In [117]:
c - d

Counter({'a': 2})

In [118]:
c,d

(Counter({'a': 3, 'b': 1, 'c': 4}), Counter({'a': 1, 'b': 2, 'c': 5}))

I don't use the following too often, myself, but we can also find the min & max values for each item in two counters.

In [119]:
c & d # min(c[x], d[x])

Counter({'a': 1, 'b': 1, 'c': 4})

In [120]:
c | d # max(c[x], d[x])

Counter({'a': 3, 'b': 2, 'c': 5})

Custom classes can override all operators except for 

```python
and or not is
```

# Lab

Open the [Advanced Data Types Lab][advanced-data-types-lab]

[advanced-data-types-lab]: ./advanced-data-types-lab.ipynb