## DefaultDict

Dictionaries expect that you will create a key-value pair before using the value. That's pretty reasonable most of the time, but sometimes you just want it to assume some basic value whenever a new key is entered. See this example.

In [14]:
count = {}
count['duck'] = 0
animals = ['duck','duck','duck','goose']
for animal in animals:
    count[animal] += 1
    print(animal)


duck
duck
duck


KeyError: 'goose'

In [15]:
count = {}
animals = ['duck','duck','duck','goose', 'cow', 'cow', 'monkey']

for animal in animals:
    try:
        count[animal] += 1
    except KeyError:
        count[animal] = 1
count

{'duck': 3, 'goose': 1, 'cow': 2, 'monkey': 1}

In [25]:
from collections import defaultdict

count = defaultdict(float)
animals = ['duck','duck','duck','goose', 'cow', 'cow', 'monkey']

for animal in animals:
    count[animal] += 1
count

defaultdict(float, {'duck': 3.0, 'goose': 1.0, 'cow': 2.0, 'monkey': 1.0})

In [28]:
from collections import defaultdict
ice_cream = defaultdict(lambda: 'Vanilla')

ice_cream['Sarah'] = 'Chunk Monkey'
ice_cream['Abdul'] = 'Butter Pecan'
print (ice_cream['Sarah'])
print (ice_cream['Joe'])

Chunk Monkey
Vanilla


In [31]:
from collections import defaultdict
food_list = 'spam spam spam spam spam spam eggs spam'.split()
print(food_list)
food_counter = defaultdict(int)
for item in food_list:
    food_counter[item] += 1
print(food_counter)

['spam', 'spam', 'spam', 'spam', 'spam', 'spam', 'eggs', 'spam']
defaultdict(<class 'int'>, {'spam': 7, 'eggs': 1})


## Named Tuple

Sometimes you want to create a class, but the class only needs to store data, and you are lazy.    

You could put the data in a dictionary, but there is a set amount of info that never changes for each instance. You could put the data in a tuple, but then you need to remember the order. What if you could have the simplicity of a tuple, but labels like a dictionary, and access methods by name like a dictionary? That's a **named tuple**.

In [35]:
from collections import namedtuple

alumni = namedtuple('Alumni', 'name age gender degree title salary employer')

alice = alumni(name= 'Alice',
               age = 29,
               gender = 'F',
               degree = 'PhD',
               title = 'Data Scientist',
               salary = 115000,
               employer = 'Thumbtack')
alice


Alumni(name='Alice', age=29, gender='F', degree='PhD', title='Data Scientist', salary=115000, employer='Thumbtack')

In [37]:
#Regular Tuples
bob = ('Bob', 30, 'male')
print('Representation:', bob)

jane = ('Jane', 29, 'female')
print ('\nField by index:', jane[0])

Representation: ('Bob', 30, 'male')

Field by index: Jane


In [42]:
from collections import namedtuple

friends = namedtuple('Friends', 'name age gender degree college')

johnny = friends(name = 'Johnny',
                 age = 28,
                 gender = 'M',
                 degree = 'Economics',
                 college = 'University of North Carolina at Chapel Hill')
johnny.age, johnny.college


(28, 'University of North Carolina at Chapel Hill')

## Deque
A deque (double-ended queue) is a lovely type of object that's designed for accessing data on either end. A normal list is only optimized for adding-removing from the right with things like append and pop. Deque's are designed to be ambivalent about sides. 

In [45]:
from collections import deque

d = deque([1,2,3,4])
d.appendleft(3)
d


deque([3, 1, 2, 3, 4])

In [46]:
d.popleft()
d

deque([1, 2, 3, 4])

In [49]:
window = deque(maxlen=4)
for idx in range(10):
    window.append(idx)
    print(window)
print("---SWITCH---")
for idx in range(10):
    window.appendleft(idx)
    print(window)

deque([0], maxlen=4)
deque([0, 1], maxlen=4)
deque([0, 1, 2], maxlen=4)
deque([0, 1, 2, 3], maxlen=4)
deque([1, 2, 3, 4], maxlen=4)
deque([2, 3, 4, 5], maxlen=4)
deque([3, 4, 5, 6], maxlen=4)
deque([4, 5, 6, 7], maxlen=4)
deque([5, 6, 7, 8], maxlen=4)
deque([6, 7, 8, 9], maxlen=4)
---SWITCH---
deque([0, 6, 7, 8], maxlen=4)
deque([1, 0, 6, 7], maxlen=4)
deque([2, 1, 0, 6], maxlen=4)
deque([3, 2, 1, 0], maxlen=4)
deque([4, 3, 2, 1], maxlen=4)
deque([5, 4, 3, 2], maxlen=4)
deque([6, 5, 4, 3], maxlen=4)
deque([7, 6, 5, 4], maxlen=4)
deque([8, 7, 6, 5], maxlen=4)
deque([9, 8, 7, 6], maxlen=4)


#### Generators
Generators aren't in the `collections` package, but are instead a standard part of Python 3. They're extremely powerful and solve a lot of problems for us.

Often times in an analysis, we don't really want to load a whole thing into memory. We really just want a `cursor` that knows where it is in the data. For instance, imagine I was trying to load all the books ever written into Python... that's too big for my RAM. However, if I just had an object that kept track of which book it was on, and what page it needs to read next, I could load things page-by-page. That's exactly what a generator does (albeit, I've oversimplified a bit). 

We can use that to give us data over and over, without having to pre-generate all the data. Let's see an example.

In [51]:
def generate_numbers():
    """
    An infinite number generator
    """
    
    x = 0
    while True:
        x += 1
        yield x
my_generator = generate_numbers()
for interation in range(10):
    next_number = next(my_generator)
    print(next_number)

1
2
3
4
5
6
7
8
9
10
