# Collections module (Python)

Overview of the more common use cases for:
- defaultdict
- Counter
- deque
- namedtuple

Official documentation: https://docs.python.org/3/library/collections.html

### defaultdict

Extends `dict` by adding 'default' functionality.
Identical to `dict` *except* for the case where the accessed key doesn't yet exist.

Subclasses and inherits everything from `dict`.

In [142]:
from collections import defaultdict

In [143]:
# instantiate with int
inventory = defaultdict(int)
inventory["apples"] = 5
# int() is called if the key isn't found
print(f"How many bananas are in stock? {inventory["bananas"]}")

# Note: inventory["bananas"] would be a KeyError for a normal dict

How many bananas are in stock? 0


Another useful example with lists.

In [144]:
homerooms = defaultdict(list)
homerooms["Ms. Smith"].append("John")
homerooms["Ms. Smith"].append("Joey")
homerooms["Mr. Jones"].append("Jack")
homerooms["Ms. Smith"].append("Jill")

for k, v in homerooms.items():
    print(f"{k}: {v}")

Ms. Smith: ['John', 'Joey', 'Jill']
Mr. Jones: ['Jack']


The way it works is whatever is inside the constructor gets called like a function and is used to supply the default value if the key is missing. So `0` for `defaultdict(int)`, `[]` for `defaultdict(list)`, etc.

You can also use functions, lambdas, and classes:

In [145]:
# function / lambda
def returns_555():
    return 555
returns_777 = lambda: 777

default_555_dict = defaultdict(returns_555)
default_777_dict = defaultdict(returns_777)
print(default_555_dict["anything"])
print(default_777_dict["at all"])

555
777


In [146]:
class StudentReport:
    last_id = 0

    def __init__(self, attendance=0, classes=[]):
        StudentReport.last_id += 1
        self.id = StudentReport.last_id
        self.attendance = attendance
        self.classes = classes
    
    def __repr__(self):
        return f"Id={self.id} | Attendance={self.attendance} | Classes={self.classes}"

student_dict = defaultdict(StudentReport)
student_dict["Katie"] = StudentReport(154, ["Biology", "Calculus"])
student_dict["Kelly"] = StudentReport(147, ["History", "Geometry", "Art"])
print(student_dict["Kevin"])

Id=3 | Attendance=0 | Classes=[]


### Counter

It goes through an iterable thing and counts how many of each thing there are.

In [147]:
from collections import Counter

In [148]:
# count chars in a string
ctr = Counter("to be or not to be that is the question")
print(ctr)

Counter({' ': 9, 't': 7, 'o': 5, 'e': 4, 'b': 2, 'n': 2, 'h': 2, 'i': 2, 's': 2, 'r': 1, 'a': 1, 'q': 1, 'u': 1})


You can update a counter with more 'data'.

In [149]:
ctr.update("whether tis nobler in the mind to suffer")
ctr.update("the slings and arrows of outrageous fortune")
ctr.update("or to take arms against a sea of troubles")
print(ctr)


Counter({' ': 30, 't': 18, 'o': 16, 'e': 15, 's': 12, 'r': 11, 'a': 10, 'n': 9, 'i': 7, 'h': 6, 'u': 6, 'f': 5, 'b': 4, 'l': 3, 'g': 3, 'w': 2, 'm': 2, 'd': 2, 'q': 1, 'k': 1})


`Counter` subclasses `dict` so you can do normal `dict` things too.

In [150]:
del ctr[" "]
print(f"counter contains {len(ctr.keys())} distinct chars")
print(f"counter contains {ctr["t"]} t's, {ctr["e"]} e's and {ctr["x"]} x's")

counter contains 19 distinct chars
counter contains 18 t's, 15 e's and 0 x's


There is some more `Counter`-specific functionality. Should be pretty self-explanatory.

In [151]:
print(f"most common chars: {ctr.most_common(5)}")
print("re-expanded in frequency order: " + "".join(ctr.elements()))
ctr.subtract("toes toes toes toes")
print(f"{ctr.total()} total elements")

most common chars: [('t', 18), ('o', 16), ('e', 15), ('s', 12), ('r', 11)]
re-expanded in frequency order: ttttttttttttttttttoooooooooooooooobbbbeeeeeeeeeeeeeeerrrrrrrrrrrnnnnnnnnnhhhhhhaaaaaaaaaaiiiiiiissssssssssssquuuuuuwwlllmmddfffffgggk
114 total elements


### deque

Deque stands for double-ended queue.

Works the same as deque in any other language there's not much to explain here.

In [152]:
from collections import deque

In [159]:
my_deque = deque([5,6])
print(f"initial state: {my_deque}")

# append
my_deque.append(7)
my_deque.appendleft(4)
print(f"you can append: {my_deque}")

# extend
my_deque.extend([8, 9, 10])
my_deque.extendleft([3])
print(f"you can extend: {my_deque}")

# pop
my_deque.pop()
my_deque.popleft()
print(f"you can pop {my_deque}")

initial state: deque([5, 6])
you can append: deque([4, 5, 6, 7])
you can extend: deque([3, 4, 5, 6, 7, 8, 9, 10])
you can pop deque([4, 5, 6, 7, 8, 9])


Some deque-specific stuff:

In [None]:
# rotate
print("rotate():")
print(my_deque)
my_deque.rotate()
print(my_deque)
my_deque.rotate()
print(my_deque)

# max length
print("\nsetting max_length=5")
print("the deque will rotate after reaching capacity")
capped_deque = deque([1, 2], maxlen=5)
capped_deque.append(3)
print(capped_deque)
capped_deque.append(4)
print(capped_deque)
capped_deque.append(5)
print(capped_deque)
capped_deque.append(6)
print(capped_deque)

# reverse
print("\nreverse()")
abc_deque = deque("ABCDE")
print(abc_deque)
abc_deque.reverse()
print(abc_deque)

rotate():
deque([6, 7, 8, 9, 4, 5])
deque([5, 6, 7, 8, 9, 4])
deque([4, 5, 6, 7, 8, 9])

setting max_length=5
the deque will rotate after reaching capacity
deque([1, 2, 3], maxlen=5)
deque([1, 2, 3, 4], maxlen=5)
deque([1, 2, 3, 4, 5], maxlen=5)
deque([2, 3, 4, 5, 6], maxlen=5)

reversing a deque
deque(['A', 'B', 'C', 'D', 'E'])
deque(['E', 'D', 'C', 'B', 'A'])


### namedtuple

Tuples, but with named fields. The motivation, according to the docs, is to "allow for more readable, self-documenting code."

So... niche cases where you need a tuple but the situation is too niche to justify an entire class?

One of the stated use cases is for processing SQL query results.
This makes sense, since the columns can vary.

In [170]:
from collections import namedtuple

In [None]:
StoreItem = namedtuple('StoreItem', ['name', 'price'])

# can instantiate with positional or named arguments
apple_item = StoreItem('apple', 1.99)
banana_item = StoreItem(name='banana', price=0.59)

print(apple_item)
print(banana_item)

StoreItem(name='apple', price=1.99)
StoreItem(name='banana', price=0.59)
