# Lecture 8: Python's collections

Python’s collections module provides a rich set of specialized container data types carefully designed to approach specific programming problems in a Pythonic and efficient way.

## Function: deque()

Python’s deque was the first data structure in collections.

This sequence-like data type is a generalization of stacks and queues designed to support memory-efficient and fast append and pop operations on both ends of the data structure.

In [1]:
from collections import deque

ticket_queue = deque()
print(ticket_queue)

print()

# People arrive to the queue
ticket_queue.append("Jane")
ticket_queue.append("John")
ticket_queue.append("Linda")
print(ticket_queue)

print()

# People bought their tickets
print(ticket_queue.popleft())
print(ticket_queue.popleft())
print(ticket_queue.popleft())

print()

# No people on the queue
print(ticket_queue.popleft())

deque([])

deque(['Jane', 'John', 'Linda'])

Jane
John
Linda



IndexError: pop from an empty deque

In [2]:
from collections import deque

recent_files = deque(["core.py", "README.md", "__init__.py"], maxlen=3)

recent_files.appendleft("database.py")
print(recent_files)

recent_files.appendleft("requirements.txt")
print(recent_files)

deque(['database.py', 'core.py', 'README.md'], maxlen=3)
deque(['requirements.txt', 'database.py', 'core.py'], maxlen=3)


In [3]:
### Deques also support sequence operations:

# Method           Description
# .clear()         Remove all the elements from a deque
# .copy()          Create a shallow copy of a deque
# .count(x)        Count the number of deque elements equal to x
# .remove(value)   Remove the first occurrence of value
# and so on.

## Function: defaultdict()

A dictionary subclass for constructing default values for missing keys and automatically adding them to the dictionary.

In [4]:
# A common problem you’ll face when you’re working with dictionaries in Python is how to handle missing keys. 
# If you try to access a key that doesn’t exist in a given dictionary, then you get a KeyError:

favorites = {"pet": "dog", "color": "blue", "language": "Python"}
print(favorites["fruit"])

KeyError: 'fruit'

In [None]:
# There are a few approaches to work around this issue. For example, you can use .setdefault()

# Since this key doesn’t exist in favorites, .setdefault() creates it and assigns it the value of apple
favorites = {"pet": "dog", "color": "blue", "language": "Python"}
print(favorites.setdefault("fruit", "apple"))
print(favorites)

print()

# If you call .setdefault() with an existent key, then the call won’t affect the dictionary
print(favorites.setdefault("pet", "cat"))
print(favorites)

In [None]:
# You can also use .get() to return a suitable default value if a given key is missing

favorites = {"pet": "dog", "color": "blue", "language": "Python"}
print(favorites.get("fruit", "apple"))
print(favorites)

# However, .get() doesn’t create the new key for you.

In [None]:
# You can use any callable to initialize your 'defaultdict' objects.
# For example, with int() you can create a suitable counter to count different objects:
from collections import defaultdict

counter = defaultdict(int)
print(counter)
print(counter["dogs"])
print(counter)

counter["dogs"] += 1
counter["dogs"] += 1
counter["dogs"] += 1
counter["cats"] += 1
counter["cats"] += 1
print(counter)

## Function: namedtuple()

A factory function for creating subclasses of tuple that provides named fields that allow accessing items by name while keeping the ability to access items by index.

In [None]:
from collections import namedtuple

Car = namedtuple('Car', ['color', 'mileage'])

my_car = Car('red', 3812.4)
print(my_car.color, my_car.mileage)

In [None]:
my_car

In [None]:
my_car.color = 'blue'

##### Attribute Name: _fields

In [5]:
Car = namedtuple('Car', 'color mileage')
ElectricCar = namedtuple('ElectricCar', Car._fields + ('charge',))

ElectricCar('red', 1234, 45.0)

NameError: name 'namedtuple' is not defined

##### Function: _asdict()

In [None]:
import json

print(my_car._asdict())

# avoid typos when generating JSON
json.dumps(my_car._asdict())

##### Function: _replace()

In [None]:
my_car._replace(color='blue')

##### Function: _make()

In [None]:
Car._make(['red', 999])

## Function: OrderedDict()

A dictionary subclass that keeps the key-value pairs ordered according to when the keys are inserted.

Sometimes you need your dictionaries to remember the order in which key-value pairs are inserted.

Python’s regular dictionaries were unordered data structures for years.

In [None]:
# In this example, you create an empty ordered dictionary by instantiating 'OrderedDict' without arguments.
# Next, you add key-value pairs to the dictionary as you would with a regular dictionary.
from collections import OrderedDict

life_stages = OrderedDict()

life_stages["childhood"] = "0-9"
life_stages["adolescence"] = "9-18"
life_stages["adulthood"] = "18-65"
life_stages["old"] = "+65"

for stage, years in life_stages.items():
    print(f"{stage}: {years}")

In [None]:
from collections import OrderedDict

letters = OrderedDict(b=2, d=4, a=1, c=3)
print(letters)

# Move b to the right end
letters.move_to_end("b")
print(letters)

# Move b to the left end
letters.move_to_end("b", last=False)
print(letters)

# Sort letters by key
for key in sorted(letters):
    letters.move_to_end(key)

print(letters)

In [None]:
# Another important difference between OrderedDict and a regular dictionary is how they compare for equality:
from collections import OrderedDict

# Regular dictionaries compare the content only
letters_0 = dict(a=1, b=2, c=3, d=4)
letters_1 = dict(b=2, a=1, d=4, c=3)
print(letters_0 == letters_1)

# Ordered dictionaries compare content and order
letters_0 = OrderedDict(a=1, b=2, c=3, d=4)
letters_1 = OrderedDict(b=2, a=1, d=4, c=3)
print(letters_0 == letters_1)

letters_2 = OrderedDict(a=1, b=2, c=3, d=4)
print(letters_0 == letters_2)

## Function: Counter()

A dictionary subclass that supports convenient counting of unique items in a sequence or iterable.

Counting objects is a common operation in programming. Say you need to count how many times a given item appears in a list or iterable.

If your list is short, then counting its items can be straightforward and quick.

If you have a long list, then counting the items will be more challenging.

In [None]:
word = "mississippi"
counter = {}

for letter in word:
    if letter not in counter:
        counter[letter] = 0
    counter[letter] += 1

print(counter)

In [None]:
from collections import defaultdict

counter = defaultdict(int)

for letter in "mississippi":
    counter[letter] += 1

print(counter)

In [None]:
from collections import Counter

# A single line of code and you’re done.
print(Counter("mississippi"))

In [None]:
# your objects need to be hashable

from collections import Counter

my_list = [1, 1, 2, 3, 3, 3, 4]
print(Counter(my_list))

my_tuple = ([1], [1])
print(Counter(my_tuple))

In [None]:
from collections import Counter

letters = Counter("mississippi")
print(letters)

# Update the counts of m and i
letters.update(m=3, i=4)
print(letters)

# Add a new key-count pair
letters.update({"a": 2})
print(letters)

# Update with another counter
letters.update(Counter(["s", "s", "p"]))
print(letters)

In [None]:
# Another difference between Counter and dict is that accessing a missing key returns 0 instead of raising a KeyError:

from collections import Counter

letters = Counter("mississippi")
print(letters["a"])

In [None]:
from collections import Counter

multiset = Counter({1, 1, 2, 3, 3, 3, 4, 4})
print(multiset)

print(multiset.keys() == {1, 2, 3, 4})

In [None]:
from collections import Counter

inventory = Counter(dogs=23, cats=14, pythons=7)

adopted = Counter(dogs=2, cats=5, pythons=1)
inventory.subtract(adopted)
print(inventory)

new_pets = {"dogs": 4, "cats": 1}
inventory.update(new_pets)
print(inventory)

inventory -= Counter(dogs=2, cats=3, pythons=1)
print(inventory)

new_pets = {"dogs": 4, "pythons": 2}
inventory += new_pets
print(inventory)

## Function: ChainMap()

A dictionary-like class that allows treating a number of mappings as a single dictionary object.

In [None]:
# ChainMap allows you to define the appropriate priority for the application’s proxy configuration.
from collections import ChainMap

cmd_proxy = {}  # The user doesn't provide a proxy
local_proxy = {"proxy": "proxy.local.com"}
global_proxy = {"proxy": "proxy.global.com"}

config = ChainMap(cmd_proxy, local_proxy, global_proxy)
print(config["proxy"])

In [6]:
# .maps public attribute that holds the internal list of mappings
from collections import ChainMap

numbers = {"one": 1, "two": 2}
letters = {"a": "A", "b": "B"}

alpha_nums = ChainMap(numbers, letters)
print(alpha_nums)
# print(list(alpha_nums))

print(alpha_nums.maps)

ChainMap({'one': 1, 'two': 2}, {'a': 'A', 'b': 'B'})
[{'one': 1, 'two': 2}, {'a': 'A', 'b': 'B'}]


In [7]:
# Additionally, ChainMap provides a .new_child() method and a .parents property

from collections import ChainMap

dad = {"name": "John", "age": 35}
mom = {"name": "Jane", "age": 31}
family = ChainMap(mom, dad)
print(family)

son = {"name": "Mike", "age": 0}
family = family.new_child(son)

for person in family.maps:
    print(person)

print()

print(family.parents)

ChainMap({'name': 'Jane', 'age': 31}, {'name': 'John', 'age': 35})
{'name': 'Mike', 'age': 0}
{'name': 'Jane', 'age': 31}
{'name': 'John', 'age': 35}

ChainMap({'name': 'Jane', 'age': 31}, {'name': 'John', 'age': 35})


In [8]:
# A final feature to highlight in ChainMap is that mutating operations, such as:
# - updating keys
# - adding new keys
# - deleting existing keys
# - popping keys
# - clearing the dictionary
# act on the first mapping in the internal list of mappings:

from collections import ChainMap

numbers = {"one": 1, "two": 2}
letters = {"a": "A", "b": "B"}

alpha_nums = ChainMap(numbers, letters)
print(alpha_nums)

# Add a new key-value pair
alpha_nums["c"] = "C"
print(alpha_nums)

# Pop a key that exists in the first dictionary
print(alpha_nums.pop("two"))
print(alpha_nums)

# Delete keys that don't exist in the first dict but do in others
# del alpha_nums["a"]

# Clear the dictionary
alpha_nums.clear()
print(alpha_nums)

ChainMap({'one': 1, 'two': 2}, {'a': 'A', 'b': 'B'})
ChainMap({'one': 1, 'two': 2, 'c': 'C'}, {'a': 'A', 'b': 'B'})
2
ChainMap({'one': 1, 'c': 'C'}, {'a': 'A', 'b': 'B'})
ChainMap({}, {'a': 'A', 'b': 'B'})


## collections: three base classes

Besides these specialized data types, 'collections' also provides three base classes that facilitate the creations of custom lists, dictionaries, and strings.

> Class: UserDict

A wrapper class around a dictionary object that facilitates subclassing dict.

> Class: UserList

A wrapper class around a list object that facilitates subclassing list.

> Class: UserString

A wrapper class around a string object that facilitates subclassing string.

Built-in types were designed and implemented with the open-closed principle in mind. This means that they’re open for extension but closed for modification.

Allowing modifications on the core features of these classes can potentially break their invariants. So, Python core developers decided to protect them from modifications.

In [9]:
# create a dictionary that automatically lowercases the keys when you insert them.
class LowerDict(dict):
    def __setitem__(self, key, value):
        key = key.lower()
        super().__setitem__(key, value)

ordinals = LowerDict({"FIRST": 1, "SECOND": 2})
ordinals["THIRD"] = 3
ordinals.update({"FOURTH": 4})

print(ordinals)
print(isinstance(ordinals, dict))

# This dictionary works correctly when you insert new keys using dictionary-style assignment with square brackets ([]).
# However, it doesn’t work when you pass an initial dictionary to the class constructor or when you use .update(). 

# This means that you would need to override .__init__(), .update(), and probably some other methods 
# for your custom dictionary to work correctly.

{'FIRST': 1, 'SECOND': 2, 'third': 3, 'FOURTH': 4}
True


In [10]:
from collections import UserDict

class LowerDict(UserDict):
    def __setitem__(self, key, value):
        key = key.lower()
        super().__setitem__(key, value)

ordinals = LowerDict({"FIRST": 1, "SECOND": 2})
ordinals["THIRD"] = 3
ordinals.update({"FOURTH": 4})

print(ordinals)
print(isinstance(ordinals, dict))

# Your custom dictionary now converts all the new keys into lowercase letters before 
# inserting them into the dictionary.

# Note that since you don’t inherit from dict directly,
# your class doesn’t return instances of dict as in the example above.

{'first': 1, 'second': 2, 'third': 3, 'fourth': 4}
False


## Conclusion

The need for these wrapper classes was partially eclipsed by the ability to subclass the corresponding standard built-in data types.

However, sometimes using these classes is safer and less error-prone than using standard data types.

### References
<ol>
<li> <a href="https://realpython.com/python-collections-module/">Python's collections: A Buffet of Specialized Data Types</a> </li>
</ol>