# Collections

## Prerequisite: What are containers in Python?
* Object that contains 0+ other objects
* Examples: `tuple`, `list`, `set`, `dict`
* Abstract base class is `collections.abc.Container` (`collections.Container` in Py2)
    * Supports the `__contains__` method which allows for the use of the `in` keyword (`x in y`)

## What are collections in Python?
* Specialized containers as alternative to built-ins
* Examples: `namedtuple`, `Counter`, `OrderedDict`



## `namedtuple`
* Factory function that generates an indexable, iterable class (subclass of `tuple`) with the named attribute fields
* Why use it? 
    * Lightweight
    * **Require no more memory than a tuple containing identical objects**

In [62]:
from collections import namedtuple
import sys

Point2D = namedtuple("Point2D", ["x", "y"]) # or namedtuple("Point", "x y") or namedtuple("Point", "x, y")
Point3D = namedtuple("Point3D", Point2D._fields + tuple("z"))
p1 = Point3D(8, y=2, z=5)

print(f"p1 = {p1}")
print(f"p1.y = {p1.y}")
print(f"namedtuple generated class: has dictionary attribute? {hasattr(p1, '__dict__')}")
print(f"namedtuple generated class: sizeof? {sys.getsizeof(p1)}")
print(f"tuple: sizeof? {sys.getsizeof((8, 2, 5))}")

p1 = Point3D(x=8, y=2, z=5)
p1.y = 2
namedtuple generated class: has dictionary attribute? False
namedtuple generated class: sizeof? 80
tuple: sizeof? 80


In [111]:
from collections import namedtuple
import sys

class Point3D(tuple):
    def __init__(self, it):
        if len(it) > 3:
            raise ValueError(f"3 values expected in iterable, {len(it)} received")
        self.x = it[0]
        self.y = it[1]
        self.z = it[2]
     
p2 = Point3D((4, 5, 6))
print(f"Manual class definition: has dictionary attribute? {hasattr(p2, '__dict__')}")
# Note: 
print(f"Manual class definition: sizeof? {sys.getsizeof(p2)}")

Manual class definition: has dictionary attribute? True
Manual class definition: sizeof? 88


In [32]:
from collections import namedtuple
import csv

# import urllib.request
# url = "https://gist.githubusercontent.com/GoodmanSciences/c2dd862cd38f21b0ad36b8f96b4bf1ee/raw/1d92663004489a5b6926e944c1b3d9ec5c40900e/Periodic%2520Table%2520of%2520Elements.csv"
# with urllib.request.urlopen(url) as response:
#    data = response.read()

Element = namedtuple("Element", "AtomicNumber,Element,Symbol,AtomicMass,NumberofNeutrons,NumberofProtons,NumberofElectrons,Period,Group,Phase")
with open("./Periodic Table of Elements.csv") as f:
    for el in map(Element._make, csv.reader(f)):
        print(el)


Element(AtomicNumber='1', Element='Hydrogen', Symbol='H', AtomicMass='1.007', NumberofNeutrons='0', NumberofProtons='1', NumberofElectrons='1', Period='1', Group='1', Phase='gas')
Element(AtomicNumber='2', Element='Helium', Symbol='He', AtomicMass='4.002', NumberofNeutrons='2', NumberofProtons='2', NumberofElectrons='2', Period='1', Group='18', Phase='gas')
Element(AtomicNumber='3', Element='Lithium', Symbol='Li', AtomicMass='6.941', NumberofNeutrons='4', NumberofProtons='3', NumberofElectrons='3', Period='2', Group='1', Phase='solid')
Element(AtomicNumber='4', Element='Beryllium', Symbol='Be', AtomicMass='9.012', NumberofNeutrons='5', NumberofProtons='4', NumberofElectrons='4', Period='2', Group='2', Phase='solid')
Element(AtomicNumber='5', Element='Boron', Symbol='B', AtomicMass='10.811', NumberofNeutrons='6', NumberofProtons='5', NumberofElectrons='5', Period='2', Group='13', Phase='solid')
Element(AtomicNumber='6', Element='Carbon', Symbol='C', AtomicMass='12.011', NumberofNeutrons

## `deque`
* "Generalization of stacks and queues (the name is pronounced 'deck' and is short for 'double-ended queue')"
* Implementation is done in C/CPython to keep performance overhead low
* Why use it?
    * **Thread-safe**
    * Memory efficient left and right appends and pops (**O(1) performance from both directions** vs. O(n) for left side pop and append for `list`) 
    * Additional features: `rotate`, `maxlen`, `extendleft`

In [110]:
from collections import deque
import timeit

def op_to_time1(n=100):
    x = deque()
    for _ in range(n):
        x.appendleft(0)
    return x

def op_to_time2(n=100):
    x = []
    for _ in range(n):
        x.append(0)
    return list(reversed(x))

def op_to_time3(n=100):
    x = []
    for _ in range(n):
        x.insert(0, 0)
    return x
        
%timeit op_to_time1()
%timeit op_to_time2()
%timeit op_to_time3()

5.33 µs ± 88.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
7.96 µs ± 77.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
18.1 µs ± 976 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [114]:
x = deque([3, 4, 5], maxlen=10) # Once a bounded length deque is full, when new items are added, a corresponding number of items are discarded from the opposite end. 
x.extendleft(reversed([0, 1, 2])) # deque.extendleft() inserts from arguments right to left
# x.rotate(3)
print(x)

deque([0, 1, 2, 3, 4, 5])


## `ChainMap`

## `Counter`

## `OrderedDict`

## `defaultdict`

## `UserDict`

## `UserList`

## `UserString`