# Notebook 10 ‚Äî Advanced Collections

**Series:** 25 Days of Data Tech ‚Äî *From Python to Production*  
**Focus Today:** The `collections` toolbox for real‚Äëworld data engineering patterns.

---

### What you'll learn
- `Counter` deep dive: frequency, set‚Äëlike ops, arithmetic
- `defaultdict` patterns: grouping, nested structures
- `deque` in practice: fast queues, rotations, sliding windows
- `namedtuple` (and a note on `dataclasses`)
- `OrderedDict` ‚Äî when it still matters
- `ChainMap` for layered configs
- `UserDict`/`UserList`/`UserString` ‚Äî safe subclassing
- Practical recipes & performance notes


> **Why this matters**  
> These types are optimized for **clarity and performance**. They shrink boilerplate (grouping, counting), power streaming pipelines, and make intent obvious.


## 1. `Counter` ‚Äî Frequency Maps with Superpowers


In [12]:
from collections import Counter

words = "to be or not to be that is the question to be".split()
c = Counter(words)
top3 = c.most_common(3)
c["to"], top3, len(c), list(c.elements())[:8]


(3,
 [('to', 3), ('be', 3), ('or', 1)],
 8,
 ['to', 'to', 'to', 'be', 'be', 'be', 'or', 'not'])

In [13]:
# Arithmetic & set-like operations on Counters
a = Counter("abracadabra")
b = Counter("alakazam")

add = a + b          # addition keeps positives
sub = a - b          # subtract floors at 0 (no negatives)
inter = a & b        # min counts (intersection)
union = a | b        # max counts (union)

add, sub, inter, union


(Counter({'a': 9,
          'b': 2,
          'r': 2,
          'c': 1,
          'd': 1,
          'l': 1,
          'k': 1,
          'z': 1,
          'm': 1}),
 Counter({'b': 2, 'r': 2, 'a': 1, 'c': 1, 'd': 1}),
 Counter({'a': 4}),
 Counter({'a': 5,
          'b': 2,
          'r': 2,
          'c': 1,
          'd': 1,
          'l': 1,
          'k': 1,
          'z': 1,
          'm': 1}))

## 2. `defaultdict` ‚Äî Smart Defaults for Grouping


In [14]:
from collections import defaultdict

pairs = [("alice", 91), ("bob", 78), ("alice", 95), ("carol", 88)]
groups = defaultdict(list)
for name, score in pairs:
    groups[name].append(score)
dict(groups)


{'alice': [91, 95], 'bob': [78], 'carol': [88]}

In [15]:
# Nested defaultdicts for trees / counters
from collections import defaultdict

tree = lambda: defaultdict(tree)
root = tree()
root["us"]["oh"]["toledo"]["count"] = 3
# Convert to normal dict for pretty view
import json
json.loads(json.dumps(root))


{'us': {'oh': {'toledo': {'count': 3}}}}

## 3. `deque` ‚Äî Fast Queues, Rotations, Windows


In [16]:
from collections import deque

dq = deque(maxlen=5)  # bounded queue
for x in range(7):
    dq.append(x)
left = dq[0]
rot = dq.copy(); rot.rotate(2)
(dq, left, rot)


(deque([2, 3, 4, 5, 6], maxlen=5), 2, deque([5, 6, 2, 3, 4], maxlen=5))

In [17]:
# Sliding window max using deque (monotonic queue)
from collections import deque

def window_max(nums, w):
    q = deque()  # stores indices, values decreasing
    out = []
    for i, x in enumerate(nums):
        while q and nums[q[-1]] <= x:
            q.pop()
        q.append(i)
        if q[0] <= i - w:
            q.popleft()
        if i >= w - 1:
            out.append(nums[q[0]])
    return out

window_max([2,1,3,2,5,2,6,2], 3)


[3, 3, 5, 5, 6, 6]

## 4. `namedtuple` ‚Äî Lightweight Immutable Records


In [18]:
from collections import namedtuple

Point = namedtuple("Point", "x y")
p = Point(3, 4)
p.x, p.y, p._asdict(), p._replace(y=5)


(3, 4, {'x': 3, 'y': 4}, Point(x=3, y=5))

> **Note:** For richer models with defaults, validation, or methods, prefer `@dataclass`. Use `namedtuple` when you need a tiny immutable record with minimal overhead.


## 5. `OrderedDict` ‚Äî Still Useful Beyond 3.7+
`dict` preserves insertion order since 3.7+, but `OrderedDict` adds:
- `move_to_end(key, last=True)`
- `popitem(last=True)` to pop LIFO or FIFO


In [19]:
from collections import OrderedDict

od = OrderedDict()
od["a"]=1; od["b"]=2; od["c"]=3
od.move_to_end("b", last=False)  # move 'b' to front
front = list(od.items())
popped_fifo = OrderedDict(od).popitem(last=False)
front, popped_fifo


([('b', 2), ('a', 1), ('c', 3)], ('b', 2))

## 6. `ChainMap` ‚Äî Layered or Fallback Dicts
Great for configuration overlays (env ‚Üí user ‚Üí defaults).


In [20]:
from collections import ChainMap

defaults = {"region":"us-east-1", "threads":4}
user = {"threads":8}
env = {"region":"eu-west-1"}

cfg = ChainMap(env, user, defaults)  # lookup left-to-right
cfg["region"], cfg["threads"], list(cfg.maps)


('eu-west-1',
 8,
 [{'region': 'eu-west-1'},
  {'threads': 8},
  {'region': 'us-east-1', 'threads': 4}])

## 7. `UserDict` / `UserList` / `UserString` ‚Äî Safer Subclassing

Prefer these wrappers over subclassing built-in `dict/list/str` directly.


In [21]:
from collections import UserDict

class CaseInsensitiveDict(UserDict):
    def __setitem__(self, key, value):
        super().__setitem__(key.lower(), value)
    def __getitem__(self, key):
        return super().__getitem__(key.lower())

cid = CaseInsensitiveDict()
cid["Content-Type"] = "application/json"
cid["content-type"]


'application/json'

In [22]:
from collections import UserList

class BoundedList(UserList):
    def __init__(self, maxlen, iterable=()):
        super().__init__(iterable)
        self.maxlen = maxlen
    def append(self, item):
        if len(self.data) >= self.maxlen:
            self.data.pop(0)
        self.data.append(item)

bl = BoundedList(3, [1,2])
for x in [3,4,5]: bl.append(x)
list(bl)


[3, 4, 5]

## 8. Practical Recipes

- **Top‚Äëk stream**: `heapq.nlargest(k, Counter(tokens).items(), key=lambda kv: kv[1])`  
- **Group rows**: `defaultdict(list)` with `groups[key].append(row)`  
- **LRU-ish cache**: `OrderedDict(move_to_end + popitem(last=False))`  
- **Rolling window**: `deque(maxlen=w)` for fixed-size history  
- **Layer configs**: `ChainMap(env, user, defaults)`


## 9. Performance Notes

- `deque` O(1) appends/pops at both ends; lists are O(n) at left side  
- `Counter` is optimized C code; prefer over manual dict counting  
- `defaultdict` avoids branches for missing keys in tight loops  
- `namedtuple` is compact and hashable (when fields are)


## 10. Practice (Try first, then reveal solutions)

1. **top_k_words**: Using `Counter`, return top‚Äëk words from text and show ties stable by word.  
2. **group_orders**: Given `(user, price)` pairs, build a total price per user using `defaultdict(float)`.  
3. **last_n_events**: Keep the last `n` events with `deque(maxlen=n)` and show current snapshot after each append.  
4. **rotate_cipher**: Use `deque` to implement a simple Caesar rotation for uppercase letters.  
5. **point_ops**: Define a `namedtuple("Point","x y")` and write `add(p,q)` and `dist(p,q)`.  
6. **recently_used**: Build a tiny LRU using `OrderedDict` with `capacity`; on `get/put`, move keys to end and evict FIFO.  
7. **merge_layers**: Combine `defaults`, `user`, `runtime` dicts with `ChainMap` and show effective view; then convert to a real dict.  
8. **ci_dict**: Implement a case‚Äëinsensitive dict using `UserDict` that preserves the *first* casing of keys for iteration.  
9. **bounded_list**: Using `UserList`, make a `BoundedList(maxlen)` that discards from the left on overflow (already shown ‚Äî extend with `extend`).  
10. **nested_dd**: Build a nested `defaultdict` tree to count occurrences by `country ‚Üí city ‚Üí day`.  
11. **counter_arith**: Given two Counters of words A and B, compute words more common in A than B and normalize to frequencies.  
12. **window_max**: Re‚Äëimplement the sliding window max with `deque` (from ¬ß3) and test it on a sample.


## 11. Practice Solutions  
*(Click to reveal after solving.)*

<details>
<summary><strong>Solution 1Ô∏è‚É£ ‚Äî top_k_words</strong></summary>

```python
from collections import Counter

def top_k_words(text, k=3):
    toks = [t.lower() for t in text.split() if t.isalpha()]
    c = Counter(toks)
    # stable by word on tie
    return sorted(c.items(), key=lambda kv: (-kv[1], kv[0]))[:k]
```
</details>

<details>
<summary><strong>Solution 2Ô∏è‚É£ ‚Äî group_orders</strong></summary>

```python
from collections import defaultdict

def group_orders(pairs):
    totals = defaultdict(float)
    for user, price in pairs:
        totals[user] += float(price)
    return dict(totals)
```
</details>

<details>
<summary><strong>Solution 3Ô∏è‚É£ ‚Äî last_n_events</strong></summary>

```python
from collections import deque

def last_n_events(events, n=3):
    dq = deque(maxlen=n)
    snapshots = []
    for e in events:
        dq.append(e)
        snapshots.append(list(dq))
    return snapshots
```
</details>

<details>
<summary><strong>Solution 4Ô∏è‚É£ ‚Äî rotate_cipher</strong></summary>

```python
from collections import deque
import string

def rotate_cipher(s, k=3):
    letters = deque(string.ascii_uppercase)
    letters.rotate(-k)
    table = str.maketrans(string.ascii_uppercase, "".join(letters))
    return s.translate(table)
```
</details>

<details>
<summary><strong>Solution 5Ô∏è‚É£ ‚Äî point_ops</strong></summary>

```python
from collections import namedtuple
from math import hypot

Point = namedtuple("Point","x y")

def add(p, q):
    return Point(p.x + q.x, p.y + q.y)

def dist(p, q):
    return hypot(p.x - q.x, p.y - q.y)
```
</details>

<details>
<summary><strong>Solution 6Ô∏è‚É£ ‚Äî recently_used (LRU)</strong></summary>

```python
from collections import OrderedDict

class LRU:
    def __init__(self, capacity=3):
        self.cap = capacity
        self.od = OrderedDict()
    def get(self, key, default=None):
        if key not in self.od:
            return default
        self.od.move_to_end(key, last=True)
        return self.od[key]
    def put(self, key, value):
        if key in self.od:
            self.od.move_to_end(key, last=True)
        self.od[key] = value
        if len(self.od) > self.cap:
            self.od.popitem(last=False)
```
</details>

<details>
<summary><strong>Solution 7Ô∏è‚É£ ‚Äî merge_layers</strong></summary>

```python
from collections import ChainMap

def merge_layers(defaults, user, runtime):
    view = ChainMap(runtime, user, defaults)
    final = dict(view)  # materialize
    return view, final
```
</details>

<details>
<summary><strong>Solution 8Ô∏è‚É£ ‚Äî ci_dict (preserve first casing)</strong></summary>

```python
from collections import UserDict

class CIDict(UserDict):
    def __init__(self, *args, **kwargs):
        super().__init__()
        self._keys = {}  # lower -> original casing
        self.update(*args, **kwargs)
    def __setitem__(self, key, value):
        lk = key.lower()
        if lk not in self._keys:
            self._keys[lk] = key
        super().__setitem__(lk, value)
    def __getitem__(self, key):
        return super().__getitem__(key.lower())
    def __iter__(self):
        for lk, v in self.data.items():
            yield self._keys[lk]
    def items(self):
        for lk, v in self.data.items():
            yield self._keys[lk], v
```
</details>

<details>
<summary><strong>Solution 9Ô∏è‚É£ ‚Äî bounded_list (extend)</strong></summary>

```python
from collections import UserList

class BoundedList(UserList):
    def __init__(self, maxlen, iterable=()):
        super().__init__(iterable)
        self.maxlen = maxlen
    def append(self, x):
        if len(self.data) >= self.maxlen:
            self.data.pop(0)
        self.data.append(x)
    def extend(self, it):
        for x in it:
            self.append(x)
```
</details>

<details>
<summary><strong>Solution üîü ‚Äî nested_dd</strong></summary>

```python
from collections import defaultdict

def nested_counts(rows):
    tree = lambda: defaultdict(tree)
    root = tree()
    for country, city, day in rows:
        root[country][city][day] = root[country][city].get(day, 0) + 1
    return root
```
</details>

<details>
<summary><strong>Solution 1Ô∏è‚É£1Ô∏è‚É£ ‚Äî counter_arith</strong></summary>

```python
from collections import Counter

def more_common_in_A(A, B):
    ca, cb = Counter(A), Counter(B)
    diff = ca - cb
    total = sum(diff.values()) or 1
    return {w: c/total for w, c in diff.items()}
```
</details>

<details>
<summary><strong>Solution 1Ô∏è‚É£2Ô∏è‚É£ ‚Äî window_max</strong></summary>

```python
from collections import deque

def window_max(nums, w):
    q = deque()
    out = []
    for i, x in enumerate(nums):
        while q and nums[q[-1]] <= x:
            q.pop()
        q.append(i)
        if q[0] <= i - w:
            q.popleft()
        if i >= w - 1:
            out.append(nums[q[0]])
    return out
```
</details>
