In [None]:
%%HTML
<style>
    body {
        --vscode-font-family: "Roboto Slab"
    }
</style>

# Python built in data structure

## Built‑in core containers (Python 3):

**Mutable:**
- list (mutable sequence)
- dict (hash map; insertion-ordered)
- set (mutable hash set)
- bytearray (mutable byte sequence)
- memoryview (view onto buffer data; mutable if underlying object is writable)

**Immutable:**
- tuple (immutable sequence)
- frozenset (immutable set)
- str (immutable text sequence)
- bytes (immutable byte sequence)
- range (immutable arithmetic sequence)
- memoryview (view onto buffer data; immutable if underlying object is read-only)


### frozenset


**What:** An immutable (hashable) version of a set. Cannot add/remove elements after creation.

**Why / Use cases:**
- Use as a dictionary key or as an element inside another set (regular `set` is unhashable)
- Stable snapshot of a set for caching / memoization
- Represent a fixed permission/feature bundle
- Safer to expose from an API when callers must not mutate

**Key properties:**
- Supports all non‑mutating set operations: union (`|`), intersection (`&`), difference (`-`), symmetric difference (`^`), subset tests
- Hashable (can live in other sets / dict keys)
- Missing mutators: `add`, `remove`, `discard`, `pop`, `clear`
- Construction cost ~ building a normal set

**When to choose:** Want set semantics + immutability + hashability.

In [1]:
# frozenset examples
a = frozenset([1, 2, 3])
b = frozenset([2, 3, 4])
print('Intersection:', a & b)          # frozenset({2, 3})
# print('Is subset?', frozenset([2,3]) <= a)
print('Is b <= a?', b <= a)
print('Is b U a?', b | a)
print('Is b ^ a?', b ^ a)

# Use as dict key (regular set would fail)
weights = {a: 'group123', b: 'group234'}
print('Lookup by frozenset key:', weights[a])

# Memoization key example
cache = {}
features = frozenset({'age', 'balance', 'region'})
cache[features] = 'computed_vector'
print('Cached value:', cache[features])

Intersection: frozenset({2, 3})
Is b <= a? False
Is b U a? frozenset({1, 2, 3, 4})
Is b ^ a? frozenset({1, 4})
Lookup by frozenset key: group123
Cached value: computed_vector


### memoryview



**What:** A zero‑copy view over the memory of a bytes-like object (`bytes`, `bytearray`, `array.array`, `mmap`, etc.).

**Why / Use cases:**
- Avoid copying large binary buffers when slicing or passing to functions
- Efficient network/file protocol parsing (treat header vs payload as slices)
- Mutate underlying buffer via a view (if original is writable like `bytearray`)
- Reinterpret underlying bytes with different element formats via `cast()`

**Key properties:**
- Slicing returns another view (still zero‑copy)
- Writable only if the underlying object is writable
- `cast(format_code)` can reinterpret element size (e.g., to unsigned bytes `'B'`)
- Converting to `bytes(mv)` (or `mv.tobytes()`) makes a copy

**When to choose:** Handling large binary data streams, performance‑sensitive parsing, interoperating with C extensions / buffer protocol implementers.


In [2]:
# memoryview examples
buf = bytearray(b'ABCDE')
mv = memoryview(buf)
print('Original buffer:', buf)

# Slice without copy
sub = mv[1:4]
print('Slice (as list of byte values):', sub.tolist())

# Mutate through view
mv[0] = ord('a')   # modifies underlying bytearray
print('After modification via view:', buf)

from array import array
nums = array('I', [1, 2, 3])  # unsigned ints (platform dependent size, commonly 4 bytes)
mv_nums = memoryview(nums)

# Cast to bytes (unsigned char) to inspect raw representation
first_int_bytes = mv_nums.cast('B')[:4]
print('First int raw bytes:', list(first_int_bytes))

# Zero-copy pipeline demo: simulate parsing header + payload
packet = bytearray(b'HEADPAYLOAD1234')
view = memoryview(packet)
header_view = view[:4]
payload_view = view[4:]
print('Header (no copy):', header_view.tobytes())
print('Payload (no copy):', payload_view.tobytes())

# Force a copy
payload_copy = bytes(payload_view)
print('Copied payload equals original?', payload_copy == payload_view.tobytes())

Original buffer: bytearray(b'ABCDE')
Slice (as list of byte values): [66, 67, 68]
After modification via view: bytearray(b'aBCDE')
First int raw bytes: [1, 0, 0, 0]
Header (no copy): b'HEAD'
Payload (no copy): b'PAYLOAD1234'
Copied payload equals original? True


## Related / standard library container helpers:

- collections: deque, Counter, defaultdict, OrderedDict (mostly legacy), ChainMap, namedtuple
- types: SimpleNamespace
- array (array.array for numeric primitive arrays)
- heapq (functions implementing a heap on a list)
- queue: Queue, LifoQueue, PriorityQueue



### Collections: 
deque, Counter, defaultdict, OrderedDict (mostly legacy), ChainMap, namedtuple


#### namedtuple


What: Lightweight, immutable tuple subclass with named fields. Provides attribute access (`p.x`) while remaining memory‑efficient like a tuple.

Use cases:
- Small, immutable records (points, RGB colors, rows from CSV)
- Clearer code than raw tuples without full dataclass overhead
- Backwards‑compatible with tuple unpacking

Notes:
- Fields are read‑only; to "modify", use `_replace` to create a new instance
- Supports tuple operations (indexing, unpacking)


In [3]:
from collections import namedtuple

# Define a named tuple type
Person = namedtuple('Person', ['name', 'age', 'gender'])

# Create instances
alice = Person(name='Alice', age=30, gender='F')
bob = Person(name='Bob', age=25, gender='M')

# Access fields by name
print(alice.name, alice.age, alice.gender)

# Tuple unpacking
name, age, gender = bob
print('Unpacked:', name, age, gender)

# Create a modified copy
older_alice = alice._replace(age=31)
print('Older Alice:', older_alice)

Alice 30 F
Unpacked: Bob 25 M
Older Alice: Person(name='Alice', age=31, gender='F')


#### defaultdict


What: dict that provides a default value for missing keys by calling a factory function (e.g., `list`, `int`, `set`).

Use cases:
- Grouping items: map key -> list of values
- Counting or frequency maps with `int` as default
- Building sets of related items

Notes:
- Accessing a missing key creates it with `default_factory()` result
- Convert to regular dict with `dict(d)` if you need a plain mapping


In [4]:
# defaultdict examples
from collections import defaultdict

# Group by first letter
groups = defaultdict(list)
for word in ['ant', 'apple', 'bear', 'banana', 'cat']:
    groups[word[0]].append(word)
print('Grouped words:', dict(groups))

# Frequency counting
counts = defaultdict(int)
for ch in 'abracadabra':
    counts[ch] += 1
print('Counts:', dict(counts))

# Set of related items
followers = defaultdict(set)
followers['alice'].add('bob')
followers['alice'].add('carol')
print('Followers:', {k: sorted(v) for k, v in followers.items()})

# default with lambda   
followers = defaultdict(lambda: 1)
print ("Default value as 1: ",  followers['new_user'])  # Outputs 1

Grouped words: {'a': ['ant', 'apple'], 'b': ['bear', 'banana'], 'c': ['cat']}
Counts: {'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1}
Followers: {'alice': ['bob', 'carol']}
Default value as 1:  1


#### Counter


What: A dict subclass for counting ***hashable*** items. Provides convenient methods for frequency analysis.

Use cases:
- Character/word frequency, top‑K elements
- Multiset arithmetic (addition/subtraction between Counters)
- Quick histogramming

Notes:
- `most_common(n)` returns top frequencies
- Supports arithmetic and `elements()` iterator


In [5]:
# Counter examples
from collections import Counter

s = 'abracadabra'
ctr = Counter(s)
print('Counts:', ctr)
print('Top 2:', ctr.most_common(2))

# Update counts from iterable
ctr.update('banana')
print('After update:', ctr)

# Arithmetic with Counters
c1 = Counter('aab')
c2 = Counter('abc')
print('c1 + c2:', c1 + c2)        # sum of counts
print('c1 - c2:', c1 - c2)        # subtract, non‑negative only

# Reconstruct elements
print('Elements:', ''.join(sorted(ctr.elements())))

Counts: Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
Top 2: [('a', 5), ('b', 2)]
After update: Counter({'a': 8, 'b': 3, 'r': 2, 'n': 2, 'c': 1, 'd': 1})
c1 + c2: Counter({'a': 3, 'b': 2, 'c': 1})
c1 - c2: Counter({'a': 1})
Elements: aaaaaaaabbbcdnnrr


#### OrderedDict


What: A dict subclass that preserves insertion order. Since Python 3.7+, built‑in `dict` also preserves insertion order, so `OrderedDict` is now mostly for advanced order‑specific operations.

Use cases:
- Order‑aware methods like `move_to_end`, `popitem(last=False)`
- LRU‑like structures where moving keys to front/back matters

Notes:
- Prefer plain `dict` unless you specifically need the extra methods


In [6]:
# OrderedDict examples
from collections import OrderedDict

od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
print('Initial:', list(od.items()))

# Move a key to the end (or beginning with last=False)
od.move_to_end('a')
print('After move_to_end("a"):', list(od.items()))

# Pop item from the beginning (queue-like)
first = od.popitem(last=False)
print('Popped first:', first)
print('Remaining:', list(od.items()))

Initial: [('a', 1), ('b', 2), ('c', 3)]
After move_to_end("a"): [('b', 2), ('c', 3), ('a', 1)]
Popped first: ('b', 2)
Remaining: [('c', 3), ('a', 1)]


#### deque


What: A double‑ended queue with O(1) appends and pops from both ends.

Use cases:
- Implement queues, stacks, sliding windows
- BFS traversal, task scheduling
- Maintain fixed‑size windows with `maxlen`

Notes:
- Methods: `append`, `appendleft`, `pop`, `popleft`, `rotate`, optional `maxlen`


In [7]:
# deque examples
from collections import deque

# Queue behavior
q = deque()
q.append('task1')
q.append('task2')
print('Queue pop left:', q.popleft())
print('Queue now:', list(q))

# Stack behavior
stack = deque()
stack.append(1)
stack.append(2)
print('Stack pop:', stack.pop())

# Sliding window with fixed size
window = deque(maxlen=3)
for n in [1,2,3,4,5]:
    window.append(n)
    print('Window:', list(window))

# Rotate (useful for round‑robin)
ring = deque(['A','B','C'])
ring.rotate(1)
print('Rotated:', list(ring))

Queue pop left: task1
Queue now: ['task2']
Stack pop: 2
Window: [1]
Window: [1, 2]
Window: [1, 2, 3]
Window: [2, 3, 4]
Window: [3, 4, 5]
Rotated: ['C', 'A', 'B']


#### ChainMap


What: Groups multiple dicts to view them as a single mapping. Lookups search each mapping in order; updates go to the first mapping by default.

Use cases:
- Layered configurations (defaults < env < CLI)
- Scoped variable lookups (e.g., nested contexts)

Notes:
- Non‑destructive: underlying dicts remain separate
- Can add temporary layers with `new_child()`


In [8]:
# ChainMap examples
from collections import ChainMap

defaults = {'timeout': 30, 'retries': 2}
user_cfg = {'timeout': 10}
cli_args = {'verbose': True}

cm = ChainMap(cli_args, user_cfg, defaults)
print('timeout:', cm['timeout'])   # 10 from user_cfg (overrides defaults)
print('retries:', cm['retries'])   # 2 from defaults
print('verbose:', cm['verbose'])   # from cli_args

# Updates go to first mapping
cm['timeout'] = 5
print('user_cfg after update via ChainMap:', user_cfg)

# Temporary scope
scoped = cm.new_child({'retries': 9})
print('scoped retries:', scoped['retries'])  # 9
print('original retries still:', cm['retries'])

timeout: 10
retries: 2
verbose: True
user_cfg after update via ChainMap: {'timeout': 10}
scoped retries: 9
original retries still: 2


### SimpleNamespace


#### What and why
- What: `types.SimpleNamespace` is a tiny, mutable object whose attributes are stored in `__dict__`. It’s like a lightweight, attribute-accessible wrapper around a dict: access with `obj.x` instead of `obj["x"]`.
- Why/use cases:
  - Quick ad‑hoc records/structs in scripts and notebooks
  - Lightweight configuration trees (e.g., `cfg.db.uri`)
  - Converting dicts to objects for clearer, dot-style access
  - Prototyping when a full dataclass/class is overkill

Notes and gotchas:
- It’s mutable and allows creating new attributes at any time (typos won’t error; they create new attributes).
- It doesn’t enforce a schema or types; for stricter models prefer `dataclasses`, `pydantic`, or `attrs`.
- Convert to dict with `vars(obj)`; create from dict with `SimpleNamespace(**d)`.

In [9]:
# SimpleNamespace examples
from types import SimpleNamespace
import json

# Create and access attributes
cfg = SimpleNamespace(host="localhost", port=5432, debug=True)
print("cfg:", cfg)                 # SimpleNamespace(debug=True, host='localhost', port=5432)
print("host:", cfg.host)

# Mutate and add attributes dynamically
cfg.debug = False
setattr(cfg, "timeout", 30)
print("updated cfg:", cfg)

# From dict -> SimpleNamespace, and back
payload = {"user": "alice", "roles": ["admin", "editor"]}
user = SimpleNamespace(**payload)
print("user:", user)
print("as dict:", vars(user))      # {'user': 'alice', 'roles': ['admin', 'editor']}

# Merge/update many keys at once via vars()
extra = {"active": True}
merged = SimpleNamespace(**{**vars(user), **extra})
print("merged:", merged)

# Nested namespaces for lightweight config trees
app = SimpleNamespace(
    db=SimpleNamespace(uri="postgres://localhost/db", retries=3),
    api=SimpleNamespace(base_url="https://api.example.com", token=None),
)
print("nested access:", app.db.uri)

# Bulk update nested fields
vars(app.db).update({"retries": 5, "pool_size": 10})
print("updated db:", app.db)

# JSON tip: convert to dict first
print("cfg JSON:", json.dumps(vars(cfg)))

# Gotcha: typos create new attributes silently
cfg.debg = True  # typo! creates a new attribute
print("has typo attr?", hasattr(cfg, "debg"), "debug:", cfg.debug)

cfg: namespace(host='localhost', port=5432, debug=True)
host: localhost
updated cfg: namespace(host='localhost', port=5432, debug=False, timeout=30)
user: namespace(user='alice', roles=['admin', 'editor'])
as dict: {'user': 'alice', 'roles': ['admin', 'editor']}
merged: namespace(user='alice', roles=['admin', 'editor'], active=True)
nested access: postgres://localhost/db
updated db: namespace(uri='postgres://localhost/db', retries=5, pool_size=10)
cfg JSON: {"host": "localhost", "port": 5432, "debug": false, "timeout": 30}
has typo attr? True debug: False


### heapq (functions implementing a heap on a list)


#### What and why
- What: `heapq` implements a min-heap on top of a plain Python list. The smallest item is always at index 0, and pushes/pops are O(log n).
- Why/use cases:
  - Priority queue (always process the lowest-cost/earliest-deadline item next)
  - Top-K selection with `nlargest`/`nsmallest` without sorting the whole dataset
  - Streaming scenarios where you maintain a rolling top/bottom set efficiently
  - Graph/search algorithms (Dijkstra/A*, event simulation)

Notes and tips:
- It’s a min-heap; for a max-heap, push negated priorities (e.g., `(-priority, item)`).
- Prefer `heappushpop(h, x)` to push then pop for better constant factors; `heapreplace(h, x)` pops then pushes (heap must be non-empty).
- Use `heapify(list_)` to transform a list into a heap in O(n).
- `nlargest(n, iterable, key=...)` and `nsmallest(...)` are often the simplest way to get top/bottom K.

In [10]:
# heapq examples
import heapq
from itertools import count

# 1) Basic push/pop (min-heap)
h = []
for x in [5, 1, 3, 7, 2]:
    heapq.heappush(h, x)
print('heap:', h)                 # internal structure (heap-ordered, not sorted)
print('min pop:', heapq.heappop(h))  # -> 1
print('after pop:', h)

# 2) Heapify existing list (O(n))
data = [9, 4, 6, 2, 8]
heapq.heapify(data)
print('heapified:', data)
heapq.heappush(data, 1)
print('after push 1:', data)
print('pop min:', heapq.heappop(data))

# 3) Top-K with nlargest/nsmallest
nums = [10, 1, 7, 3, 15, 6, 8]
print('top-3 largest:', heapq.nlargest(3, nums))
print('top-3 smallest:', heapq.nsmallest(3, nums))

# 4) Max-heap pattern via negation
maxh = []
for score in [10, 30, 20]:
    heapq.heappush(maxh, (-score, score))  # (negated key, original)
print('max pop:', heapq.heappop(maxh)[1])  # -> 30

# 5) Priority queue with (priority, tie-breaker, item)
pq = []
tie = count()  # strictly increasing counter to avoid comparing items on ties
heapq.heappush(pq, (2, next(tie), 'task-low'))
heapq.heappush(pq, (0, next(tie), 'task-high'))
heapq.heappush(pq, (1, next(tie), 'task-mid'))
while pq:
    prio, _, task = heapq.heappop(pq)
    print('run:', prio, task)

# 6) heappushpop / heapreplace contrast
h2 = [3, 5, 8]
heapq.heapify(h2)
print('h2 start:', h2)
print('heappushpop(h2, 4):', heapq.heappushpop(h2, 4))   # pushes 4, then pops min (<= 4)
print('after:', h2)
print('heapreplace(h2, 10):', heapq.heapreplace(h2, 10)) # pops min, then pushes 10
print('after:', h2)

heap: [1, 2, 3, 7, 5]
min pop: 1
after pop: [2, 5, 3, 7]
heapified: [2, 4, 6, 9, 8]
after push 1: [1, 4, 2, 9, 8, 6]
pop min: 1
top-3 largest: [15, 10, 8]
top-3 smallest: [1, 3, 6]
max pop: 30
run: 0 task-high
run: 1 task-mid
run: 2 task-low
h2 start: [3, 5, 8]
heappushpop(h2, 4): 3
after: [4, 5, 8]
heapreplace(h2, 10): 4
after: [5, 10, 8]


In [1]:
import heapq

# heapq with complex objects (e.g., custom class with priority)

class Task:
    def __init__(self, priority, name):
        self.priority = priority
        self.name = name
    def __lt__(self, other):
        return self.priority < other.priority  # heapq uses < for ordering
    def __repr__(self):
        return f"Task(priority={self.priority}, name='{self.name}')"

tasks = [
    Task(3, 'write docs'),
    Task(1, 'fix bug'),
    Task(2, 'add feature'),
]

heapq.heapify(tasks)
print('Heapified tasks:', tasks)

# Pop tasks by priority
while tasks:
    print('Next task:', heapq.heappop(tasks))

Heapified tasks: [Task(priority=1, name='fix bug'), Task(priority=3, name='write docs'), Task(priority=2, name='add feature')]
Next task: Task(priority=1, name='fix bug')
Next task: Task(priority=2, name='add feature')
Next task: Task(priority=3, name='write docs')


In [2]:
import heapq

# Example: heapq with custom key using a wrapper and lambda

class Task:
    def __init__(self, priority, name):
        self.priority = priority
        self.name = name
    def __repr__(self):
        return f"Task(priority={self.priority}, name='{self.name}')"

# Suppose we want to heapify by name instead of priority
tasks = [
    Task(3, 'write docs'),
    Task(1, 'fix bug'),
    Task(2, 'add feature'),
]

# Use a wrapper: (key, item)
heap = [(t.name, t) for t in tasks]
heapq.heapify(heap)

# Pop by name (alphabetical order)
while heap:
    name, task = heapq.heappop(heap)
    print('Next by name:', task)

Next by name: Task(priority=2, name='add feature')
Next by name: Task(priority=1, name='fix bug')
Next by name: Task(priority=3, name='write docs')


### queue: Queue, LifoQueue, PriorityQueue

#### What and why
- What: The `queue` module provides thread-safe FIFO (`Queue`), LIFO (`LifoQueue`), and `PriorityQueue` classes with built-in locking, optional maxsize/backpressure, and task tracking.
- Why/use cases:
  - Producer/consumer pipelines across threads with automatic blocking when empty/full
  - Work scheduling with backpressure via `maxsize` to avoid unbounded memory
  - Prioritized task execution using `(priority, item)` tuples in `PriorityQueue`
  - Thread-safe stack semantics with `LifoQueue`

Notes and tips:
- `put()` blocks when full; `get()` blocks when empty. Use `timeout=` or `block=False` for non-blocking behavior.
- Call `task_done()` after finishing an item so `q.join()` can know when all work is complete.
- For single-threaded usage, prefer simpler structures (`deque`, `heapq`). `queue` is about thread-safety.

In [11]:
# queue module examples: FIFO, LIFO, PriorityQueue
import threading
import time
from queue import Queue, LifoQueue, PriorityQueue

# 1) FIFO Queue with producer/consumer and task tracking
q = Queue(maxsize=3)  # small size to demonstrate backpressure
results = []

def producer():
    for i in range(5):
        q.put(i)            # blocks if q is full
        print('produced', i)
    q.put(None)             # sentinel to stop consumer

def consumer():
    while True:
        item = q.get()       # blocks if q is empty
        if item is None:
            q.task_done()
            break
        # simulate work
        time.sleep(0.05)
        results.append(item * 2)
        print('consumed', item)
        q.task_done()        # mark one task as done

t_prod = threading.Thread(target=producer)
t_cons = threading.Thread(target=consumer)
t_prod.start(); t_cons.start()
q.join()   # wait for all tasks to be marked done
t_prod.join(); t_cons.join()
print('results:', results)

# 2) LifoQueue behaves like a stack
stack = LifoQueue()
for x in [1, 2, 3]:
    stack.put(x)
print('LIFO pop:', stack.get(), stack.get(), stack.get())

# 3) PriorityQueue with (priority, tie-breaker, item)
pq = PriorityQueue()
counter = 0
for prio, task in [(2, 'low'), (0, 'high'), (1, 'mid')]:
    pq.put((prio, counter, task)); counter += 1
while not pq.empty():
    prio, _, task = pq.get()
    print('run:', prio, task)
    pq.task_done()

produced 0
produced 1
produced 2
produced 3
consumed 0
produced 4
consumed 1
consumed 2
consumed 3
consumed 4
results: [0, 2, 4, 6, 8]
LIFO pop: 3 2 1
run: 0 high
run: 1 mid
run: 2 low


## Mutability quick note:

- Mutable: list, dict, set, bytearray, deque, array.array, memoryview (if underlying is writable)
- Immutable: tuple, frozenset, str, bytes, range