<center><h1> Python Data Structures </h1></center
                                             
Largely influenced by this post [here](https://realpython.com/python-data-structures/#collectionschainmap-search-multiple-dictionaries-as-a-single-mapping)

## defaultdict

In [13]:
from collections import defaultdict

d = defaultdict(list)
d[5] = 'hi'
d

defaultdict(list, {5: 'hi'})

it sets the default value for missing keys

In [14]:
print(d[6] == [])
d1 = defaultdict(lambda: 'missing')
print(d1[5] == 'missing')

True
True


You can do the same with regular `dict` objest with `setdefault` method

In [16]:
d = {}
d.setdefault(5, []).append(4)
d

{5: [4]}

## ChainMap

The `collections.ChainMap` data structure groups multiple dictionaries into a single mapping. Lookups search the underlying mappings one by one until a key is found. Insertions, updates, and deletions only affect the first mapping added to the chain.

When managing application settings, you might have default settings, user-specific settings, and environment-specific settings. `ChainMap` allows you to overlay these settings hierarchically.

In [103]:
from collections import ChainMap

default_config = {'theme': 'light', 'language': 'English', 'timeout': 120}
user_config = {'language': 'French', 'timeout': 300}
env_config = {'theme': 'dark'}  # Imagine this comes from environment variables

config = ChainMap(env_config, user_config, default_config)

print(config['theme'])    
print(config['language']) 
print(config['timeout'])  

dark
French
300


In [62]:
# class methods
[i for i in dir(c) if '_' not in i]

['clear',
 'copy',
 'fromkeys',
 'get',
 'items',
 'keys',
 'maps',
 'parents',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

`maps` method maps the chained dictionaries as a python list in chained order

In [104]:
config.maps

[{'theme': 'dark'},
 {'language': 'French', 'timeout': 300},
 {'theme': 'light', 'language': 'English', 'timeout': 120}]

`parents` methods returns the chainmap of the original chain with first dictionary removed

In [105]:
config.parents

ChainMap({'language': 'French', 'timeout': 300}, {'theme': 'light', 'language': 'English', 'timeout': 120})

You can only modify the first chained dictionary

In [106]:
config['language'] = 'spanish'
config.maps

[{'theme': 'dark', 'language': 'spanish'},
 {'language': 'French', 'timeout': 300},
 {'theme': 'light', 'language': 'English', 'timeout': 120}]

In [109]:
try:
    del config['timeout']
except KeyError as e:
    print('Error ', e)

Error  "Key not found in the first mapping: 'timeout'"


Reversing Order

In [112]:
reversed_chain = ChainMap(*reversed(config.maps))
reversed_chain.maps

[{'theme': 'light', 'language': 'English', 'timeout': 120},
 {'language': 'French', 'timeout': 300},
 {'theme': 'dark', 'language': 'spanish'}]

`fromkeys` class method

In [70]:
ChainMap.fromkeys(["one", "two","three"])

ChainMap({'one': None, 'two': None, 'three': None})

In [71]:
ChainMap.fromkeys(["one", "two","three"], 0)

ChainMap({'one': 0, 'two': 0, 'three': 0})

When you access a duplicate key, such as "dogs" and "cats", the chain map only returns the first occurrence of that key. Internally, lookup operations search the input mappings in the same order they appear in the internal list of mappings, which is also the exact order you pass them into the class’s initializer.

This general behavior also applies to iteration:

In [72]:
for_adoption = {"dogs": 10, "cats": 7, "pythons": 3}
vet_treatment = {"dogs": 4, "cats": 3, "turtles": 1}
pets = ChainMap(for_adoption, vet_treatment)

for key, value in pets.items():
    print(key, "->", value)

dogs -> 10
cats -> 7
turtles -> 1
pythons -> 3


You can use other type of mappings

In [110]:
from collections import OrderedDict
numbers = OrderedDict(one=1, two=2)
letters = defaultdict(str, {"a": "A", "b": "B"})
c = ChainMap(numbers, letters)
print(c['a'])
c

A


ChainMap(OrderedDict({'one': 1, 'two': 2}), defaultdict(<class 'str'>, {'a': 'A', 'b': 'B'}))

ChainMap also implements `.new_child()`. This method optionally takes a mapping as an argument and returns a new ChainMap instance containing the input mapping followed by all of the current mappings in the underlying chain map:

In [91]:
c = ChainMap()
e = c.new_child()
e[4] = 5
d = c.new_child()
d[5] = 6
c

ChainMap({})

In [92]:
e

ChainMap({4: 5}, {})

## MappingProxyType

The `types.MappingProxyType` is a wrapper class in Python's `types` module that provides a read-only view of a mapping, typically a dictionary. This means you can access the data within the dictionary but cannot modify it through the proxy. It is useful for creating immutable mappings, enforcing encapsulation, and preventing unintended side effects in your programs.

In [118]:
from types import MappingProxyType

In [121]:
from types import MappingProxyType

config = {'host': 'localhost', 'port': 8080}
config_proxy = MappingProxyType(config)

print(config_proxy['host'])  # Output: 'localhost'

try:
    config_proxy['host'] = '127.0.0.1'  # Raises TypeError
except Exception as e:
    print('Error: ', e)

localhost
Error:  'mappingproxy' object does not support item assignment


**Encapsulation**

By providing a read-only view, you can protect the internal state of objects, ensuring that only controlled modifications occur.

In [123]:
class Settings:
    def __init__(self):
        self._options = {'debug': False, 'verbose': True}
        self.options = MappingProxyType(self._options)

settings = Settings()
print(settings.options['debug'])  # Output: False

try:
    settings.options['debug'] = True  # Raises TypeError
except Exception as e:
    print('Error ', e)

False
Error  'mappingproxy' object does not support item assignment


## array.array
The array.array class creates an array that holds elements of a specified numeric type. Unlike lists, which can hold elements of different types, arrays enforce type consistency and store elements in a compact form, leading to significant memory savings and faster computations.

check [here](https://docs.python.org/3/library/array.html) for codes

In [149]:
import array

In [147]:
array.array('u', ['s', 'b', 's', 'a'])

array('u', 'sbsa')

In [148]:
array.array('f', [12.0, 13.5])

array('f', [12.0, 13.5])

## dataclass


In [150]:
from dataclasses import dataclass

In [181]:
@dataclass
class DataStructures:
    num_structes:int

@dataclass
class Algorithms: 
    num_algorithms = 5

In [182]:
ds = DataStructures(3)
print(ds)

DataStructures(num_structes=3)


In [183]:
ds.num_structes

3

In [185]:
alg = Algorithms()
print(alg)

Algorithms()


## frozenset
Immutable Sets

In [187]:
vowels = frozenset({"a", "e", "i", "o", "u"})
vowels

frozenset({'a', 'e', 'i', 'o', 'u'})

In [188]:
try:
    vowels.add("p")
except Exception as e:
    print('Error ', e)

Error  'frozenset' object has no attribute 'add'


## collections.Counter: Multiset

The `collections.Counter` class in the Python standard library implements a multiset, or bag, type that allows elements in the set to have more than one occurrence.

In [189]:
from collections import Counter

In [190]:
inventory = Counter()
loot = {"sword": 1, "bread": 3}
inventory.update(loot)
inventory

Counter({'bread': 3, 'sword': 1})

In [191]:
more_loot = {"sword": 1, "apple": 1}
inventory.update(more_loot)
inventory

Counter({'bread': 3, 'sword': 2, 'apple': 1})

In [192]:
len(inventory)

3

In [193]:
sum(inventory.values())

6

## collections.deque

Deques are a generalization of stacks and queues (the name is pronounced “deck” and is short for “double-ended queue”). Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same `O(1)` performance in either direction.

Though `list` objects support similar operations, they are optimized for fast fixed-length operations and incur `O(n)` memory movement costs for `pop(0)` and `insert(0, v)` operations which change both the size and position of the underlying data representation.

In [194]:
from collections import deque

In [210]:
d = deque([1,2,3,4,5])
d

deque([1, 2, 3, 4, 5])

In [211]:
d.append(6)
d.appendleft(0)
d

deque([0, 1, 2, 3, 4, 5, 6])

In [212]:
list(d)

[0, 1, 2, 3, 4, 5, 6]

In [213]:
d.rotate(1)
d

deque([6, 0, 1, 2, 3, 4, 5])

In [214]:
d.rotate(-1)
d

deque([0, 1, 2, 3, 4, 5, 6])

In [215]:
deque(reversed(d))

deque([6, 5, 4, 3, 2, 1, 0])

In [220]:
d.index(3)

3

In [224]:
d.pop()

6

In [225]:
d.popleft()

0

In [226]:
d

deque([1, 2, 3, 4, 5])

## queue

The [queue](https://docs.python.org/3/library/queue.html#module-queue) module implements multi-producer, multi-consumer queues. It is especially useful in threaded programming when information must be exchanged safely between multiple threads. The Queue class in this module implements all the required locking semantics.

In [249]:
from queue import Queue

In [253]:
q = Queue(maxsize=3)
q.put('a')
q.put('b')
q.put('c')

In [254]:
print('Empty: ', q.empty())
print('Size: ', q.qsize())
print('Full: ', q.full())

Empty:  False
Size:  3
Full:  True


In [271]:
q = Queue()

# creating a queue with 20 elements
for i in range(20):
    q.put(i)

print('Size before: ', q.qsize())
# q.join blocks the code until all queue is processed
# the count of unfinished tasks goes up whenever an item
# is added to queue. 

# the count goes down whenever a consumer thread calls task_done()
# to indicate that the item was retrieved and all work on it is complete
for _ in range(q.qsize()):
    q.get()
    q.task_done()
q.join()
print('Size after: ', q.qsize())

Size before:  20
Size after:  0


## multiprocessing.Queue

The [`multiprocessing.Queue`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue) class is a shared job queue that allows multiple producer and consumer processes to communicate. It is a process-safe queue implemented using pipes and locks, ensuring that data is transferred safely between processes without corruption or data loss.

## Heap
The heap is a specialized tree-based data structure that satisfies the heap property:

In a min-heap, for any given node N, the value of N is less than or equal to the values of its children. In a max-heap, for any given node N, the value of N is greater than or equal to the values of its children. Heaps are commonly used to implement priority queues, where the element with the highest (or lowest) priority is always at the front.



The [`heapq`](https://docs.python.org/3/library/heapq.html) module in Python provides an implementation of the min-heap queue algorithm. It offers a set of functions that let you maintain a list in the heap order, where the smallest element is always at the index position 0.

Key Characteristics:
- **Efficient Operations**: Insertion and extraction of the smallest element are O(logn) operations.
- **In-place Transformation**: Functions like heapify() transform a list into a heap in-place.
- **Not a Complete Priority Queue**: While heapq provides the core functionality, it doesn't include all methods you'd expect from a full-featured priority queue (like item removal, priority updates).


When tasks have different priorities, and you need to process them in order of priority.

In [273]:
import heapq

tasks = [
    (2, 'clean the house'),
    (1, 'write code'),
    (3, 'go shopping'),
]

heapq.heapify(tasks)

while tasks:
    priority, task = heapq.heappop(tasks)
    print(f'Priority {priority}: {task}')

Priority 1: write code
Priority 2: clean the house
Priority 3: go shopping


When you need to find the smallest or largest elements in a large dataset.

In [274]:
numbers = [15, 3, 9, 8, 5, 2, 7, 12, 6]

smallest = heapq.nsmallest(3, numbers)
largest = heapq.nlargest(3, numbers)

print(f'Smallest: {smallest}')
print(f'Largest: {largest}')

Smallest: [2, 3, 5]
Largest: [15, 12, 9]


Efficiently merge multiple sorted lists into a single sorted list

In [275]:
list1 = [1, 3, 5]
list2 = [2, 4, 6]
list3 = [0, 7, 8]

merged = heapq.merge(list1, list2, list3)
print(list(merged))


[0, 1, 2, 3, 4, 5, 6, 7, 8]


In [285]:
numbers = [15, 3, 9, 8, 5, 2, 7, 12, 6]
heapq.heapify(numbers)
numbers

[2, 3, 7, 6, 5, 9, 15, 12, 8]

In [284]:
while numbers:
    num = heapq.heappop(numbers)
    print(f'Smallest {num}')

Smallest 2
Smallest 3
Smallest 5
Smallest 6
Smallest 7
Smallest 8
Smallest 9
Smallest 12
Smallest 15


## queue.PriorityQueue

[`queue.PriorityQueue`](https://docs.python.org/3/library/queue.html#queue.PriorityQueue) uses heapq internally and shares the same time and space complexities. The difference is that PriorityQueue is synchronized and provides locking semantics to support multiple concurrent producers and consumers.

Depending on your use case, this might be helpful, or it might just slow your program down slightly. In any case, you might prefer the class-based interface provided by PriorityQueue over the function-based interface provided by heapq:

In [286]:
from queue import PriorityQueue

In [295]:
q = PriorityQueue()
q.put((2, "code"))
q.put((1, "eat"))
q.put((3, "sleep"))

while not q.empty():
    next_item = q.get()
    print(next_item)

(1, 'eat')
(2, 'code')
(3, 'sleep')
