# Python data structures (Real Python)

https://realpython.com/python-data-structures/#sets-and-multisets

## Dictionaries

Python’s dictionaries are indexed by keys that can be of any hashable type. A hashable object has a hash value that never changes during its lifetime (see __hash__), and it can be compared to other objects (see __eq__). Hashable objects that compare as equal must have the same hash value.

There’s little reason not to use the standard dict implementation included with Python. However, specialized third-party dictionary implementations exist, such as skip lists or B-tree–based dictionaries.

Besides plain dict objects, Python’s standard library also includes a number of specialized dictionary implementations. These specialized dictionaries are all based on the built-in dictionary class (and share its performance characteristics) but also include some additional convenience features.

### collections.OrderedDict: Remember the Insertion Order of Keys

In [2]:
import collections
dic = collections.OrderedDict(one=1,two=2,three=3)
dic

OrderedDict([('one', 1), ('two', 2), ('three', 3)])

Until Python 3.8, you couldn’t iterate over dictionary items in reverse order using reversed(). Only OrderedDict instances offered that functionality. Even in Python 3.8, dict and OrderedDict objects aren’t exactly the same. OrderedDict instances have a .move_to_end() method that is unavailable on plain dict instance, as well as a more customizable .popitem() method than the one plain dict instances.

### collections.defaultdict: Return Default Values for Missing Keys

In [3]:
import collections
dd = collections.defaultdict(list)

# caso não tenha a chave no dicionario ele cria e inicializa conforme o objeto passado
dd["dogs"].append("Rufus")
dd["dogs"].append("Kathrin")
dd["dogs"].append("Mr Sniffles")

dd["dogs"]

['Rufus', 'Kathrin', 'Mr Sniffles']

### collections.ChainMap: Search Multiple Dictionaries as a Single Mapping

The collections.ChainMap data structure groups multiple dictionaries into a single mapping. Lookups search the underlying mappings one by one until a key is found. Insertions, updates, and deletions only affect the first mapping added to the chain:

In [7]:
import collections
dict1 = {'one':1, 'two':2}
dict2 = {'three':3, 'four':4}
chain = collections.ChainMap(dict1, dict2)

print(chain)

print(chain['four'])

ChainMap({'one': 1, 'two': 2}, {'three': 3, 'four': 4})
4


### types.MappingProxyType: A Wrapper for Making Read-Only Dictionaries

MappingProxyType is a wrapper around a standard dictionary that provides a read-only view into the wrapped dictionary’s data. This class was added in Python 3.3 and can be used to create immutable proxy versions of dictionaries.

In [8]:
import types
writable = {'one':1, 'two':2}
read_only = types.MappingProxyType(writable)

print(read_only)

read_only['one'] = 11

{'one': 1, 'two': 2}


TypeError: 'mappingproxy' object does not support item assignment

## Array Data Structures

### list: Mutable Dynamic Arrays

### tuple: Immutable Containers

### array.array: Basic Typed Arrays

Python’s array module provides space-efficient storage of basic C-style data types like bytes, 32-bit integers, floating-point numbers, and so on.

Arrays created with the array.array class are mutable and behave similarly to lists except for one important difference: they’re typed arrays constrained to a single data type.

In [9]:
import array
arr = array.array("f", (1.0, 2.3, 4.2, 6.7))

print(arr)

arr.append("Hello")

array('f', [1.0, 2.299999952316284, 4.199999809265137, 6.699999809265137])


TypeError: must be real number, not str

### str: Immutable Arrays of Unicode Characters

### bytes: Immutable Arrays of Single Bytes

In [13]:
arr = bytes((1,2,3,4,5))
print(arr)

print(bytes((0,290)))

b'\x01\x02\x03\x04\x05'


ValueError: bytes must be in range(0, 256)

### bytearray: Mutable Arrays of Single Bytes

The bytearray type is a mutable sequence of integers in the range 0 ≤ x ≤ 255. The bytearray object is closely related to the bytes object, with the main difference being that a bytearray can be modified freely—you can overwrite elements, remove existing elements, or add new ones. The bytearray object will grow and shrink accordingly.

## Records, Structs, and Data Transfer Objects

In [15]:
import dis
print(dis.dis(compile("(23, 'a', 'b', 'c')", "", "eval")))
print(dis.dis(compile("[23, 'a', 'b', 'c']", "", "eval")))

  1           0 LOAD_CONST               0 ((23, 'a', 'b', 'c'))
              2 RETURN_VALUE
None
  1           0 BUILD_LIST               0
              2 LOAD_CONST               0 ((23, 'a', 'b', 'c'))
              4 LIST_EXTEND              1
              6 RETURN_VALUE
None


### dataclasses.dataclass: Python 3.7+ Data Classes

In [17]:
from dataclasses import dataclass

@dataclass
class Car:
    color:str
    mileage:float
    automatic:bool
        
car1 = Car("red", 3812.4, True)
print(car1)

Car(color='red', mileage=3812.4, automatic=True)


### collections.namedtuple: Convenient Data Objects

In [2]:
from collections import namedtuple
from sys import getsizeof

p1 = namedtuple("Point", "x y z")(1, 2, 3)
p2 = (1, 2, 3)

print(getsizeof(p1))
print(getsizeof(p2))
print(p1)

64
64
Point(x=1, y=2, z=3)


### typing.NamedTuple: Improved Namedtuples

It’s very similar to namedtuple, with the main difference being an updated syntax for defining new record types and added support for type hints.

In [5]:
from typing import NamedTuple

class Car(NamedTuple):
    color: str
    mileage: float
    automatic: bool

car1 = Car("red", 3812.4, True)
print(car1)
print(car1.mileage)

Car(color='red', mileage=3812.4, automatic=True)
3812.4


### struct.Struct: Serialized C Structs

Serialized structs are seldom used to represent data objects meant to be handled purely inside Python code. They’re intended primarily as a data exchange format rather than as a way of holding data in memory that’s only used by Python code.

In [9]:
from struct import Struct
from sys import getsizeof

MinhaStruct = Struct("i?f")
data = MinhaStruct.pack(23,False,43.9)

print(data)
print(getsizeof(data))

print(MinhaStruct.unpack(data))
print(getsizeof(MinhaStruct.unpack(data)))

b'\x17\x00\x00\x00\x00\x00\x00\x00\x9a\x99/B'
45
(23, False, 43.900001525878906)
64


### types.SimpleNamespace: Fancy Attribute Access

In [12]:
from types import SimpleNamespace
car1 = SimpleNamespace(color="red", mileage=3812.4, automatic=True)

print(car1.color)
car1.windshield = "broken"

red


### Records, Structs, and Data Objects in Python: Summary

As you’ve seen, there’s quite a number of different options for implementing records or data objects. Which type should you use for data objects in Python? Generally your decision will depend on your use case:

- If you have only a few fields, then using a plain tuple object may be okay if the field order is easy to remember or field names are superfluous. For example, think of an (x, y, z) point in three-dimensional space.

- If you need immutable fields, then plain tuples, collections.namedtuple, and typing.NamedTuple are all good options.

- If you need to lock down field names to avoid typos, then collections.namedtuple and typing.NamedTuple are your friends.

- If you want to keep things simple, then a plain dictionary object might be a good choice due to the convenient syntax that closely resembles JSON.

- If you need full control over your data structure, then it’s time to write a custom class with @property setters and getters.

- If you need to add behavior (methods) to the object, then you should write a custom class, either from scratch, or using the dataclass decorator, or by extending collections.namedtuple or typing.NamedTuple.

- If you need to pack data tightly to serialize it to disk or to send it over the network, then it’s time to read up on struct.Struct because this is a great use case for it!

If you’re looking for a safe default choice, then my general recommendation for implementing a plain record, struct, or data object in Python would be to use collections.namedtuple in Python 2.x and its younger sibling, typing.NamedTuple in Python 3.


## Sets and Multisets

A set is an unordered collection of objects that doesn’t allow duplicate elements. Typically, sets are used to quickly test a value for membership in the set, to insert or delete new values from a set, and to compute the union or intersection of two sets.

In a proper set implementation, membership tests are expected to run in fast O(1) time. Union, intersection, difference, and subset operations should take O(n) time on average. The set implementations included in Python’s standard library follow these performance characteristics.

### frozenset: Immutable Sets

### collections.Counter: Multisets

The collections.Counter class in the Python standard library implements a multiset, or bag, type that allows elements in the set to have more than one occurrence.

In [14]:
from collections import Counter
inventory = Counter()

loot = {"sword": 1, "bread": 3}
inventory.update(loot)
print(inventory)

more_loot = {"sword": 1, "apple": 1}
inventory.update(more_loot)
print(inventory)

print(len(inventory))
print(sum(inventory.values()))

Counter({'bread': 3, 'sword': 1})
Counter({'bread': 3, 'sword': 2, 'apple': 1})
3
6


## Stacks (LIFOs)

### list: Simple, Built-In Stacks
Python’s lists are implemented as dynamic arrays internally, which means they occasionally need to resize the storage space for elements stored in them when elements are added or removed. The list over-allocates its backing storage so that not every push or pop requires resizing. As a result, you get an amortized O(1) time complexity for these operations.

The downside is that this makes their performance less consistent than the stable O(1) inserts and deletes provided by a linked list–based implementation (as you’ll see below with collections.deque). On the other hand, lists do provide fast O(1) time random access to elements on the stack, and this can be an added benefit.

### collections.deque: Fast and Robust Stacks

The deque class implements a double-ended queue that supports adding and removing elements from either end in O(1) time (non-amortized). Because deques support adding and removing elements from either end equally well, they can serve both as queues and as stacks.

Python’s deque objects are implemented as doubly-linked lists, which gives them excellent and consistent performance for inserting and deleting elements but poor O(n) performance for randomly accessing elements in the middle of a stack.

In [26]:
from collections import deque
from random import shuffle

d = deque()
d.append('teste1')
d.append('teste2')
d.append('teste3')
d.append('teste4')

print(d)
print(shuffle(d))
print(d)

deque(['teste1', 'teste2', 'teste3', 'teste4'])
None
deque(['teste1', 'teste3', 'teste2', 'teste4'])


### queue.LifoQueue: Locking Semantics for Parallel Computing



The LifoQueue stack implementation in the Python standard library is synchronized and provides locking semantics to support multiple concurrent producers and consumers.

Besides LifoQueue, the queue module contains several other classes that implement multi-producer, multi-consumer queues that are useful for parallel computing.

## Queues (FIFOs)

Queues have a wide range of applications in algorithms and often help solve scheduling and parallel programming problems. A short and beautiful algorithm using a queue is breadth-first search (BFS) on a tree or graph data structure.

Scheduling algorithms often use priority queues internally. These are specialized queues. Instead of retrieving the next element by insertion time, a priority queue retrieves the highest-priority element. The priority of individual elements is decided by the queue based on the ordering applied to their keys. 

### list: Terribly Sloooow Queues

### collections.deque: Fast and Robust Queues

### queue.Queue: Locking Semantics for Parallel Computing

### multiprocessing.Queue: Shared Job Queues

## Priority Queues

### list: Manually Sorted Queues

### heapq: List-Based Binary Heaps

### queue.PriorityQueue: Beautiful Priority Queues