# Advanced Python - Building Scalable Applications

### Module 5

#### Sharing and Exchanging data between processes
 - Streaming data using ```Pipe``` and ```Queue```
 - Sharing counters and buffers using ```Value``` and ```Array```
 - Sharing python lists and dictionaries using ```Manager```
 - Creating and managing shared memory using ```multiprocessing.shared_memory``` features

#### Profiling and Debugging Techniques in Python
 - Using `sys.getsizeof()`, `sys.getrefcount()`, `system.getswitchinterval()`
 - Using `cProfile` and `timeit` modules
 - Using `line_profiler` and `Memray`
 - Using `inspect` and `pdb`
 - Using the `logging` module

In [None]:
from queue import Queue

q = Queue(10)


In [1]:
from multiprocessing import Queue

Queue?

[0;31mSignature:[0m [0mQueue[0m[0;34m([0m[0mmaxsize[0m[0;34m=[0m[0;36m0[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Returns a queue object
[0;31mFile:[0m      /opt/anaconda3/lib/python3.12/multiprocessing/context.py
[0;31mType:[0m      method

In [8]:
from multiprocessing import Pipe

r, w = Pipe()
print(r, w)

w.send("Hello, world")
r.recv()

<multiprocessing.connection.Connection object at 0x109a50050> <multiprocessing.connection.Connection object at 0x109a52b40>


'Hello, world'

In [6]:
# The pseudo-implementation of Pipe() construct:

class Connection:
    def __init__(self, queue):
        self.queue = queue
    
    def send(self, data):
        self.queue.append(data)

    def recv(self):
        return self.queue.popleft()

class Pipe:
    def __new__(cls, *args, **kwargs):
        from collections import deque
        queue = deque()
        c1 = Connection(queue)
        c2 = Connection(queue)
        return c1, c2

r, w = Pipe()

w.send("Hello world")

print(r.recv())


Hello world


#### Queue vs Pipe in multiprocessing module

- ```Queue``` acts a multiprocessing equivalent of ```queue.Queue``` for Processes.
- Use ```Queue``` for creating capacity-limiting streaming between processes (producer/consumer patterns using processes)

- ```Pipe``` is a Python's abstraction to underlying OS anonymous pipes / FIFOs
- Use ```Pipe``` for streaming large volumes of data from one process to another where the synchronization / flow-control is fully managed by the OS. E.g: streaming data amongst processes.

#### ```Value``` class in multiprocessing module

- ```Value``` can be used to share a single number (int, float) amongst processes.

In [21]:
from multiprocessing import Value

v = Value("b", 129)
v.value

-127

In [15]:
n = 3468237462378642387462378462387462384762347823648723647823647823647823647823462378423784
print(n)

3468237462378642387462378462387462384762347823648723647823647823647823647823462378423784


The ```array.array``` is a sequence of *homogenous* data in Python that implements a buffer protocol (that is, their data is laid out in contiguous memory locations)


In [28]:
from array import array

a = array('i', [111, 22, 33, 44, 55, 66])
print(a, type(a))

print(a[0], a[-1])

a[1] = 45
print(a)
a.append(100)
print(a)
print(a.pop())

a[1] = 4.5

array('i', [111, 22, 33, 44, 55, 66]) <class 'array.array'>
111 66
array('i', [111, 45, 33, 44, 55, 66])
array('i', [111, 45, 33, 44, 55, 66, 100])
100


TypeError: 'float' object cannot be interpreted as an integer

The ```Array``` class in multiprocessing module is a shared-memory equivalent of ```array.array```


In [29]:
from multiprocessing import Manager

m = Manager()
m

<multiprocessing.managers.SyncManager at 0x10d881be0>

In [35]:
d = m.dict()
d["name"] = "John Doe"
d["score"] = 6.7
d["games"] = ["game-1", "game-2"]

dict(d)

print(d.keys(), d.values())

for k, v in d.items():
    print(k, v)

['name', 'score', 'games'] ['John Doe', 6.7, ['game-1', 'game-2']]
name John Doe
score 6.7
games ['game-1', 'game-2']


In [38]:
a = {"x": 10, "y": 60, "c": 254}
print(a, type(a))

b = m.dict(a)
print(b, type(b))

{'x': 10, 'y': 60, 'c': 254} <class 'dict'>
{'x': 10, 'y': 60, 'c': 254} <class 'multiprocessing.managers.DictProxy'>


In [41]:
values = [44, 55, [66, 77], (33, 44), "hello"]
print(values, type(values))

v = m.list(values)
print(v, type(v))

v[0] = 123
v.append([55, 66, 77, 88])
print(v)

[44, 55, [66, 77], (33, 44), 'hello'] <class 'list'>
[44, 55, [66, 77], (33, 44), 'hello'] <class 'multiprocessing.managers.ListProxy'>
[123, 55, [66, 77], (33, 44), 'hello', [55, 66, 77, 88]]


##### ```multiprocessing.Manager``` to created shared `list` or `dict`
  - Provides a shared dictionary or a list amongst multiple processes
  - Though this resembles a shared memory, under the hood the list and dict contents are serialized and streamed across process for every update.

NOTE: Use this for scenarios where:
   1. Updates are generally done by 1 process and other processes are reading.
   2. Updates do not happen at rapid rate
   

In [42]:
import sys
sys.stdin, sys.stdout, sys.stderr

(<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>,
 <ipykernel.iostream.OutStream at 0x107e066b0>,
 <ipykernel.iostream.OutStream at 0x108c52590>)

In [43]:
sys.maxsize

9223372036854775807

In [44]:
a = 100  # a = int(100)
print(sys.getsizeof(a))

28


In [48]:
a = list(range(100000))
print(sys.getsizeof(a))

800056


In [50]:
a = [10, 20, 56, 78, 99]
sys.getsizeof(a) + sum(sys.getsizeof(x) for x in a)

244

In [55]:
d = { str(v): v*v for v in range(1_000_000) }
print(len(d))

d["65536"]

sys.getsizeof(d)

1000000


30758320

In [None]:
sys.getrefcount

In [62]:
sys.getswitchinterval()

0.049999999999999996

In [61]:
sys.setswitchinterval(0.05)

# Reduce the switch interval for more responsive threads (react to events quickly)
# Note, that reducing the switch interval would have heavy impact on the throughput.

# Increase the switch interval to better throughput.
