In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

# Recommend Materials

* *Python official documentation*
* *The Python 3 Standard Library by Example*

# Zen of today

* Don't reinvent wheels.
* Learning by doing/ making mistake
* The question is not how, but why
* Code reviweing

# Topics

* What is standard library?
* Data structure related standard library: enum, collections
* Functional programmning related library: itertools, functools, 
* File System related library: glob, os.path. (not covered today)
* Serialization: pickle, csv
* OS related Library: Pool, subprocess, threading

# What is Standard Library

From WikiPeida:

A standard library in computer programming is the library made available across implementations of a programming language. These libraries are conventionally described in programming language specifications; however, contents of a language's associated library may also be determined (in part or whole) by more informal practices of a language's community.

For a module to be included in a language, it must solve some classical computer science problem: i.e: sorting. Or it solve some common design/engineering problmes in computer science.

Numpy and pandas are not standard libriries, because if we exclude the branch of machine learning, then they are not related to any computer science problems.

To fully utilize the power of standard libary, we need have a good understanidng of computer science. However, this is not a must for quant. So the best practice is learning by doing, try to solve some real world problems and then come back. 

# Data Structure Related

## Enum

module provides an implementation of an enumeration type, with iteration and comparison capabilities. It can be used to create well-defined symbols for values, instead of using literal strings or integers.

The good part is we don't need use of integers or stings as return values.

### Simple creation/ Iteration/ Comparasion

In [6]:
import enum


## creation
class BugStatus(enum.Enum):
    
    new  = 7
    incomplete = 6
    invalid = 5
    wont_fix = 4
    in_progress = 3
    fix_commited = 2
    fix_released = 1

In [3]:
print('\nMember name: {}'.format(BugStatus.wont_fix.name))
print('Member value: {}'.format(BugStatus.wont_fix.value))


Member name: wont_fix
Member value: 4


In [7]:
## iteration
for status in BugStatus:
    print("{:15} = {}".format(status.name, status.value))

new             = 7
incomplete      = 6
invalid         = 5
wont_fix        = 4
in_progress     = 3
fix_commited    = 2
fix_released    = 1


In [11]:
## comparasion

actual_state = BugStatus.wont_fix
desired_state = BugStatus.fix_released

In [13]:
## equity test
print("Equality:", actual_state == desired_state, actual_state == BugStatus.wont_fix)

Equality: False True


In [14]:
## identity 
print("Identity:", actual_state is desired_state, actual_state is BugStatus.wont_fix)

Identity: False True


In [17]:
## can we compare?
try:
    print("\n".join(" " + s.name for s in sorted(BugStatus)))
except TypeError as err:
    print(" Cannot sort:{}".format(err))

 Cannot sort:'<' not supported between instances of 'BugStatus' and 'BugStatus'


In [18]:
class ReturnState(enum.Enum):
    
    success = 0
    fail = 1
    

In [20]:
## a simple exmaple using this design flow
def modify_first_elemnts(obj, elm):
    try:
        obj[0] = elm
        return ReturnState.success
    except:
        return ReturnState.fail

In [21]:
obj1 = [3]
obj2 = (3)

In [22]:
res = modify_first_elemnts(obj1, 2)

In [27]:
## c exmpale
## e_state modify_first_elemnents(void * obj, void* element)

In [23]:
res

<ReturnState.success: 0>

In [24]:
res = modify_first_elemnts(obj2, 2)

In [25]:
res

<ReturnState.fail: 1>

No need use of interger or strings. It is OK if you use string and you have few states, but what if you have many states.

An example in C:

https://www.man7.org/linux/man-pages/man3/shm_open.3.html

In [26]:
## states can be compared
class BugStatus(enum.IntEnum):
    new = 7
    incomplete = 6
    invalid = 5
    wont_fix = 4
    in_progress = 3
    fix_committed = 2
    fix_released = 1
    
print('Ordered by value:')
print('\n'.join(' ' + s.name for s in sorted(BugStatus)))

Ordered by value:
 fix_released
 fix_committed
 in_progress
 wont_fix
 invalid
 incomplete
 new


In [28]:
## non-unique enumeration
import enum
class BugStatus(enum.Enum):
    new = 7
    incomplete = 6
    invalid = 5
    wont_fix = 4
    in_progress = 3
    fix_committed = 2
    fix_released = 1
    
    
    by_design = 4
    closed = 1

for status in BugStatus:
    print('{:15} = {}'.format(status.name, status.value))
    
## by_deisng and closed are ignored, 
print('\nSame: by_design is wont_fix: ',
BugStatus.by_design is BugStatus.wont_fix)
print('Same: closed is fix_released: ',
BugStatus.closed is BugStatus.fix_released)

new             = 7
incomplete      = 6
invalid         = 5
wont_fix        = 4
in_progress     = 3
fix_committed   = 2
fix_released    = 1

Same: by_design is wont_fix:  True
Same: closed is fix_released:  True


In [29]:
## enforced uniqueness
@enum.unique
class BugStatus(enum.Enum):
    new = 7
    incomplete = 6
    invalid = 5
    wont_fix = 4
    in_progress = 3
    fix_committed = 2
    fix_released = 1
    # This will trigger an error with unique applied.
    by_design = 4
    closed = 1

ValueError: duplicate values found in <enum 'BugStatus'>: by_design -> wont_fix, closed -> fix_released

In [33]:
## using program to create enumeration, strings

names_str = 'fix_released fix_committed in_progress wont_fix invalid incomplete new'

BugStatus = enum.Enum(
    value = "BugStatus",
    names = (names_str)
)

In [34]:
print('Member: {}'.format(BugStatus.new))
print('\nAll members:')
for status in BugStatus:
    print('{:15} = {}'.format(status.name, status.value))

Member: BugStatus.new

All members:
fix_released    = 1
fix_committed   = 2
in_progress     = 3
wont_fix        = 4
invalid         = 5
incomplete      = 6
new             = 7


In [35]:
## or can pass a list of tuples

BugStatus = enum.Enum(
value='BugStatus',
names=[
    ('new', 7),
    ('incomplete', 6),
    ('invalid', 5),
    ('wont_fix', 4),
    ('in_progress', 3),
    ('fix_committed', 2),
    ('fix_released', 1),
],
)

print('All members:')
for status in BugStatus:
    print('{:15} = {}'.format(status.name, status.value))

All members:
new             = 7
incomplete      = 6
invalid         = 5
wont_fix        = 4
in_progress     = 3
fix_committed   = 2
fix_released    = 1


When to use enumeration?
* Variables to work with have finite possible outcome.
* Increase readability.

Read:
https://stackoverflow.com/questions/22586895/python-enum-when-and-where-to-use

Not only integers can be used as enumeration values, for more complex values, please read references.

## Collections

Data structures beyond list, tuple, dict.  Can be very useful in algorithmic interviews.

### ChainMap

Good to use as a context container. Basically a list of di tionary. A good mind model is to think it as a list sequence of dictionaires.

Is a good way to deal with command line like structure.

i.e:

In [43]:
ls -a -l

total 36
drwxr-xr-x 3 matthew matthew  4096 Aug 13 23:46  [0m[01;34m.[0m/
drwxr-xr-x 4 matthew matthew  4096 Aug  9 11:35  [01;34m..[0m/
-rw-r--r-- 1 matthew matthew   712 Aug  9 12:26 'Data Structure.ipynb'
drwxr-xr-x 2 matthew matthew  4096 Aug  9 12:23  [01;34m.ipynb_checkpoints[0m/
-rw-r--r-- 1 matthew matthew 17357 Aug 13 23:46  session5.ipynb


Using a chained map, we can store the settings of different layer of settings. Examples would be provided later.

In [37]:
## accessing values
import collections

a = {'a': 'A', 'c': 'C'}
b = {"b": "B", "c": "D"}

m = collections.ChainMap(a ,b)

In [38]:
m

ChainMap({'a': 'A', 'c': 'C'}, {'b': 'B', 'c': 'D'})

We can see it looks like a tuple list:

In [44]:
(a, b)

({'a': 'A', 'c': 'C'}, {'b': 'B', 'c': 'D'})

In [46]:
print("Inidivual Values")
print("a = {}".format(m["a"]))
print("b = {}".format(m["b"]))
print("c = {}".format(m["c"]))

Inidivual Values
a = A
b = B
c = C


Note that only "c" in first dictionary is shown. The first step, finding a key is done in linear search.

In [47]:
## get values and keys
print("Key = {}".format(list(m.keys())))
print("Values = {}".format(list(m.values())))

Key = ['b', 'c', 'a']
Values = ['B', 'C', 'A']


Again, only value in first dictionary is returned.

In [48]:
## iteration
for k,v in m.items():
    print("{} = {}".format(k, v))

b = B
c = C
a = A


In [49]:
## check existency
"d" in m

False

In [50]:
## reorindgs
print(m.maps)

[{'a': 'A', 'c': 'C'}, {'b': 'B', 'c': 'D'}]


In [51]:
print(m["c"])

C


In [54]:
##
m_new = m.copy()
m_new.maps = list(reversed(m.maps))

In [56]:
m_new.maps

[{'b': 'B', 'c': 'D'}, {'a': 'A', 'c': 'C'}]

In [57]:
m_new["c"]

'D'

In [58]:
## updating values, note that child dictionary is not cached.

a = {'a': 'A', 'c': 'C'}
b = {'b': 'B', 'c': 'D'}
m = collections.ChainMap(a, b)

In [59]:
m

ChainMap({'a': 'A', 'c': 'C'}, {'b': 'B', 'c': 'D'})

In [60]:
a["c"] = "E"

In [61]:
m

ChainMap({'a': 'A', 'c': 'E'}, {'b': 'B', 'c': 'D'})

In [62]:
## modified m
a = {'a': 'A', 'c': 'C'}
b = {'b': 'B', 'c': 'D'}
m = collections.ChainMap(a, b)

m["c"] = "E"

In [63]:
a

{'a': 'A', 'c': 'E'}

In [64]:
b

{'b': 'B', 'c': 'D'}

A is modified and b states unchanged.

In [65]:
## new child. Most powerful feature of chained-map
a = {'a': 'A', 'c': 'C'}
b = {'b': 'B', 'c': 'D'}
m1 = collections.ChainMap(a, b)
m2 = m1.new_child()

In [66]:
m1

ChainMap({'a': 'A', 'c': 'C'}, {'b': 'B', 'c': 'D'})

In [67]:
m2

ChainMap({}, {'a': 'A', 'c': 'C'}, {'b': 'B', 'c': 'D'})

In [68]:
m2["c"] = "E"

In [69]:
m1

ChainMap({'a': 'A', 'c': 'C'}, {'b': 'B', 'c': 'D'})

In [70]:
m2

ChainMap({'c': 'E'}, {'a': 'A', 'c': 'C'}, {'b': 'B', 'c': 'D'})

Why don't we just use an oridinary dictionary? Check:

https://stackoverflow.com/questions/23392976/what-is-the-purpose-of-collections-chainmap

Also, check an example script.

### Counter

A Counter is a container that keeps track of how many times equivalent values are added.

In [84]:
## consruction. Passing sequence container
collections.Counter(["a", "b", "c", "a", "b", "b"])

Counter({'a': 2, 'b': 3, 'c': 1})

In [85]:
## second way, passing dictionary
collections.Counter({"a": 2, "b":3, "c":1})

Counter({'a': 2, 'b': 3, 'c': 1})

In [86]:
## third way, passing arguments
collections.Counter( a = 2, b =3 , c = 1)

Counter({'a': 2, 'b': 3, 'c': 1})

In [90]:
## updates
c = collections.Counter()

In [91]:
c

Counter()

In [92]:
c.update("abcdaab")

In [93]:
c

Counter({'a': 3, 'b': 2, 'c': 1, 'd': 1})

In [94]:
c.update({"a" : 1, "d": 5})  ## note this is baesd on increase

In [95]:
c

Counter({'a': 4, 'b': 2, 'c': 1, 'd': 6})

In [96]:
c.update({"a" : -1})

In [97]:
c

Counter({'a': 3, 'b': 2, 'c': 1, 'd': 6})

We can also pass negative number.

In [99]:
c.update({"a": 0.34})

In [100]:
c

Counter({'a': 3.34, 'b': 2, 'c': 1, 'd': 6})

Can also accept fractional number.

In [101]:
import collections

c = collections.Counter("abcdaab")

for letter in "abcde":
    print("{} : {}".format(letter, c[letter]))

a : 3
b : 2
c : 1
d : 1
e : 0


If a value does not present in the counter, the zero would occur.

In [104]:
## access elemntes
c = collections.Counter("extremely")
c["z"] = 0
print(c)
print(list(c.elements()))  

Counter({'e': 3, 'x': 1, 't': 1, 'r': 1, 'm': 1, 'l': 1, 'y': 1, 'z': 0})
['e', 'e', 'e', 'x', 't', 'r', 'm', 'l', 'y']


The order of elmenets are not guranteed and the elements with counter less than or equal to 0 are not returned.

In [111]:
## count most common 
c = collections.Counter()
with open("./program_news.txt", "rt") as f:
    for line in f:
        print(line)
        c.update(line.rstrip().lower()) ## remove trailing characters


for letter, count in c.most_common(3):
    print("{}:{:>7}".format(letter, count))

Baruch MFE won the 2020 9th IAQF Student Competition; news article from Baruch MFE.

Baruch MFE won the 2020 Rotman International Trading Competition; news article from Baruch College.

The 2019 Baruch MFE 5th Year Career Development Report can be found here. Featured in The Wall Street Journal in December 2019.

Employment Statistics (December 2019 – May 2020)

Placement Rate: 30 of 30

Starting Salary: High 160K;  Low 95K;  Median 120K;  Average 122K

First Year Guaranteed Compensation: High 235K;  Low 95K;  Median 140K;  Average 147K

Employers (by type): Hedge Funds/Prop Trading/Asset Management: 50%, Investment Banks: 40%, FinTech/Tech: 10%

Employers (some with multiple hires; selected): AQR, Bank of America, Barclays, Beacon Platform, BlackRock, Citadel, Credit Suisse, Cubist, Goldman Sachs, IMC Trading, Millennium, Morgan Stanley, Point72, Quantitative Brokers, Societe Generale, Squarepoint Capital, TD Securities, UBS

Location: New York: 90%, US (Other) 10%

Internship Statist

In [115]:
## arithmetic
c1 = collections.Counter(["a", "b", "c", "a", "b", "b"])
c2 = collections.Counter("alphabet")

In [116]:
c1

Counter({'a': 2, 'b': 3, 'c': 1})

In [117]:
c2

Counter({'a': 2, 'l': 1, 'p': 1, 'h': 1, 'b': 1, 'e': 1, 't': 1})

In [118]:
## add
c1 + c2

Counter({'a': 4, 'b': 4, 'c': 1, 'l': 1, 'p': 1, 'h': 1, 'e': 1, 't': 1})

Again, this is like SQL outter join

In [122]:
## union, take possible maximum
c1 | c2

Counter({'a': 2, 'b': 3, 'c': 1, 'l': 1, 'p': 1, 'h': 1, 'e': 1, 't': 1})

In [123]:
## intersection take possible minimujm
c1 & c2

Counter({'a': 2, 'b': 1})

In [125]:
## substraction, 0 and negative values won't show
c1 - c2

Counter({'b': 2, 'c': 1})

### Default dict

A generilization of dict object, can specify default behavior when constructed.

In [127]:
def default_factor():
    return "default value"

d = collections.defaultdict(default_factor, foo = "bar")
d

defaultdict(<function __main__.default_factor()>, {'foo': 'bar'})

In [128]:
d["foo"]

'bar'

In [129]:
d["bar"]

'default value'

### Deque

A powerful datastructure in algorithmeic interview, It has the property of both queue and stack. I recommend just use this data structure to solve queue/ stack questions, so you can save times thinking about API of different data structure.

In computer science, stack or queue can be used to store works to be done, as we could see later.

In [130]:
## basic operaitons
d = collections.deque("abcdefg")

In [132]:
## looks like a list
d

deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])

In [133]:
len(d)

7

In [134]:
d[2]

'c'

In [135]:
d[0]

'a'

In [136]:
d[-1]

'g'

In [138]:
## supporting slice
d[1:5]

TypeError: sequence index must be integer, not 'slice'

In [154]:
## extending
d1 = collections.deque()
d1.extend("abcdefg")
d1

deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])

In [140]:
d1.append("h")

In [141]:
d1

deque(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])

In [149]:
## add to left
d2 = collections.deque()
d2.extendleft(range(6))

In [150]:
d2

deque([5, 4, 3, 2, 1, 0])

In [151]:
d2.appendleft(6)

In [152]:
d2

deque([6, 5, 4, 3, 2, 1, 0])

In [153]:
## consuming, populating from right
d = collections.deque("abcdefg")
while True:
    try:
        print(d.pop(), end = "")
    except IndexError:
        break

gfedcba

In [156]:
## consuming, populating from left
d = collections.deque(range(6))
while True:
    try:
        print(d.popleft(), end='')
    except IndexError:
        break

012345

This is a classical multithreading pattern called boss/ worker model. Main program servers as a boss, and it assigns tasks to the workers, which are differnt threads. We would discuss more about it later. 

In [157]:
## rotation

d = collections.deque(range(10))
print("Normal :", d)

d = collections.deque(range(10))
d.rotate(2)
print("Right rotation",d)

d = collections.deque(range(10))
d.rotate(-2)
print("Left rotation", d)

Normal : deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Right rotation deque([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])
Left rotation deque([2, 3, 4, 5, 6, 7, 8, 9, 0, 1])


### Named Tuple/ Ordered diciontary

In [164]:
## named tuple, looks like struct in C

Person = collections.namedtuple("Person", "name age")

bob = Person(name = "Bob", age = 30)
print("\nRepresentationL:", bob)

## by index
print("{} is {} years old".format(*bob))


RepresentationL: Person(name='Bob', age=30)
Bob is 30 years old


In [165]:
print(bob.age, bob.name)

30 Bob


Very similar to struct in C.

In [184]:
## ordered dict
d = collections.OrderedDict()
d['a'] = "A"
d['b'] = "B"
d['c'] = "C"
for k, v in d.items():
    print(k, v)

a A
b B
c C


Anyone find this data structure similar to what we discussed?

### Array

It is just a C array. If you want to impress interviewer you can use this data strcuture, since it is much more efficient to solve standard array problems than python.

In [191]:
## initialization 
import array
import pprint

a = array.array('i', range(3))
print("Initial :", a)

Initial : array('i', [0, 1, 2])


### Heapq

A data structure that implments heap. Anyone knows what is a heap?

In [203]:
## example and plot function
data = [19, 9, 4, 10, 11]

import math
import heapq
from io import StringIO
def show_tree(tree, total_width=36, fill=' '):
    """Pretty-print a tree."""
    output = StringIO()
    last_row = -1
    for i, n in enumerate(tree):
        if i:
            row = int(math.floor(math.log(i + 1, 2)))
        else:
            row = 0
        if row != last_row:
            output.write('\n')
        columns = 2 ** row
        col_width = int(math.floor(total_width / columns))
        output.write(str(n).center(col_width, fill))
        last_row = row
    print(output.getvalue())
    print('-' * total_width)
    print()

In [204]:
## creating a heap
heap = []
print(data)

[19, 9, 4, 10, 11]


In [205]:
for n in data:
    print("add {:>3}".format(n))
    heapq.heappush(heap, n)
    show_tree(heap)

add  19

                 19                 
------------------------------------

add   9

                 9                  
        19        
------------------------------------

add   4

                 4                  
        19                9         
------------------------------------

add  10

                 4                  
        10                9         
    19   
------------------------------------

add  11

                 4                  
        10                9         
    19       11   
------------------------------------



In [206]:
heap

[4, 10, 9, 19, 11]

Note even though conceptually heap is a tree-like structure, it can be represented as an array.

In [207]:
## 
data

[19, 9, 4, 10, 11]

In [208]:
heapq.heapify(data)

In [209]:
data

[4, 9, 19, 10, 11]

In [210]:
show_tree(data)


                 4                  
        9                 19        
    10       11   
------------------------------------



Preserve the minimal heap property.

In [211]:
## accessign the content of the heap
for i in range(2):
    smallest = heapq.heappop(data)
    print("pop {:>3}".format(smallest))
    show_tree(data)

pop   4

                 9                  
        10                19        
    11   
------------------------------------

pop   9

                 10                 
        11                19        
------------------------------------



In [213]:
## push 
for i in range(4):
    print("push {:>3}".format(i))
    heapq.heappush(data, i)
    show_tree(data)

push   0

                 0                  
        10                19        
    11   
------------------------------------

push   1

                 0                  
        1                 19        
    11       10   
------------------------------------

push   2

                 0                  
        1                 2         
    11       10       19   
------------------------------------

push   3

                 0                  
        1                 2         
    11       10       19       3    
------------------------------------



# Functional programming related

## Functiontools

Manipulating functions as objects.

In [11]:
import functools
def myfunc(a, b=2):
    "Docstring for myfunc()."
    print(' called myfunc with:', (a, b))

def show_details(name, f, is_partial=False):
    "Show details of a callable object."
    print('{}:'.format(name))
    print(' object:', f)
    if not is_partial:
        print(' __name__:', f.__name__)
    if is_partial:
        print(' func:', f.func)
        print(' args:', f.args)
        print(' keywords:', f.keywords)
    return
show_details('myfunc', myfunc)
myfunc('a', 3)
print()

# Set a different default value for 'b', but require
# the caller to provide 'a'.
p1 = functools.partial(myfunc, b=4)
show_details('partial with named default', p1, True)
p1('passing a')
p1('override b', b=5)
print()
# Set default values for both 'a' and 'b'.
p2 = functools.partial(myfunc, 'default a', b=99)
show_details('partial with defaults', p2, True)
p2()
p2(b='override b')
print()
print('Insufficient arguments:')
p1()

myfunc:
 object: <function myfunc at 0x7f12945b90d0>
 __name__: myfunc
 called myfunc with: ('a', 3)

partial with named default:
 object: functools.partial(<function myfunc at 0x7f12945b90d0>, b=4)
 func: <function myfunc at 0x7f12945b90d0>
 args: ()
 keywords: {'b': 4}
 called myfunc with: ('passing a', 4)
 called myfunc with: ('override b', 5)

partial with defaults:
 object: functools.partial(<function myfunc at 0x7f12945b90d0>, 'default a', b=99)
 func: <function myfunc at 0x7f12945b90d0>
 args: ('default a',)
 keywords: {'b': 99}
 called myfunc with: ('default a', 99)
 called myfunc with: ('default a', 'override b')

Insufficient arguments:


TypeError: myfunc() missing 1 required positional argument: 'a'

In [13]:
?myfunc

In [14]:
??myfunc

## Itertools

Different kinds of iterations:

In [17]:
## container
from itertools import *
for i in chain([1, 2, 3], ['a', 'b', 'c']):
    print(i, end=' ')

1 2 3 a b c 

In [18]:
## iterables
from itertools import *
def make_iterables_to_chain():
    yield [1, 2, 3]
    yield ['a', 'b', 'c']
for i in chain.from_iterable(make_iterables_to_chain()):
    print(i, end=' ')

1 2 3 a b c 

In [19]:
## zip
for i in zip([1, 2, 3], ['a', 'b', 'c']):
    print(i)

(1, 'a')
(2, 'b')
(3, 'c')


In [20]:
## zip
for i in zip([1, 2, 3], ['a', 'b', 'c'], [5,5,6]):
    print(i)

(1, 'a', 5)
(2, 'b', 5)
(3, 'c', 6)


In [21]:
## longest
r1 = range(3)
r2 = range(2)
print('zip stops early:')
print(list(zip(r1, r2)))
r1 = range(3)
r2 = range(2)
print('\nzip_longest processes all of the values:')

zip stops early:
[(0, 0), (1, 1)]

zip_longest processes all of the values:


# OS related

Introduction: data/ read and writing. Efficent computation.

## Threading v.s. Process:

Read: https://stackoverflow.com/questions/200469/what-is-the-difference-between-a-process-and-a-thread

Using top to illustarte.

Visualization:

http://insidejvmjava.blogspot.com/2018/12/process-vs-threads.html

Operating system:

https://www.tutorialspoint.com/operating_system/os_overview.htm

Concurrency:

https://techdifferences.com/difference-between-concurrency-and-parallelism.html

Hardware:

https://www.oreilly.com/library/view/designing-embedded-hardware/0596007558/ch01.html

Read basic system processor.

## Execution of threads

Check again dequeue example, but with sleep removed. 

Check three thread scripts

## Python GIL

This is a mechanism to make python thread-safe.

Read:
https://en.wikipedia.org/wiki/Global_interpreter_lock

So essentially python is not a good language for multithreading. However, since python is a good script language, we can use python to call other langues.

In [6]:
## numpy multithreaidng example
import numpy as np
A = np.eye(10000)

In [7]:
%%time
X = np.linalg.inv(A)

CPU times: user 54.1 s, sys: 1 s, total: 55.1 s
Wall time: 13.9 s


## Pool

An easy to use multiprocessing model.

In [241]:
from multiprocessing import Pool

In [214]:
def squared(x):
    return x**2

In [270]:
array1 = list(range(50000000))

In [271]:
%%time
res = [squared(x) for x in array1]

CPU times: user 11.3 s, sys: 360 ms, total: 11.7 s
Wall time: 11.7 s


In [273]:
%%time
pool  = Pool(2)
res = pool.map(squared, array1)
pool.close()
pool.join()

CPU times: user 4.13 s, sys: 1.94 s, total: 6.06 s
Wall time: 11 s


## Interprocess Communication

Read:

https://en.wikipedia.org/wiki/Inter-process_communication

Shared memory:

https://www.tutorialspoint.com/inter_process_communication/inter_process_communication_shared_memory.htm

Use case: 
* Service client model
* examples: shared memory/ socket/ remote procedure call.

Why do we need it? 

* Service/ client model. Service oriented architecture. 

Check:

https://docs.python.org/3/library/multiprocessing.shared_memory.html

Must run in python 3.8

# Serialization

What is it?

Serialization is the process of turning an object in memory into a stream of bytes so you can do stuff like store it on disk or send it over the network.

Deserialization is the reverse process: turning a stream of bytes into an object in memory.

Why do we need it?

In [9]:
import pickle
import pprint

data1 = [{'a': 'A', 'b': 2, 'c': 3.0}]
print('BEFORE: ', end=' ')

pprint.pprint(data1)
data1_string = pickle.dumps(data1)
data2 = pickle.loads(data1_string)

print('AFTER : ', end=' ')
pprint.pprint(data2)
print('SAME? :', (data1 is data2))
print('EQUAL?:', (data1 == data2))

BEFORE:  [{'a': 'A', 'b': 2, 'c': 3.0}]
AFTER :  [{'a': 'A', 'b': 2, 'c': 3.0}]
SAME? : False
EQUAL?: True


In [10]:
data1_string

b'\x80\x03]q\x00}q\x01(X\x01\x00\x00\x00aq\x02X\x01\x00\x00\x00Aq\x03X\x01\x00\x00\x00bq\x04K\x02X\x01\x00\x00\x00cq\x05G@\x08\x00\x00\x00\x00\x00\x00ua.'