## Quick Guide to Performance and Parallel Programming

There are many options available to improve the performance of your Python codes.
The first thing to determine is what is limiting your computation. It could be CPU
speed (unlikely), memory limitations (out-of-core computing), or it could be data
transfer speed (waiting on data to arrive for processing). If your code is pure-Python,
then you can try running it with Pypy, which is is an alternative Python implementa-
tion that employs a just-in-time compiler. If your code does not experience a massive
speed-up with Pypy, then there is probably something external to the code that is
slowing it down (e.g., disk access or network access). If Pypy doesn’t make any
sense because you are using many compiled modules that Pypy does not support,
then there are many diagnostic tools available

Python has its own built-in profiler cProfile you can invoke from the command
line as in the following


```bash
python -m cProfile -o program.prof my_program.py
```

### Testing with the Fibonacci sequence

In [2]:
def fib(n):
    """Display the n first terms of Fibonacci sequence"""
    a, b = 0, 1
    i = 0
    fib_list = []
    fib_list.append(b)
    while i < n:
        a, b = b, a+b
        fib_list.append(b)
        i +=1
    print(fib_list[0])

In [9]:
%time fib(100000)

1
CPU times: user 254 ms, sys: 162 ms, total: 416 ms
Wall time: 414 ms


In [5]:
%prun fib(100000)

1
 

         100040 function calls in 0.445 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.347    0.347    0.357    0.357 <ipython-input-2-6bb8f8030fb8>:1(fib)
        1    0.088    0.088    0.445    0.445 <string>:1(<module>)
   100001    0.010    0.000    0.010    0.000 {method 'append' of 'list' objects}
        3    0.000    0.000    0.000    0.000 socket.py:333(send)
        1    0.000    0.000    0.445    0.445 {built-in method builtins.exec}
        3    0.000    0.000    0.000    0.000 iostream.py:195(schedule)
        2    0.000    0.000    0.000    0.000 iostream.py:366(write)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        2    0.000    0.000    0.000    0.000 iostream.py:300(_is_master_process)
        3    0.000    0.000    0.000    0.000 {method 'acquire' of '_thread.lock' objects}
        3    0.000    0.000    0.000    0.000 threading.py:1104(is_alive)
     

In [6]:
488*10e-3/0.445

10.96629213483146

## multiprocessing

Python has a multiprocessing module that is part of the standard library. This makes it easy
to spawn child worker processes that can break off and individually process small
parts of a big job. However, it is still your responsibility as the programmer to figure
out how to distribute the data for your algorithm.

In [10]:
def fib(n):
    """Display the n first terms of Fibonacci sequence"""
    a, b = 0, 1
    i = 0
    fib_list = []
    fib_list.append(b)
    while i < n:
        a, b = b, a+b
        fib_list.append(b)
        i +=1
    return(fib_list[0])

In [12]:
%time test = [fib(i) for i in range(100000,100020,1)]

CPU times: user 5.38 s, sys: 3.42 s, total: 8.8 s
Wall time: 8.8 s


In [15]:
from multiprocessing import Pool

In [19]:
pool = Pool(processes=8)

Process ForkPoolWorker-14:
Process ForkPoolWorker-12:
Process ForkPoolWorker-15:
Process ForkPoolWorker-13:
Process ForkPoolWorker-11:
Process ForkPoolWorker-10:
Process ForkPoolWorker-16:
Process ForkPoolWorker-9:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/Users/hvribeiro/anaconda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/Users/hvribeiro/anaconda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/Users/hvribeiro/anaconda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/Users/hvribeiro/anaconda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/Users/hvribeiro/anaconda/lib/python3.6/multiprocessing/process.py",

In [20]:
%time test2 = pool.map(fib, range(100000,100020,1))

CPU times: user 4.96 ms, sys: 2.62 ms, total: 7.59 ms
Wall time: 2.2 s


In [18]:
pool.terminate()

## dask


In [92]:
!pip install dask

Collecting dash
  Downloading https://files.pythonhosted.org/packages/ab/3c/f98dc587b064fbf5699730014bb59a1ed2213e368fbaf1303ad24ad74b6e/dash-0.26.3.tar.gz
Collecting flask-compress (from dash)
  Downloading https://files.pythonhosted.org/packages/0e/2a/378bd072928f6d92fd8c417d66b00c757dc361c0405a46a0134de6fd323d/Flask-Compress-1.4.0.tar.gz
Collecting plotly (from dash)
[?25l  Downloading https://files.pythonhosted.org/packages/38/b0/86f6d2443a64be0e30543065c6839d0e033a41ffb7fa4ad0760ae9c3b534/plotly-3.1.1-py2.py3-none-any.whl (36.9MB)
[K    100% |████████████████████████████████| 36.9MB 700kB/s 
[?25hCollecting dash_renderer (from dash)
[?25l  Downloading https://files.pythonhosted.org/packages/bd/14/fa960d38fc68490445a26ad942455ba71ad1140a71c45e7a6b4349bcf922/dash_renderer-0.13.2.tar.gz (160kB)
[K    100% |████████████████████████████████| 163kB 1.7MB/s 
Collecting retrying>=1.3.3 (from plotly->dash)
  Downloading https://files.pythonhosted.org/packages/44/ef/beae4b4ef80902f22e3

In [21]:
%%time
test3 = []
for i in range(100000,100020,1):
    a = fib(i)
    test3.append(a)
test3

CPU times: user 5.02 s, sys: 2.8 s, total: 7.82 s
Wall time: 7.81 s


In [29]:
dask.__version__

'0.18.2'

In [22]:
import dask
from dask.distributed import Client, progress
client = Client()

In [32]:
test4 = []
for i in range(100000,100020,1):
    a = dask.delayed(fib)(i)
    test4.append(a)
res = dask.delayed()(test4)

In [33]:
res

Delayed('list-def7df13-c7e1-4d06-b03d-67363433e83b')

open http://localhost:8787/status to see the progress

In [28]:
%time res.compute()

CPU times: user 460 ms, sys: 41.5 ms, total: 501 ms
Wall time: 3.05 s


[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

In [34]:
res = res.persist()  # start computation in the background

In [35]:
res.compute()

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

