# Introduction to efficient Python CPU programming

### Welcome to the SURF Jupyter Hub! 

JupyterHub provides a multiuser environment for Jupyter notebooks. The SURF JupyterHub service facilitates external courses, e.g. programming courses, and runs on our Lisa cluster.

Each course receives its own instance of JupyterHub. Users who log in will be provided with their own Jupyter Notebook Server, where they can create, download, upload and run notebooks. The service provides functionality for teachers to easily share notebooks, data and installations with their students.

For more information on the Hub and how to use it, please refer to:
https://servicedesk.surfsara.nl/wiki/display/WIKI/JupyterHub+for+education

### What is a Jupyter notebook?

The Jupyter Notebook is an open source web application that you can use to create and share documents that contain live code, equations, visualizations, and rich text (Markdown). 

Jupyter Notebook is maintained by the people at [Project Jupyter](https://jupyter.org/).

The Jupyter Hub gives users access to computational environments and resources without burdening the users with installation and setup tasks (spawn remotely notebooks and connect them to your local PC). 

This Jupyter Notebooks is running on a compute node on LISA, check it out!


#### With Jupyter we can do both rich text and code

In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


In [2]:
!lscpu

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          24
On-line CPU(s) list:             0-23
Thread(s) per core:              1
Core(s) per socket:              12
Socket(s):                       2
NUMA node(s):                    4
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
Stepping:                        4
CPU MHz:                         2601.472
BogoMIPS:                        4600.00
Virtualization:                  VT-x
L1d cache:                       768 KiB
L1i cache:                       768 KiB
L2 cache:                        24 MiB
L3 cache:                        33 MiB
NUMA node0 CPU(s):               0,4,8,12,16,20
NUM

### Jupyter has some "magic" builtin commands. 

Here's how to list them

In [5]:
%lsmagic

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%

To know more about each %magic command, you can use:

In [25]:
%autoawait?

If you are interested in the source code of the command:

In [17]:
%autoawait??

### Let's see some of the more interesting "magic" builtin functions:

How to list environment variable set on the underlying BASH shell

In [4]:
%env TEACHER_DIR

'/project/jhlsrf018'

You can also modify them from the cell

In [2]:
%env OMP_NUM_THREADS=16

env: OMP_NUM_THREADS=16


In [23]:
!export OMP_NUM_THREADS=16

You can actually run BASH programs

In [26]:
%%bash
echo "Hello 1"
sleep 5
echo "Hello 2"

Hello 1
Hello 2


And write files directly from within the cell

In [9]:
%%writefile hello_world.py
if __name__ == "__main__":
    print("Hello World!")

Writing hello_world.py


And  you can execute python programs inside the notebook

In [10]:
%run ./hello_world.py

Hello World!


In [11]:
!python hello_world.py

Hello World!


## Examples for lightweight profiling your code

 -  **%timeit** A very usefull magic function (especially for this course!)
 -  **time** (module) This module provides various time-related functions.
 -  **cProfile** (module) This module is recommended for most users; it’s a C extension with reasonable overhead that makes it suitable for profiling long-running programs. Based on lsprof, contributed by Brett Rosen and Ted Czotter.

In [30]:
import random

a = [random.random() for i in range(100)]
b = [random.random() for i in range(100)]

%timeit a+b

429 ns ± 5.76 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


Let's use it now for something more interesting...

![matmul_diagram](https://upload.wikimedia.org/wikipedia/commons/e/eb/Matrix_multiplication_diagram_2.svg)

![Matmul_eqn](https://miro.medium.com/max/674/1*NFxuJRT-GltSryxfW_hdgw.png)

In [8]:
# Simple Matrix multiplication 
import numpy as np
from random import random

def explicit_matmul(A,B):
    C = [[0 for x in range(len(A))] for y in range(len(B[0]))]
    for i in range(len(A)):
        for j in range(len(B[0])):
            for k in range(len(B)):
                C[i][j] += A[i][k] * B[k][j]
    return C

# Simple Matrix multiplication algorithm
def numpy_matmul(A,B):
    npA = np.array(A)
    npB = np.array(B)
    C = np.matmul(A,B)
    return C

#Set matrix dimension
AX=AY=BX=BY=200

#Define Matrix A
A = [[random() for x in range(AX)] for y in range(AY)]

#Define Matrix B
B = [[random() for x in range(BX)] for y in range(BY)]

#And now time the two runs and compare the results
print("Explicit:")
%timeit explicit_matmul(A,B)

print("numpy:")
%timeit numpy_matmul(A,B)

Explicit:
1.51 s ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
numpy:
10.5 ms ± 44.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [37]:
import time 

start = time.perf_counter_ns()
explicit_matmul(A,B)
end = time.perf_counter_ns()

print("Time of function execution is " +str(round(end-start)) + " ns")

         10304 function calls in 0.204 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.001    0.000    0.001    0.000 896216640.py:11(<listcomp>)
        1    0.203    0.203    0.204    0.204 896216640.py:9(explicit_matmul)
    10202    0.001    0.000    0.001    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


Time of function execution is 205996428 ns


With Python with can also profile our functions with the use of the "cProfile" module

In [38]:
import cProfile

cProfile.run('explicit_matmul(A,B)') #By default the run method prints to the std out


         10304 function calls in 0.198 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.001    0.000    0.001    0.000 896216640.py:11(<listcomp>)
        1    0.196    0.196    0.198    0.198 896216640.py:9(explicit_matmul)
    10202    0.001    0.000    0.001    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


         4 function calls in 0.200 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.199    0.199 4044252659.py:4(profiled_func)
        1    0.000    0.000    0.199    0.199 <string>:1(<module>)
        1    0.000    0.000    0.200    0.200 {built-in method builtins.exec}
        1    0.199    0.199    0.199    0.199 {method 'enable' of '_lsprof.Profiler' objects}




In [18]:
cProfile.run('explicit_matmul(A,B)',"my_perf_file.out") #By default the run method prints to the std out

Now we can investigate the pofiling a little closer with the **pstats.Stats class**

In [29]:
import pstats
from pstats import SortKey

p = pstats.Stats('my_perf_file.out')  #read in the profile data

#you can sort by the internal time
p.sort_stats('time')
p.print_stats()

#you can sort by the number of calls
p.sort_stats('calls')
p.print_stats()

#you can reverse the order
p.reverse_order()
p.print_stats()


Thu Nov 17 11:31:26 2022    my_perf_file.out

         40606 function calls in 1.532 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    1.525    1.525    1.532    1.532 /tmp/ipykernel_13713/433218466.py:5(explicit_matmul)
      200    0.003    0.000    0.003    0.000 /tmp/ipykernel_13713/433218466.py:6(<listcomp>)
    40402    0.003    0.000    0.003    0.000 {built-in method builtins.len}
        1    0.000    0.000    1.533    1.533 <string>:1(<module>)
        1    0.000    0.000    1.533    1.533 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


Thu Nov 17 11:31:26 2022    my_perf_file.out

         40606 function calls in 1.532 seconds

   Ordered by: call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    40402    0.003    0.000    0.003    0.000 {built-in method builtins.len}
      200    0.003    

<pstats.Stats at 0x15389dbd6910>

cProfile can also be invoked as a script to profile another script:

```python -m cProfile [-o output_file] [-s sort_order] (-m module | myscript.py)```


If you want to use cProfile quickly in your Jupyter notebook, you can define the following function

In [31]:
import cProfile

def do_profile(func):
    def profiled_func(*args, **kwargs):
        profile = cProfile.Profile()
        try:
            profile.enable()
            result = func(*args, **kwargs)
            profile.disable()
            return result
        finally:
            profile.print_stats()
    return profiled_func

By decorating functions with ```@do_profile```,  can then analyse the behaviour of our code

In [36]:
# Simple Matrix multiplication algorithm
@do_profile
def numpy_matmul(A,B):
    npA = np.array(A)
    npB = np.array(B)
    C = np.matmul(A,B)
    return C

@do_profile
def explicit_matmul(A,B):
    C = [[0 for x in range(len(A))] for y in range(len(B[0]))]
    for i in range(len(A)):
        for j in range(len(B[0])):
            for k in range(len(B)):
                C[i][j] += A[i][k] * B[k][j]
    return C

#Set matrix dimension
AX=AY=BX=BY=100

#Define Matrix A
A = [[random() for x in range(AX)] for y in range(AY)]

#Define Matrix B
B = [[random() for x in range(BX)] for y in range(BY)]

res = numpy_matmul(A,B)

res = explicit_matmul(A,B)

         4 function calls in 0.010 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.008    0.008    0.010    0.010 896216640.py:2(numpy_matmul)
        2    0.002    0.001    0.002    0.001 {built-in method numpy.array}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


         10304 function calls in 0.190 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.000    0.000    0.000    0.000 896216640.py:11(<listcomp>)
        1    0.189    0.189    0.190    0.190 896216640.py:9(explicit_matmul)
    10202    0.001    0.000    0.001    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




### References
* https://jupyter.org/
* https://ipython.readthedocs.io/en/stable/interactive/magics.html
* https://numpy.org/doc/stable/contents.html
* https://scipy-cookbook.readthedocs.io/items/ParallelProgramming.html
* https://docs.python.org/3/library/profile.html
