# Profiling: Finding bottlenecks in Python programs
**Contributors**: Simon Funke

*"First make it work. Then make it right. Then make it fast."* 
  
  Kent Beck

## Profiling allows us to measure resources used by sections of the program. 

Typical resources of interest are
* CPU
* Memory
* Disk I/O
* Network I/O
* ...

We focus on **CPU** profiling only.

## CPU: start simple, switch to more complex techniques if needed!

1. Manual timing
2. timeit module
3. Profiling

## Case study: filling a grid with point values

* Consider a rectangular 2D grid
<center>![lattice](pdf/grid_lattice.svg "Python")Grid lattice</center>
* Goal: a NumPy array `a[i,j]` that holds values at the grid points

# An implementation

In [4]:
from numpy import *

class Grid2D(object):
    def __init__(self,
                 xmin=0, xmax=1, dx=0.5,
                 ymin=0, ymax=1, dy=0.5):
        
        self.xcoor = arange(xmin, xmax+dx, step=dx)
        self.ycoor = arange(ymin, ymax+dy, step=dy)

    def gridloop(self, f):
        lx = size(self.xcoor)
        ly = size(self.ycoor)
        a = zeros((lx,ly))

        for i in range(lx):
            x = self.xcoor[i]
            for j in range(ly):
                y = self.ycoor[j]
                a[i,j] = f(x, y)
        return a

# Usage

Creating a new grid

In [5]:
g = Grid2D(dx=0.001, dy=0.001)

Computing grid values

In [6]:
def myfunc(x, y):
    return sin(x*y) + y

print("Computing grid values...")
a = g.gridloop(myfunc)
print("done")

Computing grid values...
done


# Manual timing

Use `time.time()` to measure the time spend in a code section.
```python
import time
t0 = time.time()
# execute code here
t1 = time.time()
print("Runtime: {}".format(t1-t0))
```
  

**Tips**:
* Simple statements should be placed in a loop.
* Make sure to have no other expensive programs running.
* Run the tests several times, choose the fastest. **Why?**

## Python challenge

1. Download [grid.py (link)](https://raw.githubusercontent.com/UiO-INF3331/UiO-INF3331.github.io/crash-course/lectures/profiling/code/grid.py) and run it. Which parts of the main function calls might be slow (consider the main functions first)?
2. Time the suspected slow code lines and compare to the total runtime.

## Timing of `grid.py`

Two parts that could potentially be slow: 
1. The initialisation `Grid2D(dx=0.001, dy=0.001)`
2. Calling the `g.gridloop(myfunc)` function.

We time these two parts separately to figure out how much time is spend in each.

### Timing the Grid2D initialisation

In [7]:
import time

for i in range(1, 4):
    t0 = time.time()
    g = Grid2D(dx=0.001, dy=0.001)
    t1 = time.time()
    print("Experiment {}, CPU time: {:.4} s".format(i, t1-t0))
print("Done")

Experiment 1, CPU time: 0.000124 s
Experiment 2, CPU time: 4.244e-05 s
Experiment 3, CPU time: 3.648e-05 s
Done


### Timing the `gridloop` function

In [8]:
import time

for i in range(1, 4):
    t0 = time.time()
    g.gridloop(myfunc)
    t1 = time.time()
    print("Experiment {}. CPU time: {:.4} s".format(i, t1-t0))
print("Done")

Experiment 1. CPU time: 1.111 s
Experiment 2. CPU time: 1.094 s
Experiment 3. CPU time: 1.088 s
Done


$=>$ The gridloop function is the cause of the slow execution!

# The *timeit* module

## The *timeit* module (1)

The `timeit` module provides an convenient way for measuring the CPU time of small code snippets.

In [15]:
import timeit
timeit.timeit(stmt="a+=1", setup="a=0")  

0.05626105699775508

By default the **accumulated time** for 1,000,000 statements is returned. This number can be changed with the `number` keyword:

In [16]:
timeit.timeit(stmt="a+=1",setup="a=0", number=1)

3.0810006137471646e-06

Use `timeit.repeat` to repeat the experiment multiple times:

In [17]:
timeit.repeat(stmt="a+=1",setup="a=0", number=10000, repeat=3)

[0.0005116499996802304, 0.0006180050004331861, 0.0005113170009281021]

## Timing user-defined functions requires a trick

`timeit` isolates the global namespace, so we need to pass the functions through the setup routine:

In [20]:
timeit.repeat(stmt="g.gridloop(myfunc)", 
              setup="from __main__  import g, myfunc", number=1)

[1.973618121999607, 1.9566386259994033, 1.9746652839985472]

## IPython's %timeit magic

IPython provides a magic to simplify timing code in the IPython shell

In [23]:
%timeit g.gridloop(myfunc) 

1 loop, best of 3: 1.95 s per loop


The `timeit` magic determins automatically how often to repeat the experiment.

In [23]:
%timeit?

## Python challenges

1. Time the `gridloop` function with the timeit module. 
2. Repeat the timing experiment 10 times and create a histogram of the results (use the `matplotlib.pyplot.hist` function of the result).
3. **Bonus** Explore the `timer` parameter of the timeit function. Try different timers and explain the differences?
3. **Bonus**: Can you improve the performance of the code?

## How can we speed up the program?

* `myfunc` is fairly straight-forward
  ```python
  def myfunc(x, y):
    return sin(x*y) + y
  ```
  Might be difficult to improve.
* What about `gridloop`?

## Recall that, `gridloop` was a function of the form

```python
def gridloop(self, f):
    lx = size(self.xcoor)
    ly = size(self.ycoor)
    a = zeros((lx,ly))

    for i in range(lx):
        x = self.xcoor[i]
        for j in range(ly):
            y = self.ycoor[j]
             a[i,j] = f(x, y)
    return a
```

It would be useful to see how much time is spend in each line!

## Line by line profiling

The `line_profiler` inspects the time spend in each line of a Python function. 

**Usage**:

1. Install with `pip install line_profiler`
2. "Decorate" the function of interest with `@profile`:
    ```python
    @profile
    def gridloop(func):
        # ...
    ```
3. Run line profiler with:
    ```bash
    kernprof -l -v grid2d_lineprofile.py
    ```

## Demo

In [34]:
!kernprof -l -v code/grid2d_lineprofile.py

Wrote profile results to grid2d_lineprofile.py.lprof
Timer unit: 1e-06 s

Total time: 2.70093 s
File: code/grid2d_lineprofile.py
Function: gridloop at line 11

Line #      Hits         Time  Per Hit   % Time  Line Contents
    11                                               @profile
    12                                               def gridloop(self, f):
    13         1            8      8.0      0.0          lx = size(self.xcoor)
    14         1            4      4.0      0.0          ly = size(self.ycoor)
    15         1           13     13.0      0.0          a = zeros((lx,ly))
    16                                           
    17      1002          441      0.4      0.0          for i in range(lx):
    18      1001          545      0.5      0.0              x = self.xcoor[i]
    19   1003002       457434      0.5     16.9              for j in range(ly):
    20   1002001       542272      0.5     20.1                  y = self.ycoor[j]
    21   1002001 

**Conclusion:** Most of the time is spend in loops and indexing => **Vectorization might be the answer!**

## A vectorised Grid2D implementation

In [52]:
class VectorisedGrid2D(object):
    def __init__(self,
                 xmin=0, xmax=1, dx=0.5,
                 ymin=0, ymax=1, dy=0.5):
        
        self.xcoor = arange(xmin, xmax+dx, step=dx)
        self.ycoor = arange(ymin, ymax+dy, step=dy)

    def gridloop(self, f):
        return f(self.xcoor[:,newaxis], self.ycoor[newaxis,:])  # Vectorized grid evaluation 

### Python challenge

2. Time the VectorisedGrid2D implementation.

## Timing the vectorised Grid2D

In [53]:
vg = VectorisedGrid2D(dx=0.001, dy=0.001)
timeit.repeat(stmt="vg.gridloop(myfunc)", setup="from __main__  import vg, myfunc", repeat=5, number=1)

[0.02455411600021762,
 0.025373734002641868,
 0.019323732998600462,
 0.017610052000236465,
 0.017593815999134677]

In [54]:
g = Grid2D(dx=0.001, dy=0.001)
timeit.repeat(stmt="g.gridloop(myfunc)", setup="from __main__  import g, myfunc", repeat=5, number=1)

[1.138242409000668,
 1.1633209499996156,
 1.1122100009997666,
 1.1192373499980022,
 1.1429081710011815]

**Vectorization yields a ca. 60 times speed improvement!**

## Profiling

A profile is a set of statistics that describes how often and for how long various parts of the program executed.

In [55]:
import cProfile
pr = cProfile.Profile()
res = pr.run("g.gridloop(myfunc)")  # res contains the statistics

## Viewing runtime statistics

In [56]:
res.print_stats()

         1002008 function calls in 1.295 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.428    0.428    1.295    1.295 <ipython-input-4-af8978036994>:11(gridloop)
  1002001    0.866    0.000    0.866    0.000 <ipython-input-6-cb0af04eab3f>:1(myfunc)
        1    0.000    0.000    1.295    1.295 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 fromnumeric.py:2659(size)
        1    0.000    0.000    1.295    1.295 {built-in method builtins.exec}
        1    0.001    0.001    0.001    0.001 {built-in method numpy.core.multiarray.zeros}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




**ncalls**: number of calls, **tottime**: total time spent in the given function excluding time made in calls to sub-functions, **percall**: tottime divided by ncalls, **cumtime**: cumulative time spent in this and all subfunctions, **percall**: cumtime divided by ncalls, **filename:lineno(function)**: information about the function