In [None]:
%run ../../DataFiles_and_Notebooks/talktools.py

# Making Python Faster

<img src="slides/warp1.jpg">


<h2> ... and using legacy code</h2>

- we've already seen `numexpr`, parallelization, etc.

- Python is _slow_ ... it's interpreted on the fly

- no static typing ... even integers are objects (bulky memory!)

- what if we want to write Python, but use it as a *glue* to fast C-code?

<pre>
Premature optimization is the root of all evil 
   -- Donald Knuth
</pre>

<pre>
C is premature optimization
   -- Josh Bloom
</pre>


## Profiling ##

You already know that Python comes with batteries include, and performance profiling is no exception.

You can keep track of how much time each *function* is taking up using tools from the standard library.

Here's the [documentation of `profile` and `cProfile`](http://docs.python.org/3/library/profile.html), but you probably won't need to use them directly.  A profile is a set of statistics that describes how often and for how long various parts of the program executed. These statistics can be formatted into reports via the pstats module.

In [None]:
import cProfile
import re
cProfile.run('re.compile("ay250|berkeley")') # run a piece of code

In [None]:
import cProfile, pstats, io
pr = cProfile.Profile()
pr.enable()

# here's the code you want to profile
def waste_of_time(n=1000):
    [x for x in range(n)]
[waste_of_time(y) for y in range(10000)]
## end of code you want to profile

pr.disable()
s = io.StringIO()
sortby = 'cumulative'
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print(s.getvalue())

## There's a magic for that!

### `%timeit` to learn how long it takes a chunk of code to run

### `%prun` for function-by-function breakdown of code in your namespace

### `%run -p` for function-by-function breakdown of running a whole file

In [None]:
cd demos/profile

In [None]:
!cat sometask.py

In [None]:
import sometask

In [None]:
%prun sometask.execute()

In [None]:
%run -p sometask.py

With the `-D` flag, you can dump the profile to binary file that external tools can use.


You  can also produce this .profile without Jupyter using:

```bash
    python -m cProfile -o sometask.profile sometask.py
```

In [None]:
%run -p -D sometask.profile sometask.py

In [None]:
!head sometask.profile

In [None]:
!pip install snakeviz py-heat-magic

In [None]:
!snakeviz sometask.profile

In [None]:
%load_ext snakeviz

In [None]:
%%snakeviz
sometask.execute()

## Profiling line-by-line

    %lprun
    
This magic is not built into Jupyter, it is provided by the [line_profiler package by Robert Kern](http://pythonhosted.org/line_profiler/).

In [None]:
#!conda install line_profiler -y

In [None]:
%load_ext line_profiler

Run the code, but only run the line profiler on the function `square()`

In [None]:
import sometask

In [None]:
!cat sometask.py

In [None]:
%lprun -f sometask.square sometask.execute()

In [None]:
%load_ext heat 

In [None]:
import numpy as np

In [None]:
%%heat 
import numpy as np
def expensive_square(x):
    x = x.copy()
    y = x.copy()
    for i in range(x.size):
        x[i] = x[i] ** 2
    
    del y
    return x

def cheap_square(x):
    return x**2

square = expensive_square
square = cheap_square

def execute():
    print("Squaring some numbers...")
    x = np.arange((5000))
    y = square(x)
    
execute()

## Profiling memory usage (line-by-line)

    %mprun
    
        
This magic is not built into Jupyter, it is provided by the [memory_profiler package by Fabian Pedregosa](https://pypi.python.org/pypi/memory_profiler).

In [None]:
!pip install  -U memory_profiler

In [None]:
%load_ext memory_profiler

In [None]:
%mprun?

In [None]:
cd demos/profile

In [None]:
import sometask

In [None]:
!cat sometask.py

In [None]:
%mprun -f sometask.square sometask.execute()

# Other tools for profiling:

[gprof](http://en.wikipedia.org/wiki/Gprof) -- command line profiling tool for C code. [GNU Gprof documentation](https://sourceware.org/binutils/docs/gprof/) is pretty good.

[valgrind](http://valgrind.org/docs/manual/cl-manual.html) -- very complex suite for analyzing callgrind and kcachegrind. "*Valgrind is an instrumentation framework for building dynamic analysis tools. There are Valgrind tools that can automatically detect many memory management and threading bugs, and profile your programs in detail.*"

[memray](https://github.com/bloomberg/memray) track memory allocations in Python code, in native extension modules, and in the Python interpreter itself. It can generate several different types of reports to help you analyze the captured memory usage data. Works on linux.

[Timing and Profiling in IPython](http://pynash.org/2013/03/06/timing-and-profiling/) : Blogpost at the PyNash

Calculate the variance

$$\sigma^2 = \frac {\sum_{i=1}^N (x_i - \sum_{j=1}^N x_j/N)^2}{N - 1}. $$

problem here is that you must first pass over all the data (to get the mean) then pass over the data again.  There's a bunch of Pythonic ways to do this. Here's a few...

In [None]:
#%%writefile var.py
def variance(data):
    sample_mean = 0.0
    
    # 1st loop
    for x in data:
        sample_mean += float(x)
    
    sample_mean /= len(data)
    
    # second loop
    sum_of_squared_errors = 0.0
    for x in data:
        sum_of_squared_errors += (float(x) - sample_mean) ** 2
    
    return sum_of_squared_errors / (float(len(data)) - 1.0)
    
def variance0(data):
    sample_mean = sum(data) / len(data) # loop 1
    sum_of_squared_errors = sum((i - sample_mean) ** 2 for i in data) # loop 2
    return sum_of_squared_errors / (len(data) - 1)

import functools
def variance1(data):
    mean = float(functools.reduce(lambda x,y : x+y, data)) / len(data)
    return functools.reduce(lambda x,y: x+y, map(lambda xi: (xi-mean)**2, data))/ (len(data) - 1)

def execute():
    variance(range(100000))
    
def execute1():
    variance1(range(100000))

In [None]:
%timeit variance(range(100000))

In [None]:
%timeit variance0(range(100000))

In [None]:
%timeit variance1(range(100000))

We'd like to do this with just one pass over the data. Have a look at Welford's Method (1962):

In [None]:
def online_variance(data):
    mean,M2= 0.,0.
    for n,d in enumerate(data):
        delta = d - mean
        mean += delta/(n + 1)
        M2 += delta*(d - mean)  
    return M2/n

In [None]:
%timeit online_variance(range(100000))

In [None]:
import var
import importlib
importlib.reload(var)

In [None]:
%load_ext line_profiler

In [None]:
%lprun -f var.variance var.execute()