## Make your Python Code Run Faster

Dr. David Perry


### Me
* Compute Integration Specialist at Research Platform Services (ResPlat)
* I ❤️ Python

### What's a ResPlat?
* Cloud & High Performance Computing
* Data Storage & Management
* Training & Community
* Part of University Services — here for all academic departments

So you probably already have basics down for intro course if you are here.

I touch on HPC and cloud here, but the details are covered in seperate courses (or you can work it out on your own from docs).

I'll mention the Pandas library, but that's a rich package in of itself, enough to fill a whole other course.

<center><img src="resbaz.png" width="75%"></center>

### Related Courses

* Intro to Python
* HPC
* Cloud Computing
* Intro to Pandas

### Today
* Part 1: Profiling (why is my code slow)
* Part 2: Optimization (fix my slow code)
* Part 3: Parallellization (run multiple bits of code at once)
* Part 4: Cloud & High Performance Computing (f%^$ it, gimme a bigger computer please)

**Goal**: Less waiting, more research.

### The Buffet Disclaimer

<center><img src="daisies.jpg" width="75%"></center>

<center><img src="too_much.gif" width="75%"></center>

I'm going to unleash upon you a veritable feast of tools, technologies, jargon and acronyms. 

But just because it's on the table doesn't mean you have to eat it! 

And you'll probably forget most of the details when you wake up tomorrow.

That's okay! 

These slides are here for you later, Google is your friend, and we're here to help. If nothing else, remember that you don't have to tolerate slow code if you don't want to, you do have options.

### After the Hangover
* Link to slides/code: https://github.com/resbaz/high-performance-python-course
* I'm here to help: perry.d@unimelb.edu.au
    

## Part 1: Profiling
    

Before we do anything else, we should figure out where our code is slow. It's no use throwing a bunch of fancy tricks at the problem if you don't know where it lies. Profiling will help you figure out what's taking the most time in your code, and focus your efforts there.

### iPython Timing Magic

In [None]:
%time a = "abc"

In [None]:
%timeit -n 100 a = "abc"

The easiest thing is to just time your code -- these are built into Jupyter/IPython. Timeit will run your code multiple times, in case it's variable.

You can try different things, and see how it impacts execution time.

### Python Profiler

In [None]:
import time

def batman():
    out = ""
    for i in range(15):
        out += "na"
    out += "\n batman!"
    time.sleep(1)
    print(out)

%prun batman()

Next level up is the built-in Python profiler. This will tell you what functions you're calling, how many times, and how long they take.

That can help, but is sometimes cryptic when it describes underlying Python functions.

BTW: If you have an IDE like PyCharm, it might give you other handy tools like call graphs an inline heatmaps.

In [None]:
%load_ext line_profiler 
# ^ Not installed by default, you'll need to DIY - `pip install line_profiler`

import time
import math

def batman():
    out = ""
    for i in range(15):
        out += "na"
        a = i**i * math.log(i+1) * math.factorial(i*10)  # Oh no, math!
    out += "\n batman!"
    print(out)

%lprun -f batman batman()

More helpful I think is the line profiler, this will describe how long you spend on each line in a program. You can even get a heatmap!

Line profiling is not built into Python, you'll have to install it.

In [None]:
%load_ext heat
# ^ Also not installed by default, 'pip install py-heat-magic'

In [None]:
%%heat
import time
import math

def batman():
    out = ""
    for i in range(15):
        out += "na"
        a = i**i * math.log(i+1) * math.factorial(i*10)  # Oh no, math!
    out += "\n batman!"
    print(out)
    
batman()

### Memory Profiling

In [None]:
%%writefile mem_profile_example.py

import random
def make_big_array():
    out = []
    for i in range(100000):
        out.append(random.random())
    return out


In [None]:
%load_ext memory_profiler
from mem_profile_example import make_big_array

%mprun -f make_big_array make_big_array()

Speed is one thing, but you also have a finite amount of memory (RAM) on your computer. If you're dealing with big datasets, your bottleneck might be memory, which limits how much data you can analyse at once, or maybe prevents you from loading your data altogether! Different Python libraries and programming techniques can have drastic impacts on memory usage, and if you run into problems you can use memory_profiler to investigate (there are other libraries for this as well).





Learn More

* %time and %timeit (built-in): http://ipython.readthedocs.io/en/stable/interactive/magics.html
* Line Profiler: https://github.com/rkern/line_profiler
* ...with heat map: https://github.com/csurfer/pyheat
* Memory Profiler: https://github.com/pythonprofilers/memory_profiler


## Challenges!

Go here: http://go.unimelb.edu.au/dgh6

Select challenge (1 to 3), and click 'Clone' to start your own instance.

You can login using your University email, or create a (free) Microsoft account.

**Challenge 1**
* Second approach a little slower because of function call overhead (tradeoff with readability?)


**Challenge 2**
* Second approach a little faster (maybe) because we avoid a call to 'append' and list comprehensions can be performed more efficiently by Python interpreter.


**Challenge 3**
* Reading CSV in chunks uses much less memory, but a little slower.