
# Speeding up your Python Code


## Pete Lawson, PhD

### October 10th, 2022

<i class="fa fa-github"> <a href ="https://github.com/pete-lawson">github.com/pete-lawson</a></i>

<i class="fa fa-envelope"> <a href ="plawson@jhu.edu">plawson@jhu.edu</a></i>

<img src="images/data-bytes-logo.png" align = "left" width = "300">


# Why speed up your `python` code?

**Lots to compute** - Your code is taking many minutes, hours, or days to run!

**Refactor** - Make your code more efficient, compact, and easier to read.

## Use List Comprehension

List comprehension is a more compact way to represent a for loop in Python code.

Lets take a look at a regular for loop:

In [1]:
list_length = 10
numbers = []

for val in range(list_length):
    numbers.append(val**2)

print(numbers)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


Now lets convert that to list comprehension syntax

In [None]:
## List Comprehension Syntax
newlist = [expression for item in iterableList if condition == True]

In [4]:
# Our for loop code
list_length = 10
numbers = []

for val in range(list_length):
    numbers.append(val**2)

print(numbers)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


In [5]:
# Now as a list comprehension
list_length = 10

numbers = [val**2 for val in range(list_length)]

print(numbers)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


By not explicitly calling `append` you can speed things up quite a bit!

By how much?

Maybe it's `time` for a lesson on timing our code!
![cat-clock](https://media3.giphy.com/media/xT1XGLSb5E1VjIUw4E/giphy.gif?cid=790b7611664fb51eaac4bf0c144b1eacfebb996c5d1dabc9&rid=giphy.gif&ct=g)
<p><small>Clock Cat Gif - Stephen Maurice Graham @smgdraws</small><//p>

### The `time` library

In [6]:
import time

start = time.time()
# My Timed Code
end = time.time()

runtime = end - start

print(f'Runtime is {runtime:.3f} seconds')

Runtime is 0.000 seconds


#### Timing a for loop

In [7]:
import time 

list_length = 1000000
start = time.time()

numbers = []
for x in range(list_length):
    numbers.append(x**2)

end = time.time()
        
runtime_loop = end - start

print(f'\nRuntime is {runtime_loop:.3f} seconds')


Runtime is 0.333 seconds


#### Timing list comprehension

In [8]:
start = time.time()

numbers = [x**2 for x in range(list_length)]   

end = time.time()
runtime_list_comp = end - start

print(f'\nRuntime is {runtime_list_comp:.3f} seconds')
print(f'\nSquaring a list with list comprehension is {runtime_loop/runtime_list_comp:.2f} times faster than a for loop')


Runtime is 0.258 seconds

Squaring a list with list comprehension is 1.29 times faster than a for loop


#### Time for another timing command: `%%timeit` magic command

In [9]:
%%timeit -r 3 -n 10
numbers = [x**2 for x in range(list_length)]   

244 ms ± 3.79 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)


### Using the `NumPy` Library - *Lets get close to the metal!*

<img src="https://upload.wikimedia.org/wikipedia/commons/0/03/Ordinateurs_centraux_348-3-006.jpg" alt="man with mainframe" style="height:400px;">
<small>IBM 360 mainframe computer, CPU light/switch panel, 4 visible Magnetic Tape Reader/Writers, "Selectric" "teletype" console, (not seen, cables under raised floor) <br>Photograph by Yves Tessier, distributed under a CC BY-SA 4.0 license.</small>

### `NumPy`: Scientific Computing in Python
NumPy consists of mathematical functions, linear algebra routines, and array operations, all running in well-optimized C code. 

Which is computer science speak for:
>**It's fast!**

Lets try implementing the original for loop in NumPy:

In [96]:
import numpy as np
start = time.time()

np_numbers = np.array([range(list_length)])
np_numbers_squared = np.square(np_numbers)

end = time.time()

runtime_np = end - start

print(f'\nRuntime is {runtime_np:.3f} seconds')
print(f'\nSquaring a list with numpy is {runtime_loop/runtime_np:.2f} times faster than a for loop')


Runtime is 0.076 seconds

Squaring a list with numpy is 4.18 times faster than a for loop


If you have an Nvidia **G**rahical **P**rocessing **U**nit (GPU) ...
![nvidia-brand-logo](https://www.nvidia.com/content/dam/en-zz/Solutions/about-nvidia/logo-and-brand/01-nvidia-logo-horiz-500x200-2c50-d.png)

Then you can use `CuPy`!
![CuPy Logo](https://repository-images.githubusercontent.com/93458756/a85df740-1aec-4a39-a5c9-4d219c1ac0f7)

`CuPy` is a drop-in replacement to `NumPy` and `SciPy` that allows your code to automagically run on your GPU!

Why is `CuPy` faster?

![cpu-vs-gpu](https://qph.cf2.quoracdn.net/main-qimg-29198fbe7b18570f369a93a7dc3f999a-lq)

A **CPU** can do a few operations at once

A **GPU** can do many thousands of (simple) operations at once

Lets create a 2000 x 2000 array in NumPy, then square it!

In [None]:
n = 2000

start = time.time()

np_array = np.random.random((n, n))C
np_array_squared = np.square(np_array)

end = time.time()

runtime_np_array = end - start

print(f'\nRuntime is {runtime_np_array*1000:.3f} milliseconds')

In [83]:
import cupy as cp

start = time.time()

cp_array = cp.random.random((n, n))
cp_array_squared = cp.square(cp_array)

end = time.time()

runtime_cp_array = end - start

print(f'\nRuntime is {runtime_cp_array*1000:.3f} milliseconds')
print(f'\nSquaring an array with CuPy is {runtime_np_array/runtime_cp_array:.2f} times faster than NumPy')


Runtime is 0.998 milliseconds

Squaring an array with CuPy is 48.01 times faster than NumPy


## Working with data in `Pandas`

In [52]:
import pandas as pd

def getData(length):
    df = pd.DataFrame()
    df['age'] = np.random.randint(0, 100, length)
    df['cat_breed'] = np.random.choice(['Calico', 'Maine Coon', 'Siamese'], length)
    return(df)

df = getData(10_000)
df.head()

Unnamed: 0,age,cat_breed
0,16,Calico
1,72,Siamese
2,10,Siamese
3,5,Calico
4,26,Maine Coon


How many individuals between 30 and 45 with a Calico cat?

In [75]:
%%timeit -r 5 -n 10

# Computed with a loop
def cat_keeper(row):
    if((row['age']>= 30 and row['age']<=45) and row['cat_breed'] == 'Calico'):
        return(True)
    return(False)

matches = []       

for index, row in df.iterrows():
    matches.append(cat_keeper(row))

loop_match_count = sum(matches)

373 ms ± 1.93 ms per loop (mean ± std. dev. of 5 runs, 10 loops each)


In [71]:
%%timeit -r 5 -n 10

# Computed with vectorized operations
vector_match_count = sum(((df['age'] >= 30) & (df['age'] <= 4|5)) & (df['cat_breed'] == 'Calico'))

1.4 ms ± 83.3 µs per loop (mean ± std. dev. of 5 runs, 10 loops each)


## Summary

- Use list comprehension rather than for loop

- Use NumPy for simple list/array operations

- Use CuPy for simple list/array operations if:
  - The list or array is large enough
  - The operation is simple
  - You have an Nvidia GPU

- If you are using Pandas, use vectorized operations, not loops