# Python Performance Techniques
This notebook explores some techniques for improving the speed of Python programs.
Note the times below each cell, and compare them between implementations.
If you run the same cells on your computer, you'll get different times, but the *ratios* between different implementations should stay about the same. 

## Use built-in functions

In [8]:
%%time
for _ in xrange(1000):
    assert 'x' in 'abcdefghijklmnopqrstuvwxyz'

CPU times: user 142 µs, sys: 119 µs, total: 261 µs
Wall time: 181 µs


In [10]:
%%time
def find_char(c, string):
    for c1 in string:
        if c == c1:
            return True
    return False

for _ in xrange(1000):
    assert find_char('x', 'abcdefghijklmnopqrstuvwxyz')

CPU times: user 2.26 ms, sys: 633 µs, total: 2.9 ms
Wall time: 2.56 ms


In [12]:
%%time
def find_char(c, string):
    for i in range(len(string)):
        if string[i] == c:
            return True
    return False

for _ in xrange(1000):
    assert find_char('x', 'abcdefghijklmnopqrstuvwxyz')

CPU times: user 2.61 ms, sys: 848 µs, total: 3.46 ms
Wall time: 2.88 ms


In [14]:
%%time
def find_char(c, string):
    i = 0
    while i < len(string):
        if string[i] == c:
            return True
        i += 1
    return False

for _ in xrange(1000):
    assert find_char('x', 'abcdefghijklmnopqrstuvwxyz')

CPU times: user 4.16 ms, sys: 1.14 ms, total: 5.3 ms
Wall time: 4.4 ms


## Use the right data structure

In [75]:
%%time
import random

def generate_some_random_strings(count=1000):
    strings = []
    letters = string.letters
    for _ in xrange(count):
        word = ''.join(random.sample(letters, len(letters))[:random.randint(5, 10)])
        strings.append(word)
    return strings

random_string_array = generate_some_random_strings()
for _ in xrange(1000):
    w = random.choice(random_string_array)
    assert w in random_string_array

CPU times: user 33.3 ms, sys: 3.96 ms, total: 37.3 ms
Wall time: 34.6 ms


In [76]:
%%time
import random

random_string_array = generate_some_random_strings(2)
random_string_set = set(random_string_array)
for _ in xrange(1000):
    w = random.choice(random_string_array)
    assert w in random_string_set

CPU times: user 874 µs, sys: 556 µs, total: 1.43 ms
Wall time: 1.05 ms


## Re-use Computed Values

Here's a function that tests whether a candidate string is a member of a list of strings, ignoring case. (So `"Monday"` and `"monday"` are the same, for purposes of this test.) `candidate_string.lower()` is computed each time through the loop, even though its value stays the same each time.

In [59]:
%%time
import random
import string

def case_insensitive_membership_test(candidate_string, strings):
    """
    Examples:
    >>> days_of_week = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
    >>> case_insensitive_membership_test('monday', days_of_week)
    True
    >>> case_insensitive_membership_test('TUESDAY', days_of_week)
    True
    >>> case_insensitive_membership_test('wEdNeSdAy', days_of_week)
    True
    >>> case_insensitive_membership_test('February', days_of_week)
    False
    """
    for w in strings:
        if candidate_string.lower() == w.lower():
            return True
    return False

import doctest
#doctest.run_docstring_examples(case_insensitive_membership_test, globals())

def generate_some_random_strings():
    strings = []
    letters = string.letters
    for _ in xrange(1000):
        word = ''.join(random.sample(letters, len(letters))[:random.randint(5, 10)])
        strings.append(word)
    return strings

strings = generate_some_random_strings() + ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
for _ in xrange(1000):
    assert case_insensitive_membership_test('monday', strings)

CPU times: user 446 ms, sys: 7.14 ms, total: 453 ms
Wall time: 452 ms


“[Hoisting](https://en.wikipedia.org/wiki/Loop-invariant_code_motion)” the computation of `candidate_string.lower()` outside the loop improves the performance.

In [77]:
%%time
import random
import string

def case_insensitive_membership_test(candidate_string, strings):
    """
    Examples:
    >>> days_of_week = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
    >>> case_insensitive_membership_test('monday', days_of_week)
    True
    >>> case_insensitive_membership_test('TUESDAY', days_of_week)
    True
    >>> case_insensitive_membership_test('wEdNeSdAy', days_of_week)
    True
    >>> case_insensitive_membership_test('February', days_of_week)
    False
    """
    candidate_string = candidate_string.lower()
    for w in strings:
        if candidate_string == w.lower():
            return True
    return False

import doctest
#doctest.run_docstring_examples(case_insensitive_membership_test, globals())

strings = generate_some_random_strings() + ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
for _ in xrange(1000):
    assert case_insensitive_membership_test('monday', strings)

CPU times: user 312 ms, sys: 5.83 ms, total: 318 ms
Wall time: 315 ms


## Pre-process re-used data into a more efficient data structure

If we're searching for a number of different candidate strings within the *same* list, we can hoist the work that's repeated for each function call, outside the function. In this case, instead of calling `s.lower()` for each string in the list each time we search it, we create a data structure to which this has already been done, and re-use it for each search.

This *helps* the pattern of usage below, where there's one list that we search multiple times. It would *hurt* in a different usage pattern (not shown), where the list is different each time. (Why?)

In [60]:
%%time
def make_case_insensitive_list(strings):
    """Construct a data structure that can be searched quickly for a string, ignoring case."""
    return [s.lower() for s in strings]

def search_case_insensitive_list(candidate_string, lowercase_strings):
    """
    Examples:
    >>> days_of_week = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
    >>> case_insensitive_list = make_case_insensitive_list(days_of_week)
    >>> search_case_insensitive_list('monday', case_insensitive_list)
    True
    >>> search_case_insensitive_list('TUESDAY', case_insensitive_list)
    True
    >>> search_case_insensitive_list('wEdNeSdAy', case_insensitive_list)
    True
    >>> case_insensitive_membership_test('February', days_of_week)
    False
    """
    string = string.lower()
    for w in lowercase_strings:
        if candidate_string == w:
            return True
    return False

import doctest
#doctest.run_docstring_examples(search_case_insensitive_list, globals())

strings = generate_some_random_strings() + ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
case_insensitive_list = make_case_insensitive_list(strings)
for _ in xrange(1000):
    assert search_case_insensitive_list('monday', case_insensitive_list)

CPU times: user 73.4 ms, sys: 3.64 ms, total: 77 ms
Wall time: 74.9 ms


Note that this simplifies the program to where we can apply the "use a built-in function" technique:

In [62]:
%%time
def make_case_insensitive_list(strings):
    """Construct a data structure that can be searched quickly for a string, ignoring case."""
    return [string.lower() for string in strings]

def search_case_insensitive_list(string, lowercase_strings):
    """
    Examples:
    >>> days_of_week = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
    >>> case_insensitive_list = make_case_insensitive_list(days_of_week)
    >>> search_case_insensitive_list('monday', case_insensitive_list)
    True
    >>> search_case_insensitive_list('TUESDAY', case_insensitive_list)
    True
    >>> search_case_insensitive_list('wEdNeSdAy', case_insensitive_list)
    True
    >>> case_insensitive_membership_test('February', days_of_week)
    False
    """
    return string.lower() in lowercase_strings

import doctest
#doctest.run_docstring_examples(search_case_insensitive_list, globals())

strings = generate_some_random_strings() + ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
case_insensitive_list = make_case_insensitive_list(strings)
for _ in xrange(1000):
    assert search_case_insensitive_list('monday', case_insensitive_list)

CPU times: user 38.9 ms, sys: 5.73 ms, total: 44.6 ms
Wall time: 40.8 ms


And now we can apply "use the right data structure", for the fastest time yet: 30.8 ms, down from an initial 453 ms.

In [65]:
%%time
def make_case_insensitive_set(strings):
    """Construct a data structure that can be searched quickly for a string, ignoring case."""
    return set([string.lower() for string in strings])

def search_case_insensitive_set(string, lowercase_strings):
    """
    Examples:
    >>> days_of_week = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
    >>> make_case_insensitive_set = make_case_insensitive_list(days_of_week)
    >>> search_case_insensitive_set('monday', case_insensitive_list)
    True
    >>> search_case_insensitive_set('TUESDAY', case_insensitive_list)
    True
    >>> search_case_insensitive_set('wEdNeSdAy', case_insensitive_list)
    True
    >>> search_case_insensitive_set('February', days_of_week)
    False
    """
    return string.lower() in lowercase_strings

import doctest
#doctest.run_docstring_examples(search_case_insensitive_list, globals())

strings = generate_some_random_strings() + ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
case_insensitive_list = make_case_insensitive_list(strings)
for _ in xrange(1000):
    assert search_case_insensitive_list('monday', case_insensitive_list)

CPU times: user 27.1 ms, sys: 3.73 ms, total: 30.8 ms
Wall time: 28.6 ms


## Caching (Memoization)

Here's another use of caching. `fib` is the naive implementation of the [Fibonacci function](https://en.wikipedia.org/wiki/Fibonacci_number). 

Using this function to compute `fib(n)` takes time proportional to $\phi^n$, where $\phi$ is the Golden Ration ${1 + \sqrt(5)} \over 2$. This is because it calls `fib(n-1)` once, `fib(n-2)` twice, `fib(n-3)` three times, `fib(n-4)` five times, and so on. In other words, it's *really slow*.

In [89]:
%%time
def fib(n):
    """
    Examples:
    >>> fib(0)
    0
    >>> fib(2)
    1
    >>> [fib(i) for i in xrange(10)]
    [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
    """
    if n <= 1:
        global c
        if n == 0: c += 1
        return n
    else:
        return fib(n - 2) + fib(n - 1)

import doctest
#doctest.run_docstring_examples(fib, globals())

fib(30)

CPU times: user 424 ms, sys: 3.75 ms, total: 428 ms
Wall time: 428 ms


`fib(n)` is the same each time we call it. Instead of computing it each time, save the result and re-use it. This is an order of magnitude faster for `fib(30)`; the improvement is larger for larger numbers.

In [91]:
%%time

fib_cache = {}
# This is a dictionary from `n` to `fib(n)`. It's a global variable, so give it a name
# that is hopefully unique.

# the original fib function, except it calls itself via `cached_fib` instead of directly
def fib(n):
    """
    Examples:
    >>> fib(0)
    0
    >>> fib(2)
    1
    >>> [fib(i) for i in xrange(10)]
    [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
    """
    if n <= 1:
        return n
    else:
        return cached_fib(n - 2) + cached_fib(n - 1)

def cached_fib(n):
    x = fib_cache.get(n)
    if x is None:
        x = fib(n)
        fib_cache[n] = x
    return x

import doctest
#doctest.run_docstring_examples(fib, globals())

fib(30)

CPU times: user 39 µs, sys: 19 µs, total: 58 µs
Wall time: 48.2 µs
