# Lab

Using timeit, compare the performance of searching text using a compiled versus an uncompiled regular expression:

In [1]:
text = '''The quick brown fox jumps over the lazy dog'''

In [2]:
import re
pattern = 'fox'
re_fox = re.compile(pattern)

In [3]:
# Compare these two approaches
re.search('fox', text)
re_fox.search(text)

<re.Match object; span=(16, 19), match='fox'>

In [4]:
%timeit re.search('fox', text)

904 ns ± 9.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [5]:
%timeit re_fox.search(text)

239 ns ± 4.64 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


Use cProfile and pstats to profile a function that uses an uncompiled `re.search` to search the text

In [None]:
%%file data/profiling/profiletest.py
import re
def make_re(pattern):
    return re.compile(pattern)

text = '''The quick brown fox jumps over the lazy dog'''
for x in range(10000):
    re.search('fox', text)

In [6]:
!python -m cProfile -s time data/profiling/profiletest.py

         40096 function calls (40095 primitive calls) in 0.014 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.004    0.000    0.005    0.000 re.py:271(_compile)
    10000    0.003    0.000    0.011    0.000 re.py:180(search)
        1    0.003    0.003    0.014    0.014 profiletest.py:1(<module>)
    10000    0.002    0.000    0.002    0.000 {method 'search' of 're.Pattern' objects}
    10009    0.002    0.000    0.002    0.000 {built-in method builtins.isinstance}
        1    0.000    0.000    0.000    0.000 sre_compile.py:759(compile)
        1    0.000    0.000    0.000    0.000 sre_parse.py:475(_parse)
        1    0.000    0.000    0.000    0.000 sre_compile.py:536(_compile_info)
        1    0.000    0.000    0.000    0.000 sre_parse.py:174(getwidth)
        4    0.000    0.000    0.000    0.000 sre_parse.py:233(__next)
        1    0.000    0.000    0.000    0.000 sre_parse.py:919(pars

Instrument the function with the following decorator, and use %timeit to compare the profiling overhead between profiling 1% of the time and 100% of the time

In [9]:
import random
import functools

def instrument(profiler, probability=0.10):
    '''Profile some of the calls to the decorated function.
    
    The default probability of profiling a call is 10%.
    '''
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            if random.random() < probability:
                try:
                    profiler.enable()
                    return func(*args, **kwargs)
                finally:
                    profiler.disable()
            else:
                return func(*args, **kwargs)
        return wrapper
    return decorator

In [13]:
import cProfile
p = cProfile.Profile()
@instrument(p, 00)
def make_re(pattern):
    return re.compile(pattern)


In [14]:
%timeit for x in range(10000): make_re('fox')

10.2 ms ± 70.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [None]:
# 0%: 6.88 ms
# 1%: 9 ms
# 100%: 18 ms

In [15]:
def make_re(pattern):
    return re.compile(pattern)

In [16]:
%timeit for x in range(10000): make_re('fox')

7.63 ms ± 61.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
