# Lab

Using timeit, compare the performance of searching text using a compiled versus an uncompiled regular expression:

In [1]:
text = '''The quick brown fox jumps over the lazy dog'''

In [2]:
import re
pattern = 'fox'
re_fox = re.compile(pattern)

In [3]:
# Compare these two approaches
re.search('fox', text)
re_fox.search(text)

<re.Match object; span=(16, 19), match='fox'>

In [4]:
%timeit re.search('fox', text)

513 ns ± 20.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [5]:
%timeit re_fox.search(text)

126 ns ± 1.89 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


Use cProfile and pstats to profile a function that uses an uncompiled `re.search` to search the text

In [6]:
%%file data/profiling/profiletest.py
import re

text = '''The quick brown fox jumps over the lazy dog'''
for x in range(10000):
    re.search('fox', text)

Overwriting data/profiling/profiletest.py


In [7]:
%%prun
import re
text = '''The quick brown fox jumps over the lazy dog'''

for n in range(10_000):
    re.search('fox', text)

 

In [8]:
!python -m cProfile -s time data/profiling/profiletest.py

         40096 function calls (40095 primitive calls) in 0.009 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.003    0.000    0.007    0.000 re.py:198(search)
    10000    0.002    0.000    0.003    0.000 re.py:289(_compile)
        1    0.002    0.002    0.009    0.009 profiletest.py:1(<module>)
    10000    0.001    0.000    0.001    0.000 {method 'search' of 're.Pattern' objects}
    10009    0.001    0.000    0.001    0.000 {built-in method builtins.isinstance}
        1    0.000    0.000    0.000    0.000 sre_parse.py:493(_parse)
        1    0.000    0.000    0.000    0.000 sre_compile.py:759(compile)
        1    0.000    0.000    0.000    0.000 sre_compile.py:536(_compile_info)
        3    0.000    0.000    0.000    0.000 sre_parse.py:172(append)
        1    0.000    0.000    0.000    0.000 sre_parse.py:174(getwidth)
        1    0.000    0.000    0.000    0.000 sre_parse.py:937(pars

Instrument the function with the following decorator, and use %timeit to compare the profiling overhead between profiling 1% of the time and 100% of the time

In [9]:
import random
import functools

def instrument(profiler, probability=0.10):
    '''Profile some of the calls to the decorated function.
    
    The default probability of profiling a call is 10%.
    '''
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            if random.random() < probability:
                try:
                    profiler.enable()
                    return func(*args, **kwargs)
                finally:
                    profiler.disable()
            else:
                return func(*args, **kwargs)
        return wrapper
    return decorator

In [12]:
import cProfile
p = cProfile.Profile()

@instrument(p, 0.01)
def many_searches(pattern, text, n=100):
    for i in range(n):
        re.search(pattern, text)

In [13]:
%timeit for x in range(100): many_searches('fox', text)

5.13 ms ± 95.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [None]:
# nopro: 5.05ms
# 0%: 5.05ms
# 1%: 5.18ms
# 100%: 11.1ms

In [12]:
def many_searches_noprof(pattern, text, n=100):
    for i in range(n):
        re.search(pattern, text)

In [13]:
%timeit for x in range(100): many_searches_noprof('fox', text)

5.05 ms ± 173 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [20]:
p.print_stats(sort='time')

         32602200 function calls in 8.834 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  8110000    2.421    0.000    3.384    0.000 re.py:289(_compile)
  8110000    2.411    0.000    6.973    0.000 re.py:198(search)
    81100    1.844    0.000    8.816    0.000 <ipython-input-16-865406956896>:4(many_searches)
  8110000    1.177    0.000    1.177    0.000 {method 'search' of 're.Pattern' objects}
  8110000    0.963    0.000    0.963    0.000 {built-in method builtins.isinstance}
    81100    0.018    0.000    0.018    0.000 {method 'disable' of '_lsprof.Profiler' objects}


