# Module: Debugging and Profiling

## Using `pdb`

In [6]:
import pandas as pd

def do_something():
    df = pd.read_csv("https://s3.amazonaws.com/python-level-2/sales-funnel.csv")

    print(df.head())
    print(df[df['Status'] == 'pending'].count())
    
    df['Amount'] = df['Quantity'] * df['Price']
    print(df[df['Status'] == 'pending']['Amount'].sum())
    return df

## Let's step through
`import pdb; pdb.set_trace()`

1. l
2. ll
3. print values
4. step into (s)
5. step up from (u)
6. investigate, introspect
6. go to the next (n)
7. continue through

## Exercise: Investigation
1. Find out what is tail
2. While in the pdb, find out what else you can do with `os.path`

In [10]:
import os


def get_path(filename):
    """Return file's path or empty string if no path."""
    head, tail = os.path.split(filename)
    return head


## Exercise
Fix the following to do a proper conversion from the base to the destination currency.

We will be using [this Rates API](https://ratesapi.io/documentation/) for currency exchange rates.

In [1]:
def get_rate(base_currency, to_currency):
    res = requests.get('https://api.ratesapi.io/api/latest')
    return res[to_currency]

In [8]:
get_rate('EUR', 'GBP')

## Lab
Let's understand and [fix this code](https://raw.githubusercontent.com/AllenDowney/ThinkPython/master/code/Card.py)

Drop `pdb`s throughout `card.py` to find out what is going on!

In [2]:
# %run -i 'card.py'

### Exercise  A1
What's the issue?

```python
SyntaxError: invalid syntax
```

### A brief review of lambdas

In [20]:
f = lambda val: val * 2

In [22]:
sorted(['hello', 'banana' 'apple', 'coffee', 'hi', 'yoo'], key=len)

['hi', 'yoo', 'hello', 'coffee', 'bananaapple']

In [23]:
movies = [
    {"imdb": 7.2, "title": "The Little Rascals"},
    {"imdb": 8.5, "title": "The Usual Suspects"},
    {"imdb": 5.5, "title": "Pokemon The Movie"},
    {"imdb": 6.7, "title": "Guardians of the Galaxy 2"}
]

### Exercise
- Sort the movies by `imdb` score ascending
- Sort by `imdb` score descending

### Exercise A2
- Drop a `pdb` before the next failing line and find out what is going on.
- Why the `TypeError`?
- HINT: What is `self.cards`?
- HINT: Do a manual comprison of two of them
- HINT: 2 different ways to fix this next bug, either Python 3 magic methods OR `key` 

### Exercise A3
Find out what is going on here `find_defining_class`

## Profiling Basics

In [33]:
import cProfile

In [37]:
# cProfile.run("2+2")

### Exercise
Profile the following code

In [38]:
def recip_square(i):
    return 1./i**2

def approx_pi(n=10000000):
    val = 0.
    for k in range(1,n+1):
        val += recip_square(k)
    return (6 * val)**.5

### Exercise
Optimize it. What can we do? Let's come up with ideas.

In [39]:
cProfile.run("do_something()")

   Account                          Name            Rep       Manager  \
0   714466               Trantow-Barrows   Craig Booker  Debra Henley   
1   714466               Trantow-Barrows   Craig Booker  Debra Henley   
2   714466               Trantow-Barrows   Craig Booker  Debra Henley   
3   737550  Fritsch, Russel and Anderson   Craig Booker  Debra Henley   
4   146832                  Kiehn-Spinka  Daniel Hilton  Debra Henley   

       Product  Quantity  Price     Status  
0          CPU         1  30000  presented  
1     Software         1  10000  presented  
2  Maintenance         2   5000    pending  
3          CPU         1  35000   declined  
4          CPU         2  65000        won  
Account     4
Name        4
Rep         4
Manager     4
Product     4
Quantity    4
Price       4
Status      4
dtype: int64
105000
         18556 function calls (18282 primitive calls) in 0.223 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:li

        8    0.000    0.000    0.000    0.000 common.py:283(is_null_slice)
        1    0.000    0.000    0.000    0.000 common.py:302(infer_compression)
       25    0.000    0.000    0.000    0.000 common.py:329(apply_if_callable)
       15    0.000    0.000    0.000    0.000 common.py:348(is_datetime64_dtype)
       47    0.000    0.000    0.000    0.000 common.py:381(is_datetime64tz_dtype)
       38    0.000    0.000    0.000    0.000 common.py:422(is_timedelta64_dtype)
       64    0.000    0.000    0.000    0.000 common.py:456(is_period_dtype)
        1    0.000    0.000    0.000    0.000 common.py:49(is_url)
       57    0.000    0.000    0.000    0.000 common.py:492(is_interval_dtype)
        3    0.000    0.000    0.001    0.000 common.py:50(new_method)
       98    0.000    0.000    0.001    0.000 common.py:530(is_categorical_dtype)
       12    0.000    0.000    0.000    0.000 common.py:566(is_string_dtype)
       12    0.000    0.000    0.000    0.000 common.py:595(conditio

        1    0.000    0.000    0.000    0.000 inspect.py:72(isclass)
        1    0.000    0.000    0.000    0.000 interactiveshell.py:705(get_ipython)
        7    0.000    0.000    0.001    0.000 iostream.py:197(schedule)
        6    0.000    0.000    0.000    0.000 iostream.py:310(_is_master_process)
        6    0.000    0.000    0.000    0.000 iostream.py:323(_schedule_flush)
        6    0.000    0.000    0.001    0.000 iostream.py:386(write)
        7    0.000    0.000    0.000    0.000 iostream.py:93(_event_pipe)
        1    0.000    0.000    0.001    0.001 managers.py:1162(insert)
        2    0.000    0.000    0.000    0.000 managers.py:1238(reindex_indexer)
        2    0.000    0.000    0.000    0.000 managers.py:1284(<listcomp>)
        7    0.000    0.000    0.000    0.000 managers.py:132(__init__)
        7    0.000    0.000    0.000    0.000 managers.py:138(<listcomp>)
        2    0.000    0.000    0.001    0.000 managers.py:1427(take)
       20    0.000    0.000    

        1    0.000    0.000    0.000    0.000 ssl.py:482(__new__)
        1    0.000    0.000    0.000    0.000 ssl.py:486(_encode_hostname)
        1    0.000    0.000    0.058    0.058 ssl.py:494(wrap_socket)
        1    0.000    0.000    0.016    0.016 ssl.py:569(load_default_certs)
        2    0.000    0.000    0.000    0.000 ssl.py:710(verify_mode)
        1    0.000    0.000    0.000    0.000 ssl.py:718(verify_mode)
        1    0.000    0.000    0.016    0.016 ssl.py:723(create_default_context)
        1    0.000    0.000    0.057    0.057 ssl.py:983(_create)
        7    0.000    0.000    0.000    0.000 threading.py:1017(_wait_for_tstate_lock)
        7    0.000    0.000    0.000    0.000 threading.py:1071(is_alive)
        7    0.000    0.000    0.000    0.000 threading.py:513(is_set)
       42    0.000    0.000    0.000    0.000 typing.py:255(inner)
       84    0.000    0.000    0.000    0.000 typing.py:329(__hash__)
   126/42    0.000    0.000    0.000    0.000 typing.py:

In [48]:
"""Sorting a large, randomly generated string and writing it to disk"""
# Source: https://toucantoco.com/en/tech-blog/tech/python-performance-optimization
# Slightly modified for class-use
import random


def write_sorted_letters(nb_letters=2 * 10**6):
    random_string = ''
    for i in range(nb_letters):
        random_string += random.choice('abcdefghijklmnopqrstuvwxyz')
    sorted_string = sorted(random_string)

    with open("sorted_text.txt", "w") as sorted_text:
        for character in sorted_string:
            sorted_text.write(character)

### Profile it

In [49]:
cProfile.run("write_sorted_letters()")

         12460935 function calls in 3.791 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    1.045    1.045    3.783    3.783 <ipython-input-48-6c33e28a9206>:7(write_sorted_letters)
        1    0.008    0.008    3.791    3.791 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 _bootlocale.py:33(getpreferredencoding)
        1    0.000    0.000    0.000    0.000 codecs.py:186(__init__)
  2000000    0.894    0.000    1.292    0.000 random.py:250(_randbelow_with_getrandbits)
  2000000    0.840    0.000    2.306    0.000 random.py:285(choice)
        1    0.000    0.000    0.000    0.000 {built-in method _locale.nl_langinfo}
        1    0.000    0.000    3.791    3.791 {built-in method builtins.exec}
  2000000    0.174    0.000    0.174    0.000 {built-in method builtins.len}
        1    0.255    0.255    0.255    0.255 {built-in method builtins.sorted}
        1    0.002    0.002    0.002    0.002 

## Let's install `line_profiler`
https://github.com/pyutils/line_profiler

In [50]:
!pip install line_profiler

Collecting line_profiler
  Downloading line_profiler-3.1.0.tar.gz (45 kB)
[K     |████████████████████████████████| 45 kB 3.1 MB/s eta 0:00:011
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h    Preparing wheel metadata ... [?25ldone
Building wheels for collected packages: line-profiler
  Building wheel for line-profiler (PEP 517) ... [?25ldone
[?25h  Created wheel for line-profiler: filename=line_profiler-3.1.0-cp38-cp38-macosx_10_16_x86_64.whl size=52714 sha256=34806573d32897f80e7f8f707ba24d9c9e232241ff3ef38beefcbc4141010074
  Stored in directory: /Users/suneelchakravorty/Library/Caches/pip/wheels/8c/77/e9/c5e6acd4b2f433b82d0328e06eeef6d8c06a27db842d3cfb53
Successfully built line-profiler
Installing collected packages: line-profiler
Successfully installed line-profiler-3.1.0


In [55]:
%load_ext line_profiler

In [57]:
%lprun -f write_sorted_letters write_sorted_letters()

Where is the first bottleneck?

Where is the second bottleneck?