# Debugging Python code 🐛💻🐍
Created by [Ryan Parker](https://github.com/rparkr).  
Aug 2023

This notebook is used to demonstrate different methods for debugging Python code. It covers live debugging, rather than logging for code monitoring and asynchronous debugging.

Each of the sections below has some code with a bug; we'll use different debugging techniques to find and fix the bugs.

## Types of errors
Programmers encounter two main types of errors: _syntax_ errors and _runtime_ errors.

Syntax errors deal with code that cannot be understood by the Python interpreter: for example, referring to a packages that was not imported, not indenting a `for` block, or forgetting to put a colon after a function definition (`def` statement). Syntax errors can usually be identified ahead of time by a linter, otherwise, they are easy to identify based on the error message Python produces when the interpreter runs into code it does not understand. 

Runtime errors deal with code that is correct Python but either incorrect for the particular statement being executed (in which case an `Exception` is raised detailing the error) or else code that uses incorrect logic (no `Exception` is raised, but the code doesn't do what is expected).

Of the types of errors, logic-based runtime errors are the trickiest because they do not throw any error message. Debugging tools are very valuable when trying to locate and resolve logic errors.

# Simple debugging: [Python Tutor](https://pythontutor.com/)
When first learning Python, it can be helpful to watch code execute one line at a time, especially when trying to understand flow control or recursive function behavior. [Python Tutor](https://pythontutor.com/) is a free online resource that visualizes code execution, line by line, showing variable references, intermediate values, and final outputs. Python Tutor runs only pure-Python code and does not support imported packages, besides a selection of modules from [Python's standard library](https://docs.python.org/3/library/index.html). Furthermore, Python tutor works with only Python 3.6 (which was released in 2016).

<span style="color: firebrick; font-weight: bold;">Note:</span> in general, I recommend using an IDE like Visual Studio Code for line-by-line code execution and debugging since VS Code has more advanced debugging features than Python Tutor and does not share its limitations. But if you come across some code online and want to quickly check how it works, Python Tutor can be a helpful tool.

The function below has an error. Try to identify it by copying the code and pasting into [PythonTutor.com](https://pythontutor.com/python-debugger.html#mode=edit)'s code visualizer.

In [2]:
def fib_number(n: int) -> int:
    '''Calculate the `n`th Fibonacci number.
    
    The first two numbers of the sequence are (0, 1). The `n`th number
    is the sum of the previous two Fibonacci numbers. Thus, the second Fibonacci
    number is 0 + 1 = 1. The third is 1 + 1 = 2; and so on.

    Examples:
    ```python
    >>> fib_number(4)
    3
    >>> fib_number(10)
    55
    >>> fib_number(20)
    6765
    >>> fib_number(100)
    354224848179261915075
    ```
    '''
    if n < 3:
        return n
    else:
        return fib_number(n - 1) + fib_number(n - 2)

In [6]:
assert fib_number(10) == 55, f"Uh oh. fib_number(10) should be 55, but the result is {fib_number(10)}"

AssertionError: Uh oh. fib_number(10) should be 55, but the result is 89

# More complex debugging: VS Code
Visual Studio Code performs linting to identify problems adead of time, offers line-by-line execution for quickly checking code flow, and has advanced debugging tools to track down and resolve errors.

## Example 1: complex code flow
The function in this example recursively searches a (nested) dictionary for keys that match a user's query. Depending on the arguments passed to the function and the complexity of the dictionary being searched, the code flow can be difficult to follow without the aid of line-by-line execution or a debugger. We'll explore the use of VS Code's debugging tools to track the function's progress to search for matching keys.

Background on the function:  
The core idea is searching for a term within a list (since `mydict.keys()` returns a list). Here’s [a helpful StackOverflow answer](https://stackoverflow.com/questions/3640359/regular-expressions-search-in-list/39593126#39593126) that shows how to do that.

The search function is handled by the `regex` package, which adds fuzzy matching functionality to Python’s basic `re` module. See this [blog post from Max Halford](https://maxhalford.github.io/blog/fuzzy-regex-matching-in-python/) for a brief explanation. More details are found in [the `regex` GitHub repo](https://github.com/mrabarnett/mrab-regex#approximate-fuzzy-matching-hg-issue-12-hg-issue-41-hg-issue-109).

In [None]:
# %pip install regex

In [14]:
import regex

def dict_search(
    search_term: str,
    search_dict: dict,
    fetch_all: bool=True,
    fuzzy_constraints: str='{s,i,2i+4s<=4}',
    search_dict_name: str=''):
    '''Search for a term within the keys of a (nested) dictionary.

    Parameters
    ----------
    search_term: str
        The term you want to find among the keys of a (potentially nested)
        dictionary.

    search_dict: dict
        The dictionary whose keys will be searched for a matching term. Leave
        this as the top-level dict variable, such as `my_dict`, not 
        `my_dict['subdict']`, since this function uses the top-level dict to
        trace down each level.

        If you want to search starting at a particular key (like 
        `my_dict['subdict']`), then use `my_dict['subdict']` for the
        search_dict parameter and specify the search_dict_name parameter
        as "my_dict['subdict']".

    fetch_all: bool, default=True
        If `True` (default), print all matching keys across all levels of
        the (nested) dictionary. If `False`, stop printing matches after the
        first level where a match is found.

    fuzzy_constraints: str, default='{s,i,2i+4s<=4}'
        Searches are based on "fuzzy matching" logic, where a matched dict key
        does not need to be an exact match. For example, a search for "price"
        would also return results for "pricing" or "prices".

        The constraints are:
        {substitutions, insertions, deletions, weighting_factor}.

        For example: {e<=3} will match any string where there are
        at most 3 differences ("errors") between the matched value and the
        searched term. {s<=3,i<=3,d<=3} will match any string where there are
        at most 3 substitutions (e.g., "run" has 1 substitution from "ran"),
        3 insertions (e.g., "strep" has 1 insertion from "step"), and
        3 deletions (e.g. "bach" has 1 deletion from "beach").

        The weighting factors enable you to set the cost of each kind of error,
        prioritizing some errors over others in the matching logic.
        For example, {s,i,d,2i+3d+4s<=5} allows substitutions, insertions,
        and deletions, where the cost of each insertion is 2, the cost of each
        deletion is 3, and the cost of each substition is 4, with the total
        cost limited to no more than 5, which allows 2 insertions, 1
        substitution, 1 deletion, or 1 insertion and 1 deletion together.

        For more info, see: https://github.com/mrabarnett/mrab-regex#approximate-fuzzy-matching-hg-issue-12-hg-issue-41-hg-issue-109

    search_dict_name: str, default=''
        The name of the dictionary variable that will be searched. If the
        search_dict parameter is a top-level dictionary (like `mydict`), this
        should be left at its default (the empty string ''). If the
        search_dict parameter is a sub-level in the dictionary (like
        `my_dict['subdict']`, then this should be a string of that variable
        name ("my_dict['subdict']").


    Returns
    -------
    `None`. Results are printed for each key that matches the search term.

    Notes
    -----
    Inspiration for this method came from the following two sources:
    - Searching through a list: https://stackoverflow.com/questions/3640359/regular-expressions-search-in-list/39593126#39593126
    - Fuzzy matching: https://maxhalford.github.io/blog/fuzzy-regex-matching-in-python/

    '''

    # Store the name of the dictionary being searched
    # Reference: https://bobbyhadz.com/blog/python-print-variable-name
    if search_dict_name == '':
        globals_dict = globals()
        search_dict_name = [var_name for var_name in globals_dict if (globals_dict[var_name] is search_dict)][0]
        del globals_dict

    # Perform the fuzzy-match search across dict keys
    pattern = f"({search_term})" + fuzzy_constraints # Default: "{s,i,2i+4s<=4}" 
    results_list = list(filter(
        lambda s: regex.search(pattern, s, regex.BESTMATCH),
        search_dict.keys())
    )

    if results_list:
        for item in results_list:            
            print(f"{search_dict_name}['{item}']")
        if fetch_all:
            # Recurse through nested dictionary levels
            for k in search_dict.keys():
                if type(search_dict[k]) == dict:
                    dict_search(search_term,
                                search_dict[k], 
                                fetch_all=fetch_all, 
                                fuzzy_constraints=fuzzy_constraints, 
                                search_dict_name=f"{search_dict_name}['{k}']")
                # Also search through lists if there are lists-of-dicts
                elif type(search_dict[k]) == list:
                    for n, subitem in enumerate(search_dict[k]):
                        if type(subitem) == dict:
                            dict_search(search_term,
                                        subitem, 
                                        fetch_all=fetch_all, 
                                        fuzzy_constraints=fuzzy_constraints, 
                                        search_dict_name=f"{search_dict_name}['{k}'][{n}]")

    else:
        # Recurse through nested dictionary levels
        for k in search_dict.keys():
            if type(search_dict[k]) == dict:
                dict_search(search_term,
                            search_dict[k], 
                            fetch_all=fetch_all, 
                            fuzzy_constraints=fuzzy_constraints, 
                            search_dict_name=f"{search_dict_name}['{k}']")
            # Also search through lists if there are lists-of-dicts
            elif type(search_dict[k]) == list:
                    for n, subitem in enumerate(search_dict[k]):
                        if type(subitem) == dict:
                            dict_search(search_term,
                                        subitem, 
                                        fetch_all=fetch_all, 
                                        fuzzy_constraints=fuzzy_constraints, 
                                        search_dict_name=f"{search_dict_name}['{k}'][{n}]")

Test the function using weather forecast data for Frisco, TX, from [Open-Meteo.com](https://open-meteo.com/en/docs), an open-source API for weather forecasts and weather history.

In [7]:
import json         # convert JSON to Python dictionaries and vice-versa
import requests     # make API calls (and other web requests)

r = requests.get(url=(
    'https://api.open-meteo.com/v1/forecast?'
    'latitude=33.1507'
    '&longitude=-96.8236'
    '&hourly='
        'temperature_2m,'
        'relativehumidity_2m,'
        'apparent_temperature,'
        'precipitation_probability,'
        'precipitation,'
        'weathercode,'
        'surface_pressure,'
        'cloudcover,'
        'visibility,'
        'windspeed_10m,'
        'winddirection_10m,'
        'temperature_180m,'
        'uv_index,'
        'is_day'
    '&current_weather=true'
    '&timezone=America%2FChicago'
    # '&past_days=3'
    '&forecast_days=7'))

forecast = r.json()

In [16]:
# Search for a weather variable
dict_search('temp', forecast, search_dict_name='forecast')

forecast['current_weather']['temperature']
forecast['hourly_units']['temperature_2m']
forecast['hourly_units']['apparent_temperature']
forecast['hourly_units']['temperature_180m']
forecast['hourly']['temperature_2m']
forecast['hourly']['apparent_temperature']
forecast['hourly']['temperature_180m']


## Example 2: PyTorch
Aligning tensor dimensions for opertations in neural networks can be tricky. Packages like [`tensor-sensor`](https://github.com/parrt/tensor-sensor) can help with visualizing the tensor operations, but debugging tools can also be very valuable, especially variable introspection during execution. We'll explore how to use that feature in this section.

The code in this section trains a neural network to use embeddings from text to output a score (such as a rating given a review text or a likelihood of default given a description of past credit activity).



Import packages

In [18]:
import pandas as pd
import torch
from torch.utils.data import DataLoader

Load a dataset with text and a numerical target

In [40]:
# download a dataset with text and corresponding numeric scores
# I learned about this data source from: https://huggingface.co/learn/nlp-course/chapter7/5?fw=pt
r = requests.get(url=(
    'https://datasets-server.huggingface.co/rows'
    '?dataset=amazon_reviews_multi'
    '&config=en'
    '&split=train'
    '&offset=0'
    '&limit=1000'))

data_dict = []
for row in r.json()['rows']:
    data_dict.append(row['row'])
df = pd.DataFrame.from_dict(data_dict)
df.head()

Unnamed: 0,review_id,product_id,reviewer_id,stars,review_body,review_title,language,product_category
0,en_0964290,product_en_0740675,reviewer_en_0342986,1,Arrived broken. Manufacturer defect. Two of th...,I'll spend twice the amount of time boxing up ...,en,furniture
1,en_0690095,product_en_0440378,reviewer_en_0133349,1,the cabinet dot were all detached from backing...,Not use able,en,home_improvement
2,en_0311558,product_en_0399702,reviewer_en_0152034,1,I received my first order of this product and ...,The product is junk.,en,home
3,en_0044972,product_en_0444063,reviewer_en_0656967,1,This product is a piece of shit. Do not buy. D...,Fucking waste of money,en,wireless
4,en_0784379,product_en_0139353,reviewer_en_0757638,1,went through 3 in one day doesn't fit correct ...,bubble,en,pc


In [None]:
class SampleDataset(torch.utils.data.Dataset):

    def __init__(self, train, data, labels):
        self.data = data
        self.labels = labels

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        return self.data[index], self.labels[index]


Set up DataLoader for processing the data

Define the network architecture

Embed the text using `sentence_transformers`

Train the model

Evaluate performance using a metric like R²

# Python's built-in `pdb` module
Even if you don't have access to powerful debugging tools from an IDE like VS Code, you can always use Python's built-in [`pdb` module](https://docs.python.org/3/library/pdb.html), for "Python DeBugger."

To use `pdb` in Python 3.7+, simply type `breakpoint()` on the line where you want to pause code execution and enter the `pdb` interactive Python prompt (REPL, or Read Execute Print Loop), which will enable you to inspect variables and run other commands. To resume code execution, run the command `continue`.

To use `pdb` in Python versions before 3.7, do the following:
```python
import pdb
...
<YOUR CODE HERE>
...
pdb.set_trace()
```

and the Python debugger interactive prompt will launch when the `pdb.set_trace()` statement is executed.

Once you've entered the Python debugger, [use these commands](https://docs.python.org/3/library/pdb.html#debugger-commands) along with any of your own Python code to inspect variables and move around the code.

In [28]:
import pdb

from bokeh.io import output_notebook
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.plotting import figure, show
import pandas as pd

output_notebook()

In [44]:
# Get weather forecast data
df = pd.read_csv(
    'https://api.open-meteo.com/v1/forecast?'
    'latitude=33.2362'
    '&longitude=-96.8011'
    '&hourly=temperature_2m,'
    'apparent_temperature'
    '&format=csv', skiprows=2, parse_dates=True)

pdb.set_trace()
# or, ...
# breakpoint()  # doesn't work in VS Code, Python 3.11.3

# Convert to datetime for proper representation in the plot
df.time = df.time.astype('datetime64[ns]')

source = ColumnDataSource(data=df)

# Plot data
p = figure(
        title='Temperature forecast for Prosper, TX',
        x_axis_label='Date', y_axis_label='Temperature (°C)',
        x_axis_type='datetime')

p.line(x='time', y='temperature_2m (°C)', legend_label='temp',
       color='firebrick', source=source)
p.line(x='time', y='apparent_temperature (°C)', legend_label='apparent temp',
       source=source)

hover = HoverTool()
hover.tooltips = [
    ('time', '@time{%Y-%m-%d %H:%M}'),
    ('temp °C', '@{temperature_2m (°C)}{0.0}'),
    ('feels like', '@{apparent_temperature (°C)}{0.0}')
]

hover.formatters = {'@time': 'datetime'}

p.add_tools(hover)

show(p)