# Using the error stack trace to debug code

In this course, the test cells often use assert statements to throw errors when the solution is not correct. The error message usually includes something about why the assertion failed, e.g. "function output is not a dictionary", "variable foo is the wrong size", "foo does not match the instructor's solution", etc. These errors usually point you directly towards what needs to be fixed. This type of error is not what this notebook focuses on.

Of course, these assertions aren't the only case where code can fail. The Python interpreter will throw errors any time it is unable to do what is asked of it. These error messages, also known as a **traceback**, display where the error was made from the bit of code Python was unable to execute and all of the function calls that lead to that bit of code being attempted. 

First, let's import numpy and pandas.

In [None]:
import numpy as np
import pandas as pd

# Numpy example

Here is a simple example of how the stack trace works. The printsum function is dependent on the add function. 

In [None]:
def printsum(a,b):
    # This is
    # defined in an undeitable cell
    print(add(a,b))
    
def add(x,y):
    # This is
    # defined in an uneditable cell
    s = x+y
    return(s)

The variables "foo" and "bar" are loaded from a separate file. **For the sake of the example, assume that "foo" and "bar" were defined in an exercise solution, but the functions were defined in an uneditable block**. The values of bar *should* be the values of foo squared. Let's see if we can add them with **printsum** to get foo+foo\*\*2 displayed as output...

In [None]:
from foobar import foo, bar
printsum(foo, bar)

Oh, no! It didn't work. The bottom of the stack trace shows that x and y couldn't be broadcast together. "I never did anything with x or y, there must be a problem with the notebook!" might be the first thought to come to mind. However, examining the whole traceback shows something different. The problem started with the call to **printsum** and pertains to the **shape of two arrays**. The natural next step is to look at the shapes of the arrays fed into **printsum** and possibly the variables themselves.

In [None]:
print('foo\'s shape:', foo.shape)
print(foo)
print('bar\'s shape:', bar.shape)
print(bar)

The bar array somehow got transposed, but the numbers are correct. The real way to resolve this issue is to find the root cause wherever bar is defined, but since that only exists hypothetically, let's correct it here by modifying the call to **printsum**.

In [None]:
printsum(foo, bar.T)

Some common numpy errors are broadcasting and type errors (recall everything in an array has to be of the same type). These basic errors might rear their ugly heads when calling functions that take array inputs. The bottom of the stack trace will have some lines of code that aren't even in the notebook. This is OK. Use the error message to **find where code that you wrote** may have caused the problem. 

# Pandas example

Basic pandas errors are also often thrown as a result of function calls on dataframes that aren't as expected. The stack trace is useful to figure out where the issue is in these cases as well. Some common errors are index errors, using the wrong version of loc/iloc, and problems with grouping or merging.

For this example suppose that you have to write a function that vertically stacks some data frames. The column 'baz' **may** be named 'BAZ' in **some** of the inputs, and you are required to handle this and the returned data frame must have all 'baz'.

In [None]:
from foobar import df_foo, df_bar

In [None]:
def df_stack(dfs):
    # concatenate the dataframes in dfs.
    df_foobar = pd.concat(dfs, axis=0).fillna(0)
    # Add the columns named BAZ and baz together (remember NA were set to zero)
    baz_se = df_foobar.BAZ + df_foobar.baz
    # Drop the column with the wrong name
    df_foobar.drop('BAZ',axis=1, inplace=True)
    # Assign the baz values to the baz column
    df_foobar[['baz']] = baz_se
    return(df_foobar)

print(df_stack([df_foo, df_bar]))

Our funciton df_stack appears to work correctly. The test cells will often check "edge cases" which may reveal that the function isn't able to handle specific types of inputs that it should be able to. 

In [None]:
from foobar import df_test
# some other code
#
#
df = df_stack([df_foo, df_test])
# some more code
#
#

The function didn't work this time, even though the demo worked. Examining the traceback reveals that the call to **df_stack** was the origin of the error. This call led to the "baz_se = ..." line being attempted. That line resulted to a call to the pandas library that was unable to be executed. There appears to be an issue with the column names in **df_stack**. We can investigate further by looking at the column headers for each data frame in the list dfs. 

Note: the debugging is done in a separate cell for demonstration purposes, but on an assignment, you would want to add this code to your original function definition.

Additional note: because the function call threw an error, the variable df never gets assigned. If an attempt to use it is made in subsequent cells, more errors will result. See below

In [None]:
print(df)

This can be confusing, especially if the assignment is made in an uneditable cell or the name df is used in multiple namespaces. The find function in your browser (ctrl+F) can be helpful in figuring out where df should have been assigned but wasn't. Now on to the debugging. Remember we have an issue with the columns of the dataframes in dfs, so let's print them. 

In [None]:
def df_stack(dfs):
    # begin debug code
    for df in dfs:
        print(df.columns)
    # end debug code
    df_foobar = pd.concat(dfs, axis=0).fillna(0)
    baz_se = df_foobar.BAZ + df_foobar.baz
    df_foobar.drop('BAZ',axis=1, inplace=True)
    df_foobar[['baz']] = baz_se
    return(df_foobar)

print(df_stack([df_foo, df_bar]))

The demo still works with the print statement. Let's try the test again.

In [None]:
df = df_stack([df_foo, df_test])

We got the same error traceback, but this time we are armed with more information. The test list didn't have any columns named "BAZ"! By directly referencing the column, we require it to exist. Another logical conclusion is if **none** of the baz columns in dfs are named "baz" our code will also throw an error.

Somehow we need to change the names of the columns named "BAZ" to "baz" **without requiring that either name exists** (only one or the other), and this should probably be done before the concatenation.

In [None]:
def df_stack(dfs):
    # begin debug code
    for df in dfs:
        print(df.columns)
    # end debug code
    
#     Old implementation

#     df_foobar = pd.concat(dfs, axis=0).fillna(0)
#     baz_se = df_foobar.BAZ + df_foobar.baz
#     df_foobar.drop('BAZ',axis=1, inplace=True)
#     df_foobar[['baz']] = baz_se
    dfs_new = [df.rename(columns = {'BAZ' : 'baz'})for df in dfs]
    df_foobar = pd.concat(dfs_new)
    return(df_foobar)

print(df_stack([df_foo, df_bar]))

In [None]:
df = df_stack([df_foo, df_test])

In [None]:
print(df)

The solution above uses the pd.rename function to handle the naming issue. This function checks the column names for any of the keys in the dictionary. If the key is found, the column name is changed in the output, but the key **is not required to be used**. 

# One frustrating type of assertion error

I know we said assertion errors weren't going to be the focus of this notebook, but one case is worth looking into. Suppose a function defined in an exercise solution returns a large collection of outputs called stu_arr, and the instructor's solution called ins_arr. 

In [None]:
from foobar import stu_arr, ins_arr
assert stu_arr.shape == ins_arr.shape, "Student array has incorrect shape"
assert (stu_arr == ins_arr).all(), "Student list did not match instructor list"

Ok, somewhere our function runs but doesn't compute the correct value. The next step is to find out where. Let's try printing the right answer, the generated answer, and a conditional array that checks for matches...

In [None]:
print(stu_arr)
print(ins_arr)
print(ins_arr == stu_arr)

That didn't do much good. Is there a problem in the test, because it looks like everything matches? The issue with this strategy is the output is truncated because the arrays are huge. We need to figure out where our code outputs the wrong answer so that we can investigate how it gets there. 

In [None]:
wrong = [i for i, c in enumerate(ins_arr != stu_arr) if c]
print(wrong)
print(stu_arr[wrong])
print(ins_arr[wrong])

The list comprehension stored in wrong is the index of all of the mismatches. With this information, you can investigate the few edge cases that are causing the assertion failure. This technique will often quickly point you towards what isn't being handled correctly in your function. Since these arrays are just dummy examples, we can't go much further. However, this is a good starting point in your debugging process if you're having this type of assertion failure.

# Key Takeaways

 * If there are issues with failing hidden test cells, look at the **Grading report**. (Details -> View Grading Report)
     * This is where the error traceback you need to look at lives.
     * Sometimes this requires finding errors above the test cell where variable assignments fail.
 * Look at the whole error traceback
 * **Look at the whole error traceback**
 * Use the traceback to get clues as to why your code is failing. 
     * Which function call resulted in the error? Focus on functions where you actually wrote some code.
     * What type of error is it?
     * What attribute is causing the error (shape, column name, index out of bounds, variable name, etc)?
 * Use print or assert to investigate the source of the error. It is rarely useful to print out entire data frames or arrays, because those usually get truncated.
 * If there is an assertion error, look for the mismatches. They can usually point you in the right direction.

This notebook isn't inended to be a complete guide to debugging. Hopefully these strategies are helpful in completing the notebooks in this course.