# Understanding Errors

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](
https://colab.research.google.com/github/jpn--/python-for-transportation-modeling/blob/master/course-content/basic-python/error-handling.ipynb
)

*If you are running on Google Colab or a similar isolated environment, the following line can be used to install this notebook's dependencies.*

In [1]:
!python -m pip install "git+https://github.com/jpn--/python-for-transportation-modeling.git#egg=transportation-tutorials&subdirectory=example-package"

In [2]:
import os
import numpy as np
import pandas as pd
import transportation_tutorials as tt

A variety of things can go wring when you run a piece of Python code: 
input files might be missing or corrupt, there might be typos or bugs in 
code you wrote or code provided by others, etc.

When Python encounters a problem it does not know how to manage on its own,
it generally raises an *exception*.  Exceptions can be basic errors or
more complicated problems, and the message that comes along with an 
exception usually has a bunch of information that comes with it.
For example, consider this error:

In [3]:
for i in 1 to 5:
    print(i)

SyntaxError: invalid syntax (3683155868.py, line 1)

The ``SyntaxError`` tells you that the indicated bit of code isn't valid for
Python, and simply cannot be run.  It helpfully also adds a carat marker
pointing to the exact place where the problem was found.  In this case,
the problem is the "to" in the "for" loop, which is found in many other
languages, but not in Python.  

Obviously, even if the code is readable as valid Python code, there still
may be errors.  

In [4]:
speeds = {
    'rural highway': 70,
    'urban highway': 55,
    'residential': 30,
}

for i in speed_limits:
    print(speed_limits[i])

NameError: name 'speed_limits' is not defined

Here, the code itself is valid, but a `NameError` occurs because
there is an attempt to use a variable name that has not been defined
previously.  The error message itself is pretty self-explanatory.
But consider this:

In [5]:
road_types = ['rural highway', 'urban highway' 'residential']

In [6]:
for i in road_types:
    print(speeds[i])

70


KeyError: 'urban highwayresidential'

A `KeyError` occurs when using a key to get a value from a mapping
(i.e., a dictionary or a similar object), but the key cannot be
found.  Usually, the misbehaving key is also shown in the error message,
as in this case, although the value of the key may be unexpected.  Here,
it appears to be the last two keys of the list mashed together.  This
happened due to a missing comma in the definition of the list earlier.
When that line with the missing comma was read, it was interpreted as a 
valid Python instruction: a list with two items, the second item being
two string value seperated only by whitespace, which implies they are
to be concatenated.  It is only when this value is ultimately used in
the look that it Python discovers there is anything wrong.

To demonstrate a more complicated example, we can attempt to read a 
file that does not exists, which will raise an exception like this:

In [7]:
pd.read_csv('path/to/non-existant/file.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'path/to/non-existant/file.csv'

There's a lot of output here, but the last line of the output is pretty clear
by itself: the file does not exist.  As a general rule of thumb, when something
you are running raises an exception, the message printed at the very bottom of the 
error output is the first place to look to try to find an explanation for what
happened and how to fix it.

Sometimes, however, the explanation for the error is not quite a self-explanatory
as the `FileNotFoundError`. 

In [8]:
tt.problematic()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 912: invalid start byte

In this case, the error report is less clear.  The error type being raised is 
a `UnicodeDecodeError`, which gives us a hint of the problem: some kind of unicode 
text data is attempting (unsuccessfully) to be read from somewhere.
But if you don't
know exactly what the `problematic` function is supposed to do, it might not
be obvious what it wrong.  It is in this situation that all the other data printed
along with the error can be valuable.  This other stuff is called a "traceback",
because it provides the entire path through the code, from the `problematic` 
function call, through every sub-function called, to the point where the error
is encountered.  Every function call is shown with both the name of the file and the
name of the function.

For the most part, errors are unlikely to arise from bugs in major software packages,
such as `numpy` and `pandas`.  These packages are rigorously tested, and while 
it is possible to find a bug, it is generally unusual -- it is much much more likely
that bugs or errors will arise from application-specific code.  Thus, it can be helpful
to scan through all of the various files and functions, and look for items that are
related to application-specific files.  In this case, we skip over all the lines
referencing `pandas` files, and focus on the other lines, which are found in the
`transportation_tutorials` package:

    .../transportation_tutorials/data/__init__.py in problematic()
         46         # When there are various lines of code intervening,
         47         # you might not get to see the relevant problem in the traceback
    ---> 48         result = pandas.read_csv(filename)
         49         return result
         50 

By default in a Jupyter notebook, when the source code is written in Python, the traceback
print out includes the offending line of code plus two lines before and after, to give some
context.  Sometimes that little snippet is enough to reveal the problem itself, but in
this case those lines include some comments, which don't really help us solve the problem.

If you want to investigate further, you can open the filename shown in a text editor such
as Notepad++, and scroll to the indicated line number.  In this file, if we did that we would 
see some more context that should help diagnose this problem:

    .../transportation_tutorials/data/__init__.py in problematic()
         42     
         43     def problematic():
         44         filename = data('THIS-FILE-IS-CORRUPT')
         45         import pandas
         46         # When there are various lines of code intervening,
         47         # you might not get to see the relevant problem in the traceback
    ---> 48         result = pandas.read_csv(filename)
         49         return result

Well, that's helpful... it turns out we are loading a file that is intentionally corrupt,
with junk data in part of the file, as might happen on a botched download from a remote server.
If only diagnosing all errors were so easy!  Unfortunately (or, fortunately, depending
on your perspective), in real world applications, code probably won't attempt to load
a file that is intentionally corrupt and so clearly labelled as such.  

## How to Report a Problem

If you are unable to diagnose or solve a problem yourself, it may
make sense to enlist some help from a co-worker or outside professional.
When doing so, it is usually valuable not only to report what you were
trying to do when a problem occurred, but also to send the *entire traceback*
output from the problem as well.  This offers others the chance to 
follow along through the code, and often problems can be diagnosed 
easily by looking at the complete traceback, particularly if they also
have access to the same source code.

For more complicated problems, it may also be beneficial to share
additional system information. This is particularly common and generally
expected when you report issues with major packages such as numpy or 
pandas, but it can be useful for debugging other more localized problems 
as well.  You can access some basic information about your system and
your Anaconda Python installation by using the ``conda info`` command in 
a console or with the Anaconda Prompt on Windows.

## Handling Errors

In simple code or analysis projects, most of the time you'll
just want to avoid having errors in your Python code. However,
if you are writing Python functions that are shared with others
or will be re-used in multiple places, it may be desirable or
necessary to handle errors, instead of just avoiding them.  To
do so, you can use a `try...except` statement. 

In [9]:
try:
    table = pd.read_csv('path/to/non-existant/file.csv')
except:
    table = pd.DataFrame() # set to blank dataframe
print(table)

Empty DataFrame
Columns: []
Index: []


The `try...except` works like this: first, the code in the 
`try` block is run.  If an exception is raised while running
this code, execution immediately jumps to the start of the
`except` block and continues.  If no errors are raised, the code in
the `except` block is ignored.

As shown above, this code will set the `table` variable to 
a blank dataframe for any kind of error.  It is also possible
(and often preferable) to be more discriminating in error processing,
only catching certain types of errors.  For example, we may
only want to recover like this when the file is missing; if it
is corrupt or something else is wrong, we want to know about it.
In that case, we can catch only `FileNotFoundError`, which will 
work as desired for the missing file:

In [10]:
try:
    table = pd.read_csv('path/to/non-existant/file.csv')
except FileNotFoundError:
    table = pd.DataFrame() # set to blank dataframe
print(table)

Empty DataFrame
Columns: []
Index: []


And raise the error for the corrupt file:

In [11]:
try:
    table = pd.read_csv(tt.data('THIS-FILE-IS-CORRUPT'))
except FileNotFoundError:
    table = pd.DataFrame() # set to blank dataframe
print(table)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 912: invalid start byte

Alternatively, we can write different error handlers for 
the different kind of errors we expect to encounter:

In [12]:
try:
    table = pd.read_csv(tt.data('THIS-FILE-IS-CORRUPT'))
except FileNotFoundError:
    table = pd.DataFrame() # set to blank dataframe
except UnicodeDecodeError:
    table = pd.DataFrame(['corrupt!'], columns=['data'])
print(table)

       data
0  corrupt!


There are a variety of other advanced techniques for error
handling described in the official 
[Python tutorial](https://docs.python.org/3/tutorial/errors.html) 
on this topic.