# Errors and Exceptions

Source: [The Software Carpentries](https://swcarpentry.github.io/python-novice-inflammation/09-errors/index.html)

Every programmer encounters errors, both those who are just beginning, and those who have been programming for years. Encountering errors and exceptions can be very frustrating at times, and can make coding feel like a hopeless endeavour. However, understanding what the different types of errors are and when you are likely to encounter them can help a lot. Once you know why you get certain types of errors, they become much easier to fix.

Errors in Python have a very specific form, called a __traceback__. Let’s examine one:

In [None]:
# This code has an intentional error. You can type it directly or
# use it for reference to understand the error message below.
def favorite_ice_cream():
    ice_creams = [
        'chocolate',
        'vanilla',
        'strawberry'
    ]
    print(ice_creams[3])

favorite_ice_cream()

This particular traceback has two levels.
You can determine the number of levels by looking for the number of arrows on the left hand side.
In this case:

1.  The first shows code from the cell above,
    with an arrow pointing to Line 11 (which is `favorite_ice_cream()`).

2.  The second shows some code in the function `favorite_ice_cream`,
    with an arrow pointing to Line 9 (which is `print(ice_creams[3])`).

The last level is the actual place where the error occurred.
The other level(s) show what function the program executed to get to the next level down.
So, in this case, the program first performed a
__function call__ to the function `favorite_ice_cream`.
Inside this function,
the program encountered an error on Line 6, when it tried to run the code `print(ice_creams[3])`.

> ## Long Tracebacks
>
> Sometimes, you might see a traceback that is very long
> -- sometimes they might even be 20 levels deep!
> This can make it seem like something horrible happened,
> but the length of the error message does not reflect severity, rather,
> it indicates that your program called many functions before it encountered the error.
> Most of the time, the actual place where the error occurred is at the bottom-most level,
> so you can skip down the traceback to the bottom.

So what error did the program actually encounter?
In the last line of the traceback,
Python helpfully tells us the category or type of error (in this case, it is an `IndexError`)
and a more detailed error message (in this case, it says "list index out of range").

If you encounter an error and don't know what it means,
it is still important to read the traceback closely.
That way,
if you fix the error,
but encounter a new one,
you can tell that the error changed.
Additionally,
sometimes knowing *where* the error occurred is enough to fix it,
even if you don't entirely understand the message.

If you do encounter an error you don't recognize,
try looking at the
[official documentation on errors](http://docs.python.org/3/library/exceptions.html).
However,
note that you may not always be able to find the error there,
as it is possible to create custom errors.
In that case,
hopefully the custom error message is informative enough to help you figure out what went wrong.


## Syntax Errors

When you forget a colon at the end of a line,
accidentally add one space too many when indenting under an `if` statement,
or forget a parenthesis,
you will encounter a __syntax error__.
This means that Python couldn't figure out how to read your program.
This is similar to forgetting punctuation in English:
for example,
this text is difficult to read there is no punctuation there is also no capitalization
why is this hard because you have to figure out where each sentence ends
you also have to figure out where each sentence begins
to some extent it might be ambiguous if there should be a sentence break or not

People can typically figure out what is meant by text with no punctuation,
but people are much smarter than computers.
If Python doesn't know how to read the program,
it will give up and inform you with an error.
For example:

In [None]:
def some_function()
    msg = 'hello, world!'
    print(msg)
     return msg

Here, Python tells us that there is a `SyntaxError` on line 1,
and even puts a little arrow in the place where there is an issue.
In this case the problem is that the function definition is missing a colon at the end.

Actually, the function above has *two* issues with syntax.
If we fix the problem with the colon,
we see that there is *also* an `IndentationError`,
which means that the lines in the function definition do not all have the same indentation:

In [None]:
def some_function():
    msg = 'hello, world!'
    print(msg)
     return msg

## Tabs and Spaces

Some indentation errors are harder to spot than others.
In particular, mixing spaces and tabs can be difficult to spot
because they are both [whitespace]({{ page.root }}/reference.html#whitespace).
In the example below, the first two lines in the body of the function
`some_function` are indented with tabs, while the third line &mdash; with spaces.
If you're working in a Jupyter notebook, be sure to copy and paste this example
rather than trying to type it in manually because Jupyter automatically replaces
tabs with spaces.


In [None]:
def some_function():
	msg = 'hello, world!'
	print(msg)
    return msg

Visually it is hard/impossible to spot the error. Fortunately, Python does not allow you to mix tabs and spaces.

## Variable Name Errors

Another very common type of error is called a `NameError`,
and occurs when you try to use a variable that does not exist.
For example:

In [None]:
print(a)

Variable name errors come with some of the most informative error messages,
which are usually of the form "name 'the_variable_name' is not defined".

Why does this error message occur?
That's a harder question to answer,
because it depends on what your code is supposed to do.
However,
there are a few very common reasons why you might have an undefined variable.
The first is that you meant to use a __string__, but forgot to put quotes around it:

In [None]:
print(hello)


The second reason is that you might be trying to use a variable that does not yet exist.
In the following example,
`count` should have been defined (e.g., with `count = 0`) before the for loop:


In [None]:
for number in range(10):
    count = count + number
print('The count is:', count)

Finally, the third possibility is that you made a typo when you were writing your code.
Let's say we fixed the error above by adding the line `Count = 0` before the for loop.
Frustratingly, this actually does not fix the error.
Remember that variables are __case-sensitive__,
so the variable `count` is different from `Count`. We still get the same error,
because we still have not defined `count`:

In [None]:
Count = 0
for number in range(10):
    count = count + number
print('The count is:', count)

## Index Errors

Next up are errors having to do with containers (like lists and strings) and the items within them.
If you try to access an item in a list or a string that does not exist,
then you will get an error.
This makes sense:
if you asked someone what day they would like to get coffee,
and they answered "caturday",
you might be a bit annoyed.
Python gets similarly annoyed if you try to ask it for an item that doesn't exist:


In [None]:
letters = ['a', 'b', 'c']
print('Letter #1 is', letters[0])
print('Letter #2 is', letters[1])
print('Letter #3 is', letters[2])
print('Letter #4 is', letters[3])

Here,
Python is telling us that there is an `IndexError` in our code,
meaning we tried to access a list index that did not exist.



## File Errors

The last type of error we'll cover today
are those associated with reading and writing files: `FileNotFoundError`.
If you try to read a file that does not exist,
you will receive a `FileNotFoundError` telling you so.
If you attempt to write to a file that was opened read-only, Python 3
returns an `UnsupportedOperationError`.
More generally, problems with input and output manifest as
`IOError`s or `OSError`s, depending on the version of Python you use.



In [None]:
file_handle = open('myfile.txt', 'r')

One reason for receiving this error is that you specified an incorrect path to the file.
For example,
if I am currently in a folder called `myproject`,
and I have a file in `myproject/writing/myfile.txt`,
but I try to open `myfile.txt`,
this will fail.
The correct path would be `writing/myfile.txt`.
It is also possible that the file name or its path contains a typo.

A related issue can occur if you use the "read" flag instead of the "write" flag.
Python will not give you an error if you try to open a file for writing
when the file does not exist.
However,
if you meant to open a file for reading,
but accidentally opened it for writing,
and then try to read from it,
you will get an `UnsupportedOperation` error
telling you that the file was not opened for reading:


In [None]:
file_handle = open('myfile.txt', 'w')
file_handle.read()

These are the __most common errors with files__,
though many others exist.
If you get an error that you've never seen before,
searching the Internet for that error type
often reveals common reasons why you might get that error.

## Reading Error Messages

Read the Python code and the resulting traceback below, and answer the following questions:

1.  How many levels does the traceback have?
2.  What is the function name where the error occurred?
3.  On which line number in this function did the error occur?
4.  What is the type of error?
5.  What is the error message?

In [None]:
# This code has an intentional error. Do not type it directly;
# use it for reference to understand the error message below.
def print_message(day):
    messages = {
        'monday': 'Hello, world!',
        'tuesday': 'Today is Tuesday!',
        'wednesday': 'It is the middle of the week.',
        'thursday': 'Today is Donnerstag in German!',
        'friday': 'Last day of the week!',
        'saturday': 'Hooray for the weekend!',
        'sunday': 'Aw, the weekend is almost over.'
    }
    print(messages[day])

def print_friday_message():
    print_message('Friday')

print_friday_message()

## Identifying Syntax Errors

1. Read the code below, and (without running it) try to identify what the errors are.
2. Run the code, and read the error message. Is it a `SyntaxError` or an `IndentationError`?
3. Fix the error.
4. Repeat steps 2 and 3, until you have fixed all the errors.


In [None]:
def another_function
  print('Syntax errors are annoying.')
   print('But at least Python tells us about them!')
  print('So they are usually not too hard to fix.')

## Identifying Variable Name Errors

1. Read the code below, and (without running it) try to identify what the errors are.
2. Run the code, and read the error message.
   What type of `NameError` do you think this is?
   In other words, is it a string with no quotes,
   a misspelled variable,
   or a variable that should have been defined but was not?
3. Fix the error.
4. Repeat steps 2 and 3, until you have fixed all the errors.


In [None]:
for number in range(10):
    # use a if the number is a multiple of 3, otherwise use b
    if (Number % 3) == 0:
        message = message + a
    else:
        message = message + 'b'
print(message)

## Identifying Index Errors

1. Read the code below, and (without running it) try to identify what the errors are.
2. Run the code, and read the error message. What type of error is it?
3. Fix the error.


In [None]:
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
print('My favorite season is ', seasons[4])


# Defensive Programming: Assertions

The first step toward getting the right answers from our programs
is to assume that mistakes *will* happen
and to guard against them.
This is called __defensive programming__
and the most common way to do it is to add
__assertions__ to our code
so that it checks itself as it runs.
An assertion is simply a statement that something must be true at a certain point in a program.
When Python sees one,
it evaluates the assertion's condition.
If it's true,
Python does nothing,
but if it's false,
Python halts the program immediately
and prints the error message if one is provided.
For example,
this piece of code halts as soon as the loop encounters a value that isn't positive:


In [None]:
numbers = [1.5, 2.3, 0.7, -0.001, 4.4]
total = 0.0
for num in numbers:
    assert num > 0.0, 'Data should only contain positive values'
    total += num
print('total is:', total)

## Fail early, fail often - Turn bugs into assertions

But assertions aren't just about catching errors:
they also help people understand programs.
Each assertion gives the person reading the program
a chance to check (consciously or otherwise)
that their understanding matches what the code is doing.

Most good programmers follow two rules when adding assertions to their code.

1. The first is, *fail early, fail often*.
The greater the distance between when and where an error occurs and when it's noticed,
the harder the error will be to debug,
so good code catches mistakes as early as possible.

2. The second rule is, *turn bugs into assertions or tests*.
Whenever you fix a bug, write an assertion that catches the mistake
should you make it again.
If you made a mistake in a piece of code,
the odds are good that you have made other mistakes nearby,
or will make the same mistake (or a related one)
the next time you change it.

Writing assertions to check that you haven't __regressed__
(i.e., haven't re-introduced an old problem)
can save a lot of time in the long run,
and helps to warn people who are reading the code
(including your future self)
that this bit is tricky.

## Test-Driven Development

An assertion checks that something is true at a particular point in the program.
The next step is to check the overall behavior of a piece of code,
i.e.,
to make sure that it produces the right output when it's given a particular input.
For example,
suppose we need to find where two or more time series overlap.
The range of each time series is represented as a pair of numbers,
which are the time the interval started and ended.
The output is the largest range that they all include:

![Graph showing three number lines and, at the bottom,
the interval that they overlap.](https://swcarpentry.github.io/python-novice-inflammation/fig/python-overlapping-ranges.svg)


Most novice programmers would solve this problem like this:

1.  Write a function `range_overlap`.
2.  Call it interactively on two or three different inputs.
3.  If it produces the wrong answer, fix the function and re-run that test.

This clearly works --- after all, thousands of scientists are doing it right now --- but
there's a better way:

1.  Write a short function for each test.
2.  Write a `range_overlap` function that should pass those tests.
3.  If `range_overlap` produces any wrong answers, fix it and re-run the test functions.

Writing the tests *before* writing the function they exercise
is called __test-driven development__ (TDD).
Its advocates believe it produces better code faster because:

1.  If people write tests after writing the thing to be tested,
    they are subject to confirmation bias,
    i.e.,
    they subconsciously write tests to show that their code is correct,
    rather than to find errors.
2.  Writing tests helps programmers figure out what the function is actually supposed to do.

Here are three test functions for `range_overlap`:


In [None]:
assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)
assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)
assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0)

# Debugging 101
Once testing has uncovered problems, the next step is to fix them. Many novices do this by making more-or-less random changes to their code until it seems to produce the right answer, but that’s very inefficient (and the result is usually only correct for the one case they’re testing). The more experienced a programmer is, the more systematically they debug, and most follow some variation on the rules explained below.

## Know What It's Supposed to Do

The first step in debugging something is to
*know what it's supposed to do*.
"My program doesn't work" isn't good enough:
in order to diagnose and fix problems,
we need to be able to tell correct output from incorrect.
If we can write a test case for the failing case --- i.e.,
if we can assert that with *these* inputs,
the function should produce *that* result ---
then we're ready to start debugging.
If we can't,
then we need to figure out how we're going to know when we've fixed things.

But writing test cases for scientific software is frequently harder than
writing test cases for commercial applications,
because if we knew what the output of the scientific code was supposed to be,
we wouldn't be running the software:
we'd be writing up our results and moving on to the next program.
In practice,
scientists tend to do the following:

1.  *Test with simplified data.*
    Before doing statistics on a real data set,
    we should try calculating statistics for a single record,
    for two identical records,
    for two records whose values are one step apart,
    or for some other case where we can calculate the right answer by hand.

2.  *Test a simplified case.*
    If our program is supposed to simulate
    magnetic eddies in rapidly-rotating blobs of supercooled helium,
    our first test should be a blob of helium that isn't rotating,
    and isn't being subjected to any external electromagnetic fields.
    Similarly,
    if we're looking at the effects of climate change on speciation,
    our first test should hold temperature, precipitation, and other factors constant.

3.  *Compare to an oracle.*
    A __test oracle__
    is something whose results are trusted,
    such as experimental data, an older program, or a human expert.
    We use test oracles to determine if our new program produces the correct results.
    If we have a test oracle,
    we should store its output for particular cases
    so that we can compare it with our new results as often as we like
    without re-running that program.

4.  *Check conservation laws.*
    Mass, energy, and other quantities are conserved in physical systems,
    so they should be in programs as well.
    Similarly,
    if we are analyzing patient data,
    the number of records should either stay the same or decrease
    as we move from one analysis to the next
    (since we might throw away outliers or records with missing values).
    If "new" patients start appearing out of nowhere as we move through our pipeline,
    it's probably a sign that something is wrong.

5.  *Visualize.*
    Data analysts frequently use simple visualizations to check both
    the science they're doing
    and the correctness of their code.
    This should only be used for debugging as a last resort,
    though,
    since it's very hard to compare two visualizations automatically.

## Other Golden Rules

- Make it fail every time
- Make it fail fast
- Change one thing at a time, for a reason
- Keep track of what you've done
- Version control for the win
- Be humble
- Debug with a colleague


# Toy Example: Not Supposed to be the Same

You are assisting a researcher with Python code that computes the
Body Mass Index (BMI) of patients.  The researcher is concerned because
all patients seemingly have unusual and identical BMIs, despite having different
physiques.  BMI is calculated as **weight in kilograms**
divided by the square of **height in metres**.

- Use the debugging principles in this exercise and locate problems with the code. 
- What suggestions would you give the researcher for ensuring any later changes they make work correctly?


In [None]:
patients = [[70, 1.8], [80, 1.9], [150, 1.7]]

def calculate_bmi(weight, height):
    return weight / (height ** 2)

for patient in patients:
    weight, height = patients[0]
    bmi = calculate_bmi(height, weight)
    print("Patient's BMI is: %f" % bmi)