# Error Stories

## Objectives

- To troubleshoot Python errors/exceptions
- To use conditional logic to prevent errors
- To write code to test Python code

## Introduction

To err is human. And since writing code is a human activity, it's no less error-prone than anything else. Just as human stories tend to dramatize the protagonists' mishaps, misteps, and mistakes en route to their triumphs and successes, the stories we tell in code necessarily traffic in errors. 

Of course, we want our code to work, which usually means "to work without obvious errors or bugs." But the important word there is _obvious_. A lot of work goes into making code appear free from errors. 

Part of that work involves writing code in such a way that it can respond to errors, and/or respond to situations that might cause errors and avert them.

Another part of that work involves testing code to confirm that it works as intended in a variety of scenarios.

We'll look at each of these strategies below.

#### How to Use this Notebook

This notebook is intended for you to work through independently, in order to review and clarify the concepts introduced on Python Camp Day 3, and to lay the groundwork for the activities on Python Camp Day 4. However, feel free to collaborate with others in working through it. It is also intended to serve as a resource you can return to review as necessary.

1. Read the documentation above each cell containing code and run the cell (`Ctrl+Enter` or `Cmd+Return`) to view the output.


2. Follow the prompts labeled `Try it out!` that ask you to write your own code in the provided blank cells.


3. (Hidden) solutions to these exercises follow the blank cells; click the toggle bar to expand the solution to compare with your approach.


4. Some prompts include alternative exercises (Parsons Problems) that will be linked from the prompt. These alternatives may help clarify concepts (especially if you find yourself struggling to keep up with all the syntax).


5. Optional annotations (labeled `For the curious...`) provide additional explanation and/or context for those who want them. Feel free to skip these sections if you like. As a beginner, it's important to maintain a balanced cognitive load: taking in too much information all at once can impede your progress toward understanding. This balance looks different for everyone, but we have tried to keep the main content focused on a few key concepts, tools, and techniques, while providing that additional context for those who might benefit from it.


6. Follow the instructions at the end to complete and submit a short, autograded assignment to test your knowledge. (You may submit the assignment as many times as you like.)



## I. Debugging

**Debugging** is the process of troubleshooting errors in code.

Apart from syntax errors, the frequency of which will decrease as you become more comfortable with Python's syntactic rules, most errors in Python arise from a mismatch between the logic of the code, and the structure of the data or environment on which the code is being run.

A common source of errors lies in inconsistencies or unexpected elements within a dataset. In the code below, we'll attempt to modify our bookstore data by separating the name in the `instructor` field into a first name and last name. (That could make it easier to look up a given course by instructor, for instance.) 

In [14]:
import json
with open('../../../data/bookstore-data-summer-2023.json') as f:
    bkst_data = json.load(f)

Note that as written, this code will not run without error.

In [None]:
# Loop over courses in the dataset
for course in bkst_data:
    # Split each instructor name on white space
    first_name, last_name = course['instructor'].split()
    # Assign each part of the name to a new key in the dictionary course
    course['first_name'] = first_name
    course['last_name'] = last_name

#### Notes

Running this code produces an `AttributeError`. Until you gain some familiarity with Python exceptions, the name of the exception will be less useful than the message that follows it:

```
'NoneType' object has no attribute 'split'

```
Note also the green arrow pointing to the line of code that reads

```
first_name, last_name = course['instructor'].split()
```

The `AttributeError` tells that something went wrong with the call to the `split()` method. The phrase `'NoneType object'` may seem opaque. But recall what we've learned about the `split()` method: that it's [defined](https://docs.python.org/3/library/stdtypes.html#str.split) to work with strings. That's the meaning of the "str" when we write it as `str.split()`: it tells us that Python _strings_ have access to this method. Other Python types do not.

An `AttributeError` occurs when we try to use a [method](https://gwu-libraries.github.io/python-camp/glossary.html#term-method) on a [type](https://gwu-libraries.github.io/python-camp/glossary.html#term-type) that doesn't "have" that method. We'll learn more about the concept of **attributes** on Day 4. For now, the important thing is to find out when and why the value of `course['instructor']` might _not_ be a string.



### I.1 Debugging with `print()`

One of the best ways to debug code is also one of the simplest: using the `print()` function.

If we know that an error is ocurring inside a [for loop](https://gwu-libraries.github.io/python-camp/glossary.html#term-for-loop), but we don't know what data element triggered the error, we can take advantage of the fact that the loop will be **interrupted** when the error occurs. 

By using `print()` to display the element we're interested in each time through the loop, we can observe where the loop stops: if we've put our `print()` in the right place, the last value printed should be one that triggered the error.

#### Try it out!

Copy the buggy code from above and paste it into the cell below. Add one or more calls to the `print()` function and see if you can identify the source of the `AttributeError`.



In [None]:
# Your code here


One approach is as follows: to print the value of the `'instructor'` key each time through the loop.

```
# Loop over courses in the dataset
for course in bkst_data:
    print(course['instructor'])
    # Split each instructor name on white space
    first_name, last_name = course['instructor'].split()
    # Assign each part of the name to a new key in the dictionary course
    course['first_name'] = first_name
    course['last_name'] = last_name
```
Doing so reveals that the last value printed before the error is not a name but `None`. 

Note that `None` is _not_ a string; it's a special Python type (written without quotation marks) that stands for a [null value](https://gwu-libraries.github.io/python-camp/glossary.html#term-null-value).

As the `AttributeError` informs us, we cannot use the `split` method on a `None` value. (There's not much you _can_ do with `None`; that's because it's a null: used to designate the absence of a value.)



### I.2 Preventing errors with `if`

There are more sophisticated ways of handling errors in Python, but one of the most straightforward is to include an `if` statement to check for the condition that causes the error.

In this case, the error is caused when `course['instructor']` is `None`. Python provides a concise way to check whether a value is `None`. We can simply write `if value_to_test:`, where `value_to_test` is a variable or other name that may or may not be null. 

In the code below, we've incorporated this check into our code. 

Note that the code _still_ fails, but this time we get a different error. 

In [None]:
# Loop over courses in the dataset
for course in bkst_data:
    # Check for null values
    if course['instructor']:
        # Split each instructor name on white space
        first_name, last_name = course['instructor'].split()
        # Assign each part of the name to a new key in the dictionary course
        course['first_name'] = first_name
        course['last_name'] = last_name

#### Try it out!

Can you use our `print()` debugging technique to identify the cause of this `ValueError`?



In [22]:
# Your code here

#### Hint

The line 

```
first_name, last_name = course['instructor'].split()
```
expects a string for the value of `course['instructor']`that follows a familiar pattern:  `'first_name last_name'`, where white space separates the first and last name.


In other words, this code works only if the result of the `split()` method is a list with two elements. And `str.split()` will produce a list with two elements _only if_ the string has a _single_ instance of white space. 



#### Try it out!

Reading the documentation for [`str.split()`](https://docs.python.org/3/library/stdtypes.html#str.split), can you identify an argument to the method that could prevent this error? How can we force `split` to return only _two_ elements?



In [None]:
# Your code here

Expand the cell below to see a possible solution.

In [27]:
# Loop over courses in the dataset
for course in bkst_data:
    # Check for null values
    if course['instructor']:
        # Split each instructor name on white space
        # Only split on the first occurence of white space
        first_name, last_name = course['instructor'].split(maxsplit=1)
        # Assign each part of the name to a new key in the dictionary course
        course['first_name'] = first_name
        course['last_name'] = last_name

#### Notes

Note that the code in the provided solutione runs without errors, but it doesn't necessarily solve the problem posed by the instructors' names in our dataset.

This code will handle names where the _last name_ (surname) contains spaces. But where a middle name or middle initial is given, or where the _first name_ (the given name) contains spaces, it will assign the wrong values to the `'last_name'` key. 

This fact illustrates an important point. 

> Code that runs without errors is not necessarily code that works as intended.



## II. Testing code

Because of that fact, it's useful to create **tests** that can confirm that our code works as intended. We obviously can't test for every eventuality. But when developing our user stories, we can identify **test cases** that represent how our code should run under optimal conditions or in optimal scenarios.

In what follows, we'll write some tests for our name-parsing code above. Note that as of now, not all of our tests will succeed, because we haven't solved the problem of instructors with multiple first names or where first and middle names/initials are given. But identifying the test cases allows us to specify with confidence the kinds of situations for which our code _is_ expected to work, as well as to flag conditions where it doesn't work (and that we might want to address in future work).

### II.1 Writing testable code

Code tends to be easier to test when it's organized into discrete units. Writing **functions** is a great way to organize code in units that can be easily tested. (Functions also make our code more readable and easier to modify in response to new user stories, new datasets, etc.)

#### Try it out!

The following cell contains the function signature (the `def` line) and the `return` statement for a function called `extract_instructor_names`. 

This function takes as argument a single dictionary, checks for the presence of a key called `instructor`, and if the latter is present, adds `first_name` and `last_name` entries to the dictionary. 

The function should do the same thing as the body of the `for` loop above, but without the loop: in other words, we want to be able to use our function to process a single course (not a list of courses).

See if you can fill out the body of the function below. 



In [30]:
def extract_instructor_names(course_dict):
    # Given a dictionary with an 'instructor' entry, 
    # splits the associated string into two parts
    # and assigns them as separate entries to the dictionary
    
    # Your code here
    
    return course_dict

In [32]:
def extract_instructor_names(course_dict):
    # Given a dictionary with an 'instructor' entry, 
    # splits the associated string into two parts
    # and assigns them as separate entries to the dictionary
    if course_dict['instructor']:
        first_name, last_name = course_dict['instructor'].split(maxsplit=1)
        # Assign each part of the name to a new key in the dictionary course
        course_dict['first_name'] = first_name
        course_dict['last_name'] = last_name
    
    return course_dict

### II.2 Writing tests

Now that we have a function that works for a single course dictionary, we can write some tests.

We'll be using the `assert` [keyword](https://gwu-libraries.github.io/python-camp/glossary.html#term-keyword), which you might have noticed in yesterday's autograded homework exercise.

`assert` is used almost exclusively in writing tests. Like an [if statement](https://gwu-libraries.github.io/python-camp/glossary.html#term-if-statement), an `assert` statement evaluates a given condition. But instead of being followed by a block of code to execute if the condition is `True`, `assert` does nothing if the condition is true. However, if the condition is `False`, it causes an `AssertionError`. 

#### Try it out!

Modify `x` in the code below so that the `assert` statement produces an error. Note that following the condition, we can provide (as a Python string) a message; this message will appear in the `AssertionError` itself.



In [36]:
x = 5
assert x > 4, 'x should be greater than 4.'

Using `assert`, below we create some tests for `extract_instructor_names`. 

Each `assert` statement represents a different condition we want to test for.

In [41]:
# Define some test data
test_data = {'instructor': 'Dolsy Smith'}
# Obtain a result by running the function
test_result = extract_instructor_names(test_data)
# Test the result against a condition
assert test_result['first_name'] == 'Dolsy', 'first_name should be "Dolsy"'
assert test_result['last_name'] == 'Smith', 'last_name should be "Smith"'

Here's another test, to make sure our function works when the value for `instructor` is `None`.

Note that our test data is not an example of a full course as represented in the `bkst_data` dataset. Our function only deals with the value of the `instructor` key, so that's the only key our `test_data` dictionary needs to contain. (We could add other keys, like `department` and `course` and `section`, but in this case they would not add anything to the test.)

In [42]:
# Define some test data
test_data = {'instructor': None}
# Obtain a result by running the function
test_result = extract_instructor_names(test_data)
# Test the result against a condition
assert 'first_name' not in test_result, 'first_name should not be present'
assert 'last_name' not in test_result, 'last_name should not be present'

#### Try it out!

Write a test for our function to verify it's behavior on a name with three parts.



In [None]:
# Your code here

In [44]:
# Define some test data
test_data = {'instructor': 'Guido van Rossum'}
# Obtain a result by running the function
test_result = extract_instructor_names(test_data)
# Test the result against a condition
assert test_result['first_name'] == 'Guido', 'first_name should be "Guido"'
assert test_result['last_name'] == 'van Rossum', 'last_name should be "van Rossum"'

## Wrap up / Final exercise

In this homework, you practiced debugging errors with the `print()` function, and you used `if` statements to avert possible errors due to inconsistencies in the data. Finally, you worked with the `assert` keyword to construct tests for code (to ensure that your code is working as expected).


The final part of the homework or today is an [autograded exercise](./HW_3_GR.ipynb), designed to test your grasp of the concepts covered on Days 1, 2, and 3. 