<a id='Top' /a>
# Class09: Designing and Debugging
## Learning Objectives
* [Understanding Error Messages](#Error)
* [Inserting print statements](#Print)
* [Commenting out code](#Comment)
* [Debuggers: A more sophisticaed approach](#Debugger)
* [Unit testing](#Unit)
* [Refactoring](#Refactor)
* [Martian Challenge](#Challenge)

<a id='Error' /a>
## Understanding Error Messages
[Top of Notebook](#Top)

Python tries to be helpful, it really does. It always tells you the problem with your code. The problem is the traceback it spits out is in computerese. My advice here is to at least try to understand what Python is saying. This  gets easier with practice.  

Let's look at some examples.

In [5]:
x = 1 + 'cat'

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Silly me. I tried to add 1 to a cat. Python starts by telling me I made a "TypeError," meaning some variable is the wrong type (string, integer, list, tuple, etc.). Python draws an arrow pointing at the offending statement (There was only one line of code in this case). Then it tells me that + is an operation that is not supported with an integer and a string. You can't add numbers to text. Python is trying hard to tell me what is wrong.

Let's try a slightly more complicated example with multiple lines of code.

In [None]:
# Create two integer variables
a = 2
b = 3

# Put the variables in a list, increment one of them and print the result.
c = [a, b]
c[1] = b + 1
print(c)

# Put the variables in a tuple, increment one of them and print the result. 
d = (a, b)
d[1] = b + 1
print(d)

Okay, this is a bit more cryptic.  I made a list with two variables, a and b, then I changed the seccond element of the list (The first element is element zero, remember?) Python had no problem with this.

Then I created a tuple the same way and tried changing the second element. The arrow in the error message is pointing at line 8 saying there is a problem, something about tuples and not supporting item assignment? What Python is trying to say is that tuples are immutable. Lists you can alter, but tuples can't be changed. You have to redefine  the variable d because it is a tuple. If you want to alter it, you can't just replace an element in a tuple.

### Student Challenge 
Decipher the error message for the code below and explain what it means in the cell following the error message.

In [None]:
John_age = 10
Mary_age = 12
combined_age = Jon_age + Mary_age

Too easy? Try this one.

In [None]:
famous_books = ['Catcher in the Rye', 'Grapes of Wrath', 'The Martian']
for book in famous books
    print(book)

Okay. One more to see if you're getting the hang of debugging.

In [None]:
x = [23, 9, -3, 0, 5]
print(x[1])
print(x[5])

<a id='Print' /a>
## Inserting print statements
[Top of Notebook](#Top)

Sometimes reading error messages just isn't enough. Sometime there is no error message but the answer is wrong, so you know there is a mistake somewhere. Inserting print statements to check the values of variable at intermediate stages is a technique as old as programming, but it works!

Here is an example of code the runs without error but gives the wrong answer. The objective is to find the average number of letters for all the words in a string of text.

In [None]:
quote = '''If a hiker gets lost in the mountains, people will coordinate a search. If a train crashes, 
people will line up to give blood. If an earthquake levels a city, people all over the world will send 
emergency supplies. This is so fundamentally human that it's found in every culture without exception. 
Yes, there are assholes who just don't care, but they're massively outnumbered by the people who do.'''

num_words = 0
total = 0
for word in quote.split():
    total += len(word)
    num_words += 1
average = total/num_words
print('The average word length is:', average)

So what's wrong with this? You calculate an average by adding up numbers and dividing by the number of numbers. The average of 3, 5, 9, is (3 + 5 + 9)/3. In this code we divide the quote into words, add up the the word lengths, and divide by the number of words. There is no error message, but the average length is wrong.

It is not a syntax error, it is a logic error. One way to catch these is to insert print statements. Let's see if the split() method is doing what we intended.

In [None]:
quote = '''If a hiker gets lost in the mountains, people will coordinate a search. If a train crashes, 
people will line up to give blood. If an earthquake levels a city, people all over the world will send 
emergency supplies. This is so fundamentally human that it's found in every culture without exception. 
Yes, there are assholes who just don't care, but they're massively outnumbered by the people who do.'''

num_words = 0
total = 0
for word in quote.split():
    total += len(word)
    num_words += 1
    print(word)               # Here is the print statement added for debugging.
average = total/num_words
print('The average word length is:', average)

The split() method divides a string based on white spaces. When we print the words we see that punctuation is included in the word. We want "care," to count as four letters, not five, which includes the comma.

This is actually a tricky problem. Do we want to include apostrophes? Is "don't" a five letter word? Is it two words? Do you have to undo the contractions?  We'll skip that debate and say we will treat it as one word and count the apostrophe, but we really want to eliminate the commas and periods.

An easy way to alter a string is with the built in [replace()](https://www.tutorialspoint.com/python/string_replace.htm) method.

In [None]:
# replace() example
pet = 'dig'
pet = pet.replace('i','o')
print(pet)

We can use this method to replace all the commas and periods with empty strings.

In [None]:
quote = '''If a hiker gets lost in the mountains, people will coordinate a search. If a train crashes, 
people will line up to give blood. If an earthquake levels a city, people all over the world will send 
emergency supplies. This is so fundamentally human that it's found in every culture without exception. 
Yes, there are assholes who just don't care, but they're massively outnumbered by the people who do.'''

num_words = 0
total = 0
for word in quote.split():
    total += len(word)
    num_words += 1
    word = word.replace(",","")
    word = word.replace(".","")
    print(word)
average = total/num_words
print(average)

### Student Challenge
The code above fixed the words by replacing commas and periods with nothing ("" is a string of zero length). The debug print statement shows that those pesky characters are gone. But the average is the same? Grrrr! What's wrong now? Squashing bugs is hard!

See if you can find the logic error and calculate the correct average. Hint: You probably don't need to insert print statements if you carefully review the order in which the steps are performed.

<a id='Comment' /a>
## Commenting out code
[Top of Notebook](#Top)

Once you have fixed your code you could simply delete all the print statements you inserted while you were debugging. But then if a new bug pops up you'll have to reinsert them, so a simple alternative is to turn them into comments so that Python will ignore them. Below is an example where print statements have been inserted to check calculation results along the way and then commented out when the code was debugged.

In [None]:
a = 9 + 7
# print(a)
b = 12 - 3
# print(b)
print(a + b)

<a id='Debugger' /a>
## Debuggers: A more sophisticated approach
[Top of Notebook](#Top)

Debugging your code using print statements is a simple and time-honored approach, but as your code grows longer and longer those extra print statement just add to the visual clutter. At some point more programmers reach for more sophisticated tools such as debuggers, which allow you to execute your program one line at a time and inspect the contents of all variables at any step. Learning to use a debugger is beyond the scope of this class,but you should be aware that [they exist](https://docs.python.org/3/library/pdb.html). 

It you continue to persue programming for fun or profit after this class you should teach yourself how to use a debugger.

<a id='Unit' /a>
## Unit Testing
[Top of Notebook](#Top)

Another technique that is very hot in the programming world is "unit testing." The basic idea is to build tests right into your code so that every unit of code has its own test. This has three advantages. 

1. If you design the tests first, then you can just code until the tests run without and error.  You have a solution when your tests all pass.
2. If you modify your code later to add some feature and you break something else (this happens a lot) then one of your units tests is likely to fail, warning you that your new feature has introduced a bug.
3. If each unit (e.g., function) has its own test built in then you can reassemble the functions in new programs to solve different tasks knowing all the pieces work as designed.

Unit testing is a big topic -- books have been written about it -- but you can accomplish a fair amount with the simple "assert" statement. As usual, this is easiest to understand by working through a couple of examples. Basically, the asset statement asserts that something is true.  If you code is working it the assertion will evaluate as true and nothing happens. If evalutes as not true python tells you the assertion failed.

In [None]:
# Assertion is true so nothing will print.
a = 2
b = 2
assert a==b

In [None]:
# Assertionn is false so python will report an error.
a = 2
b = 3
assert a==b

So what good is that? We hide assertions in our code that automatically check that everything is working the way we want. For example, suppose we wrote a function to square numbers. We add a few assertions, so make sure the function is returning correct values

In [None]:
def square_it(x):
    return x*x

assert square_it(2) == 4
assert square_it(-4) == 16
assert square_it(0) == 0

Notice that we test that the function works for negative numbers as well as positive ones. Those assert statements are lying quietly in the code, ready to sound the alarm if a bug appears. They are like the indicator lights in your car that come on only if there is a problem. It doesn't hurt to insert a few of these tests whenever you write code, just to check that your functions are behaving themselves.

### Student challenge: Write the code to satisfy the assertion.
In the cell below the assertions have already been written for a function that prints the last word in a string. Your challenge is to write the function named "last_word" that accepts a string as the input parameter and returns the last word of that string. If you code it correctly the assert statements execute without error. 

In other words, if you call a your function like this:

last_word('All is well')

It should return the string 'well', which is the last word in the string.

Hint: recall that the split method applied to a string returns a list of words and that -1 is the index of the last element of any list, tuple or array.

In [None]:
# You fill in the rest of the function
def last_word(text):
    # Fill in your code here
    
    

assert last_word('This is a test') == 'test'
assert last_word('My dog has fleas') == 'fleas'

<a id='Refactor'/a>
## Refactoring Code
[Top of Notebook](#Top)

You've written some code and it passes the tests. But later on you realize that maybe your code isn't as readable as it could be, or perhaps the varibles could have better names, or you discover you are using the same bits of code in several places so it would make sense to write a function to replace the repetious code. Refactoring is not debugging -- you are just improving working code, not fixing a bug. Refactoring is just a fancy way of saying that we are going to take code that works and make it better. Refactoring is to code as editing is to a story. To quote the man who wrote the book on refactoring:

>"Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure. It is a disciplined way to clean up code that minimizes the chances of introducing bugs. In essence when you refactor you are improving the design of code after it has been written." ~ Martin Fowler 1999

Refactoring goes hand in hand with unit testing, because the unit tests will alert you if you break something while refactoring.

### A simple example: compound interest

Suppose you want to know how much your student loan is going to cost you to repay. You know how much you borrowed, but there is the little matter of compound interest. When you pay back a loan, you actually have to pay the interest first before any of your payments go toward actually paying down your debt (I know, right?). The same formula applies for interest accruing to your savings account, but in you favor.

The formula for compound interest is:
$$ A = P\left(1 + \frac{r}{n}\right)^{nt} $$

* P  = principal amount (the initial amount you borrow or deposit)
* r  = annual rate of interest (as a decimal)
* t  = number of years the amount is deposited or borrowed for.
* A  = amount of money (debt or savings) accumulated after n years, including interest.
* n  = number of times the interest is compounded per year 

[This site](https://qrc.depaul.edu/StudyGuide2009/Notes/Savings%20Accounts/Compound%20Interest.htm) gives an example calculation. The values: P = 1500, r = 4.3%, so r = 4.3/100 = 0.043, n = 4, t = 6 should yield $1,938.84. We can use this known example to create our assertion statement to test our code.

So let's write a function to calculate compound interest. The input parameters will be the stuff in the right side of the above equation and the function should return A, the accumulated amount of money.

In [None]:
def compound_interest(P, r, n, t):
    import math
    A = P*math.pow((1 + (r/n)), n*t)
    return A

# Test statement to check the function is working correctly.
assert compound_interest(1500, 0.043, 4, 6) == 1938.84

The code looks right, but the assertion fails. So to debug we insert a print statement.

In [None]:
def compound_interest(P, r, n, t):
    import math
    A = P*math.pow((1 + (r/n)), n*t)
    return A

print(compound_interest(1500, 0.043, 4, 6))
assert compound_interest(1500, 0.043, 4, 6) == 1938.84

The code is correct!  The assertion fails because we are comparing the output of our function with a number that has been rounded to two decimal places. 

Technically, 1938.8368221341054 is not exactly equal to 1938.84, so the assertion fails. Actually, checking equality between two floating point numbers is a bad idea. The computer cannot compute numbers to an infinite number of decimal places, so numbers might only be a teansey, weansy bit different but the test for equality will fail.

The solution is to test whether the numbers are acceptably close. In this case, we're happy if they agree within a penny. So we subtract the two numbers and check that their difference is less than 0.01. We take the absolute value of the difference because we don't know in advance which will be larger.

In [None]:
def compound_interest(P, r, n, t):
    import math
    A = P*math.pow((1 + (r/n)), n*t)
    return A

# Check the answer is good to within a penny
tolerance = 0.01
assert abs(compound_interest(1500, 0.043, 4, 6) - 1938.84) < tolerance

Yay! We've debugged our code to the point were we know it works for at least one test case. We should probably add more tests, and would if the code were more complicated.

Now that we have function code we are going to refactor it.  This is pretty simple code, but we can still make improvements. Let's start by adding a "doc sting," which is a multi-line string enclosed in triple single quotes.

In [7]:
def compound_interest(P, r, n, t):
    '''
    This function calculates the compound interest: 
       Input Parameters:
       P: principal amount (the initial amount you borrow or deposit)
       r: annual rate of interest (as a decimal)
       t: number of years the amount is deposited or borrowed for
       n: number of times the interest is compounded per year  
       Output Parameters:
       A: amount of money accumulated after n years, including interest.
    '''
    import math
    A = P*math.pow((1 + (r/n)), n*t)
    return A

tolerance = 0.01
assert abs(compound_interest(1500, 0.043, 4, 6) - 1938.84) < tolerance

If you run the code the assertion still passes, so we didn't break the function.  So what good is the doc string?  Well, suppose you are giving this function to someone else and they want to know how to use it. They can use the help command.

In [8]:
help(compound_interest)

Help on function compound_interest in module __main__:

compound_interest(P, r, n, t)
    This function calculates the compound interest: 
       Input Parameters:
       P: principal amount (the initial amount you borrow or deposit)
       r: annual rate of interest (as a decimal)
       t: number of years the amount is deposited or borrowed for
       n: number of times the interest is compounded per year  
       Output Parameters:
       A: amount of money accumulated after n years, including interest.



The help command returned the doc string without the user needing to read your code.  Doc strings are available for all built-in python commands. For example:

In [None]:
help(abs)

In [None]:
help(print)

So when you create functions that others might use, you should add doc strings.

There is more we could do to improve the code. For example, we could check that the arguments are valid. We wouldn't want someone to enter a value of zero for n, because we'd end up dividing by zero. Or we could check that the input parameters are numbers, not strings, i.e., 1.0 and not "1.0".  But I think you get the idea.  Once you get you code working you should stop and check if there are ways to improve the logic, the documentation, the efficiency, the variable names, add error checking, etc.

Before we most on to the Martian Challege, let's use our function to look at the interest you are likely to pay on a typical student loan. This example comes from [this website](https://studentloanhero.com/featured/how-student-loan-interest-works/).

>To understand how compound interest works, let’s look at an example Direct Loan with a $10,000 balance and a 4.29% interest rate, which is the current rate for undergraduate loans.

>If this loan were compounded annually, 4.29% of the loan balance would be charged annually. In this case, the interest would be $429 once per year.

>However, student loans are not compounded annually, they are compounded daily. Rather than charge 4.29% once per year, that number is divided by 365 and compounds daily, or 0.0118% per day. Assuming a \$10,000 balance, that is $1.175 per day.

Let's check that with the function we wrote.

In [9]:
P = 10000 # principal
r = 4.29/100 # interest rate is 4.29 percent, which we convert to a decimal
n = 365 # Number of times per year the interest is compounded, which is daily
t = 1/365 # We are interested in interest added to our loan in one day, or 1/365th of a year.
new_balance = compound_interest(10000, 0.0429, 365, 1/365)
interest = new_balance - P
print('The interest accrued ${0:.3f} per day.'.format(interest))

The interest accrued $1.175 per day.


The website goes on to demonstrate how much less you pay if you can accerate your payments. It is worthwhile reading for those of you with student debt.

Compound interest can work in your favor, too. Any money you save when you are early in your career will grow into a nice nest egg by the time you need to put *your* kids through college. So maybe they won't need to take out student loans!

<a id='Challenge'/a>
## Help Watney debug his code
[Top of Notebook](#Top)

Below are two functions designed to solve various tasks with Python. Each function has one or more errors. Your challenge is to fix them. You'll know you've succeeded when the assert statement executes without error.

In [None]:
def calorie_count(menu):
    '''Function to calculate the calories in a meal when passed a menu as a list. The calories of 
       various foods available to Watney are stored in a dictionary. The function iterates over the
       items in the menu and sums up the calories.'''
    
    # Items in the pantry (this is before he has only potatoes). 
    pantry = {'potato': 100, 'ketchup':20, 'energy_bar': 200, 'macncheese'; 350, 'chicken':300,
              'cereal': 225, 'eggs':125, 'dried_fruit':175, 'pudding':300, 'beans':90}
    
    # Initialize the variable holding the calorie total to zero
    total_calories = 0
    
    # Loop over the items in menu and sum the calories.
    for item in menu:
        total_calories += pantry{item}
    return total_calories


# Test the function
assert calorie_count(['potato', 'ketchup', 'energy_bar']) == 320      

To understand the next function, let's learn a couple of tricks. First, let's see how to convert a string into list or tuple of the individual characters.

In [None]:
quote = 'Tell Commander Lewis, disco sucks.'

# Convert it to a list
quote_characters = list(quote)
print('As a list:')
print(quote_characters)
print()

# Or a tuple
quote_characters = tuple(quote)
print('As a tuple:')
print(quote_characters)

The next concept we need to learn to understand the function below is modular division. For numbers, modular devision just returns the remainder after you perform the division. When you divide 10 by 3, it goes in three times with a remainder of 1. The percent sign is the modular division operator. For example:

In [None]:
print(10 % 3)
print(20 % 6)

### The big trick
If I have a list with 26 elements (the alphabet), and I try to access it with an index that is too big, I'll get an error. When, what I really want is to do to wrap around to the beginning. If I used modular division to get the remainder when I divide my index by the length of the list I'll get exactly what I want.  Let's see this using a shorter string as an example.

In [None]:
# Make a list of five characters in the word hello.
test_list = list('Hello')
print(test_list)

In [None]:
index = 3
print(test_list[3])

index = 6
print(test_list[index % len(test_list)])

index = 8
print(test_list[index % len(test_list)])

### Student Challenge

Now you are in a position to understand what Mark is trying to do in the next function.  See if you can get it to work. 

Hint1: It is easy to mix up () and [] if you're not careful.

Hint2: Sometimes the problem is with the line right before the error message.

In [None]:
def mess_with_NASA(message, offset):
    '''
    Mark is bored (four years on Mars!), so he decides to screw with NASA a bit, and send his messages in code.
    Each day he picks a new code, just to keep the NASA engineers on their toes.
    The cipher is a simple letter substitution where every letter slides over by a certain number of places.
    For example, if the offset is 4, then a-->e, j-->n and letters at the end of the alphabet wrap back to the 
    start of the alphabet, z-->d. This function accepts a message and an offest and returns the coded version.
    '''
    
    # Turn a string with that alphabet into a tuple of the letters
    alphabet = tuple('abcdefghijklmnopqrstuvwzyz')
    
    # Check we didn't miss any letters (no reason not to build assertions inside functions)
    assert len(alphabet) == 26
    
    # Convert the message to lower case so it will be in our alphabet
    message = message.lower()
    
    # Split the message into characters and loop over them. Ignore any characters that are not
    # in our list of letters as they are spaces and punctuation, so they can just be copied unaltered.
    letters = list(message)
    coded_message = []
    for letter in letters:
        if letter in alphabet:
            ind = alphabet.index(letter) # Get the current position
            ind += offset # Add the offset
            mod_ind = ind % len(alphabet) # Modular division to perform wrap around indexing
            coded_message.append(alphabet(mod_ind))
        else:
            coded_message.append(letter)
    
    # Join the list of coded charaters back into a string and return it to the user.
    coded_message = ''.join(coded_message)        
    return coded_message

# Test the function. With an offset of 1, each letter just shifts by one. a -->b, f-->g, etc.
assert mess_with_NASA('Hello, World!', 1) == 'ifmmp, zpsme!'        
            