# Intermediate Python: Programming

# Class 3

In our previous class, 
we explored how to make decisions using code,
how to write our own functions,
and how to connect these programming structures with for loops.

In this class, 
we'll learn additional ways to improve the usability and robustness of our functions.
By the end of this class, you should be able to:

- identify and correct different types of errors as reported by python
- test and validate complex functions
- include help documentation and define defaults for a function
- apply assertions to defensively program

## Interpreting errors

You've probably noticed by now that a lot of time spent coding
is actually spent troubleshooting after receiving a warning or error.
Let's look at some of the error messages you're likely to receive,
and think about ways you may be able to resolve them.

> Errors are defined as problems with the basic way code is written.
Exceptions refer to external features of code (like variable names)
that interrupt the flow of the program.
We'll discuss them in this section collectively.

Take the following code, 
which wraps an example we covered in our last class inside a function:
```python
def pos_neg(x):
    if x > 0:
        print(x, "is positive")
    elif x == 0: 
        print(x, "is zero")
    else:
        print(x, "is negative, will be converted")
        x = -x
    return x
        
num = "-3"

pos_neg(num)
```

This code will produce the following error message:
```python
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-8ada265733cc> in <module>()
      9 num = "shoe"
     10 
---> 11 pos_neg(num)

<ipython-input-14-8ada265733cc> in pos_neg(x)
      1 def pos_neg(x):
----> 2     if x > 0:
      3         print(x, "is positive")
      4     elif x == 0:
      5         print(x, "is zero")

TypeError: '>' not supported between instances of 'str' and 'int'
```

The error messages in Python are called tracebacks. 
It includes enough information that you should be able to determine which part of your code created the error, 
and ideally,
how to correct it.
The number of arrows on the lefthand side indicate the number of levels of the error.
Here, there are two levels. 
The first arrow points to line 11,
in which the function is called.
This traces back to the second arrow,
in line 2,
which is where the actual error has occurred. 
The last line of the traceback tells us the type of error and how that error relates to our code. 
In this case, 
we're trying to use `>` to compare 0, an integer,
with `x`, a string (`num`, which we were using to test our function,
and has accidentally been assigned using quotation marks).
These different types of data can't be compared in this way.
We could resolve this by using an integer to test our code instead.

There are three main types of errors you'll see reported from Python:

### Syntax errors

Syntax errors are related to the basic symbols used to specify actions in code.
Examples include:

#### Missing a symbol

```python
def fahr_to_celsius(temp)
    return ((temp - 32) * (5/9))
```
There is a colon missing at the end of the first line,
where the function is defined,
and is reported as `SyntaxError`,
with the explanation of `invalid syntax`.

#### Indentation

```python
def fahr_to_celsius(temp):
return ((temp - 32) * (5/9))
```
The indentation at the beginning of the second line is missing, 
which is reported as `IndentationError` because `expected an indented block`.

Indentation matters in Python,
as whitespace (tabs and spaces) is how code chunks are organized into units.
Python doesn't allow mixing of whitespace types, 
so many interpreters correct spaces that are meant to be tabs.
Keep in mind that this isn't a failsafe against all possible syntax errors of this type.

### Variable name errors

Let's say you're trying to print something:
```python
print(octopus)
```
You'll see a `NameError`,
specifically, 
that `name 'octopus' is not defined`.
One common reason for this is because octopus isn't a variable, 
but rather, a string around which you forgot to place quotation marks.

Another common problem is if you forgot to create a variable before using it,
or if you have a typo (e.g., capitalization mismatch) between variable references:
```python
name = "octopus"
print(Name)
```

### Index errors

If you are attempting to access an element in a list,
but there is no element at the index position specified,
you'll receive an `IndexError`.
```python
name = "octopus"
name[7]
```
For this error, the explanation is `string index out of range`.
This means that the string,
`octopus`, does not have a character at index position 7.

### File errors

If you had a typo in a filename when trying to load data in one of our previous classes, 
you probably saw an error citing `OSError`.
```python
import numpy as np

data = np.loadtxt(fname="data/inflamation-01.csv", delimiter=",")
```
The explanation is that the file is `not found`,
which means that either the name of the file is wrong,
or its location isn't correct (e.g., if you forgot to include `data/` before the filename).

> These are the most common errors you're likely to encounter, 
although this list is not exhaustive. 
The Python documentation explains [built-in errors and exceptions](https://docs.python.org/3/library/exceptions.html), 
although if you're using someone else's code, 
it's possible they've written a custom exception.

#### Challenge-errors

Using the error message included below the code, 
identify the type of error and why it is occurring.

*Code:*
```python
def print_message(day):
    messages = {
        "monday": "Hello, world!",
        "tuesday": "Today is tuesday!",
        "wednesday": "It is the middle of the week.",
        "thursday": "Today is Donnerstag in German!",
        "friday": "Last day of the week!",
        "saturday": "Hooray for the weekend!",
        "sunday": "Aw, the weekend is almost over."
    }
    print(messages[day])

def print_friday_message():
    print_message("Friday")

print_friday_message()
```
*Error:*
```python
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-33-ce8c0be5fdfb> in <module>()
     14     print_message("Friday")
     15 
---> 16 print_friday_message()

<ipython-input-33-ce8c0be5fdfb> in print_friday_message()
     12 
     13 def print_friday_message():
---> 14     print_message("Friday")
     15 
     16 print_friday_message()

<ipython-input-33-ce8c0be5fdfb> in print_message(day)
      9         "sunday": "Aw, the weekend is almost over."
     10     }
---> 11     print(messages[day])
     12 
     13 def print_friday_message():

KeyError: 'Friday'
```

We'll talk more about troubleshooting and debugging code in our next class
but this framework of understanding errors should be useful as we proceed to increase the robustness of our functions.

## Testing and validating

When writing custom functions,
you are the one responsible for determining whether they are running correctly. 

What does it mean to run "correctly"? 
In general, 
this means ensuring the function works as expected,
which involves answering two different questions:

1. *Does the code perform the task we've specified?* 
Ensuring the code runs without error and in the manner you intend is referred to as **verification**,
sometimes and is confirmed through testing (often against a test dataset or task).
2. *Does the code meet stated goal?*
Ensuring the code performs specific tasks that meet our specific need is referred to as **validation**,
and requires careful inspection of the tasks performed by the code,
as well as how they fit into the rest of the software being used.

Let's apply this framework to the challenge exercise from our last class,
in which you wrote a function to count the number of vowels in a string.
You verified the function was working correctly by using a few short example strings with known numbers of vowels.
To validate this function, however,
you would also need to confirm whether capitalization matters,
and whether the letter y should be included as a vowel.

Since validation relies on external factors, 
we'll rely in this class on specifying tasks to the best of our ability,
and instead focus on testing through verification.
To examine this process,
we'll first write a function that offsets the data in an array by a specific value supplied by the user:

In [1]:
import numpy as np

def offset_mean(data, target_mean_value):
    return (data - np.mean(data)) + target_mean_value

Next, we'll create a test dataset to use with the function to ensure it's working as expected:

In [2]:
# create test dataset of only zeros
z = np.zeros((2,2))
print("test dataset:", z)
# test new function
print("offset data:", offset_mean(z, 3))


test dataset: [[0. 0.]
 [0. 0.]]
offset data: [[3. 3.]
 [3. 3.]]


Finally, we can assess the output from an inflammation dataset to ensure our approach seems reasonable.

In [3]:
# use offset function on real data
data = np.loadtxt(fname="data/inflammation-01.csv", delimiter=",")
print("offset inflammation:", offset_mean(data, 0))

offset inflammation: [[-6.14875 -6.14875 -5.14875 ... -3.14875 -6.14875 -6.14875]
 [-6.14875 -5.14875 -4.14875 ... -5.14875 -6.14875 -5.14875]
 [-6.14875 -5.14875 -5.14875 ... -4.14875 -5.14875 -5.14875]
 ...
 [-6.14875 -5.14875 -5.14875 ... -5.14875 -5.14875 -5.14875]
 [-6.14875 -6.14875 -6.14875 ... -6.14875 -4.14875 -6.14875]
 [-6.14875 -6.14875 -5.14875 ... -5.14875 -5.14875 -6.14875]]


Let's run some basic summary statistics on the original and offset data to ensure the offset is working as expected:

In [4]:
# confirm offset has worked
print('original min, mean, and max are:', np.min(data), np.mean(data), np.max(data))
offset_data = offset_mean(data, 0)
print('min, mean, and max of offset data are:',
      np.min(offset_data),
      np.mean(offset_data),
      np.max(offset_data))
# offset isn't exact, but is close


original min, mean, and max are: 0.0 6.14875 20.0
min, mean, and max of offset data are: -6.14875 2.842170943040401e-16 13.85125


Given that the mean of the original data is 6.14875,
we expect the offset summary stats to be:
- min = -6.14875 (original min of 0 - 6.14875)
- mean = 0 (original min of 6.14875 - 6.14875)
- max = 13.85125 (original max of 20 - 6.14875)

The offset isn't exact (because of rounding during calculations),
but they are very close.
Let's check the standard deviation, 
and instead use an extra calculation that will make it easier to quickly assess if they're the same:

In [5]:
print('difference in standard deviations before and after:',
      np.std(data) - np.std(offset_data))

difference in standard deviations before and after: 0.0


Now that we are confident our function works as intended,
we should add documentation to record its purpose.
In the code above,
we used code comments to share information about the function.
While this technically documents our code, 
a more effective way is to embed help documentation within the code: 

In [6]:
def offset_mean(data, target_mean_value):
    'offset array by value'
    return (data - np.mean(data)) + target_mean_value

While technically correct,
this may not be useful in a few months.
We can include a bit more information if we use three sets of quotation marks,
so we can include text on multiple lines:

In [7]:
def offset_mean(data, target_mean_value):
    '''Return a new array containing the original data
       with its mean offset to match the desired value.'''
    return (data - np.mean(data)) + target_mean_value

This format allows us to access the help documentation,
in the same way we would another function:

In [8]:
help(offset_mean)

Help on function offset_mean in module __main__:

offset_mean(data, target_mean_value)
    Return a new array containing the original data
    with its mean offset to match the desired value.



#### Challenge-docstring

In our last class, we created a function for temperature conversion.
Add a docstring to document the code:
```python
def fahr_to_celsius(temp):
    return ((temp - 32) * (5/9))
```

## Defining defaults

The next step in developing our function is to define defaults.
The default parameters are what the function assumes to use,
unless you specify something different.

An example of a program default can been seen in how we loaded our data:

```
np.loadtxt(fname="data/inflammation-01.csv", delimiter=",")
```

Without `delimiter=","`, 
the function will assume the columns are separated by whitespace.

We can include a default value in our custon function,
here setting the target mean to zero:

In [9]:
def offset_mean(data, target_mean_value=0.0):
    '''Return a new array containing the original data with its mean offset to match the
       desired value (0 by default).
    Example: offset_mean([1, 2, 3], 0) => [-1, 0, 1]'''
    return (data - np.mean(data)) + target_mean_value

Of course, we should also update the documentation to ensure it still reflects the current state of the function.
It is also useful to include an example 
(input and output) of how the function is used.

We can use our function to explore how Python handles defaults,
by testing our function and still using two arguments:

In [10]:
test_data = np.zeros((2, 2))
print("test data", test_data)
print("offset data", offset_mean(test_data, 3))

test data [[0. 0.]
 [0. 0.]]
offset data [[3. 3.]
 [3. 3.]]


Here we've also used `np.zeros`, 
a useful function for creating test datasets entirely of zeros 
(which allow mathematical manipulations to be more easily interpreted).

If we wanted to apply the default value of zero,
we would need to modify our test data:

In [11]:
# modify data so default value can be applied with effect
more_data = 5 + np.zeros((2, 2))
print('data before mean offset:', more_data)
# offsetting data with default parameter
print('offset data:', offset_mean(more_data))

data before mean offset: [[5. 5.]
 [5. 5.]]
offset data: [[0. 0.]
 [0. 0.]]


Indeed, we can see the offset still working,
even though we haven't specified a value by which to offset.

Let's explore how Python matches values to parameters
using a simple example function:

In [12]:
def display(a=1, b=2, c=3):
    print('a:', a, 'b:', b, 'c:', c)

print('no parameters:')
display()
print('one parameter:')
display(55)
print('two parameters:')
display(55, 66)

no parameters:
a: 1 b: 2 c: 3
one parameter:
a: 55 b: 2 c: 3
two parameters:
a: 55 b: 66 c: 3


The output above illustrates two rules for parameters:

- parameters are matched with values inside the parentheses from left to right 
- any parameter without a value assigned by the user is automatically given the default value

The example above relies solely on the relative position of values.
We can override the positional assumptions by naming a value as it's entered:

In [13]:
display(c=77)

a: 1 b: 2 c: 77


Here, the default values for `a` and `b` remain unchanged,
while `c` has changed to suit our needs.

> You may see references to values or parameters being "passed."
This is the same thing as saying "entering" or "giving" a value,
such as `c=11` in the example above. 
In this case, 11 is the value being passed.

With this understanding of default values,
you should now be better equipped to understand the help documentation associated with the functions we've already been using.

Executing `help(np.loadtxt)`, for example,
lets us know the function:

- has a parameter called fname that doesn’t have a default value
- has eight others parameters that do have default values
- needs to have delimiters defined 
(e.g., `delimiter=","` rather than only `","`) 
because there are so many other parameters available

#### Challenge-parameters

What does the following code display when run, and why?

```python
def numbers(one, two=2, three, four=4):
    n = str(one) + str(two) + str(three) + str(four)
    return n

print(numbers(1, three=3))
```

#### Challenge-parameters2

What does the following code display when run, and why?

```python
def func(a, b=3, c=6):
    print('a: ', a, 'b: ', b, 'c:', c)

func(-1, 2)
```

## Assertions and defensive programming

Understanding how to effectively debug is important for our ability to write code,
but is also useful for protecting ourselves from making future mistakes.
In this section, we'll explore defensive programming as a strategy to improve our effectiveness as coders.
Defensive programming assumes that mistakes will happen,
and provides information (via code) to guard against them.

Assertions are one of the main tools we can apply in programming defensively. 
Assertions are statements assessing whether something is true at a given point in a program.
If the statement is true, the program proceeds,
but if it is false, Python prints a specified error message.

Let's take a look at how assertions work with a simple example.
We'll create two test datasets,
one that includes numbers that are all positive, 
and another that includes negative numbers: 

In [14]:
numbers_pos = [1.5, 5.2, 3.5, 4.1]
numbers_neg = [1.5, 5.2, -3.5, 4.1]

Next, we'll write a for loop that adds the numbers together.

The assertion we include here requires that all data should be positive.
Technically, an assertion only needs state the criteria 
(e.g., `assert n > 0.0`), 
but we include a statement that makes the mistake understandable
without having to dig into the code.

In [15]:
total = 0.0
for n in numbers_pos:
    assert n > 0.0, 'Data should only contain positive values'
    total += n
print('total is:', total)

total is: 14.299999999999999


The loop runs fine with `numbers_pos`, 
which we can treat like a positive control in an experimental setting.
If we replace it with `numbers_neg`, 
though, the assertion is reported:

```python
total = 0.0
for n in numbers_neg:
    assert n > 0.0, 'Data should only contain positive values'
    total += n
print('total is:', total)
```

```python
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-14-7c882966c113> in <module>()
      1 total = 0.0
      2 for n in numbers_neg:
----> 3     assert n > 0.0, 'Data should only contain positive values'
      4     total += n
      5 print('total is:', total)

AssertionError: Data should only contain positive values
```

We can then use this error information to help us figure out how to use the code correctly.
While the code above is a loop, 
keep in mind that assertions are often included in functions.

There are three types of assertions:

- *precondition:* something that must be true at the start of the function in order for it to work correctly
- *postcondition:* something that the function guarantees is true when it finishes
- *invariant:* something that is always true at a particular point inside a piece of code

#### Challenge-pre-post

Suppose you are writing a function called `average` that calculates the average of the numbers in a list. What pre-conditions and post-conditions would you write for it?

## Wrapping up

In this class, 
we explored types of errors in Python, 
testing/validating functions,
including help documentation and defaults for functions,
and assertions to program defensively.

In our next and final class,
we'll continue to develop our functions for other people to use 
by exploring debugging, test-driven development, 
and creating command-line programs from our python scripts.