# Python part 2


# 5. Making choices

In our last lesson, we discovered something suspicious was going on in our inflammation data by drawing some plots. How can we use Python to automatically recognize the different features we saw, and take a different action for each? In this section, we’ll learn how to write code that runs only when certain conditions are true.

In [1]:
# if-statement to check whether number is greater than 100
num = 37
if num > 100: # notice colon
    print('greater') # notice indentation
else:
    print('not greater')
print('done')

not greater
done


Conditional statements don’t have to include an else. If there isn’t one, Python simply does nothing if the test is false:

In [2]:
# Conditional statements don't have to include an `else`
num = 54
print('before conditional')
if num > 100:
    print(num, 'is greater than 100')
print('after conditional')

before conditional
after conditional


Consider conditional statements carefully to check which conditions are considered and you would like to consider. In the above example, `num < 100` is not triggered. Best practice to 

In [3]:
# Chain multiple if-statements
num = -3

if num > 0:
    print(num, 'is positive')
elif num == 0: # To test equality, we use a double equals sign, rather than single for assigning values
    print(num, 'is zero')
else:
    print(num, 'is negative')


-3 is negative


In [4]:
# Optional: Difference between == and =
print(num == 4)
num = 4 
print(num == 4)

False
True


In [5]:
# True and False are special values in Python
type(num == 4)

bool

We can also combine tests using `and` and `or`. `and` is only true if both parts are true:

In [6]:
# Combine comparisons with 'and'
if (1 > 0) and (-1 >= 0): # best practice to use parentheses
    print('both parts are true')
else:
    print('at least one part is false')

at least one part is false


In [7]:
# Combine with 'or'
if (1 > 0) or (-1 >= 0):
    print('at least one test is true')

at least one test is true


## Checking our Data

1. Let's rerun the `inflammation_analysis.ipynb` 
1. Discuss data 

Let's catch the suspicious data

From the first couple of plots, we saw that maximum daily inflammation exhibits a strange behavior and raises one unit a day. Wouldn’t it be a good idea to detect such behavior and report it as suspicious? Let’s do that! However, instead of checking every single day of the study, let’s merely check if maximum inflammation in the beginning (day 0) and in the middle (day 20) of the study are equal to the corresponding day numbers.

*Question: What alternative approach could we have taken?*
- Use a for-loop to check each day separately
- Use `numpy.diff()` to check whether each value equals to 1

In [8]:
import numpy

# Let's inspect max values of the first dataset
data = numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')

max_inflammation = numpy.max(data, axis=0)
print(max_inflammation)

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.
 18. 19. 20. 19. 18. 17. 16. 15. 14. 13. 12. 11. 10.  9.  8.  7.  6.  5.
  4.  3.  2.  1.]


In [9]:
# Let use day_0 == 0 and day_20 == 20
if (max_inflammation[0] == 0) and (max_inflammation[20] == 20):
    print('Suspicious looking maxima!')

Suspicious looking maxima!


In [10]:
# Example alternative solution
if all(numpy.diff(max_inflammation[:20])) == 1:
    print("The series is monotonically increasing")
else:
    print("The series is not monotonically increasing")

The series is monotonically increasing


We also saw a different problem in the third dataset; the minima per day were all zero (looks like a healthy person snuck into our study).

In [11]:
data = numpy.loadtxt(fname='data/inflammation-03.csv', delimiter=',')

min_inflammation = numpy.min(data, axis=0)
print(min_inflammation)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [12]:
# Let's use sum == 0
if numpy.sum(min_inflammation) == 0:
    print('Minima add up to zero')

Minima add up to zero


We can check both conditions with an `elif` condition

### Optional: Let's test all our datasets

1. Find files
1. Create for-loop over filenames
1. Load data
1. Test data

In [13]:
import glob
import numpy

filenames = sorted(glob.glob('data/inflammation*.csv'))
for filename in filenames:
    data = numpy.loadtxt(fname=filename, delimiter=',')
    
    # Data to test
    max_inflammation = numpy.max(data, axis=0)
    min_inflammation = numpy.min(data, axis=0)
    
    if (max_inflammation[0] == 0) and (max_inflammation[20] == 20):
        print('Suspicious looking maxima in:', filename) 
    elif numpy.sum(min_inflammation) == 0:
        print('Minima add up to zero in:', filename) 
    else:
        print(filename, ' looks OK') 

Suspicious looking maxima in: data\inflammation-01.csv
Suspicious looking maxima in: data\inflammation-02.csv
Minima add up to zero in: data\inflammation-03.csv
Suspicious looking maxima in: data\inflammation-04.csv
Suspicious looking maxima in: data\inflammation-05.csv
Suspicious looking maxima in: data\inflammation-06.csv
Suspicious looking maxima in: data\inflammation-07.csv
Minima add up to zero in: data\inflammation-08.csv
Suspicious looking maxima in: data\inflammation-09.csv
Suspicious looking maxima in: data\inflammation-10.csv
Minima add up to zero in: data\inflammation-11.csv
Suspicious looking maxima in: data\inflammation-12.csv


In this way, we have asked Python to do something different depending on the condition of our data. Here we printed messages in all cases, but we could also imagine not using the `else` catch-all so that messages are only printed when something is wrong, freeing us from having to manually examine every plot for features we’ve seen before.

# 6. Creating functions


At this point, we’ve written code to 
- draw some interesting features in our inflammation data
- loop over all our data files to quickly draw these plots for each of them
- have Python make decisions based on what it sees in our data. 

But, our code is getting pretty long and complicated; what if we had thousands of datasets, and didn’t want to generate a figure for every single one?

Cutting and pasting it is going to make our code get very long and very repetitive, very quickly. We’d like a way to package our code so that it is easier to reuse, and Python provides for this by letting us define things called `functions` — a shorthand way of re-executing longer pieces of code. Let’s start by defining a function `fahr_to_celsius` that converts temperatures from Fahrenheit to Celsius:

In [14]:
# Simple example function to convert Fahrenheit to Celsius
def fahr_to_celsius(temp_F):
    temp_C = (temp_F - 32) * (5 / 9)
    return temp_C

Let’s try running our function. This command should call our function, using “32” as the input and return the function value. In fact, calling our own function is no different from calling any other function:

In [15]:
fahr_to_celsius(32)

0.0

In [16]:
print('freezing point of water:', fahr_to_celsius(32), 'C')
print('boiling point of water:', fahr_to_celsius(212), 'C')

freezing point of water: 0.0 C
boiling point of water: 100.0 C


### Composing functions

Now that we’ve seen how to turn Fahrenheit into Celsius, we can also write the function to turn Celsius into Kelvin:

In [17]:
# Second example to convert Celsius to Kelvin
def celsius_to_kelvin(temp_C):
    return temp_C + 273.15 # you don't have to define a variable called temp_K

In [18]:
print('freezing point of water in Kelvin:', celsius_to_kelvin(0))

freezing point of water in Kelvin: 273.15


What about converting Fahrenheit to Kelvin? We could write out the formula, but we don’t need to. Instead, we can compose the two functions we have already created:

In [19]:
# Combine function to convert Fahrenheit to Kelvin
def fahr_to_kelvin(temp_F):
    temp_C = fahr_to_celsius(temp_F)
    temp_K = celsius_to_kelvin(temp_C)
    return temp_K

In [20]:
print('boiling point of water in Kelvin:', fahr_to_kelvin(212))

boiling point of water in Kelvin: 373.15


This is our first taste of how larger programs are built: we define basic operations, then combine them in ever-larger chunks to get the effect we want. Real-life functions will usually be larger than the ones shown here — typically half a dozen to a few dozen lines — but they shouldn’t ever be much longer than that, or the next person who reads it (which can be yourself!) won’t be able to understand what’s going on. General guideline: a function should perform one task.

### Variable scope

In composing our temperature conversion functions, we created variables inside of those functions, temp, temp_c, temp_f, and temp_k. We refer to these variables as local variables because they no longer exist once the function is done executing. If we try to access their values outside of the function, we will encounter an error:

In [21]:
print('Again, temperature in Kelvin was:', temp_K)

NameError: name 'temp_K' is not defined

If you want to reuse the temperature in Kelvin after you have calculated it with fahr_to_kelvin, you can store the result of the function call in a variable:

In [22]:
temp_kelvin = fahr_to_kelvin(212.0)
print('temperature in Kelvin was:', temp_kelvin)

temperature in Kelvin was: 373.15


### Optional: Defining defaults

We have passed parameters to functions in two ways: directly, as in `type(data)`, and by name, as in `numpy.loadtxt(fname='something.csv', delimiter=',')`. In fact, we can pass the filename to `loadtxt` without the `fname=`:

In [23]:
numpy.loadtxt('data/inflammation-01.csv', delimiter=',')

array([[0., 0., 1., ..., 3., 0., 0.],
       [0., 1., 2., ..., 1., 0., 1.],
       [0., 1., 1., ..., 2., 1., 1.],
       ...,
       [0., 1., 1., ..., 1., 1., 1.],
       [0., 0., 0., ..., 0., 2., 0.],
       [0., 0., 1., ..., 1., 1., 0.]])

but we still need to say `delimiter=`:

In [24]:
numpy.loadtxt('data/inflammation-01.csv', ',')

SyntaxError: unexpected EOF while parsing (<unknown>, line 1)

In [25]:
help(numpy.loadtxt)

Help on function loadtxt in module numpy:

loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes', max_rows=None, *, quotechar=None, like=None)
    Load data from a text file.
    
    Each row in the text file must have the same number of values.
    
    Parameters
    ----------
    fname : file, str, pathlib.Path, list of str, generator
        File, filename, list, or generator to read.  If the filename
        extension is ``.gz`` or ``.bz2``, the file is first decompressed. Note
        that generators must return bytes or strings. The strings
        in a list or produced by a generator are treated as lines.
    dtype : data-type, optional
        Data-type of the resulting array; default: float.  If this is a
        structured data-type, the resulting array will be 1-dimensional, and
        each row will be interpreted as an element of the array.  In this
        case, the number 

Optional arguments with defaults are handy: if we usually want a function to work one way, but occasionally need it to do something else, we can allow people to pass a parameter when they need to but provide a default to make the normal case easier. The example below shows how Python matches values to parameters:

In [26]:
def display_num(a=1, b=2, c=3):
    print('a:', a, 'b:', b, 'c:', c)

print('no parameters:')
display_num()
print('one parameter:')
display_num(55)
print('two parameters:')
display_num(55, 66)


no parameters:
a: 1 b: 2 c: 3
one parameter:
a: 55 b: 2 c: 3
two parameters:
a: 55 b: 66 c: 3


As this example shows, parameters are matched up from left to right, and any that haven’t been given a value explicitly get their default value. We can override this behavior by naming the value as we pass it in:

In [27]:
print('only setting the value of c')
display_num(c=77)

only setting the value of c
a: 1 b: 2 c: 77


In [28]:
# Adding a docstring
def display_number(a=1, b=2, c=3):
    ''' This is an example docstring
    
    
    '''
    print('a:', a, 'b', b, 'c', c)

In [29]:
help(display_number)

Help on function display_number in module __main__:

display_number(a=1, b=2, c=3)
    This is an example docstring



# Modular Code

1. Update `inflammation_analysis.ipynb` by introducing functions
2. Create a new script called `processing.py` and copy the functions there
3. Create a new file called `inflammation_analysis_refactored.ipynb` and import our functions to run the analysis

### Add docstrings

We should also write a docstring at the beginning of our function. This is a special type of comment that explains what the function does, what arguments it takes, and what it returns.

# 7. Errors and Exceptions

Every programmer encounters errors, both those who are just beginning, and those who have been programming for years. Encountering errors and exceptions can be very frustrating at times, and can make coding feel like a hopeless endeavour. However, understanding what the different types of errors are and when you are likely to encounter them can help a lot. Once you know why you get certain types of errors, they become much easier to fix.

In [30]:
def favorite_ice_cream():
    ice_creams = [
        'chocolate', 
        'vanilla',
        'strawberry'
    ]
    print(ice_creams[3])
    
favorite_ice_cream()

IndexError: list index out of range

This particular traceback has two levels. You can determine the number of levels by looking for the number of arrows on the left hand side. In this case:

1. The first shows code from the cell above, with an arrow pointing to Line 11 (which is `favorite_ice_cream()`).

2. The second shows some code in the function favorite_ice_cream, with an arrow pointing to Line 9 (which is `print(ice_creams[3])`).

The last level is the actual place where the error occurred. The other level(s) show what function the program executed to get to the next level down. So, in this case, the program first performed a function call to the function favorite_ice_cream. Inside this function, the program encountered an error on Line 6, when it tried to run the code `print(ice_creams[3])`.

In [31]:
# SyntaxError
def some_function()
    msg = 'hello, world'
    assert len(msg) => 3
    print(msg)
     return msg

SyntaxError: invalid syntax (826306346.py, line 2)

In [32]:
# NameError
print(aa)

NameError: name 'aa' is not defined

In [33]:
# TypeError
number = 'one'
sum_values = number + 1

TypeError: can only concatenate str (not "int") to str

In [38]:
# File errors
file = open('inflammation-13.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'inflammation-13.csv'

# 8. Defensive programming

The previous lessons have introduced the basic tools of programming: variables and lists, file I/O, loops, conditionals, and functions. What they haven’t done is show us how to tell whether a program is getting the right answer, and how to tell if it’s still getting the right answer as we make changes to it.

With that, I have some bad news for you: You will make mistakes! However, with that fact also comes knowledge. The first step toward getting the right answers from our programs is to assume that mistakes will happen and to guard against them. 

This is called defensive programming, and the most common way to do it is to add **assertions** to our code so that it checks itself as it runs. An assertion is simply a statement that something must be true at a certain point in a program. When Python sees one, it evaluates the assertion’s condition. If it’s true, Python does nothing, but if it’s false, Python halts the program immediately and prints the error message if one is provided. For example, this piece of code halts as soon as the loop encounters a value that isn’t positive:

In [35]:
# Some example assertions
assert 0 > 1, "0 should be smaller than 1"

AssertionError: 0 should be smaller than 1

In [36]:
# Calculate sum of positive numbers
numbers = [1.5, 2.3, 0.7, -0.1, 4.4]
total = 0
for num in numbers:
    assert num > 0, 'Data should only contain positive values'
    total = total + num
print('total is', total)

AssertionError: Data should only contain positive values

### What should we test in our analysis?

- Are the data loaded correctly?
- Are the data within the expected range?

In [37]:
import numpy

array = numpy.array([1, 1])
empty_array = numpy.array([])

print(array)
print(empty_array)

print(array.size)
print(empty_array.size)

assert array.size > 0, 'Expected non-empty array'
assert empty_array.size == 0, 'Expected non-empty array'

[1 1]
[]
2
0


### Go to inflammation_analysis.ipynb for defensive programming

Write a function `detect_problems_defensive()` that adds pre- and post-conditions