# Intermediate Python: Programming
# Class 4

So far in this course,
we've learned about different programming structures in Python to assist in automating tasks, 
and also explored ways to document and define defaults for functions.
In today's class, 
we'll complete the course by thinking about ways we can make our functions as robust and reusable as possible.

By the end of this class,
you should be able to:

- perform debugging on functions creating errors
- understand principles of test-driven development
- apply assertions to program defensively
- write command-line programs for Python code

## Debugging 

In our previous class, 
we discussed testing and validating our code.
If you do identify a problem that either prevents the code from working,
or that prevents the code from accomplishing the task you had in mind,
you'll need to debug your code.

> If you would like additional explanations for the concepts covered in this section, more detail is available in the original lessons from [Software Carpentry](https://swcarpentry.github.io/python-novice-inflammation/11-debugging/index.html).

Debugging code associated with research analysis is particularly challenging.
We're writing code to find out an answer to a question, 
so validating that our answers are accurate is difficult.

Keeping in mind your overall goal for what the code should accomplish, 
a reasonable process for writing research software includes:

- *Testing with subset data:* Use a few example data points for testing before moving on to the entire dataset.
- *Testing with simplified data:* Use synthetic data, or subset to a simpler unit (e.g., only one chromosome instead of the entire genome.
- *Compare to known findings:* Use well-established reports from previously published literature, specifically from model systems, to confirm your software is finding equivalent results.
- *Check conservation laws:* Use summary statistics to confirm the number of samples included doesn't vary in unexpected ways. For example, if you are filtering out data, the number of data points should decreased rather than increase.
- *Visualize:* Although it's difficult to compare figures in an automated way (e.g., with a computer), we can use data visualizations to confirm our assumptions are being met. In fact, we modeled this approach in earlier classes in this course.

We'll generally 
With these general steps in mind,
let's explore some specific approaches to assist in the debugging process.

- *Ensure failures are consistent.* Double check that you've executed all the code that your problem code needs to run, and that you're using the data you initially intended. It's easy to blame the code for not working, when it's actually a mistake we made when trying to run it.
- *Fail quickly and efficiently:* Minimize the time it takes to get the error to resurface, and isolate the portion of the code involved.
- *Change one thing at a time*: Make only one alteration before testing your code again, and be rational in choosing what change to try next.
- *Keep track of what you've done:* Don't make yourself repeat an experiment, and also be able to remember what happened when you last tried something.
- *Ask for help:* Whether someone in your lab group, other members of your computational community, or strangers online, you might be able to save yourself a lot of time and energy by leveraging someone else's expertise. As a bonus, it's possible that framing your problem in terms someone else could understand will help you figure it out on your own!

#### Challenge-debug

The following code computes the Body Mass Index (BMI) of patients. 
BMI is calculated as weight in kilograms divided by the square of height in metres.
The test results indicate all patients appear to have have unusual and identical BMIs, 
despite having different physiques. 
What suggestions do you have to improve this code?

```
patients = [[70, 1.8], [80, 1.9], [150, 1.7]]

def calculate_bmi(weight, height):
    return weight / (height ** 2)

for patient in patients:
    weight, height = patients[0]
    bmi = calculate_bmi(height, weight)
    print("Patient's BMI is: %f" % bmi)
```

#### Challenge-pair

Take one of the functions you've written for this course and deliberately introduce an error. Share that error with one of the other course participants and attempt to debug each other's errors.

## Assertions and defensive programming

Understanding how to effectively debug is important for our ability to write code,
but is also useful for protecting ourselves from making future mistakes.
In this section, we'll explore defensive programming as a strategy to improve our effectiveness as coders.
Defensive programming assumes that mistakes will happen,
and provides information (via code) to guard against them.

Assertions are one of the main tools we can apply in programming defensively. 
Assertions are statements assessing whether something is true at a given point in a program.
If the statement is true, the program proceeds,
but if it is false, Python prints a specified error message.

Let's take a look at how assertions work with a simple example.
We'll create two test datasets,
one that includes numbers that are all positive, 
and another that includes negative numbers: 

In [12]:
numbers_pos = [1.5, 5.2, 3.5, 4.1]
numbers_neg = [1.5, 5.2, -3.5, 4.1]

Next, we'll write a for loop that adds the numbers together.

The assertion we include here requires that all data should be positive.
Technically, an assertion only needs state the criteria 
(e.g., `assert n > 0.0`), 
but we include a statement that makes the mistake understandable
without having to dig into the code.

In [13]:
total = 0.0
for n in numbers_pos:
    assert n > 0.0, 'Data should only contain positive values'
    total += n
print('total is:', total)

total is: 14.299999999999999


The loop runs fine with `numbers_pos`, 
which we can treat like a positive control in an experimental setting.
If we replace it with `numbers_neg`, 
though, the assertion is reported:

```python
total = 0.0
for n in numbers_neg:
    assert n > 0.0, 'Data should only contain positive values'
    total += n
print('total is:', total)
```

```python
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-14-7c882966c113> in <module>()
      1 total = 0.0
      2 for n in numbers_neg:
----> 3     assert n > 0.0, 'Data should only contain positive values'
      4     total += n
      5 print('total is:', total)

AssertionError: Data should only contain positive values
```

We can then use this error information to help us figure out how to use the code correctly.
While the code above is a loop, 
keep in mind that assertions are often included in functions.

There are three types of assertions:

- *precondition:* something that must be true at the start of the function in order for it to work correctly
- *postcondition:* something that the function guarantees is true when it finishes
- *invariant:* something that is always true at a particular point inside a piece of code

## Test-drive development

normal tendency is to do:
- Write a function range_overlap.
- Call it interactively on two or three different inputs.
- If it produces the wrong answer, fix the function and re-run that test.

better practice is to:
- Write a short function for each test.
- Write a range_overlap function that should pass those tests.
- If range_overlap produces any wrong answers, fix it and re-run the test functions.

## Command-line programs

In [None]:
# switch to command line
# download code file: http://swcarpentry.github.io/python-novice-inflammation/data/python-novice-inflammation-code.zip
