In [30]:
from IPython import display
import numpy

## Software Carpentry - Python part 2

12:40 - Recap of yesterday  
12:50 - Making choices (conditional statements)  
13:15 - Creating functions  
13:40 - Exercises in Breakout Room

**14:00 - Coffee break**  

14:15 - Errors and exceptions   
14:40 - Defensive programming  
15:05 - Debugging

**15:30 - Coffee break**  

15:45 - Command-line programs  
16:30 - Exercises in Breakout Room  
16:50 - Wrap up  

# Recap part 1

**Goal: Analysis inflammation data**
1. Python fundamentals on variables and lists
1. Import and analyse data with `numpy`
1. Create figure with subplots using `matplotlib.pyplot`
1. Repeat analysis in a for-loop together with `glob`

**Today**
1. Catch suspicious datasets with conditional statements
1. Create reusable function with the analysis



# 5. Making choices



# Exercise 5 - How many paths?

Consider this code:

```python
if 4 > 5:
    print('A')
elif 4 == 5:
    print('B')
elif 4 < 5:
    print('C')
```

Which of the following would be printed if you were to run this code? Why did you pick this answer?

1. A
1. B
1. C
1. B and C

In [31]:
# if-statement
num = 37
if num > 100: # notice colon
    print('greater') # notice indentation
else:
    print('not greater')
print('done')

not greater
done


In [32]:
num = 54
print('before conditional')
if num > 100:
    print(num, 'is greater than 100')
print('after conditional')

before conditional
after conditional


In [33]:
# Chain multiple if-statements
num = -3

if num > 0:
    print(num, 'is positive')
elif num == 0: 
    print(num, 'is zero')
else:
    print(num, 'is negative')

-3 is negative


In [34]:
# Difference between == and =
print(num == 4)
num = 4 
print(num == 4)

False
True


In [35]:
# Combine comparisons with 'and'
if (1 > 0) and (-1 >= 0): # best practice to use parentheses
    print('both parts are true')
else:
    print('at least one part is false')

at least one part is false


In [36]:
# Combine with 'or'
if (1 > 0) or (-1 >= 0):
    print('at least one test is true')

at least one test is true


In [37]:
# Let's inspect max values of the first dataset
data = numpy.loadtxt(fname='../data/inflammation-01.csv', delimiter=',')

max_inflammation = numpy.max(data, axis=0)
print(max_inflammation)

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.
 18. 19. 20. 19. 18. 17. 16. 15. 14. 13. 12. 11. 10.  9.  8.  7.  6.  5.
  4.  3.  2.  1.]


In [38]:
# Let's use sum == 0
if numpy.sum(min_inflammation) == 0:
    print('Minima add up to zero')

### Optional: Let's test all our datasets

1. Find files
1. Create for-loop over filenames
1. Load data
1. Test data

In [39]:
# Demonstrate how we can reuse a function to run the code bellow
import glob
import numpy

filenames = glob.glob('../data/inflammation*.csv')  # Store all the file names in memory

for filename in filenames:                       # Do the following code for each file name stored    
    data = numpy.loadtxt(fname=filename, delimiter=',')
    
    # Data to test
    max_inflammation = numpy.max(data, axis=0)
    min_inflammation = numpy.min(data, axis=0)
    
    if (max_inflammation[0] == 0) and (max_inflammation[20] == 20):
        print('Suspicious looking maxima in:', filename) 
    elif numpy.sum(min_inflammation) == 0:
        print('Minima add up to zero in:', filename) 
    else:
        print(filename, ' looks OK') 

Suspicious looking maxima in: ../data/inflammation-02.csv
Suspicious looking maxima in: ../data/inflammation-10.csv
Minima add up to zero in: ../data/inflammation-11.csv
Suspicious looking maxima in: ../data/inflammation-06.csv
Suspicious looking maxima in: ../data/inflammation-01.csv
Suspicious looking maxima in: ../data/inflammation-05.csv
Minima add up to zero in: ../data/inflammation-03.csv
Suspicious looking maxima in: ../data/inflammation-12.csv
Suspicious looking maxima in: ../data/inflammation-07.csv
Minima add up to zero in: ../data/inflammation-08.csv
Suspicious looking maxima in: ../data/inflammation-09.csv
Suspicious looking maxima in: ../data/inflammation-04.csv


### Key points

* Use `if condition` to start a conditional statement, `elif condition` to provide additional tests, and `else` to provide a default.
* The bodies of the branches of conditional statements must be indented.
* Use `==` to test for equality.
* `X and Y` is only true if both `X` and `Y` are true.
* `X or Y` is true if either `X` or `Y`, or both, are true.
* True and False represent truth values.

### Questions?

In [40]:
# Turning our for loop into a function

# 6. What is a function and how to create it



In [41]:
# Demonstrate simple functions

# Explain arguments (Default and non-default parameters)

# A function that calls a function

# Examples:
def s(p):
    a = 0
    for v in p:
        a += v
    m = a / len(p)
    d = 0
    for v in p:
        d += (v - m) * (v - m)
    return numpy.sqrt(d / (len(p) - 1))

def std_dev(sample):
    sample_sum = 0
    for value in sample:
        sample_sum += value

    sample_mean = sample_sum / len(sample)

    sum_squared_devs = 0
    for value in sample:
        sum_squared_devs += (value - sample_mean) * (value - sample_mean)

    return numpy.sqrt(sum_squared_devs / (len(sample) - 1))

# Exercise 6 - Mixing Default and Non-Default Parameters 

Given the following code:

```python
def numbers(one, two=2, three, four=4):
    n = str(one) + str(two) + str(three) + str(four)
    return n

print(numbers(1, three=3))
```

What do you expect will be printed? What is actually printed? What rule do you think Python is following?
1. `1234`
1. `one2three4`
1. `1239`
1. `SyntaxError`

In [42]:
# There will be a syntax error
def numbers(one, two=2, three, four=4):
    n = str(one) + str(two) + str(three) + str(four)
    return n

print(numbers(1, three=3))

SyntaxError: non-default argument follows default argument (<ipython-input-42-23cd32300a79>, line 2)

# Solution

This example shows us what Python rules are when it comes to providing input parameters to functions.

The **SyntaxError** shows us the problem is in the definition of the function: 

<span style="color:red">SyntaxError: </span> non-default argument follows default argument

The defined parameters `two` and `four` are given **default values**. Because `one` and `three` are not given default values, they are required to be placed before any parameters that have default values.

* Define a function using `def function_name(parameter)`.
* The body of a function must be indented.
* Call a function using function_name(value).
* Variables defined within a function can only be seen and used within the body of the function.
* Use help(thing) to view help for something.
* Put docstrings in functions to provide help for that function.
* Specify default values for parameters when defining a function using name=value in the parameter list.
* Readability

# Breakout Session 1

# Exercise 7 - Rescaling an array

Write a function `rescale` that takes an array as input and returns a corresponding array of values scaled to lie in the range 0.0 to 1.0. 

Hint: If `L` and `H` are the lowest and highest values in the original array, then the replacement for a value `v` should be `(v-L) / (H-L)`.

In [None]:
data = numpy.arange(10.0)
data

In [None]:
# Solution
def rescale(input_array):
    '''
    Takes a numpy array and returns a corresponging numpy array of values scaled to lie in the range of 0.0 to 1.0
    '''
    L = numpy.min(input_array)
    H = numpy.max(input_array)
    output_array = (input_array - L) / (H - L)
    return output_array

Run the commands `help(numpy.arange)` and `help(numpy.linspace)` to see how to use these functions to generate regularly-spaced values, then use those values to test your `rescale` function. Once you’ve successfully tested your function, add a docstring that explains what it does.

In [43]:
help(rescale)

Help on function rescale in module __main__:

rescale(input_array, low_val=0.0, high_val=1.0)
    rescales input array values to lie between low_val and high_val



In [None]:
rescale(numpy.arange(10.0))

### Optional
Rewrite the rescale function so that it scales data to lie between `0.0` and `1.0` by default, but will allow the caller to specify lower and upper bounds if they want. Compare your implementation to your neighbor’s: do the two functions always behave the same way?

In [None]:
# Solution
def rescale(input_array, low_val=0.0, high_val=1.0):
    """rescales input array values to lie between low_val and high_val"""
    L = numpy.min(input_array)
    H = numpy.max(input_array)
    intermed_array = (input_array - L) / (H - L)
    output_array = intermed_array * (high_val - low_val) + low_val
    return output_array

# 7. Errors and Exceptions



### Key points
* Tracebacks can look intimidating, but they give us a lot of useful information about what went wrong in our program, including where the error occurred and what type of error it was.
* An error having to do with the ‘grammar’ or syntax of the program is called a `SyntaxError`. If the issue has to do with how the code is indented, then it will be called an `IndentationError`.
* A `NameError` will occur when trying to use a variable that does not exist. 
* Containers like lists and strings will generate errors if you try to access items in them that do not exist. This type of error is called an `IndexError`.
* Trying to read a file that does not exist will give you an `FileNotFoundError`. 


### Questions?

# 8. Defensive Programming


### Key points

* Program defensively, i.e., assume that errors are going to arise, and write code to detect them when they do.
* Put assertions in programs to check their state as they run, and to help readers understand how those programs are supposed to work.
* Use preconditions to check that the inputs to a function are safe to use.
* Use postconditions to check that the output from a function is safe to use.

### Questions?

# Breakout Session 2

# Exercise 9 - Adding a Help Message
Create a copy of `readings_06.py` and modify it so that if no parameters are given (i.e., no action is specified and no filenames are given), it prints a message explaining how it should be used.

In [None]:
# Add a help message

# Exercise 10 - Adding a help message

Create a copy of `readings_06.py` and modify it so that if no action is given it displays the means of the data.

# Bonus exercise - Finding Particular Files

Using the `glob` module introduced earlier, write a simple version of `ls` that shows files in the current directory with a particular suffix. A call to this script could look like this:

```bash
$ python my_ls.py py
```

with a possible output of:

```bash
readings_01.py
readings_02.py
readings_03.py
...
```

# 9. Debugging

When your tests uncovered problems or you are encountering errors, the next step is to fix them. We can set up a guide on how to approach debugging your code:

1. **Know what your program is supposed to do.** The statement _"it doesn't work"_ is not very helpful, make use of the error traceback for example.
1. **Test with simplified (dummy) data.** Similarly to our inflammation analysis, first check your program with a single data set that you understand, before scaling up. 
1. **Test a simplified case.**  If our program is supposed to simulate magnetic eddies in rapidly-rotating blobs of supercooled helium, our first test should be a blob of helium that isn’t rotating, and isn’t being subjected to any external electromagnetic fields.
1. **Compare to an oracle.** An orcacle is a program, device, data set, or human being against which the results of a test can be compared.
1. **Check conservation laws.** Mass, energy, and other quantities are conserved in physical systems, so they should be in programs as well. Similarly, if we are analyzing patient data, the number of records should either stay the same or decrease as we move from one analysis to the next (since we might throw away outliers or records with missing values).
1. **Visualize.** You can spot problems with the data analysis by frequently creating simple plots.

### Make it fail ...

**Every time**
There's nothing worse than an intermittent problem: if we have to call a function a dozen times to get a single failure, the odds are good that we’ll scroll past the failure when it actually occurs.
* Check the environment: correct folder, correct data, correct parameters, correct installed libraries, ...

**Fast**
* If it takes 20 min for a bug to pop up (example during a simulation), it becomes very difficult to resolve the issue effectively
* The smaller the gap between cause and effect, the easier the connection is to find. Divide and conquer!


# 10. Command-Line Programs


### Key points
* The `sys` library connects a Python program to the system it is running on.
* The list `sys.argv` contains the command-line arguments that a program was run with.
* The pseudo-file `sys.stdin` connects to a program’s standard input.

### Questions?