## Defensive programming

This should fail due to the assertion that the inputs be positive values

In [1]:
numbers = [1.5, 2.3, 0.7, -0.001, 4.4]
total = 0.0
for num in numbers:
    assert num > 0.0, 'Data should only contain positive values'
    total += num
print('total is:', total)

AssertionError: Data should only contain positive values

Trying different kinds of assertions:
- precondition
    - must be true at start of function
- postcondition
    - something that must be true when the function finishes
- invariant
    - something that is always true at that particular point in the code

In [None]:
def normalize_rectangle(rect):
    """Normalizes a rectangle so that it is at the origin and 1.0 units long on its longest axis.
    Input should be of the format (x0, y0, x1, y1).
    (x0, y0) and (x1, y1) define the lower left and upper right corners
    of the rectangle, respectively."""
    assert len(rect) == 4, 'Rectangles must contain 4 coordinates' # precondition
    x0, y0, x1, y1 = rect
    assert x0 < x1, 'Invalid X coordinates' # precondition
    assert y0 < y1, 'Invalid Y coordinates' # precondition

    dx = x1 - x0
    dy = y1 - y0
    if dx > dy:
        scaled = dx / dy
        upper_x, upper_y = 1.0, scaled
    else:
        scaled = dx / dy
        upper_x, upper_y = scaled, 1.0

    assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid' # postcondition
    assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid' # postcondition

    return (0, 0, upper_x, upper_y)

Testing the preconditions

In [None]:
print(normalize_rectangle( (0.0, 1.0,2.0))) # should fail precondition on line 6

AssertionError: Rectangles must contain 4 coordinates

In [None]:
print(normalize_rectangle( (4.0, 2.0, 1.0, 5.0) )) # should fail precondition on line 8

AssertionError: Invalid X coordinates

In [None]:
print(normalize_rectangle( (0.0, 1.0, 2.0, 1.0))) # should fail precondition on line 9

AssertionError: Invalid Y coordinates

Testing postconditions

In [None]:
print(normalize_rectangle( (0.0, 0.0, 1.0, 5.0))) # should pass postcondition because the rectangle is taller than it is wide

(0, 0, 0.2, 1.0)


In [None]:
print(normalize_rectangle( (0.0, 0.0, 5.0, 1.0))) # should fail postcondition on line 21 because the rectangle is wider than it is tall

AssertionError: Calculated upper Y coordinate invalid

This tells us that there is something wrong with our calculations, so it's an assertion more to help the developer than the user. We can fix this error by correcting line 14 to divide dy by dx instead of dx by dy.

In [None]:
def normalize_rectangle(rect):
    """Normalizes a rectangle so that it is at the origin and 1.0 units long on its longest axis.
    Input should be of the format (x0, y0, x1, y1).
    (x0, y0) and (x1, y1) define the lower left and upper right corners
    of the rectangle, respectively."""
    assert len(rect) == 4, 'Rectangles must contain 4 coordinates'
    x0, y0, x1, y1 = rect
    assert x0 < x1, 'Invalid X coordinates'
    assert y0 < y1, 'Invalid Y coordinates'

    dx = x1 - x0
    dy = y1 - y0
    if dx > dy:
        scaled = dy / dx
        upper_x, upper_y = 1.0, scaled
    else:
        scaled = dx / dy
        upper_x, upper_y = scaled, 1.0

    assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid'
    assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid'

    return (0, 0, upper_x, upper_y)

In [None]:
print(normalize_rectangle( (0.0, 0.0, 5.0, 1.0))) # now that we corrected our function this test passes too

(0, 0, 1.0, 0.2)


Test-driven development of range overlap function

1. write function for each test
2. write a `range_overlap` function that should pass tests
3. if `range_overlap` produces any wrong answers, fix it and rerun test functions

In [None]:
def range_overlap(ranges): # define empty function
    pass

Now we can establish some tests. For now, these should fail because `range_overlap` doesn't do anything

In [None]:
# test statements
assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0) # test 1
assert range_overlap([ (2.0, 3.0), (2.0, 4.0)]) == (2.0, 3.0) # test 2
assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0)]) == (0.0, 1.0) # test 3

AssertionError: 

All of the tests above are cases where the inputs are valid and check that `range_overlap` does what we want. But what about if the inputs don't overlap?

In [None]:
assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None # no overlap
assert range_overlap([ (0.0, 1.0), (1.0, 2.0)]) == None # touching but not overlapping

The tutorial thinks these should fail still because `range_overlap` is still empty, but that isn't happening for me.

In [None]:
range_overlap([(0.0, 1.0), (1.0, 2.0)])

Oh well, moving on.

Let's actually make this function.

In [None]:
def range_overlap(ranges):
    """Return common overlap among a set of [left, right] ranges."""
    max_left = 0.0
    min_right = 1.0
    for (left, right) in ranges:
        max_left = max(max_left, left)
        min_right = min(min_right, right)
    return (max_left, min_right)

And let's make a test function so we don't have to run each assertion separately

In [None]:
def test_range_overlap():
    assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0) # test 1
    assert range_overlap([ (2.0, 3.0), (2.0, 4.0)]) == (2.0, 3.0) # test 2
    assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0)]) == (0.0, 1.0) # test 3
    assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None # no overlap
    assert range_overlap([ (0.0, 1.0), (1.0, 2.0)]) == None # touching but not overlapping
    assert range_overlap([]) == None

In [None]:
test_range_overlap()

AssertionError: 

So we know we fail the second test, but we don't know if any of the tests after this fail, but at least we can go and solve this problem.

We are definitely causing an issue because we are initializing with values that aren't related to the data! **Always initialize from data**

## Debugging

Well I did type up thoughts as I worked on it, but then I forgot to save it before committing.

## Command-line programs

Again with the not saving...

Since I didn't both with the data for the tutorial, I'm just going to put the code for this here rather than a code chunk.

We can set up a script to run as a program, let's say it's called `code.py`
```
import sys
import numpy

def main():
    script = sys.argv[0]
    filename = sys.argv[1]
    data = numpy.loadtxt(filename, delimiter=',')
    for row_mean in numpy.mean(data, axis=1):
        print(row_mean)

if __name__ == '__main__':
    main()
```

Now we can run this from the terminal with `python code.py test.csv` to get the row means from the `test.csv` file.

The `if __name__ == '__main__': main()` portion of the script allows us to distinguish between a file being imported (in which case `__name__` is the file name) and running the file as a script (in which case `__name__` is set to `__main__`).

### Handling multiple files

We can loop over multiple input files with a loop over `sys.argv[1:]`

```
import sys
import numpy

def main():
    script = sys.argv[0]
    for filename in sys.argv[1:]:
        data = numpy.loadtxt(filename, delimiter = ',')
        for row_mean in numpy.mean(data, axis = 1):
            print(row_mean)

if __name__ == '__main__':
    main()
```



### Handling command-line flags
Now we can add flags:

```
import sys
import numpy

def main():
    script = sys.argv[0]
    action = sys.argv[1]
    filenames = sys.argv[2:]

    for filename in filenames:
        data = numpy.loadtxt(filename, delimiter = ',')

        if action == '--min':
            values = numpy.amin(data, axis = 1)
        elif action == '--mean':
            values = numpy.mean(data, axis = 1)
        elif action == '--max':
            values = numpy.amax(data, axis = 1)
        
        for val in values:
            print(val)

if __name__ == '__main__':
    main()
```

Now we can run commands like: `python code.py --max test.csv` to get the maximum values of each row of `test.csv`.

But there are some issues that this set up causes. We are passing arguments with an assumption about how many flags and in what order various inputs are provided. Also the `main()` function is long and not very readable and there is no way to check that the action flag will actually result in an output.

```
import sys
import numpy

def main():
    script = sys.argv[0]
    action = sys.argv[1]
    filenames = sys.argv[2:]

    assert action in ['--min', '--mean', '--max'], \
        'Action is not one of --min, --mean, or --max: ' + action
    
    for filename in filenames:
        process(filename, action)

def process(filename, action):
    data = numpy.loadtxt(filename, delimiter = ',')

    if action == '--min':
        values = numpy.amin(data, axis = 1)
    elif action == '--mean':
        values = numpy.mean(data, axis = 1)
    elif action == '--max':
        values = numpy.amax(data, axis = 1)
    
    for val in values:
        print(val)

if __name__ == '__main__':
    main()
```


### Handling standard input
Now let's use standard input so that this can be used in a pipeline with inputs redirected to it.

For this we will use a new script `count_stdin.py`
```
import sys

count = 0
for line in sys.stdin:
    count += 1

print(count, 'lines in standard input')
```

We will use this a little differently: `python count_stdin.py < test.csv` rather than `python count_stdin.py test.csv`. The `<` redirects the file to standard input.

Now we can rewrite our previous program so that it will load in data from `sys.stdin` if there isn't a filename provided. We don't need to make any changes to the `process()` function, we just need to tell the `main()` function how to handle a lack of filenames.

```
import sys
import numpy

def main():
    script = sys.argv[0]
    action = sys.argv[1]
    filenames = sys.argv[2:]
    assert action in ['--min', '--mean', '--max'], \
           'Action is not one of --min, --mean, or --max: ' + action
    if len(filenames) == 0:
        process(sys.stdin, action)
    else:
        for filename in filenames:
            process(filename, action)

def process(filename, action):
    data = numpy.loadtxt(filename, delimiter = ',')

    if action == '--min':
        values = numpy.amin(data, axis=1)
    elif action == '--mean':
        values = numpy.mean(data, axis=1)
    elif action == '--max':
        values = numpy.amax(data, axis=1)

    for val in values:
        print(val)

if __name__ == '__main__':
    main()
```