## Exceptions

Conditional execution - same category as if statements. The code in the `try` statement is run first. If an error occurs while running that code, the code under the `except` statement is run instead.

In [4]:
a = 'year'
b = 2018

This gives an error, because strings cannot be combined with numbers. Execution of the program is stopped.

In [3]:
a+b

TypeError: Can't convert 'int' object to str implicitly

This tries the same code, and converts both variables to a string if an error occurs. Execution of the program continues.

In [2]:
try:
    combined = a+b
except TypeError:
    combined = str(a)+str(b)
   
print(combined)

year2018


## Loops

Loops are a way of repeating commands over and over again.

A _while_ loop is an indefinite loop. It runs indefinitely until a certain condition is met. Be careful with while loops. If the condition is never met, the while loop will try to run forever.

In [39]:
i = 0
while i<5:
    i = i + 1
    print(i)
print('done with while loop')

1
2
3
4
5
done with while loop


A _for_ loop is a definite loop. It iterates over a list, array or other variable types that are groups of values or other variables. It definitely stops when it gets to the end of the list.

In [40]:
name_list = ['Bob','Jane','Mary']
for name in name_list:
    print("Hello " + name)
print('done')         

Hello Bob
Hello Jane
Hello Mary
done


An example of looping through a sequence of numbers created with the `np.arange()` function.

In [41]:
import numpy as np
print(np.arange(5))

[0 1 2 3 4]


In [42]:
print(np.arange(2,5))

[2 3 4]


In [43]:
print(np.arange(2,5,0.5))

[ 2.   2.5  3.   3.5  4.   4.5]


In [44]:
for i in np.arange(5):
    print(i*2)

0
2
4
6
8


__Exercise__:

Write a for loop that prints out the cumulative sum of an array. For example, the cumulative sum of the array

```
[1,3,6,4,7]
```

would be:

```
[1,4,10,14,21]
```

## Working with data files

In this tutorial, we will use data files as input. We will start with the most basic ways of importing data using core Python functions, then use Numpy and the data analysis package Pandas.

First we will use a small CTD calibration data set and use it to understand how to work with 2-D arrays.

Then, we will work with a larger data set from the 2013 West Coast Ocean Acidification cruise on the R/V Pt. Sur.

In [12]:
import numpy as np
from scipy import stats

### CTD calibration data

The lowest level method of using files as data input in Python is to work with file handles. The `open()` function loads the file into memory (that is, the computer's short-term memory where data can be accessed quickly). The file handle represents the location of the file in memory.

The first file that we will work with is data from Table 10.1 of McKillup and Dyar, Geostatistics Explained, Cambridge University Press, 2010.

Values are the weight percent of MgO present in tourmalines from three locations in Maine.
Each CSV file contains the same data in a different format.

In [13]:
filename = 'data/week02_MgO_data_example/MgO_Maine.csv'
fhand = open(filename)
fhand.read()

'Mount Mica,Sebago Batholith,Black Mountain\n7,         4,               1\n8,         5,               2\n10,        7,               4\n11,        8,               5\n'

Here `\n` represents a newline character. Newline characters are always present in text files, but text files make them invisible and display a new line.

Once the file handle has been read to the end of the file, it cannot be read any more.

In [14]:
fhand.read()

''

In order to read the file again, it has to be opened again. The contents of the file can also be read line by line. This can be useful if there is some processing step that has to be done to each line individually.

In [15]:
fhand = open(filename)
for line in fhand:
    print(line)
    print('next line')

Mount Mica,Sebago Batholith,Black Mountain

next line
7,         4,               1

next line
8,         5,               2

next line
10,        7,               4

next line
11,        8,               5

next line


For numeric data, the NumpPy package provides an easy way to load data from a text file directly into an array. You can specify the number of header lines, the delimiter that separates data values, and many other options for specifiying the format.

In [16]:
data = np.genfromtxt(filename,skip_header=1,delimiter=',')
print(data)

[[  7.   4.   1.]
 [  8.   5.   2.]
 [ 10.   7.   4.]
 [ 11.   8.   5.]]


In [17]:
type(data)

numpy.ndarray

This data set has two dimensions (rows and columns). Some data sets can have more than two dimensions. An oceanographic data set might have separate dimensions for time, depth, latitude and longitude. This data set could be visualized as a series of cubes.

In [18]:
np.ndim(data) # number of dimensions

2

For this two-dimensional data set, the "shape" is specified by the number of rows and columns.

In [19]:
np.shape(data) # (rows, columns)

(4, 3)

To obtain just the number of rows or columns, use indexing.

In [20]:
np.shape(data)[1] # just the columns

3

Rows and columns can be accessed by "slicing" the data set. The location of each value in the array can be represented by two indices. The first represents the row index, and the second represents the column index.

In [21]:
data[1,2]

2.0

A colon `:` can be used to access all values in a row or column. To access all rows in the 0 column:

In [22]:
data[:,0]

array([  7.,   8.,  10.,  11.])

In [23]:
data

array([[  7.,   4.,   1.],
       [  8.,   5.,   2.],
       [ 10.,   7.,   4.],
       [ 11.,   8.,   5.]])

Indices can also referenced to the end of the rows and columns with negative numbers.

In [24]:
data[-3,-2]

5.0

Numpy functions can also often be applied just along rows or columns. Take the `np.mean` function. The grand mean of all values is given by:

In [25]:
np.mean(data) # grand mean

6.0

Taking the mean along `axis=0` gives the column mean. This is the mean of all the rows (dimension 0) in each column. The result is one value for each column.

In [26]:
np.mean(data,axis=0)

array([ 9.,  6.,  3.])

To obtain the mean of each row, use the `axis = 1` option.

In [27]:
np.mean(data,axis=1)

array([ 4.,  5.,  7.,  8.])

#### Exercise:  ANOVA (the hard way)

Calculate the sum of squares within groups and the sum of squares between groups. Use these values to caluculate F in an analysis of variance.

__Hint:__ the `np.tile` function may be useful.

In [28]:
arr = [1,2,3]
np.tile([1,2,3],(3,1))

array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])

The stats library can be used to fund critical values of the F distribution (compare with tables).

In [29]:
stats.f.ppf(0.95,4,3)

9.1171822532464173