# Arrays in Python using the Numpy Module
The power of python lies within the use of community built modules that lay on top of the core python programming language. These modules are abstract code designed to maximize the usefulness of OOP tools by providing advanced functions and methods to the objects in a program.

Numpy is a key module used in meteorology as it deals well (and quickly) with array manipulations. This notebook will introduce you to the numpy module and many of its useful functions, as well as continuing to develop your python language skills.

First thing is to copy over a text file that we will use throughout todays lecture. It is located in <span style="font-family:Courier">**/home/kgoebber/met330/python**</span> and is called <span style="font-family:Courier">**nao.txt**</span>.

## Importing modules
In order to use a module within your program you need to first tell the computer to bring it in! The way that this happens is with an import statement. However, it is common to give the module a handle, so that any similar named functions (e.g., a maximum function) that are in different modules, don't overlap!

An example of an import statement for numpy is below. The common practice is to call the handle for the numpy module 'np'.

In [1]:
import numpy as np

If you have run the above cell, it appears that nothing has happened, but actually a lot has. You have just imported all of the functions that are contained within the numpy module, sitting in the background ready for you to get to work on some data!

## Reading in data to Python
It is difficult to easily talk about reading data from a file into a Python program simply because there are so many ways to do so! In this example we'll use the numpy module to help us read in an ASCII (plain text) file that we can then use and manipulate to calculate or graph stuff to answer scientific questions!

In [2]:
nao_data = np.loadtxt('nao.txt', skiprows=1)

In the command above you are using a function (loadtxt) from the numpy module (np) to read in a file (nao.txt), skipping the first row (skiprows=1), and upacking the data.

How can I find out more about this function? Well, there are many ways, but two of the easiest would be to 1) google <span style="font-family:Courier"> **numpy loadtxt** </span> or 2) in a new cell below type <span style="font-family:Courier"> **np.loadtxt?** </span>. The second way will bring up the manual pages from the numpy module to a dialogue box at the bottom of your web browser.

In [3]:
# So what does the data look like?
print(nao_data)

[[  1.95000000e+03   9.20000000e-01   4.00000000e-01  -3.60000000e-01
    7.30000000e-01  -5.90000000e-01  -6.00000000e-02  -1.26000000e+00
   -5.00000000e-02   2.50000000e-01   8.50000000e-01  -1.26000000e+00
   -1.02000000e+00]
 [  1.95100000e+03   8.00000000e-02   7.00000000e-01  -1.02000000e+00
   -2.20000000e-01  -5.90000000e-01  -1.64000000e+00   1.37000000e+00
   -2.20000000e-01  -1.36000000e+00   1.87000000e+00  -3.90000000e-01
    1.32000000e+00]
 [  1.95200000e+03   9.30000000e-01  -8.30000000e-01  -1.49000000e+00
    1.01000000e+00  -1.12000000e+00  -4.00000000e-01  -9.00000000e-02
   -2.80000000e-01  -5.40000000e-01  -7.30000000e-01  -1.13000000e+00
   -4.30000000e-01]
 [  1.95300000e+03   3.30000000e-01  -4.90000000e-01  -4.00000000e-02
   -1.67000000e+00  -6.60000000e-01   1.09000000e+00   4.00000000e-01
   -7.10000000e-01  -3.50000000e-01   1.32000000e+00   1.04000000e+00
   -4.70000000e-01]
 [  1.95400000e+03   3.70000000e-01   7.40000000e-01  -8.30000000e-01
    1.3400

In [4]:
# More elegant printing
for row in range(64):
    print(('%4d '+12*'%5.1f') % tuple(nao_data[row,:]))

# The first part of the print statement is a 
#   format statement (more on that later).
# Then we have to convert the 1D array to a tuple for the print function
#   this basically makes the 1D array into a simple list that can then
#   be printed.

1950   0.9  0.4 -0.4  0.7 -0.6 -0.1 -1.3 -0.1  0.2  0.8 -1.3 -1.0
1951   0.1  0.7 -1.0 -0.2 -0.6 -1.6  1.4 -0.2 -1.4  1.9 -0.4  1.3
1952   0.9 -0.8 -1.5  1.0 -1.1 -0.4 -0.1 -0.3 -0.5 -0.7 -1.1 -0.4
1953   0.3 -0.5 -0.0 -1.7 -0.7  1.1  0.4 -0.7 -0.3  1.3  1.0 -0.5
1954   0.4  0.7 -0.8  1.3 -0.1 -0.2 -0.6 -1.9 -0.4  0.6  0.4  0.7
1955  -1.8 -1.1 -0.5 -0.4 -0.3 -1.1  1.8  1.1  0.3 -1.5 -1.3  0.2
1956  -0.2 -1.1 -0.1 -1.1  2.2  0.1 -0.8 -1.4  0.2  0.9  0.5  0.1
1957   1.1  0.1 -1.3  0.5 -0.8 -0.7 -1.2 -0.6 -1.7  1.3  0.7  0.1
1958  -0.5 -1.1 -2.0  0.4 -0.2 -1.4 -1.7 -1.6 -0.1  0.2  1.6 -0.7
1959  -0.9  0.7 -0.1  0.4  0.4  0.4  0.7  0.1  0.9  0.9  0.4  0.4
1960  -1.3 -1.9 -0.5  1.4  0.5 -0.2  0.3 -1.4  0.4 -1.7 -0.5  0.1
1961   0.4  0.5  0.6 -1.6 -0.4  0.9 -0.4  0.9  1.2  0.5 -0.6 -1.5
1962   0.6  0.6 -2.5  1.0 -0.1  0.2 -2.5  0.1 -0.4  0.4 -0.2 -1.3
1963  -2.1 -1.0 -0.4 -1.4  2.2 -0.4 -0.8 -0.6  1.8  0.9 -1.3 -1.9
1964  -0.9 -1.4 -1.2  0.4  0.5  1.3  1.9 -1.8  0.2  0.7 -0.0 -0.1
1965  -0.1

## Working with Data with Numpys help

While seeing the data printed out is great, its hard to determine how the data is actually stored. Luckily for us, numpy has some great functions that we can use to investigate the arrays. One of the most helpful is the function called <span style="font-family:Courier">**shape**</span>. Execute the following link to see what the command yields.

In [5]:
nao_shape = np.shape(nao_data)
print(nao_shape)
# or
print(nao_data.shape) # This is the object-oriented programming method

(64, 13)
(64, 13)


The <span style="font-family:Courier">**shape**</span> function gives us a tuple value that tells us about the shape of the array [row,col] (or the number of index values for each element of the array). In this case how many rows and columns. So similar to Fortran, numpy reads the data into an array where the first element is the row index and the second is the column index.

But we don't have to print out the whole array at one time. We can slice and dice the data any way that we desire using an array call and telling the program what element(s) we want out of our array.

### NOTE: Python uses zero-based arrays
This is done because the number line actually goes from zero to nine [0,1,2,3,4,5,6,7,8,9] and **_not_** [1,2,3,4,5,6,7,8,9,10]

So.... <br> <span style="font-family:Courier">
1st element -> 0 <br>
2nd element -> 1 <br>
3rd element -> 2 <br>
. <br>
. <br>
. <br>
nth element -> n-1 <br> </span>

In [6]:
# So lets get just all of the years, which are in the first column
# We specify all rows with the colon (:) and the particular row with 
#  the appropriate element number (0)
# naodata[rows,cols]
print(nao_data[:,0])

[ 1950.  1951.  1952.  1953.  1954.  1955.  1956.  1957.  1958.  1959.
  1960.  1961.  1962.  1963.  1964.  1965.  1966.  1967.  1968.  1969.
  1970.  1971.  1972.  1973.  1974.  1975.  1976.  1977.  1978.  1979.
  1980.  1981.  1982.  1983.  1984.  1985.  1986.  1987.  1988.  1989.
  1990.  1991.  1992.  1993.  1994.  1995.  1996.  1997.  1998.  1999.
  2000.  2001.  2002.  2003.  2004.  2005.  2006.  2007.  2008.  2009.
  2010.  2011.  2012.  2013.]


In [7]:
# What if we wanted all of the nao values for the first year?
# Put in the proper array call below to get just the 1950 data.
# nao_data[rows,cols]
print(nao_data[0,1:])

[ 0.92  0.4  -0.36  0.73 -0.59 -0.06 -1.26 -0.05  0.25  0.85 -1.26 -1.02]


## Array range operation
We can also specify a range of values to use, instead of all or just a single element. However, we must note the Python behavior for this action (which we already saw in a previous lecture with the range function. To summarize:

<span style="font-family:Courier">
[0:2] is 0, 1    ->  mathematically we would write the set of numbers as [0,2) <br>
[0:3] is 0, 1, 2 ->  mathematically we would write the set of numbers as [0,3) <br>
[1:4] is 1, 2, 3 ->  mathematically we would write the set of numbers as [1,4) <br>
</span>

This can be used in one or both (or for however many elements) that the array contains.

So lets construct an array call that looks for the months June, July, and August for 1960 - 1980

In [8]:
# This example extracts 1950 - 1959 for all months
print(nao_data[0:10,1:])
# Note: Not putting a number before or after the colon will result 
#       in using the first or last values in the dataset, respectively.

[[ 0.92  0.4  -0.36  0.73 -0.59 -0.06 -1.26 -0.05  0.25  0.85 -1.26 -1.02]
 [ 0.08  0.7  -1.02 -0.22 -0.59 -1.64  1.37 -0.22 -1.36  1.87 -0.39  1.32]
 [ 0.93 -0.83 -1.49  1.01 -1.12 -0.4  -0.09 -0.28 -0.54 -0.73 -1.13 -0.43]
 [ 0.33 -0.49 -0.04 -1.67 -0.66  1.09  0.4  -0.71 -0.35  1.32  1.04 -0.47]
 [ 0.37  0.74 -0.83  1.34 -0.09 -0.25 -0.6  -1.9  -0.44  0.6   0.4   0.69]
 [-1.84 -1.12 -0.53 -0.42 -0.34 -1.1   1.76  1.07  0.32 -1.47 -1.29  0.17]
 [-0.22 -1.12 -0.05 -1.06  2.21  0.1  -0.75 -1.37  0.24  0.88  0.51  0.1 ]
 [ 1.05  0.11 -1.26  0.49 -0.79 -0.72 -1.19 -0.55 -1.66  1.32  0.73  0.12]
 [-0.54 -1.06 -1.96  0.37 -0.24 -1.38 -1.73 -1.56 -0.07  0.16  1.64 -0.7 ]
 [-0.87  0.68 -0.15  0.36  0.39  0.4   0.74  0.06  0.88  0.89  0.41  0.44]]


## Common Functions in numpy
Do you every compute averages? sums? max? min?

Numpy can help!

Mathematical Functions: http://docs.scipy.org/doc/numpy/reference/routines.math.html <br>

Statistical Functions: http://docs.scipy.org/doc/numpy/reference/routines.statistics.html <br>

In [9]:
# Let's compute the average value of NAO for the month of January.
# January is index value 1 (column 2)
jan_data = nao_data[:,1]
jan_mean = np.mean(jan_data)
print(jan_mean)

0.0625


In [10]:
# Now let's compute the max NAO value in the dataset
# NOTE: we want to exclude the year, so we want to ignore the 
#       first column and begin with index value 1 (for column 2)
nao_values = nao_data[:,1:13]
max_nao_value = np.max(nao_values)
print(max_nao_value)

3.04


In [11]:
# So now we have a value, but when did this occur?
row,col = np.where(nao_data == 3.04)
print(row[0])
print(col[0])

28
11


In [12]:
# So an NAO value of 3.04 occured in Nov. 1978 as row 28 == 1978 
#   and column 11 refers to Nov.
print(nao_data[row[0],0])
print(nao_data[row[0],col[0]])

1978.0
3.04


In [13]:
# Find the minimum that occurs between 1990 - 2010 and where it occurs
min_nao_value = np.min(nao_values[40:50,:])
print(min_nao_value)
print(np.where(nao_data == min_nao_value))

-3.18
(array([43]), array([7]))


### Find a minimum (or maximum) with missing values
This is not a trival problem. If you missing value is -999 or +999, the max and min function will find those values, even though that is not what you were intending. What needs to happen is to remove those values and leave a blank (or masked value)

The numpy function for a masked array is
```python
np.ma.masked_array
```
See the cell below to see how this works for the NAO data, where we need to remove the -9.99 values with blanks. Note: The blanks appear as dashes when printed out.

In [14]:
# Find the minimum to occur in the NAO dataset that is not a -9.99
# This is not as trivial as the previous problem, you'll need a 
#  masked array to accomplish this task

# What is a masked array?
print(nao_values[63,:])
ma_nao = np.ma.masked_array(nao_values, mask=(nao_values==-9.99))
print(ma_nao[63,:])

[ 0.35 -0.45 -1.61  0.69  0.57  0.52  0.67  0.97  0.24 -9.99 -9.99 -9.99]
[0.35 -0.45 -1.61 0.69 0.57 0.52 0.67 0.97 0.24 -- -- --]


In [15]:
print(np.min(nao_values[63,:]))

-9.99


In [16]:
print(np.min(ma_nao[63,:]))

-1.61
