# Intro to Numpy

## Objectives

- Make new NumPy arrays
- Perform arithmetic with arrays
- Use common array methods and NumPy functions
- Filter arrays using boolean arrays and "fancy" indexing

In [1]:
# add the "as np" so we don't have to type "numpy" each time
import numpy as np

## Making Arrays

At the beginning of the Python lesson we loaded data into NumPy from a text file.
Let's look at a few other ways to make arrays.
If you've got data in Python lists those can be converted to arrays:

In [3]:
# list like yesterday
odds = [1, 3, 5, 7]
odds

[1, 3, 5, 7]

In [5]:
# pass the list to np.array
odds_arr = np.array(odds)
odds_arr

array([1, 3, 5, 7])

Lists and arrays have some similarities:

In [9]:
print('index', odds[1], odds_arr[1])
print('slice', odds[1:3], odds_arr[1:3])
print('length', len(odds), len(odds_arr))

index 3 3
slice [3, 5] [3 5]
length 4 4


In [10]:
for x in odds:
    print(x)
for x in odds_arr:
    print(x)

1
3
5
7
1
3
5
7


But some things are different:

### Question

How would you add 1 to every number in the `odds` list and make a new list with those (even) numbers? (You don't need to write code, isntead talk about how you'd do this.)

In [14]:
evens = []
for x in odds:
    y = x + 1
    evens.append(y)
evens

[2, 4, 6, 8]

With arrays this is simpler:

In [15]:
even_arr = odds_arr + 1
even_arr

array([2, 4, 6, 8])

Now suppose we want to add these odd and even numbers together. With lists that's another loop, but with arrays:

In [16]:
even_arr + odds_arr

array([ 3,  7, 11, 15])

Suppose we did need to loop over an array to perform a complex calculation and then store the result in a new array. We can't append to arrays, they have a fixed size.

In [17]:
odds_arr.append(9)

AttributeError: 'numpy.ndarray' object has no attribute 'append'

We need to create the new array with the appropriate size and shape. (By the way, you can check the size and shape of an array with the `.size` and `.shape` attributes.)

Functions for creating new arrays of same given shape are `np.empty`, `np.ones`, `np.zeros`, and `np.arange`. The one thing you *have* to tell these functions is how big the array needs to be.

In [18]:
np.ones(5)

array([ 1.,  1.,  1.,  1.,  1.])

In [19]:
np.zeros((3, 3))

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

### Question

Why did I put an extra pair of parentheses in the call to `np.zeros`?

### Exercise

Which shape value is for the number of rows in the array, and which is for the number of columns?

In [20]:
np.zeros((4, 2))

array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])

`np.arange` doesn't take a shape, instead it takes start, stop, and step values to make an array of numbers that cover some range:

In [22]:
r = np.arange(0, 20, 2)
r

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

See even more functions for making arrays at http://docs.scipy.org/doc/numpy/reference/routines.array-creation.html.

## Array Methods and NumPy Functions

We're going to load some specialy prepared precipitation data derived from the `precip_yearly.csv` file. It's saved in a special cross-platform NumPy binary format. Learn more about NumPy's special binary array storage format at http://docs.scipy.org/doc/numpy/reference/routines.io.html.

In [41]:
store = np.load('mean_ca_precip.npz')
years = store['years']
precip = store['precip']

In [29]:
print(years)
print(precip)

[1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014]
[ 17.09596591  21.51670391  26.1965896   20.63826347  21.75160714
  23.09702381  40.62267442  19.65480263  49.3564497   34.8197093
  35.98443243  48.1316129   27.39222222  28.57650794  20.21411765
  23.19271676  30.31596386  22.15284672  37.73311258  41.68419118
  18.8287218   22.28330189  23.02771812  28.7423913   38.86439394
  21.24044643  20.99722772  15.34122449]


Arrays have methods attached to them for doing calculations and transformations with data. For example you can get the mean of an array (as seen with Pandas):

In [30]:
precip.mean()

27.837604993879033

And the NumPy package (`np`) has functions that work on arrays, e.g. to calculate a logarithm:

In [31]:
np.log(precip)

array([ 2.83884252,  3.06882956,  3.26562923,  3.0271468 ,  3.07968765,
        3.13970377,  3.70432639,  2.97832172,  3.89906845,  3.55018359,
        3.58308641,  3.87393919,  3.31025911,  3.35258498,  3.00638125,
        3.1438383 ,  3.41167443,  3.09796601,  3.63053803,  3.73012195,
        2.93538346,  3.1038376 ,  3.13669863,  3.35837308,  3.66007851,
        3.05590721,  3.04439042,  2.73054362])

### Exercise

Use IPython's tab completion feature to compare available array methods (e.g. `mean_precip.mean()`) with available NumPy functions (e.g. `np.log`). Do you notice any differences?

There are a massive number of routines in NumPy. For more info see http://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs and http://docs.scipy.org/doc/numpy/reference/routines.html.

## Boolean Indexing

Let's examine which years had more than average precipitation and which had below average.
First we'll need the average precipitation:

In [32]:
avg = precip.mean()

Then we can compare that average to the values in the `precip` array:

In [33]:
precip > avg

array([False, False, False, False, False, False,  True, False,  True,
        True,  True,  True, False,  True, False, False,  True, False,
        True,  True, False, False, False,  True,  True, False, False, False], dtype=bool)

The comparison creates a new array of boolean (True/False) values that is True where the precipitation that year was above average and False where the precipitation was below average.
We can use the boolean arrays to pull values out of other arrays:

In [34]:
above = precip > avg
print(precip[above])
print(years[above])

[ 40.62267442  49.3564497   34.8197093   35.98443243  48.1316129
  28.57650794  30.31596386  37.73311258  41.68419118  28.7423913
  38.86439394]
[1993 1995 1996 1997 1998 2000 2003 2005 2006 2010 2011]


You can use boolean indexing without first assigning them to an array:

In [35]:
years[precip < avg]

array([1987, 1988, 1989, 1990, 1991, 1992, 1994, 1999, 2001, 2002, 2004,
       2007, 2008, 2009, 2012, 2013, 2014])

But if you do have a boolean array and what its opposite you can use the `~` operator:

In [36]:
years[~above]

array([1987, 1988, 1989, 1990, 1991, 1992, 1994, 1999, 2001, 2002, 2004,
       2007, 2008, 2009, 2012, 2013, 2014])

We can also use boolean indexing to make assignments to arrays:

In [39]:
arr = np.random.normal(size=(4, 4))
arr

array([[ 0.52834365, -0.83129062,  0.60736072, -1.02060637],
       [-1.70055145,  0.39537391, -2.60654771,  0.18089374],
       [-0.10792992,  0.29998574,  0.13354803,  0.28557201],
       [ 0.8867229 , -0.58960566, -1.14392114,  0.7629729 ]])

In [40]:
arr[arr < 0] = 0
arr

array([[ 0.52834365,  0.        ,  0.60736072,  0.        ],
       [ 0.        ,  0.39537391,  0.        ,  0.18089374],
       [ 0.        ,  0.29998574,  0.13354803,  0.28557201],
       [ 0.8867229 ,  0.        ,  0.        ,  0.7629729 ]])

### Exercise

Write a function that clips data in an array to given low and high values.
For example, calling `clip(arr, 0, 1)` should return an array where values lower than zero have been replaced with 0 and values higher than one have been replaced with 1.