# Numpy
Duncan Callaway

This notebook gives a quick tour of `numpy`.  My objective is to give you a sense of how arrays in `numpy` work so that you can then contrast them to Pandas dataframes.  

A numpy array is a grid of elements *of the same data type*.  (Pandas is like numpy but handles different data types.)  The beauty of numpy, like matlab, is that you don't need to specify what sort of data type you're working with -- it will guess for you.

In [1]:
import numpy as np
a = np.array([[1,2,3], [4,5,6]])
a

array([[1, 2, 3],
       [4, 5, 6]])

That's a numpy array.  

Q: What have we seen before that looks a lot like this?

A: a nested list.  

A big difference, though, is that we can index numpy arrays more cleanly can we can lists:

In [2]:
a[1,1]

5

What is the first index entry calling?  If you're not sure, play around with `a` to see if you can figure it out.

Answer: The row.

And so the second entry is calling the column.  Note this is the opposite of the order we used when we indexed into the nested list.  

What happens when we try to add a different data type to the array?

In [3]:
a[1,1] = 'g'
a

ValueError: invalid literal for int() with base 10: 'g'

We can't make the assignment.  However if we do it in the original definition, we can

In [4]:
a = np.array([[1,2,3], [4,5,'g']])
a

array([['1', '2', '3'],
       ['4', '5', 'g']], dtype='<U21')

We can make the assignment -- but python interprets the data type as '<U21', which is a mixed data type.

What happens if we try to do mathematical operations on this type?

In [5]:
a[1,0]+1

TypeError: can only concatenate str (not "int") to str

Doesn't work.  Even though the specific entry was *originally* numeric, it stored it as a str to make it consistent with storing the letter 'g'.

**Ok.  What's a slice?**

It's a portion of the array that we call with expressions involving a colon `:` in the index locations.

In [6]:
a[0:1, 0:2]

array([['1', '2']], dtype='<U21')

Perhaps you remember that the indexing is not *inclusive* meaning it does not include data associated with the end index.  To get the last value you can just leave the position in the slice syntax empty:

In [7]:
a[1:, 1:]

array([['5', 'g']], dtype='<U21')

You can also put a number after a *second* colon in the slice syntax.  Negative numbers slice backwards.  

In [8]:
a[:,1::-1]

array([['2', '1'],
       ['5', '4']], dtype='<U21')

Numbers other than 1 (or -1) after that second colon will pull data in steps, for example this skips the second row:

In [9]:
a[:,0::2]

array([['1', '3'],
       ['4', 'g']], dtype='<U21')