# Arrays and Matrices

Now we know that lists are great at holding a lot of information. This is very useful, for instance, if we have text data (a list of names, sentences from a story, etc.). It can also be useful for doing math. Instead of doing math with single numbers, sometimes we want to do math with many numbers at once. This can be accomplished with a list, but the problem is, we have to loop through the list to make it work. And, if you have long lists (tens of thousands of elements) then it can be pretty slow too. 

If we have a python list and want to add 5 to every number in that list, we can do it 2 different ways: 
```python
#looping way
mylist = range(1,51)
newlist = []

for i in mylist:
    newlist.append(i+5)

#list comprehension
newlist = [x+5 for x in mylist]

```
While neither solution is still pretty brief, it feels like a lot of work for just adding 5 to some numbers.

The folks who created **Numpy** are trying to solve this problem. This package allows you to do all kinds of math on many numbers at once in a very concise way: 

```python
import numpy
#arange is the same as range, but for numpy arrays
myarray = numpy.arange(1,51)

#the numpy way
newarray = myarray + 5
```


Notice we don't need any loops! We can treat `myarray` as a single number. Saying `myarray + 5` adds 5 to every number in it. Let's see an actual example to play with:

In [132]:
import numpy

myarray = numpy.arange(1,51)
print type(myarray)
print len(myarray)
print myarray[:10]

newarray = myarray+5
print newarray[:10]


<type 'numpy.ndarray'>
50
[ 1  2  3  4  5  6  7  8  9 10]
[ 6  7  8  9 10 11 12 13 14 15]


This works for other math operations too:

In [None]:
shortarray = numpy.arange(1,11)
print shortarray
print shortarray - 1
print shortarray * 2
print shortarray / 2
print shortarray **2

If we want to do more complicated things, like square root, we can't use the built-in `sqrt` function on numpy arrays:

In [None]:
sqrt(shortarray) #error!

Don't worry, `numpy` has all the same functions that know how to work with arrays. We just have to use the function from the numpy package, not the built-in python one. You can find a list of them in their [documentation](http://docs.scipy.org/doc/numpy/reference/routines.math.html)

In [None]:
print numpy.sqrt(shortarray)
print numpy.abs(shortarray*-1)


There are also functions for summarizing the numbers withinin an array. For instance, we can take the sum, mean, median, or product of an array. This example also shows how to make an array of any numbers you want.

In [None]:
#we can create an array of numbers like this: 
myarray = numpy.array([3,4,7,9,-4,6,15])

print numpy.mean(myarray) #mean
print numpy.sum(myarray) #add them up
print numpy.median(myarray) #median
print numpy.prod(myarray) #multiply all of them



Since we have to use the functions so much, by convention people rename `numpy` to `np`: 

In [None]:
import numpy as np
np.mean(myarray)

### Differences between numpy arrays and Python lists

So far arrays seem just like Python lists, except you can do math with them. But, there are a couple of big differnces. 

First, Python lists can have a mixture of different kinds of data (text, numbers, more lists, etc.). Numpy arrays can only contain the same datatype. So, if you want integers, you have to have all integers. If you want floats, it has to be all floats. They can even hold strings, although there isn't much point most of the time. 

When you create an array, numpy will try to guess what the datatype is, based on what you input. You can check what the datatype is by using the `dtype` property of your array. When you create the array, you can also use the `dtype` argument to tell it what kind of data you want. Numpy has several different types of integers and floating point numbers <http://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html>. You only need to care about this if you need huuuge numbers, or if you have huuuge arrays and want to save memory. 

In [None]:
myarray = np.array([1,3,4,5])
print myarray.dtype #int64 (integers, 64 bit)

myotherarray = np.array([4,7,6,0,3,2],dtype='float64')
print myotherarray.dtype

The other big difference is that arrays (unlike Python lists) can have more than 1 dimension. So, you can have rows and columns! This means you can have entire tables of numbers you can do math on, as easily as you can do single numbers. If you have rows and columns, you call it a "matrix". Numpy can also have arrays with 3,4,5, or 100 dimensions. Those are called "N-Dimensional arrays" or `ndarray`. 

To create a matrix, you can specify each row inside of square brackets `[]` like a list. You just have to enclose them all in another set of brackets. Think of it as 1 list, where each element is another list, which is a row in your matrix. Below we create a matrix that looks like this: 

|1|2|3|
|---|---|---|
|4|5|6|
|7|8|9|

In [None]:
mymatrix = np.array( [ [1,2,3], [4,5,6], [7,8,9] ] )
print mymatrix

### Indexing

Indexing from numpy arrays is similar to python lists. You use square brackets `[]`, and you can either ask for a single position, or a range of positions (slicing). Here are several examples. These are all exactly the same way you'd index a Python list. 

In [None]:
myarray = np.array([0,1,2,3,4,5,6,7,8,9])

print myarray[0]
print myarray[0:3]
print myarray[:4]
print myarray[3:]
print myarray[::2]
print myarray[::-1]

### Indexing Matrices

What if you have a matrix? How do you pull out information? Well, you have to specify both the rows and the columns that you want to pull from. You can specify a single index or a range, just like in the examples above. The only difference is that you specify the row index, then a comma, then the column index. 

mymatrix looks like this: 

|1|2|3|
|---|---|---|
|4|5|6|
|7|8|9|

```python
mymatrix = np.array([[1,2,3],[4,5,6],[7,8,9]])

mymatrix[0,0] #1 - first row, first column
mymatrix[2,1] #5 - third row, second column
```
You can specify a range of indexes for the rows or columns. If you want to include all numbers from a row or a column, just include a colon `:`

```python
mymatrix[0,:] #[1,2,3] - entire first row (first row, all columns)
mymatrix[:,0] #[1,4,7] - entire first column (all rows, first column)

mymatrix[:2,:] # first 2 rows 
```
Now you try. See if you can pull the following information from `mylist`

* The middle column
* The last 2 rows
* The diagonal items (you'll need 3 different statements)




In [None]:
mymatrix = np.array([[1,2,3],[4,5,6],[7,8,9]])




### Boolean indexing

Another really handy thing that numpy arrays can do is "boolean" indexing. This allows you to pull out information from your array based on its *value*, instead of its position. 

First, let's see what happens if we take an array and use it as part of a conditional statment: 

In [None]:
myarray = np.arange(10)
print myarray
print myarray>4

What's going on here? Notice that this creates a new array, and each item is either `True` or `False`, so the datatype is boolean. Notice that the boolean array has 10 elements, just like `myarray`. Also notice that for every element in `myarray` that's greater than 4, the correponding element in the other array is `True`. 

If we use a boolean array to index your array, then numpy will pull only the elements where the boolean array is `True`, like so:

In [None]:
bool_array = myarray>4

myarray[bool_array]

Notice that we get a new array with only 5 elements, and all those elements are greater than 4. Usually, we don't create a new variable like `bool_array`, and we put our conditional statement inside the brackets.

Also try changing the conditional statement (using `>,<,==,!=`) to see what results you get.

In [None]:
myarray[myarray>4] #give me only the elements of myarray that are greater than 4. 

It may seem funny to use `myarray` twice like this, but there's a good reason. The boolean array you put inside the brackets doesn't have to come from the original array. You can use 1 array to index another one. 

Let's imagine we have an array of reaction times (RTs) from a Psychology experiment. We also have another array that specifies the Block number from the experiment. We know that the first block is considered practice, so we only want the RTs from the non-practice blocks. So, we index our `RT` array using the values form `Block` to only include where Block>1:

In [None]:
Block = np.repeat([1,2,3,4],10) #another handy function, creates array with 1,1,1,1...2,2,2,2..
print Block
RT = np.random.rand(40) #just fake data
print RT
print len(RT) #40


#boolean indexing. Read it like this:
#'Give me all the elements of RT, wherever Block is greater than 1'
realRT = RT[Block>1] 

print len(realRT) #30

So we can use any boolean array for indexing. The only requirement is that it's the same size as the array you are indexing from. If `Block` was shorter or longer than `RT`, the cell above would not work. Now that you know you can use any array, then the statment: 
```python
myarray[myarray>4]
```
Should look a little less funny. 

### Filling arrays

Numpy has some convenient functions for creating matrices of a certain size. For instance, you can create matrices of random numbers using `numpy.random.rand`. You just need to specify the number of rows and columns, and it will make a matrix of that size, where each element is a decimal number between 0 and 1. 

In [None]:
randoms = np.random.rand(3,4)

print randoms

You can always figure out the number of rows and columns of a matrix (or array) using the `shape` property. This will give you a tuple that tells you the number of rows, columns, and other dimensions. 

In [None]:
randoms.shape #(3,4)

The `shape` for an array will have only 1 dimension 

In [None]:
myarray.shape #(10,) 10 rows, no columns

### `zeros`,  `ones`, and `empty`

It is often useful to generate arrays of all zeros, all ones, or just empty arrays with no numbers that you fill later. Numpy has a function of each of these. You have to specify the number of rows and columns between two square brackets: 

In [None]:
print np.zeros([5,3]) #5 rows, 3 columns
print np.ones([3,2]) #3 rows, 2 columns
print np.empty([4,4]) #4 rows, 4 columns


Notice the empty array isn't really empty, The numbers you see are in scientific notation. They are reaaaly small numbers. This is because the array has to be filled with *something*, so numpy just fills it with very small numbers close to 0. 

### Changing values using indexing

Just like in Python lists, we can use indexing to both pull information from arrays, and to *change* information in them. If we specify a single number, it will set multiple elements to that single number: 

In [None]:

allones = np.ones([5,5])
print allones

print ""
allones[:,0] = 3 #change the first column to all 3's

print allones



What if we want to change multiple elements to different values? We can do it just fine, as long as we specify the appropriate numbers. Using `allones` above, let's set the first column equal to 10,11,12,13,14. 

In [None]:
allones[0,:] = [10,11,12,13,14]
print allones

This only works if we give it the right amount of numbers. If we want to change the whole row, we need 5 numbers. If we specify less than 5, it doesn't work: 

In [None]:
allones[0,:] = [10,11,12,13] #error! we need 5 numbers, but we only gave it 4
print allones

A nice shortcut if you want to change *all* elements of your matrix is to just specify 1 colon `:`.

In [None]:
allones[:] = 2 #change everything to 2

print allones


### Missing values

Sometimes you want to specify missing data in an array. Numpy uses the value NaN ("Not a Number") to specify missing data. We can take an empty array and fill with all missing data using indexing, like so:  

In [None]:
emptymat = np.empty([4,4]) 

emptymat[:] = np.nan #notice we say np.nan, not just "nan"
print emptymat

### Changing values based on boolean indexing

Now let's put it all together. We'll create an array of random integers between 0 and 10 using `numpy.random.randint` (this takes 3 arguments: a starting value, and ending value, and the dimensions of the matrix you want to generate). 

In [None]:
randints = np.random.randint(0,10,[4,4])
print randints

Now let's change all numbers that are greater than 5 to zero. 

In [None]:
print randints
print ""

randints[randints>5] = 0

print randints

### Replicating arrays and matricies

Sometimes we want to start with 1 array and repeat it a certain number of times. Notice I did this in the example above with the array `Block`. I started with a simple array, `[1,2,3,4]` and I wanted to repeat each element 10 times, to reflect that each block has 10 trials in it. I was able to do this with `numpy.repeat`. It takes 2 arguments: an array, and the number of times to repeat the elements. 

In [None]:
print np.repeat([1,2,3],5) #repeat each number 5 times



This works for matrices too. Let's repeat `mymatrix` from above. If you have a matrix, you'll probably want to include a third argument, `axis`. This tells you which dimension you want to repeat (the rows or columns). rows correspond to the 0 axis, columns are 1. This is because when you're indexing, you say `mymatrix[row,column]`, or if you do `mymatrix.shape` then it will say `(3,3)`. So, the number in position 0 corresponds to the number of rows, and in position 1 it corresponds to the number of columns. 

In [None]:
print np.repeat(mymatrix,3) #if we exclude axis, then it flattens it to an array
print np.repeat(mymatrix,3,axis=0) #repeat each row 3 times
print np.repeat(mymatrix,3,axis=1) #repeat each column 3 times


Sometimes you want to repeat an entire matrix a certain amount of times, and arrange them in a grid. For this you want `np.tile`. You give it a matrix, then a tuple specifying how many times you want to repeat it row-wise and column-wise. Later you'll learn how this is useful for images.  

In [None]:
print np.tile(mymatrix,(2,3)) #repeat 2 times along the rows, 3 times along the columns

print np.tile(mymatrix,(3,1)) #repeat 3 times along the rows only

### Reshaping arrays

Let's say you want a matrix that's 10x10, and you want it to contain the numbers 1 through 100. You could type this by hand, but it would be super annoying! We know that `numpy.arange` can produce the numbers 1 through 100, but it produces just a flat, 1-dimesional array. Don't fret, we can use the `numpy.reshape` function to take that flat array and reshape it into a matrix. We just specify how many rows and how many columns we want it to be, using a tuple:


In [136]:
x = np.arange(1,101)
print x.shape #(100,)

print np.reshape(x,(10,10)) #easy peasy



(100,)
[[  1   2   3   4   5   6   7   8   9  10]
 [ 11  12  13  14  15  16  17  18  19  20]
 [ 21  22  23  24  25  26  27  28  29  30]
 [ 31  32  33  34  35  36  37  38  39  40]
 [ 41  42  43  44  45  46  47  48  49  50]
 [ 51  52  53  54  55  56  57  58  59  60]
 [ 61  62  63  64  65  66  67  68  69  70]
 [ 71  72  73  74  75  76  77  78  79  80]
 [ 81  82  83  84  85  86  87  88  89  90]
 [ 91  92  93  94  95  96  97  98  99 100]]


This is great, but what if we wanted to the numbers to count down the columns, instead of across rows? We can just change the argument `order` to the value of `F`. The `F` stands for "Fortran", because this mirrors how the Fortran programming language produces matrices. It's OK, this is not intuitive, I don't expect you to know that off the top of your head!

In [140]:
print np.reshape(x,(10,10),order='F')

[[  1  11  21  31  41  51  61  71  81  91]
 [  2  12  22  32  42  52  62  72  82  92]
 [  3  13  23  33  43  53  63  73  83  93]
 [  4  14  24  34  44  54  64  74  84  94]
 [  5  15  25  35  45  55  65  75  85  95]
 [  6  16  26  36  46  56  66  76  86  96]
 [  7  17  27  37  47  57  67  77  87  97]
 [  8  18  28  38  48  58  68  78  88  98]
 [  9  19  29  39  49  59  69  79  89  99]
 [ 10  20  30  40  50  60  70  80  90 100]]


### Just the beginning

This is just scratching the surface of what numpy can do. We could literally spend a whole term just focusing on numpy. I just want to familiarize you with the basics. You'll learn later that numpy is used in conjunction with a lot of different packages, and is particularly relevant for plotting and for images. Numpy is a fundamental package which lots of other packages depend on, so it's good to know the basic functionality. 

This also introduces you to the concept of thinking in terms of arrays and matrices. This is a very powerful way of doing math. You can literally take thousands or millions of numbers at once, and treat them just like 1 number. If you are familiar with Matlab, this is now Matlab does most things by default.