# Introduction to NumPy

Conventionally, numpy is imported as follows:

In [1]:
import numpy as np

Adding `...as np` to the import just allows us to access numpy's functionality using `np...` instead of `numpy...`, just convenient and it's a very common convention.

## Creating Arrays

### Manually Creating Arrays

You create arrays manually using normal Python lists

In [2]:
a = np.array([0, 1, 2, 3])
print(a)

[0 1 2 3]


In MATLAB, you use semicolons to specify rows.

```matlab
a = [0, 1, 2; 3, 4, 5];
```

With numpy, you use a list for each row. A 2D array is constructed using a *list of lists*.

In [3]:
b = np.array([[0, 1, 2], # line break is not necessary, just makes it easy to see shape of the array
              [3, 4, 5]])
print(b)

[[0 1 2]
 [3 4 5]]


Arrays have several properties that are important:

In [4]:
print(b.ndim) # number of dimensions
print(b.shape) # number of elements along each dimension (e.g. num_rows, num_cols)
print(b.size) # total number of elements

2
(2, 3)
6


### Functions for Creating Arrays

Often we'll use built-in functions to create arrays instead of creating them manually.

Function names should be somewhat familiar to MATLAB users.

In [5]:
a = np.arange(10) # 0, 1, ... 9, a lot like Python's built-in `range()`
print(a)
b = np.arange(2, 10, 3) # start, end (exclusive), step
print(b)

[0 1 2 3 4 5 6 7 8 9]
[2 5 8]


Instead of specifying step size, you can specify the number of points you want in a range.

In [6]:
a = np.linspace(0, 1, 5) # start, end (inclusive), num-points
print(a)

[ 0.    0.25  0.5   0.75  1.  ]


There are several functions for creating common types of arrays.

Note that these functions take a "shape" argument that should indicate the size of each dimension of the array in what's called a *tuple*.

In [7]:
a = np.ones((5, 2)) # num_rows, num_cols
print(a)

[[ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]]


In [8]:
b = np.zeros((4, 3)) # same as ones()
print(b)

[[ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]


In [9]:
c = np.eye(6) # identity matrix is always square, so just one parameter
print(c)

[[ 1.  0.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  0.]
 [ 0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  1.]]


## Indexing and Slicing

numpy arrays can be indexed and sliced similarly to Python lists

In [10]:
a = np.arange(10)
print(a)

print(a[0])
print(a[3])
print(a[-1])

[0 1 2 3 4 5 6 7 8 9]
0
3
9


Multi-dimensional arrays are indexed using row, col

In [11]:
a = np.array([[0, 1, 2, 3],
              [4, 5, 6, 7]])
print(a)

x = a[1, 2]
print(x)

[[0 1 2 3]
 [4 5 6 7]]
6


If you use a single index, that **row** is returned

In [12]:
row = a[1]
print(row)

[4 5 6 7]


If you want a **column**, you can use a colon instead of an index (similar to MATLAB)

In [13]:
print(a[:, 2])

[2 6]


The colon is really just a slice with default start and end -- read `a[:, 2]` as "array a, all rows, column 2".

**Note**: even though we asked for a "column," what we get has only 1 dimension

In [14]:
col = a[:, 2]
print(col.ndim)
print(col.shape)

1
(2,)


Many numpy operations simplify the shape of an array when the size of one of the dimensions becomes one.

If you really need a column, you can use `np.newaxis`

In [15]:
real_col = col[:, np.newaxis]
print(real_col)
print(real_col.shape)

[[2]
 [6]]
(2, 1)


### Copies and Views

Slicing creates a *view* on the array.

The data is not copied, so modifying a slice modifies the original array.

In [16]:
a = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8]])

b = a[0, 2:] # 0th row of a, from column 2 to the end
print(b)

b[0] = 1000
print(b)

print(a)

[3 4]
[1000    4]
[[   1    2 1000    4]
 [   5    6    7    8]]


This probably seems weird, but it allows for more efficient code.

If you don't want to mess up the original array, you can explicitly copy an array (or view).

In [17]:
c = a[1].copy() # copy of row 1 of a

c[1] = 200

print(c)

print(a)

[  5 200   7   8]
[[   1    2 1000    4]
 [   5    6    7    8]]


### Fancy Indexing

If slices aren't flexible enough, you can index an array using a list (or an array)

In [18]:
a = np.arange(0, 100, 10)
print(a)

b = a[[4, 2, 9, 9]]
print(b)

[ 0 10 20 30 40 50 60 70 80 90]
[40 20 90 90]


## Elementwise Operations

These are things that will happen to every element in your array, no matter what shape it is.

In [19]:
a = np.arange(5)
print(a)

b = a + 1 # adds one to each element
print(b)

c = 2 * a # multiplies each element by 2
print(c)

d = np.sqrt(a) # takes square root of each element
print(d)

e = np.square(a) # squares each element
print(e)

[0 1 2 3 4]
[1 2 3 4 5]
[0 2 4 6 8]
[ 0.          1.          1.41421356  1.73205081  2.        ]
[ 0  1  4  9 16]


**Note**: multiplying numpy arrays works a little differently than in MATLAB. Basic multiplication (with `*`) is *elementwise multiplication*.

In [20]:
a = np.array([0, 1, 2, 3])
b = np.array([5, 6, 7, 8])
print(a * b)

[ 0  6 14 24]


Boolean operations are elementwise operations which create *masks*. These can be used for complex indexing.

In [21]:
a = np.array([0, 1, 2, 3, 4])
mask = a > 2
print(mask)

[False False False  True  True]


These masks can be used for complex indexing.

In [22]:
b = np.array([5, 6, 7, 8, 9])
c = b[mask] # all elements of b where a > 2
print(c)

[8 9]


numpy has some built-in functions like sin, log, exp, etc. These are elementwise operators since they take an array and compute the function for each element in the array.

In [23]:
a = np.linspace(0, 1, 5)

s = np.sin(a)
print(s)

[ 0.          0.24740396  0.47942554  0.68163876  0.84147098]


In [24]:
e = np.exp(a)
print(e)

[ 1.          1.28402542  1.64872127  2.11700002  2.71828183]


## Reductions

Reductions are operations that reduce the size of an array. Most of them have self-explanatory names.

In [25]:
a = np.array([0, 1, 2, 3, 4])

print(np.sum(a))
print(np.mean(a))
print(np.max(a))

10
2.0
4


**Note**: Take care when using these operations on multi-dimensional arrays.

By default, they will operate on the entire array, returning a scalar (single number). If you want to sum across rows or columns, there is the `axis` keyword argument.

To remember how this works, think of the array shape.

In [26]:
a = np.array([[0, 1, 2, 3, 4],
              [5, 6, 7, 8, 9]])
print(a.shape)

print(a.shape[0])
print(a.shape[1])

(2, 5)
2
5


First is the number of rows (2). You count rows by going down $\downarrow$

Second is the number of columns (5). You count columns by going right $\rightarrow$

This *direction* of accumulation corresponds to the *axis*.

In [27]:
column_sums = np.sum(a, axis=0) # sum *down* the rows (sum of each column)
print(column_sums)

[ 5  7  9 11 13]


In [28]:
row_sums = np.sum(a, axis=1) # sum *across* the columns (sum of each row)
print(row_sums)

[10 35]


**Note**: Remember that numpy likes to reduce the number of dimensions when a dimension's size becomes 1, so when you sum along rows, you don't get a column vector.

In [29]:
not_col = np.sum(a, axis=1)
print(not_col.shape)

(2,)


These functions (`sum`, `mean`, `min`, `max`, etc.) take a keyword argument `keepdims` if you'd like it not to do this. It's just a shortcut so you don't have to use `np.newaxis`

In [30]:
col = np.sum(a, axis=1)[:, np.newaxis]
print(col.shape)

(2, 1)


In [31]:
also_col = np.sum(a, axis=1, keepdims=True)
print(also_col.shape)

(2, 1)


## Broadcasting

Broadcasting is a very powerful concept.

Image here: http://www.scipy-lectures.org/intro/numpy/operations.html#broadcasting

In [32]:
a = np.arange(0, 40, 10)[:, np.newaxis] # "column vector"
b = np.arange(3) # "row vector"

c = a + b # numpy expands dimensions of each to make elementwise addition work
print(c)

[[ 0  1  2]
 [10 11 12]
 [20 21 22]
 [30 31 32]]


We actually already used broadcasting without saying so.

In [33]:
a = np.ones((5, 5))

b = a + 2
print(b)

[[ 3.  3.  3.  3.  3.]
 [ 3.  3.  3.  3.  3.]
 [ 3.  3.  3.  3.  3.]
 [ 3.  3.  3.  3.  3.]
 [ 3.  3.  3.  3.  3.]]


## Array Shape Manipulation

You an flatten a multi-dimensional array

In [34]:
a = np.array([[0, 1, 2, 3],
              [4, 5, 6, 7]])

b = a.ravel()
print(b)

[0 1 2 3 4 5 6 7]


Reshaping can be done in a few ways.

In [35]:
a = np.array([[0, 1, 2, 3],
              [4, 5, 6, 7],
              [8, 9, 10, 11]])

print(a.shape)
print(a.size)

(3, 4)
12


You can explicitly specify the number of rows and columns (total size must match)

In [36]:
b = a.reshape(2, 6)
print(b)

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]]


You can also use `-1` and numpy will figure it out.

In [37]:
c = a.reshape(-1, 3)
print(c)

also_c = a.reshape(4, -1)
print(also_c)

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]


**Note**: Reshape is really just creating a view.

In [38]:
a = np.arange(12)
b = a.reshape(4, 3)

b[0] = 0

print(a)

[ 0  0  0  3  4  5  6  7  8  9 10 11]
