# Working with NumPy

When you start working with numeric computations, you'll soon start wanting to work with sequence of numbers like in **vectors**, a collection of vectors like in **matrices**, and potentially even higher dimensional structures like **tensors**. 

While you could imagine using **a list** to represent a vector, as in:

In [1]:
v1 = [1, 2, 3]
v2 = [4, 5, 6]

Performing very common vector operations like vector additions and dot products can be cumbersome:

In [2]:
v3 = v1 + v2
print(v3)

[1, 2, 3, 4, 5, 6]


In [3]:
def vector_add(v1, v2):
    v3 = []
    for i in range(len(v1)):
        v3.append(v1[i] + v2[i])
    return v3

In [4]:
vector_add(v1, v2)

[5, 7, 9]

Fortunately, you can get access to a very powerful numeric array library for Python by installing **NumPy** package. If you have used Anaconda distribution, then your Python already comes with **NumPy** installed!

To start using the **NumPy** package, you have to import them:

In [5]:
import numpy as np

# Creating a NumPy array

You can convert a list into **a NumPy array** that supports more complex numeric operations:

In [6]:
a = np.array([1, 2, 3])

In [7]:
a

array([1, 2, 3])

In [8]:
b = np.array([4, 5, 6])

Unlike lists, NumPy arrays supports mathematical operations like that for vectors and matrices:

sum of two vectors

In [9]:
a + b

array([5, 7, 9])

dot product

In [10]:
a @ b

32

Above you have created a vector or more specifically a **1-D array**

In [11]:
x = np.array([1, 2, 3, 4, 5])

You can look at an array's dimension with its `ndim` property

In [12]:
x.ndim

1

and get its **shape** with `shape` property

In [13]:
x.shape

(5,)

You can create a **2-D array** by turning a **list of list** into an array.

In [14]:
y = np.array([[0, 1, 2], [3, 4, 5]])  # 2 x 3 array

y

array([[0, 1, 2],
       [3, 4, 5]])

In [15]:
y.ndim

2

In [16]:
y.shape

(2, 3)

# Functions for creating arrays

While you can enter elemetns of array one-by-one, it is far more common to use one of NumPy's special functions for creating an array.

## Range of integers

Just like `range()` function to create an iterable, ther is `np.arange` that let's you create a sequence of integers as a NumPy array

In [17]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [18]:
np.arange(5, 10)

array([5, 6, 7, 8, 9])

In [21]:
np.arange(3, 30, 2)

array([ 3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29])

## Specifying start, ending and number of items

Alternatively, you might want to specify the start and end, and also specify **how many elements** should fit in that range, equally spaced apart:

In [22]:
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [23]:
np.linspace(0, 1, 100)

array([0.        , 0.01010101, 0.02020202, 0.03030303, 0.04040404,
       0.05050505, 0.06060606, 0.07070707, 0.08080808, 0.09090909,
       0.1010101 , 0.11111111, 0.12121212, 0.13131313, 0.14141414,
       0.15151515, 0.16161616, 0.17171717, 0.18181818, 0.19191919,
       0.2020202 , 0.21212121, 0.22222222, 0.23232323, 0.24242424,
       0.25252525, 0.26262626, 0.27272727, 0.28282828, 0.29292929,
       0.3030303 , 0.31313131, 0.32323232, 0.33333333, 0.34343434,
       0.35353535, 0.36363636, 0.37373737, 0.38383838, 0.39393939,
       0.4040404 , 0.41414141, 0.42424242, 0.43434343, 0.44444444,
       0.45454545, 0.46464646, 0.47474747, 0.48484848, 0.49494949,
       0.50505051, 0.51515152, 0.52525253, 0.53535354, 0.54545455,
       0.55555556, 0.56565657, 0.57575758, 0.58585859, 0.5959596 ,
       0.60606061, 0.61616162, 0.62626263, 0.63636364, 0.64646465,
       0.65656566, 0.66666667, 0.67676768, 0.68686869, 0.6969697 ,
       0.70707071, 0.71717172, 0.72727273, 0.73737374, 0.74747

## Common starting array: `ones`, `zeros`, and `random`

You would often want to start with an array of specified sizes (e.g. 3 x 3 matrix) initialized with 0s, 1s, or perhaps random numbers.

### initialize to 0

In [24]:
np.zeros((3, 3))  # be careful - you pass in a tuple specifying the shape

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

### initialize to 1

In [25]:
np.ones((3, 4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

### randomly initialized

In [29]:
np.random.rand(4, 2)  # random number uniformly drawn between 0 and 1

array([[0.63875217, 0.71972817],
       [0.47024438, 0.84033629],
       [0.68402181, 0.16422992],
       [0.29084573, 0.3317176 ]])

In [35]:
np.random.randn(4, 2)  # random number drawn from standard normal distribution

array([[ 0.03481687, -0.42212612],
       [ 0.1503195 ,  0.12119077],
       [-0.67594363, -0.80449497],
       [-0.06626417,  0.03121792]])

## Other special matrices: identity matrix and diagonal matrix

On other occasions, you'd want to create an identity matrix (matrix where only diagonal elements are 1s) and diagonal matrix (matrix where only diagonal elements are non-zeros)

In [36]:
np.eye(4)  # 4 x 4 identity matrix

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [37]:
np.diag([1, 2, 3, 4]) # 4 x 4 matrix with 1, 2, 3, 4 as the diagonal entries

array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

## Reshaping arrays

You can create an array in one shape and then later **reshape** it. This can be a neat trick for creating a higher dimensional array with sequential content:

Create a flat sequential array

In [38]:
x = np.arange(15)

x

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

Now reshape it into 3 x 5

In [39]:
x.reshape(3, 5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

We are going to revisit array shape manipulation in much more details later.

# Indexing and slicing arrays

You can index and slice a 1-D array just like you would for a list

In [40]:
a = np.arange(10)

In [41]:
a[0]  # first element

0

In [42]:
a[-1] # last element

9

In [43]:
a[3:8] # slicing

array([3, 4, 5, 6, 7])

In [44]:
a[::-1] # reversing

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

When working with **multidimensional arrays**, you have more choices with indexing:

In [45]:
a = np.arange(12).reshape(3, 4)
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [46]:
a[0] # selects the first row

array([0, 1, 2, 3])

In [47]:
a[-1] # selects the last row

array([ 8,  9, 10, 11])

In [48]:
a[:2] # slice first two rows

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [49]:
a[0, 1] # selects row 0, column 1

1

In [50]:
a[2, 2]

10

In [51]:
a[1, :3] # slice first row, first three columns

array([4, 5, 6])

In [54]:
a[:, 2] # get column 2

array([ 2,  6, 10])

In [56]:
a[:,2:]

array([[ 2,  3],
       [ 6,  7],
       [10, 11]])

In [57]:
a[:,:2]

array([[0, 1],
       [4, 5],
       [8, 9]])

Slicing multiple dimensions together

In [58]:
a = np.arange(16).reshape(4, 4)

In [59]:
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [61]:
a[1:3, 1:3]

array([[ 5,  6],
       [ 9, 10]])

# Modifying arrays

Just like list, NumPy arrays are **mutable** - meaning, you can change its content after the creation.

In [62]:
x = np.zeros((3, 5))
x

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

## Change one element at a time

In [63]:
x[1, 2] = 3

In [64]:
x

array([[0., 0., 0., 0., 0.],
       [0., 0., 3., 0., 0.],
       [0., 0., 0., 0., 0.]])

## Change an entire row:

In [65]:
x[0] = 4

In [66]:
x

array([[4., 4., 4., 4., 4.],
       [0., 0., 3., 0., 0.],
       [0., 0., 0., 0., 0.]])

## Change an entire column:

In [67]:
x[:, -1] = 5

In [68]:
x

array([[4., 4., 4., 4., 5.],
       [0., 0., 3., 0., 5.],
       [0., 0., 0., 0., 5.]])

## Placing another list into an array

In [69]:
x[-1] = [10, 20, 30, 40, 50]

In [70]:
x

array([[ 4.,  4.,  4.,  4.,  5.],
       [ 0.,  0.,  3.,  0.,  5.],
       [10., 20., 30., 40., 50.]])

In [71]:
x[0] = [1, 2, 3]

ValueError: cannot copy sequence with size 3 to array axis with dimension 5

## Copies vs Views

You saw that you can use indexing and slicing to assign values into an array. But what would happen if you slice an array, and then assign values into an index in that array?

In [72]:
x = np.arange(15).reshape(3, 5)

x

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [None]:
r = x[1] # second row

In [None]:
r[3] # 4th value in the second row

You can modify a value in the 1-D array.

In [None]:
r[3] = 100

In [None]:
r

What happens in the original array?

In [None]:
x

Notice that changing values of `r` results in changes in `x`!

This is because **slicing an array** returns what's called a **view** of an array. View is a bit like a viewing window - you are only seeing a part of the original array, but you are nevertheless still looking at the array!

A handful of NumPy operations (including slicing) returns a view and thus can be a convenient way to continue to work with the original array and often can save memory by avoiding unnecessary copies of data. 

However, there will be situations that you want to pass a copy of the array so that you don't have to worry about your original array accenditanlly getting modified. You can use `copy()` method in these cases.

In [73]:
x = np.arange(15).reshape(3, 5)

r = x[0].copy()

# update all elements of r
r[:] = r * 100

print('r=')
print(r)
print('x=')
print(x)

r=
[  0 100 200 300 400]
x=
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


# Working with arrays

## Elementwise operations

You can perform **elementwise** operations easily with NumPy array

In [None]:
x = np.arange(5)
x

In [None]:
x + 3

In [None]:
x**2

In [None]:
y = np.ones(5) + 5
y

In [None]:
x - y

In [None]:
2 ** x

In [None]:
np.sqrt(x)

### Multiplying arrays

When you multiply arrays, they do **not** perform matrix multiplication

In [None]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[1, 0], [0, 1]])

In [None]:
a * b

To get matrix multiplcation, you want to use `@` operator

In [None]:
a @ b

## Speed of operations

Aside from being far easily to perform operations on, operations on NumPy arrays are much faster than on lists!

Let's compare the time of squaring a sequence of numbers from 0 to 99 (inclusive)

We'll use `%%timeit` Jupyter Magic to measure the average time it takes to complete the operation.

### List

In [None]:
x = list(range(100))

In [None]:
%%timeit
x2 = []
for v in x:
    x2.append(v**2)

#### Using list comprehension

There is a bit more succinct way to express a loop based operations like above using what's known as **list comprehension**.

In [None]:
%%timeit
x2 = [v**2 for v in x] # this is equivalent to the above code

List comprehension can be slightly faster than full-fledge for-loops.

### Numpy array

In [None]:
x = np.arange(100)

In [None]:
%%timeit
x2 = x**2

You should see that NumPy is usually quite a bit faster!

## Comparisons

You can perform comparisons on arrays to get an array of boolean values.

In [None]:
a = np.array([1, 2, 3, 4])
b = np.array([4, 2, 2, 4])

#### Equality

In [None]:
a == b

In [None]:
a == 3

#### Inequality

In [None]:
a > b

In [None]:
a < 4

### Universal functions (ufuncs)

NumPy also comes with a lot of functions that works on element-by-element basis, and these are called **universal functions** or ufuncs for short. Examples include `np.sqrt`, `np.sin`, and `np.log`

In [None]:
a = np.arange(10)
a

In [None]:
np.sqrt(a)

In [None]:
np.sin(a)

In [None]:
np.log(a)

### Transposition

When working with a 2-D array (matrix), you would often encounter a need to **transpose** an array. You can achieve this with `.T`

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
a

In [None]:
a.T # returns a transposed VIEW

Notice that transposition returns a **view**, and thus modifying the returned array results in the modification of the original:

In [None]:
b = a.T

b[-1, 0] = 100

In [None]:
b

In [None]:
a

# Reductions and statistics

One very common type of computations on array is the **reduction** - an operations that results in smaller array as results. Examples include summation, mean, max and min.

In [None]:
x = np.arange(10)
x

In [None]:
np.sum(x)

Many operations are also available as methods on the array:

In [None]:
x.sum()

When working with higher dimensional array (dim >= 2), you can actually specify the axis to perform reduction over:

In [None]:
x = np.array([[1, 1], [2, 2]])
x

In [None]:
x.sum(axis=0) # reduces over axis=0 -> sum over rows

In [None]:
x.sum(axis=1) # reduces over axis=1 -> sum over columns

In [None]:
y = np.array([[1, 2, 8], [4, -3, 5]])

y

In [None]:
y.max(axis=0) # maximum over rows

In [None]:
y.max() # maximum overall

### Statistics

Statistics computations are example of reductions, and common computations such as mean, median, and standard deviations are provided:

In [None]:
x = np.array([1, 2, 3, 5, 10, -5])

In [None]:
x.mean()

In [None]:
np.median(x)  # there is no x.median

In [None]:
x.std()

Let's use a random samples of Normal distribution:

In [None]:
# 100 samples of Normal distribution
x = np.random.randn(100)

In [None]:
x.mean()

In [None]:
x.std()

# Fancy indexing

We have already looked at how to index into an array with an integer or a tuple of integers:

In [None]:
a = np.arange(12).reshape(2, 2, 3)
a

In [None]:
a[0]

In [None]:
a[0, 1, 2]

Let's now look at a more advanced or **fancy** indexing:

## Indexing with boolean

Probably by far the most useful fancy indexing is the boolean indexing. Given an array:

In [None]:
a = np.array([-5, 10, 6, -8, -2, 3, 0])

suppose that we want to take all negative numbers change them to 0, an operation sometimes referred to as *rectification*. We can actually use a for-loop just like on a list:

In [None]:
for i in range(len(a)):
    if a[i] < 0:
        a[i] = 0
a

While this works, this now uses a for loop and thus not super quick. Earlier, we saw that we can get an array of `True` and `False` with comparison operators

In [None]:
a = np.array([-5, 10, 6, -8, -2, 3, 0])

In [None]:
a < 0

It turns out that you can **index into an array with a boolean array**:

In [None]:
a[a < 0] # selects the negative entries

And finally, you can use the result as target of assignment directly!

In [None]:
a[a < 0] = 0   # make all negative numbers 0

a

A beauty in this is that you can use this on an array of arbitrary dimensions!

In [None]:
x = np.array([[1, -5, 3, 10, 7, -5],[-5, 8, 9, -1, -10, -4]])
x

In [None]:
x < 0

In [None]:
x[x < 0]

In [None]:
x[x < 0] = 0

In [None]:
x

## Indexing with an array of integers

You can use an array of integers as index into an array:

In [None]:
a = np.arange(0, 100, 10)
a

In [None]:
a[[2, 3, 2, 4, 2]]

In [None]:
a[[2, 3, 9]]

You can use this to assign:

In [None]:
a[[2, 3, 9]] = -100

In [None]:
a

## Sorting an array

Given an array, you can get a sorted **copy** using `np.sort`

In [None]:
x = np.array([1, 4, 3, 0, 5])

In [None]:
np.sort(x)

`.sort` method sorts the array **in place**

In [None]:
x.sort()

In [None]:
x

For higher dimension array, you can specify the **dimension to sort along**

In [None]:
a = np.array([[1, 8, 4, 5], [3, 9, 2, 4], [1, 2, 3, 0]])
a

In [None]:
np.sort(a, axis=0) # sort within columns (that is, sort along the rows)

In [None]:
np.sort(a, axis=1) # sort within rows (that is, sort along the columns) 

# On the tangent: keyword arguments

Notice that in some cases, we specify the name of the parameter when passing in an argument, for example in the case of `x.mean(axis=5)`. In Python, you can actually give value to the specific parameter by using **keyword arguments** notation.

For example, consider the following function:

In [None]:
def greeting(name, message='Hello!', n=1):
    for i in range(n):
        print('{} {}'.format(message, name))

In [None]:
greeting('Edgar')

In [None]:
greeting('Edgar', 'Morning!')

In [None]:
greeting('Edgar', 'Morning!', 10)

What if I wanted to use the default for `message` (that is, "Hello!") but wanted to specify the number of times the message prints? You can use keyword arguments!

In [None]:
greeting('Edgar', n=5)

Hence, keyword arguments are useful if you want to change the non-default behavior for certain arguments but use the default arguments for the rest!

Note that because `name` doesn't have a default value, you **must** always specify the value. However, you can actually use keyword arguments to change the position at which you specify its value!

In [None]:
greeting(n=10, name='John')

In [None]:
greeting(n=10, name='John', message='Hi!')

# Broadcasting

One very interesting, potentially confusing but ultimately useful feature of NumPy array is **broadcasting**. To understand what it is all about, let's start at the real basics - adding two arrays

We already saw that you can add two arrays of same size, and the addition occurs **element wise** as expected.

In [None]:
a = np.array([1, 2, 3])
b = np.array([1, 1, 1])

a + b

You can also create a *column vector* by preparing a 2-D array of shape N x 1

In [None]:
a = np.array([[1], [2], [3]])
b = np.array([[1], [1], [1]])

In [None]:
a

In [None]:
b

Adding them is not a problem

In [None]:
a + b

Now what would happen if we take the following two arrays:

In [None]:
a = np.array([[1, 2, 3]])
b = np.array([[1], [1], [1]])

In [None]:
a

In [None]:
b

and add them?

In terms of matrix algebra, you shouldn't be able to add two matrices of differing sizes, but...

In [None]:
a + b

What just happened?! Well, **broadcasting** happened.

## Array shape manipulation

To completely harness the power of broadcasting, it becomes critical that you can manipulate the exact shape of an array.

## Flattening an array

Sometimes you want to flatten out an array into a long vector. You achieve this with `ravel`.

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
a

In [None]:
b = a.ravel()

Be ware that this returns a **view**

In [None]:
b[0] = 100

In [None]:
a

Also pay special attention to the order of traversal of elements when flatenning - in Python, the **last dimension ravels out first**.

## Reshaping

We already saw that we can take a flat array and turn it into higher dimension array as long as the number of elements matches:

In [None]:
a = np.arange(24)
a

In [None]:
a.reshape([2, 3, 4])

You can actually take a non-flat array and change it's shape as well:

In [None]:
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
x

In [None]:
x.reshape([3, 4])

## Adding and reducing dimensions

Sometimes, you want to add a dimension into your array, most likely for the purpose of broadcasting.

Say that you want to add each of values `[-1, 0, 1]` to an array `[1, 2, 3, 4]`, yielding `[0, 1, 2, 3]`, `[1, 2, 3, 4]`, and `[2, 3, 4, 5]`. This could be done with broadcasting:

In [None]:
x = np.array([-1, 0, 1])
y = np.array([1, 2, 3, 4])

However, as it stands, you cannot broadcast them together because they are both 1-D arrays.

In [None]:
x

In [None]:
x.shape

In [None]:
y

In [None]:
y.shape

In [None]:
x + y

What you want to do, is to make them both into 2-D array, and make one a column vector and the other a row vector, so you can broadcast them together.

In [None]:
x = x[np.newaxis, :] # add a singleton first dimension

In [None]:
x

In [None]:
x.shape

In [None]:
y = y[:, np.newaxis] # add a singleton second dimension

In [None]:
y

In [None]:
y.shape

Now you can add them for broadcasting

In [None]:
x + y

## Squeeze dimensions

Given an array, you can **squeeze out** singleton dimensions from an array with `.squeeze()` method.

In [None]:
x = np.random.randn(4, 1, 1, 3, 1)

In [None]:
x

In [None]:
x.shape

In [None]:
y = x.squeeze()

In [None]:
y

In [None]:
y.shape

# NumPy Exercises

Here are a few NumPy array exercises to practice your array creation and manipulation skills.

### 1. Create a vector of size 10 filled with 0s.

### 2. Create a vector of integers starting from 0 and ending at 50

### 3. Reverse the following vector

In [None]:
x = np.array([1, 2, 3, 4, 5])

# do something on x to reverse it! Hint: slicing syntax with step?

### 4. Create a 4 x 4 matrix with values ranging from 0 to 15

### 5. Create a 4 x 4 matrix with values ranging from 3 to 18

### 6. Create 5 x 5 identity matrix

### 7. Create 2 x 2 x 3 array with random values

### 8. Find the pairwise difference between each element of a vector

Express the pairwise difference as a matrix

So for `[1, 2, 3]`, pairwise difference matrix is:

```
[[0, 1, 2],
[-1, 0, 1],
[-2, -1, 0]]
```

Hint 2: ...broadcasting?