# Introduction to NumPy

This is a quick introduction to Jupyter notebooks and Python. It combines key elements from several open source introductory courses on Jupyter and Python for scienfitic computing. Previous experience with programming is assumed (Elementary experience with MATLAB, R, etc. or C, C++, etc. is fine).

Author: Nicolai Riis, DTU Compute, July 2017.

Each notebook takes roughly 1 hour to go through (reading + exercises).

Consider skipping segments that are familiar to you and using the notebooks as refence during the course.

## List of notebooks:
1) [Introduction to Jupyter notebooks and Python](1 - Introduction to Jupyter notebooks and Python.ipynb)

2) [Introduction to NumPy](2 - Introduction to NumPy.ipynb) - (Current)

3) [Introduction to Matplotlib](3 - Introduction to Matplotlib.ipynb)

4) [Debugging and more](4 - Debugging and more.ipynb)

### Licence:
These notebooks are released under the Attribution 3.0 Unported Licence (https://creativecommons.org/licenses/by/3.0/). This means you are free to share and adapt the content as you please as long as you give appropriate credit, link the licence and indicate if changes are made to the material.


For the original content that these notebooks are based upon please see:

* https://github.com/jhamrick/nbgrader-demo/tree/master/instructor/source/ps1
* https://github.com/jrjohansson/scientific-python-lectures


### Contact for errors:

[nabr@dtu.dk](nabr@dtu.dk)



---
<a id='1.1'></a>

### Table of contents

2.1 [Arrays](#2.1)

2.2 [Vectorized computation](#2.2)

2.3 [Linear Algebra](#2.3)

2.4 [Exercises](#2.4)

---
<a id='2.1'></a>

The `numpy` package (module) is used in almost all numerical computation using Python. It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices), performance is very good. 

To use `numpy` you need to import the module, using for example:

In [None]:
import numpy as np

Now, we have access to all NumPy functions via the variable `np` (this is the convention in the Scientific Python community for referring to NumPy). We can take a look at what this variable actually is, and see that it is in fact the `numpy` module (remember that you will need to have run the cell above before `np` will be defined!):

In [None]:
?np

NumPy is incredibly powerful and has many features, but this can be a bit intimidating when you're first starting to use it. If you are familiar with other scientific computing languages, the following guides may be of use:
* NumPy for Matlab Users: http://mathesaurus.sourceforge.net/matlab-numpy.html
* NumPy for R (and S-Plus) Users: http://mathesaurus.sourceforge.net/r-numpy.html

If not, don't worry! Here we'll go over the most common NumPy features.

---
<a id='2.1'></a>

## 2.1 Arrays

The core component of NumPy is the `ndarray`, which is pronounced like "N-D array" (i.e., 1-D, 2-D, ..., N-D). We'll use both the terms `ndarray` and "array" interchangeably. For now, we're going to stick to just 1-D arrays -- we'll get to multidimensional arrays later.

Arrays are very similar to `lists`. Let's first review how lists work. Remember that we can create them using square brackets:

In [None]:
mylist = [3, 6, 1, 0, 10, 3]
mylist

And we can access an element via its *index*. To get the first element, we use an index of 0:

In [None]:
print("The first element of 'mylist' is: " + str(mylist[0]))

To get the second element, we use an index of 1:

In [None]:
print("The second element of 'mylist' is: " + str(mylist[1]))

And so on.

Arrays work very similarly. The first way to create an array is from an already existing list:

In [None]:
myarray = np.array(mylist) # equivalent to np.array([3, 6, 1, 0, 10, 3])
myarray

<div class="alert alert-info">
Notice that <code>myarray</code> looks different than <code>mylist</code> -- it actually tells you that it's an array. If we take a look at the <i>types</i> of <code>mylist</code> and <code>myarray</code>, we will also see that one is a list and one is an array. Using <code>type</code> can be a very useful way to verify that your variables contain what you want them to contain:
</div>

In [None]:
# look at what type mylist is
type(mylist)

In [None]:
# look at what type myarray is
type(myarray)

We can get elements from a NumPy array in exactly the same way as we get elements from a list:

In [None]:
print("The first element of 'myarray' is: " + str(myarray[0]))
print("The second element of 'myarray' is: " + str(myarray[1]))

So far the `numpy.ndarray` looks awefully much like a Python list (or nested list). Why not simply use Python lists for computations instead of creating a new array type? 

There are several reasons:

* Python lists are very general. They can contain any kind of object. They are dynamically typed. They do not support mathematical functions such as matrix and dot multiplications, etc. Implementing such functions for Python lists would not be very efficient because of the dynamic typing.
* Numpy arrays are **statically typed** and **homogeneous**. The type of the elements is determined when the array is created.
* Numpy arrays are memory efficient.
* Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of `numpy` arrays can be implemented in a compiled language (C and Fortran is used).

Using the `dtype` (data type) property of an `ndarray`, we can see what type the data of an array has:

In [None]:
myarray.dtype

We get an error if we try to assign a value of the wrong type to an element in a numpy array:

In [None]:
myarray[0] = "hello"

If we want, we can explicitly define the type of the array data when we create it, using the `dtype` keyword argument: 

In [None]:
M = np.array([1, 2, 3, 4], dtype=complex)

M

Common data types that can be used with `dtype` are: `int`, `float`, `complex`, `bool`, `object`, etc.

We can also explicitly define the bit size of the data types, for example: `int64`, `int16`, `float128`, `complex128`.

### Using array-generating functions

For larger arrays it is inpractical to initialize the data manually, using explicit python lists. Instead we can use one of the many functions in `numpy` that generate arrays of different forms. Some of the more common are:

In [None]:
# create a range

x = np.arange(0, 10, 1) # arguments: start, stop, step

x

#### linspace and logspace

In [None]:
# using linspace, both end points ARE included
np.linspace(0, 10, 25)

In [None]:
np.logspace(0, 10, 10, base=np.e)

#### mgrid

In [None]:
x, y = np.mgrid[0:5, 0:5] # similar to meshgrid in MATLAB

In [None]:
x

In [None]:
y

One thing that you can do with lists that you *cannot* do with NumPy arrays is adding and removing elements. For example, I can create a list and then add elements to it with `append`:

In [None]:
mylist = []
mylist.append(7)
mylist.append(2)
mylist

However, you *cannot* do this with NumPy arrays. If you tried to run the following code, for example:

```python
myarray = np.array([])
myarray.append(7)
```

You'd get an error like this:

```
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-25-0017a7f2667c> in <module>()
      1 myarray = np.array([])
----> 2 myarray.append(7)

AttributeError: 'numpy.ndarray' object has no attribute 'append'
```

<div class="alert alert-info">To create a NumPy array, you must create an array with the correct shape <i>from the beginning</i>. However, the array doesn't have to have all the correct values from the very beginning: these you can fill in later.</div>

There are a few ways to create a new array with a particular shape:

* `np.empty(shape)` -- creates an empty array of shape `shape`
* `np.zeros(shape)` -- creates an array of shape `shape` and sets all the elements to zero
* `np.ones(shape)` -- creates an array of shape `shape` and sets all the elements to one


In [None]:
arr = np.zeros((3, 4))
arr

<div class="alert alert-info">
The *shape* of the array is a very important concept. You can always get the shape of an array by accessing its <code>shape</code> attribute:
</div>

In [None]:
arr.shape

Note that for 1-D arrays, the shape returned by the `shape` attribute is still a tuple, even though it only has a length of one:

In [None]:
np.zeros(3).shape

This also means that we can *create* 1-D arrays by passing a length one tuple. Thus, the following two arrays are identical:

In [None]:
np.zeros((3,))

In [None]:
np.zeros(3)

<div class="alert alert-danger">There is a warning that goes with this, however: be careful to always use tuples to specify the shape when you are creating multidimensional arrays. For example, to create an array of zeros with shape <code>(3, 4)</code>, we <b>must</b> use <code>np.zeros((3, 4))</code>. The following <b>will not work</b>:</div>

```python
np.zeros(3, 4)
```

It will give an error like this:

```
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-39-06beb765944a> in <module>()
----> 1 np.zeros(3, 4)

TypeError: data type not understood
```

This is because the second argument to `np.zeros` is the data type, so numpy thinks you are trying to create an array of zeros with shape `(3,)` and datatype `4`. It (understandably) doesn't know what you mean by a datatype of `4`, and so throws an error.

<div class="alert alert-info">
Another important concept is the <i>size</i> of the array -- in other words, how many elements are in it. This is equivalent to the length of the array, for 1-D arrays, but not for multidimensional arrays. You can also see the total size of the array with the <code>size</code> attribute:
</div>

In [None]:
arr = np.zeros((3, 4))
arr.size

We can also create arrays and then reshape them into any shape, provided the new array has the same size as the old array:

In [None]:
arr = np.arange(32).reshape((8, 4))
arr

#### diag

In [None]:
# a diagonal matrix
np.diag([1,2,3])

In [None]:
# diagonal matrix with offset from the main diagonal
np.diag([1,2,3], k=1) 

#### random data

In [None]:
from numpy import random

In [None]:
# uniform random numbers in [0,1]
random.rand(5,5)

In [None]:
# standard normal distributed random numbers
random.randn(5,5)

## Accessing and modifying multidimensional array elements

To access or set individual elements of the array, we can index with a sequence of numbers:

In [None]:
# set the 3rd element in the 1st row to 0
arr[0, 2] = 0
arr

We can also access the element on it's own, without having the equals sign and the stuff to the right of it:

In [None]:
arr[0, 2]

We frequently will want to access ranges of elements. In NumPy, the first dimension (or *axis*) corresponds to the rows of the array, and the second axis corresponds to the columns.

<div class="alert alert-warning">Note: be careful about setting modifying an array if what you really want is a <i>copy</i> of an array. Remember that in Python, variables are really just pointers to objects.</div>

For example, if I want to create a second array that mutliples every other value in `arr` by two, the following code will work but will have unexpected consequences:

In [None]:
arr = np.arange(10)
arr2 = arr
arr2[::2] = arr2[::2] * 2
print("arr:  " + str(arr))
print("arr2: " + str(arr2))

Note that `arr` and `arr2` both have the same values! This is because the line `arr2 = arr` doesn't actually copy the array: it just makes another pointer to the same object. To truly copy the array, we need to use the `.copy()` method:

In [None]:
arr = np.arange(10)
arr2 = arr.copy()
arr2[::2] = arr2[::2] * 2
print("arr:  " + str(arr))
print("arr2: " + str(arr2))

## Indexing

<div class="alert alert-info">Also like lists, we can use "slicing" to get different parts of the array. Slices look like
<code>myarray[a:b:c]</code>, where <code>a</code>, <code>b</code>, and <code>c</code> are all optional (though you have to specify at least one). <code>a</code> is the index of the beginning of the slice, <code>b</code> is the index of the end of the slice (exclusive), and <code>c</code> is the step size.
</div>

Note that the exclusive slice indexing described above is different than some other languages you may be familiar with, like Matlab and R. `myarray[1:2]` returns only the second elment in myarray in Python, instead of the first and second element. 

First, let's quickly look at what is in our array (defined above), for reference:

In [None]:
print("myarray:", myarray)

Now, to get all elements except the first:

In [None]:
myarray[1:]

To get all elements except the last:

In [None]:
myarray[:-1]

To get all elements except the first and the last:

In [None]:
myarray[1:-1]

To get every other element of the array (beginning from the *first* element):

In [None]:
myarray[::2]

To get every element of the array (beginning from the *second* element):

In [None]:
myarray[1::2]

And to reverse the array:

In [None]:
myarray[::-1]

Index slicing works exactly the same way for multidimensional "Matrix" arrays:

In [None]:
A = np.array([[n+m*10 for n in range(5)] for m in range(5)])

A

In [None]:
# a block from the original array
A[1:4, 1:4]

In [None]:
# strides
A[::2, ::2]

---
<a id='2.1'></a>

## 2.2 Vectorized computation

Vectorizing code is the key to writing efficient numerical calculation with Python/Numpy. That means that as much as possible of a program should be formulated in terms of matrix and vector operations, like matrix-matrix multiplication.

### Scalar-array operations

One advantage of using NumPy arrays over lists is the ability to do a computation over the entire array. For example, if you were using lists and wanted to add one to every element of the list, here's how you would do it:

In [None]:
mylist = [3, 6, 1, 0, 10, 22]
mylist_plus1 = []
for x in mylist:
    mylist_plus1.append(x + 1)
mylist_plus1

Or, you could use a list comprehension:

In [None]:
mylist = [3, 6, 1, 0, 10, 22]
mylist_plus1 = [x + 1 for x in mylist]
mylist_plus1

In contrast, adding one to every element of a NumPy array is far simpler:

In [None]:
myarray = np.array([3, 6, 1, 0, 10, 22])
myarray_plus1 = myarray + 1
myarray_plus1

This won't work with normal lists. For example, if you ran `mylist + 1`, you'd get an error like this:

```
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-19-5b3951a16990> in <module>()
----> 1 mylist + 1

TypeError: can only concatenate list (not "int") to list
```

We can do the same thing for subtraction, multiplication, etc.:

In [None]:
print("Subtraction: \t" + str(myarray - 2))
print("Multiplication:\t" + str(myarray * 10))
print("Squared: \t" + str(myarray ** 2))
print("Square root: \t" + str(np.sqrt(myarray)))
print("Exponential: \t" + str(np.exp(myarray)))

### Working with multiple arrays

We can also easily do these operations for multiple arrays. For example, let's say we want to add the corresponding elements of two lists together. Here's how we'd do it with regular lists:

In [None]:
list_a = [1, 2, 3, 4, 5]
list_b = [6, 7, 8, 9, 10]
list_c = [list_a[i] + list_b[i] for i in range(len(list_a))]
list_c

With NumPy arrays, we just have to add the arrays together:

In [None]:
array_a = np.array(list_a) # equivalent to np.array([1, 2, 3, 4, 5])
array_b = np.array(list_b) # equivalent to np.array([6, 7, 8, 9, 10])
array_c = array_a + array_b
array_c

<div class="alert alert-warning">
Note: make sure when adding arrays that you are actually working with arrays, because if you try to add two lists, you will *not* get an error. Instead, the lists will be concatenated:
</div>

In [None]:
list_a + list_b

Just as when we are working with a single array, we can add, subtract, divide, multiply, etc. several arrays together:

In [None]:
print("Subtraction: \t" + str(array_a - array_b))
print("Multiplication:\t" + str(array_a * array_b))
print("Exponent: \t" + str(array_a ** array_b))
print("Division: \t" + str(array_a / array_b))

Another very useful thing about NumPy is that it comes with many so-called "vectorized" operations. A vectorized operation (or computation) works across the entire array. For example, let's say we want to add together all the numbers in a list. In regular Python, we might do it like this:

In [None]:
mylist = [3, 6, 1, 10, 22]
total = 0
for number in mylist:
    total += number
total

Using NumPy arrays, we can just use the `np.sum` function:

In [None]:
# you can also just do np.sum(mylist) -- it converts it to an
# array for you!
myarray = np.array(mylist)
np.sum(myarray)

<div class="alert alert-info">
There are many other vectorized computations that you can do on NumPy arrays, including multiplication (<code>np.prod</code>), mean (<code>np.mean</code>), and variance (<code>np.var</code>). They all act essentially the same way as <code>np.sum</code> -- give the function an array, and it computes the relevant function across all the elements in the array.
</div>

## Vectorizing functions

As mentioned several times by now, to get good performance we should try to avoid looping over elements in our vectors and matrices, and instead use vectorized algorithms. The first step in converting a scalar algorithm to a vectorized algorithm is to make sure that the functions we write work with vector inputs.

In [None]:
def Theta(x):
    """
    Scalar implemenation of the Heaviside step function.
    """
    if x >= 0:
        return 1
    else:
        return 0

In [None]:
Theta(np.array([-3,-2,-1,0,1,2,3]))

OK, that didn't work because we didn't write the `Theta` function so that it can handle a vector input... 

To get a vectorized version of Theta we can use the Numpy function `vectorize`. In many cases it can automatically vectorize a function:

In [None]:
Theta_vec = np.vectorize(Theta)

In [None]:
Theta_vec(np.array([-3,-2,-1,0,1,2,3]))

We can also implement the function to accept a vector input from the beginning (requires more effort but might give better performance):

In [None]:
def Theta(x):
    """
    Vector-aware implemenation of the Heaviside step function.
    """
    return 1 * (x >= 0)

In [None]:
Theta(np.array([-3,-2,-1,0,1,2,3]))

In [None]:
# still works for scalars as well
Theta(-1.2), Theta(2.6)

### Matrix algebra

What about matrix mutiplication? There are two ways. We can either use the `dot` function, which applies a matrix-matrix, matrix-vector, or inner vector multiplication to its two arguments: 

In [None]:
A

In [None]:
np.dot(A, A)

In [None]:
v1 = np.arange(0, 5)

In [None]:
v1

In [None]:
np.dot(A, v1)

In [None]:
np.dot(v1, v1)

Alternatively, we can cast the array objects to the type `matrix`. This changes the behavior of the standard arithmetic operators `+, -, *` to use matrix algebra.

In [None]:
M = np.matrix(A)
v = np.matrix(v1).T # make it a column vector

In [None]:
v

In [None]:
M * M

In [None]:
M * v

In [None]:
# inner product
v.T * v

In [None]:
# with matrix objects, standard matrix algebra applies
v + M*v

If we try to add, subtract or multiply objects with incomplatible shapes we get an error:

In [None]:
v = np.matrix([1,2,3,4,5,6]).T

In [None]:
np.shape(M), np.shape(v)

In [None]:
M * v

See also the related functions: `inner`, `outer`, `cross`, `kron`, `tensordot`. Try for example `help(kron)`.

---
<a id='2.1'></a>

## 2.3 Linear Algebra

There are loads of liner algebra matrix computations in NumPy. Most are under the `np.linalg` sub-module

In [None]:
C = np.matrix([[1j, 2j], [3j, 4j]])
C

In [None]:
?np.linalg

In [None]:
np.linalg.inv(C) # equivalent to C.I 

---
<a id='2.1'></a>

## 2.4 Exercises

### Exercise 1: Euclidean distance (2 points)

Recall that the Euclidean distance $d$ is given by the following equation:

$$
d(a, b) = \sqrt{\sum_{i=1}^N (a_i - b_i) ^ 2}
$$

In NumPy, this is a fairly simple computation because we can rely on array computations and the `np.sum` function to do all the heavy lifting for us.

<div class="alert alert-success">
Complete the function <code>euclidean_distance</code> below to compute $d(a,b)$, as given by the equation above. Note that you can compute the square root using <code>np.sqrt</code>.
</div>

In [None]:
def euclidean_distance(a, b):
    """Computes the Euclidean distance between a and b.
    
    Hint: your solution can be done in a single line of code!
    
    Parameters
    ----------
    a, b : numpy arrays or scalars with the same size
    
    Returns
    -------
    the Euclidean distance between a and b
    
    """
    # YOUR CODE HERE
    raise NotImplementedError()

<div class="alert alert-warning">Remember that you need to execute the cell above (with your definition of <code>euclidean_distance</code>), and then run the cell below to check your answer. If you make changes to the cell with your answer, you will need to <i>first</i> re-run that cell, and <i>then</i> re-run the test cell to check your answer again.</div>

In [None]:
# add your own test cases in this cell!


In [None]:
from nose.tools import assert_equal, assert_raises

# check euclidean distance of size 3 integer arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
assert_equal(euclidean_distance(a, b), 5.196152422706632)

# check euclidean distance of size 4 float arrays
x = np.array([3.6, 7., 203., 3.])
y = np.array([6., 20.2, 1., 2.])
assert_equal(euclidean_distance(x, y), 202.44752406487959)

# check euclidean distance of scalars
assert_equal(euclidean_distance(1, 0.5), 0.5)

# check that an error is thrown if the arrays are different sizes
a = np.array([1, 2, 3])
b = np.array([4, 5])
assert_raises(ValueError, euclidean_distance, a, b)
assert_raises(ValueError, euclidean_distance, b, a)

print("Success!")

### Exercise 2: Border (3 points)

<div class="alert alert-success">
Write a function to create a 2D array of arbitrary shape. This array should have all zero values, except for the elements around the border (i.e., the first and last rows, and the first and last columns), which should have a value of one.
</div>

In [None]:
def border(n, m):
    """Creates an array with shape (n, m) that is all zeros
    except for the border (i.e., the first and last rows and
    columns), which should be filled with ones.

    Hint: you should be able to do this in three lines
    (including the return statement)

    Parameters
    ----------
    n, m: int
        Number of rows and number of columns

    Returns
    -------
    numpy array with shape (n, m)

    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# add your own test cases in this cell!

In [None]:
from numpy.testing import assert_array_equal
from nose.tools import assert_equal

# check a few small examples explicitly
assert_array_equal(border(1, 1), [[1]])
assert_array_equal(border(2, 2), [[1, 1], [1, 1]])
assert_array_equal(border(3, 3), [[1, 1, 1], [1, 0, 1], [1, 1, 1]])
assert_array_equal(border(3, 4), [[1, 1, 1, 1], [1, 0, 0, 1], [1, 1, 1, 1]])

# check a few large and random examples
for i in range(10):
    n, m = np.random.randint(2, 1000, 2)
    result = border(n, m)

    # check dtype and array shape
    assert_equal(result.dtype, np.float)
    assert_equal(result.shape, (n, m))

    # check the borders
    assert (result[0] == 1).all()
    assert (result[-1] == 1).all()
    assert (result[:, 0] == 1).all()
    assert (result[:, -1] == 1).all()

    # check that everything else is zero
    assert np.sum(result) == (2*n + 2*m - 4)

print("Success!")