# Numpy

In [2]:
import numpy as np

## Creation of arrays
There are several ways of creating numpy arrays. One way is to give a (nested) list as a parameter to the `array` constructor:

In [49]:
np.array([1,2,3])   # one dimensional array

array([1, 2, 3])

Two dimensional array can be given by listing the rows of the array:

In [50]:
np.array([[1,2,3], [4,5,6]])

array([[1, 2, 3],
       [4, 5, 6]])

Similarly, three dimensional array can be described as a list of lists of lists:

In [52]:
np.array([[[1,2], [3,4]], [[5,6], [7,8]]])

array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

There are some helper functions to create common types of arrays:

In [54]:
np.zeros((3,4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

To specify that elements are ints instead of float, use the parameter dtype:

In [55]:
np.zeros((3,4), dtype=int)

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

Similarly `ones` initializes all elements to one, `full` initializes all elements to a specified value, and `empty` leaves the elements uninitialized:

In [3]:
np.ones((2,3))

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [4]:
np.full((2,3), fill_value=7)

array([[7, 7, 7],
       [7, 7, 7]])

In [59]:
np.empty((2,4))

array([[0.00000000e+000, 5.30276956e+180, 5.05117710e-038,
        2.99587486e-066],
       [3.25667482e-086, 3.35709490e-143, 6.01433264e+175,
        6.93885958e+218]])

The `eye` function creates the identity matrix, that is, a matrix with elements on the diagonal are set to one, and non-diagonal elements are set to zero:

In [61]:
np.eye(5, dtype=int)

array([[1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0],
       [0, 0, 0, 0, 1]])

The `arange` function works like the `range` function, but produces an array instead of a list.

In [5]:
np.arange(0,10,2)

array([0, 2, 4, 6, 8])

For non-integer ranges it is better to use `linspace`:

In [9]:
np.linspace(0, np.pi, 5)  # Evenly spaced range with 5 elements

array([ 0.        ,  0.78539816,  1.57079633,  2.35619449,  3.14159265])

With `linspace` one does not have compute the length of the step, but instead one specifies the wanted number of elements. By default, the endpoint is included in the result, unlike with `arange`.

### Arrays with random elements

To test our programs we might use real data as input. However, real data is not always available, and it may take time to gather. We could instead generate random numbers to use as substitute. They can be generated really easily with NumPy, and can be sampled from several different distribution, of which we mention below only a few. Random data can simulate real data better than, for example, ranges or constant arrays. Sometimes we also need random numbers in our programs to choose a subset of real data. NumPy can easily produces arrays of wanted shape filled with random numbers. Below are few examples.

In [26]:
np.random.random((3,4))          # Elements are uniformly distributed from half-open interval [0.0,1.0)

array([[ 0.36371077,  0.57019677,  0.43860151,  0.98837384],
       [ 0.10204481,  0.20887676,  0.16130952,  0.65310833],
       [ 0.2532916 ,  0.46631077,  0.24442559,  0.15896958]])

In [12]:
np.random.normal(0, 1, (3,4))    # Elements are normally distributed with mean 0 and standard deviation 1

array([[ 1.08789642, -0.60484571, -0.3100596 , -0.29957746],
       [-0.4185277 ,  1.40676248, -0.31368845,  0.73533905],
       [ 0.67562055,  0.10291608,  0.76654546,  0.3266169 ]])

In [17]:
np.random.randint(-2, 10, (3,4))  # Elements are uniformly distributed integers from the half-open interval [-2,10)

array([[ 6,  3,  9,  8],
       [ 9, -1,  2,  1],
       [ 7,  7,  5,  4]])

Sometimes it is useful to be able to recreate exactly the same data in every run of our program. For example, if there is a bug in our program, which manifests itself only with certain input, then to debug our program it needs to behave deterministically. We can create random numbers deterministically, if we always start from the same starting point. This starting point is usually an integer, and we call it a *seed*. Example of use:

In [23]:
np.random.seed(0)
print(np.random.randint(0, 100, 10))
print(np.random.normal(0, 1, 10))

[44 47 64 67 67  9 83 21 36 87]
[ 1.26611853 -0.50587654  2.54520078  1.08081191  0.48431215  0.57914048
 -0.18158257  1.41020463 -0.37447169  0.27519832]


If you run the above cell multiple times, it will always give the same number, unlike the earlier examples. Try rerunning them now!

The call to `np.random.seed` initializes the *global* random number generator. The calls `np.random.random`, `np.random.normal`, etc all use this global random number generator. It is however possible to generate new random number generators, and use those to sample random numbers from a distribution. Example on usage:

In [29]:
new_generator = np.random.RandomState(seed=123)  # RandomState is a class, so we give the seed to its constructor
new_generator.randint(0, 100, 10)

array([66, 92, 98, 17, 83, 57, 86, 97, 96, 47])

You will see these used later in the materials and in the exercises, just so we can agree what the random input data is. How else could we agree whether result is correct or not, if we can't agree what the input is!

## Array types and attributes

An array has several attributes: ndim tells the number of dimensions, shape tells the size in each dimension, size tells the number of elements, and dtype tells the element type. Let's create a helper function to explore these attributes:

In [72]:
def info(name, a):
    print("%s has dim %i, shape %s, size %i, and dtype %s:" % (name, a.ndim, a.shape, a.size, a.dtype))
    print(a)

In [73]:
b=np.array([[1,2,3], [4,5,6]])
info("b", b)

b has dim 2, shape (2, 3), size 6, and dtype int64:
[[1 2 3]
 [4 5 6]]


In [74]:
c=np.array([b, b])          # Creates a 3-dimensional array
info("c", c)

c has dim 3, shape (2, 2, 3), size 12, and dtype int64:
[[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]]


In [75]:
d=np.array([[1,2,3,4]])                # a row vector
info("d", d)

d has dim 2, shape (1, 4), size 4, and dtype int64:
[[1 2 3 4]]


## Indexing, slicing and reshaping

### Indexing
One dimensional array behaves like the list in Python:

In [40]:
a=np.array([1,4,2,7,9,5])
print(a[1])
print(a[-2])

4
9


For multi-dimensional array the index is a comma separated tuple instead of a single integer:

In [38]:
b=np.array([[1,2,3], [4,5,6]])
print(b)
print(b[1,2])    # row index 1, column index 2
print(b[0,-1])   # row index 0, column index -1

[[1 2 3]
 [4 5 6]]
6
3


In [39]:
# As with lists modification through indexing is possible
b[0,0] = 10
print(a)

[[10  2  3]
 [ 4  5  6]]


#### Slicing
Slicing works similarly to lists, but now we can have slice in different dimensions:

In [46]:
print(a)
print(a[1:3])
print(a[::-1])    # Reverses the array

[1 4 2 7 9 5]
[4 2]
[5 9 7 2 4 1]


In [45]:
print(b)
print(b[:,0])
print(b[0,:])
print(b[:,1:])

[[10  2  3]
 [ 4  5  6]]
[10  4]
[10  2  3]
[[2 3]
 [5 6]]


We can even assign to a slice:

In [47]:
b[:,1:] = 7
print(b)

[[10  7  7]
 [ 4  7  7]]


A common idiom is to extract rows or columns from an array:

In [51]:
print(b[:,0])    # First column
print(b[1,:])    # Second row

[10  4]
[4 7 7]


### Reshaping

When an array is reshaped, its number of elements stays the same, but they are reinterpreted to have a different shape. An example of this is to interpret a one dimensional array as two dimension array:

In [79]:
a=np.arange(9)
anew=a.reshape(3,3)
info("anew", anew)
info("a", a)

anew has dim 2, shape (3, 3), size 9, and dtype int64:
[[0 1 2]
 [3 4 5]
 [6 7 8]]
a has dim 1, shape (9,), size 9, and dtype int64:
[0 1 2 3 4 5 6 7 8]


In [77]:
d=np.arange(4)             # 1d array
dr=d.reshape(1,4)          # row vector
dc=d.reshape(4,1)          # column vector
info("d", d)
info("dr", dr)
info("dc", dc)

d has dim 1, shape (4,), size 4, and dtype int64:
[0 1 2 3]
dr has dim 2, shape (1, 4), size 4, and dtype int64:
[[0 1 2 3]]
dc has dim 2, shape (4, 1), size 4, and dtype int64:
[[0]
 [1]
 [2]
 [3]]


<div class="alert alert-warning">
Note the 1d array and the row and column vectors, which are 2d arrays, are fundamentally different objects, even though they look similar. They behave differently when we combine or otherwise operate arrays of different shapes, as we shall see in the next section and later in this material.
</div>

An alternative syntax to create, for example, column or row vectors is through the `np.newaxis` keyword. Sometimes this is more easier or natural than with the reshape method:

In [83]:
info("d", d)
info("drow", d[:, np.newaxis])
info("drow", d[np.newaxis, :])
info("dcol", d[:, np.newaxis])

d has dim 1, shape (4,), size 4, and dtype int64:
[0 1 2 3]
drow has dim 2, shape (4, 1), size 4, and dtype int64:
[[0]
 [1]
 [2]
 [3]]
drow has dim 2, shape (1, 4), size 4, and dtype int64:
[[0 1 2 3]]
dcol has dim 2, shape (4, 1), size 4, and dtype int64:
[[0]
 [1]
 [2]
 [3]]


### a list as an index returns a 2d array, single index a 1d array

## Array concatenation, splitting and stacking

The are two ways of combining several arrays into one bigger array: `concatenate` and `stack`. `Concatenate` takes n-dimensional arrays and returns an n-dimensional array, whereas `stack` takes n-dimensional arrays and returns n+1-dimensional array. Few examples of these:

In [27]:
a=np.arange(2)
b=np.arange(2,5)
print("a has shape %s: %s" % (a.shape, a))
print("b has shape %s: %s" % (b.shape, b))
np.concatenate((a,b))  # concatenating 1d arrays

a has shape (2,): [0 1]
b has shape (3,): [2 3 4]


array([0, 1, 2, 3, 4])

In [31]:
c=np.arange(1,5).reshape(2,2)
print("c has shape %s:" % (c.shape,), c, sep="\n")
np.concatenate((c,c))   # concatenating 2d arrays

c has shape (2, 2):
[[1 2]
 [3 4]]


array([[1, 2],
       [3, 4],
       [1, 2],
       [3, 4]])

By default `concatenate` joins the arrays along axis 0. To join the arrays horizontally, add parameter `axis=1`:

In [16]:
np.concatenate((c,c), axis=1)

array([[1, 2, 1, 2],
       [3, 4, 3, 4]])

If you want to catenate arrays with different dimensions, for example to add a new column to a 2d array, you must first  reshape the arrays to have same number of dimensions:

In [20]:
print("New row:")
print(np.concatenate((c,a.reshape(1,2))))
print("New column:")
print(np.concatenate((c,a.reshape(2,1)), axis=1))

New row:
[[1 2]
 [3 4]
 [0 1]]
New column:
[[1 2 0]
 [3 4 1]]


Using `stack` to create higher dimensional arrays from lower dimensional arrays:

In [32]:
np.stack((b,b))

array([[2, 3, 4],
       [2, 3, 4]])

In [33]:
np.stack((b,b), axis=1)

array([[2, 2],
       [3, 3],
       [4, 4]])

Inverse operation of `concatenate` is `split`. Its argument specifies either the number of equal parts the array is divided into, or it specifies explicitly the break points.

In [39]:
d=np.arange(12).reshape(6,2)
print("d:")
print(d)
d1,d2 = np.split(d, 2)
print("d1:")
print(d1)
print("d2:")
print(d2)

d:
[[ 0  1]
 [ 2  3]
 [ 4  5]
 [ 6  7]
 [ 8  9]
 [10 11]]
d1:
[[0 1]
 [2 3]
 [4 5]]
d2:
[[ 6  7]
 [ 8  9]
 [10 11]]


In [47]:
d=np.arange(12).reshape(2,6)
print("d:")
print(d)
parts=np.split(d, (2,3,5), axis=1)
for i, p in enumerate(parts):
    print("part %i:" % i)
    print(p)

d:
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]]
part 0:
[[0 1]
 [6 7]]
part 1:
[[2]
 [8]]
part 2:
[[ 3  4]
 [ 9 10]]
part 3:
[[ 5]
 [11]]


## Fast computation using universal functions

## Aggregations: max, min, sum, mean, standard deviation...

## Broadcasting

## Comparisons and masking

## Fancy indexing

## Sorting arrays

## Matrix operations
    