# Day 4: Introduction to NumPy

## Examples

### Why do we need it?

Let's try some basic array operations on standard Python arrays - lists:

In [1]:
x = [1, 2, 3]
y = [3, 4, 5]
x + y  # Add arrays

[1, 2, 3, 3, 4, 5]

In [2]:
x * 2  # Multiply the array by integer

[1, 2, 3, 1, 2, 3]

In [3]:
x * y  # Multiply matrix by matrix

TypeError: can't multiply sequence by non-int of type 'list'

Is it our desired result? In terms of matrix calculations - no.

### Array creation

In [4]:
import numpy as np

In [5]:
x_a = np.array(x)
y_a = np.array(y)

In NumPy you can of course build 2-dimensional matrices:

In [6]:
arr2d = np.array(
    [x, y]
)  # If we use nested lists, numpy can build multi-dimensional arrays from them!
arr2d

array([[1, 2, 3],
       [3, 4, 5]])

In [7]:
arr2d.shape

(2, 3)

... or 3-dimensional matrices:

In [8]:
arr3d = np.array([[x, y], [x, y]])  # Nested lists again
arr3d

array([[[1, 2, 3],
        [3, 4, 5]],

       [[1, 2, 3],
        [3, 4, 5]]])

In [None]:
arr3d.shape

In [9]:
arr3d = np.array(
    [arr2d, arr2d, arr2d]
)  # You can also use list of arrays to create a new one
print(arr3d)
print(arr3d.shape)

[[[1 2 3]
  [3 4 5]]

 [[1 2 3]
  [3 4 5]]

 [[1 2 3]
  [3 4 5]]]
(3, 2, 3)


You can build `N`-dimensional arrays in NumPy! But there is a limit - `N` must be no greater than 32.

In [10]:
N = 32
arr = np.array([1, 2, 3])
for i in range(N - 1):
    arr = np.expand_dims(arr, axis=-1)
print("Shape of arr: {}".format(arr.shape))
print("Number of dimensions of arr: {}".format(arr.ndim))

Shape of arr: (3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
Number of dimensions of arr: 32


OK, so we know how to build arrays in a simple way. Let's play with the arrays a little bit!

### Array operations

In [11]:
x_a + y_a

array([4, 6, 8])

In [12]:
x_a * 2

array([2, 4, 6])

In [13]:
x_a * y_a  # This is element-wise multiplication

array([ 3,  8, 15])

In [14]:
np.dot(
    x_a, y_a
)  # This is matrix multiplication. For 1-dimensional arrays it's just a dot product
# Later we will see the dot product of 2-dimensional matrices

26

In [15]:
x_a @ y_a  # This is alternative way to calculate dot product

26

Now that's way more intuitive in terms of matrix calculations.

And now it's time to show 2-d matrix multiplication

In [16]:
print("arr2d:\n{}".format(arr2d))
print("arr2d transposed:\n{}".format(arr2d.T))
product = np.dot(arr2d, arr2d.T)  # We have to transpose the second argument,
print(
    "Product of these matrices:\n{}".format(product)
)  # because otherwise the dimensions won't match
product = arr2d @ arr2d.T
print("Product of these matrices, alternative way of calculation:\n{}".format(product))

arr2d:
[[1 2 3]
 [3 4 5]]
arr2d transposed:
[[1 3]
 [2 4]
 [3 5]]
Product of these matrices:
[[14 26]
 [26 50]]
Product of these matrices, alternative way of calculation:
[[14 26]
 [26 50]]


### Array slicing and indexing

In [17]:
x_a  # This is our array

array([1, 2, 3])

In [18]:
type(x_a)  # Type of the array

numpy.ndarray

In [19]:
x_a[0], x_a[1], x_a[2]  # Elements - we can use brackets [] to access them

(1, 2, 3)

In [20]:
x_a[0] = 5  # We can also substitute elements of arrays
x_a

array([5, 2, 3])

In [21]:
arr2d = np.array([x, y, x])  # Now let's create 2-dimensional array
arr2d

array([[1, 2, 3],
       [3, 4, 5],
       [1, 2, 3]])

In [22]:
arr2d[0]  # This is the first row

array([1, 2, 3])

In [23]:
print(
    "Type of array slice: {}\nDimensions of the slice: {}".format(
        type(arr2d[0]), arr2d[0].shape
    )
)

Type of array slice: <class 'numpy.ndarray'>
Dimensions of the slice: (3,)


In [24]:
arr2d[0, 2]  # Here we take the element from 0-th row and 2-nd column directly

3

In [25]:
arr2d[0][2]  # Here we first take 0-th row, then 2-nd element of this row

3

In [26]:
arr2d[
    :, 0
]  # Let's get the first column, colon means: "Take all items of this dimension"

array([1, 3, 1])

In [27]:
arr2d[
    0, :
]  # So here we take the first row and all the columns, but as we saw earlier, there is a shortcut

array([1, 2, 3])

In [28]:
arr2d  # Let's see again how this arrays looks...

array([[1, 2, 3],
       [3, 4, 5],
       [1, 2, 3]])

In [29]:
arr2d[:2, 1:]  # ...and try some more sophisticated indexing,
# here: take all rows up to row number 2 (without this row, remember about 0-indexing!)
# and all columns from column number 1 to the end (including this column)

array([[2, 3],
       [4, 5]])

In [30]:
arr2d[:2, 1:].shape

(2, 2)

In [31]:
arr2d[:-1, 0:-1]  # We can also take all rows up to the last row (excluding it)
# and all columns from column number 0 up to last columns (excluding it)

array([[1, 2],
       [3, 4]])

### Implemented ways to build standard arrays

Got some intuition about indexing already? Let's see how else we can build matrices!

In [32]:
range_arr = np.arange(
    5
)  # Just get all integers from zero to this argument (not including it)
print(range_arr)
range_arr = np.arange(3, 8)  # Get all integers from start to stop
print(range_arr)
range_arr = np.arange(
    1, 10, 3
)  # Get all integers from start to stop with given step, stop is not included!
print(range_arr)

[0 1 2 3 4]
[3 4 5 6 7]
[1 4 7]


This is a nice point to see how much faster is NumPy than operation on lists:

In [33]:
%%timeit -n 100
N = 10000
list_range = range(N)
sum(list_range)

The slowest run took 4.07 times longer than the fastest. This could mean that an intermediate result is being cached.
172 µs ± 78.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [46]:
%%timeit -n 100
arr_range = np.arange(N)
arr_range.sum()

The slowest run took 4.79 times longer than the fastest. This could mean that an intermediate result is being cached.
3.29 µs ± 2.06 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


But how exactly `np.ndarray.sum()` works?

In [45]:
print(arr2d.sum())  # This is a sum of all elements
print(arr2d.sum(axis=0))  # Sum over all rows
print(arr2d.sum(axis=1))  # Sum over all columns

24
[ 5  8 11]
[ 6 12  6]


In [36]:
linspace = np.linspace(
    0, 1
)  # This function creates equally spaced sequence of numbers from start to stop
# Here we don't specify the step, but the number of items. Default is 50
print(linspace)
linspace = np.linspace(0, 1, endpoint=False)  # We can also drop the stop value
print(linspace)
linspace = np.linspace(0, 1, 10, endpoint=False)
print(linspace)

[0.         0.02040816 0.04081633 0.06122449 0.08163265 0.10204082
 0.12244898 0.14285714 0.16326531 0.18367347 0.20408163 0.2244898
 0.24489796 0.26530612 0.28571429 0.30612245 0.32653061 0.34693878
 0.36734694 0.3877551  0.40816327 0.42857143 0.44897959 0.46938776
 0.48979592 0.51020408 0.53061224 0.55102041 0.57142857 0.59183673
 0.6122449  0.63265306 0.65306122 0.67346939 0.69387755 0.71428571
 0.73469388 0.75510204 0.7755102  0.79591837 0.81632653 0.83673469
 0.85714286 0.87755102 0.89795918 0.91836735 0.93877551 0.95918367
 0.97959184 1.        ]
[0.   0.02 0.04 0.06 0.08 0.1  0.12 0.14 0.16 0.18 0.2  0.22 0.24 0.26
 0.28 0.3  0.32 0.34 0.36 0.38 0.4  0.42 0.44 0.46 0.48 0.5  0.52 0.54
 0.56 0.58 0.6  0.62 0.64 0.66 0.68 0.7  0.72 0.74 0.76 0.78 0.8  0.82
 0.84 0.86 0.88 0.9  0.92 0.94 0.96 0.98]
[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]


In [37]:
ones = np.ones([3, 3])  # The argument indicates dimensions of target array
ones

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [38]:
zeros = np.zeros((5, 5), dtype=np.uint8)  # We can also pass dimensions as a tuple
# You can always specify type of data that the array contains
print(zeros)
print(type(zeros[0, 0]), zeros.dtype)

[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
<class 'numpy.uint8'> uint8


In [39]:
eye = np.eye(
    4
)  # This function creates matrix with ones on a diagonal. In its simplest version it takes
# just one argument - the number of rows. In that case result is a square unity matrix
eye

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [49]:
print(eye.dtype)
eye.dtype = np.uint16  # Yoo can change the type of the data inside an array
print(eye.dtype)
print(eye)  # But you have to be careful

float64
uint16
[[    0     0     0 16368     0     0     0     0     0     0     0     0
      0     0     0     0]
 [    0     0     0     0     0     0     0 16368     0     0     0     0
      0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0     0 16368
      0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0 16368]]


In [41]:
eye = np.eye(4)
print(eye.dtype)
eye = np.array(eye, dtype=np.uint16)  # This is much safer
print(eye.dtype)
print(eye)

float64
uint16
[[1 0 0 0]
 [0 1 0 0]
 [0 0 1 0]
 [0 0 0 1]]


In [42]:
new_arr = zeros  # Let's create a new matrix...
new_arr[1:-1, 1:-1] = ones  # ...and substitute a whole sub-matrix
new_arr

array([[0, 0, 0, 0, 0],
       [0, 1, 1, 1, 0],
       [0, 1, 1, 1, 0],
       [0, 1, 1, 1, 0],
       [0, 0, 0, 0, 0]], dtype=uint8)

In [43]:
print(zeros)  # Let's check our zeros matrix

[[0 0 0 0 0]
 [0 1 1 1 0]
 [0 1 1 1 0]
 [0 1 1 1 0]
 [0 0 0 0 0]]


This was very important. In Python by default you pass a reference to an object, in order to copy it an array you have to specify it explicitly.

In [44]:
zeros = np.zeros((5, 5), dtype=np.uint8)
new_arr = zeros.copy()
new_arr[1:-1, 1:-1] = ones
print(new_arr)
print(zeros)  # Now only new_arr has been modified

[[0 0 0 0 0]
 [0 1 1 1 0]
 [0 1 1 1 0]
 [0 1 1 1 0]
 [0 0 0 0 0]]
[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]


NumPy has also a nice module for random numbers generation

In [None]:
random_array = np.random.random(
    [2, 3, 3]
)  # Argument is the shape of array. Returned array consists of elements
# randomly chosen from uniform distribution from 0 to 1
random_array

In [None]:
a = 3
b = 15  # To change the interval just do some simple operations
random_new_interval = random_array * (b - a) + a
random_new_interval

In [None]:
random_ints = np.random.randint(
    3, 8
)  # This returns a single value randomly selected from given interval
print(random_ints)
random_ints = np.random.randint(3, 8, [3, 5])  # But this will give us a 3x5 array
print(random_ints)

### Let's see how to join arrays together!

In [None]:
rand_arr = np.random.random([4, 3])
print(rand_arr)
random_arr_sequence = [rand_arr] * 5

In [None]:
stacked = np.stack(random_arr_sequence)  # This function stacks all arrays
# in the sequence and creates new dimension
print(stacked.shape)
stacked = np.stack(random_arr_sequence, axis=1)  # You can also specify dimension,
# along which they will be stacked
print(stacked.shape)
concatenated = np.concatenate(
    random_arr_sequence
)  # This function joins matrices along first dimension
print(concatenated.shape)
concatenated = np.concatenate(
    random_arr_sequence, axis=1
)  # But of course you can select it by hand
print(concatenated.shape)

### In the end let's now see how NumPy can describe our data:

In [None]:
random_ints = np.random.randint(3, 80, 10)
print(random_ints)
max_v = random_ints.max()
print("Maximum value in the array: {}".format(max_v))
argmax = np.argmax(random_ints)
print("Index of maximum value in the array: {}".format(argmax))
min_v = random_ints.min()
print("Minimum value in the array: {}".format(min_v))
argmin = np.argmin(random_ints)
print("Index of maximum value in the array: {}".format(argmin))

In [None]:
mean = random_ints.mean()
print("Mean of the array: {}".format(mean))
std = random_ints.std()
print("Standard deviation of the array: {}".format(std))