---
<h1><center>Arrays (NumPy)</h1></center>

---

# Introduction

In [67]:
import numpy as np

We have import above a new module `numpy`. The `numpy` package (module) is used in almost all numerical computation using Python. It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices), performance is very good.

Below we have provided only small part of all features and available function set of NumPy. More information you can find at [official site](http://www.numpy.org/).

# Table of Contents
- [The NumPy array object](#The-NumPy-array-object) 
- [NumPy array-generating functions](#NumPy-array-generating-functions)
- [Manipulating arrays](#Manipulating-arrays)
- [Operations on NumPy arrays](#Operations-on-NumPy-arrays)
- [Data processing](#Data-processing)
- [*Exercise 1.1*](#Exercise-1.1)
- [*Exercise 1.2*](#Exercise-1.2)
- [*Exercise 1.3*](#Exercise-1.3)
- [*Exercise 1.4*](#Exercise-1.4)

### The NumPy array object

[[back to top]](#Table-of-Contents)

To create new vector and matrix arrays from Python lists or tuples we can use the `numpy.array` function.

In [69]:
v = np.array([1,2,3,4])
print ("NumPy one dimensional array (vector):\nv =", v)

M = np.array([[1, 2], [3, 4], [5, 6]])
print ("\nNumPy two dimensional array (matrix) M:\n", M)

# The `v` and `M` objects are both of the type `ndarray` that the `numpy` module provides.
print ("\nTypes of v and M:", type(v), type(M))

NumPy one dimensional array (vector):
v = [1 2 3 4]

NumPy two dimensional array (matrix) M:
 [[1 2]
 [3 4]
 [5 6]]

Types of v and M: <class 'numpy.ndarray'> <class 'numpy.ndarray'>


The difference between the `v` and `M` arrays is only their shapes. We can get information about the shape of an array by using the `ndarray.shape` property. The number of elements in the array is available through the ndarray.size property. Equivalently, we could use the function `numpy.shape` and `numpy.size`.

In [72]:
print ("v.shape:", v.shape)
print ("M.shape:", M.shape)

print ("\nv contains {} elements".format(v.size))
print ("M contains {} elements".format(M.size))

v.shape: (4,)
M.shape: (3, 2)
3

v contains 4 elements
M contains 6 elements


More properties of the `numpy` arrays:

In [53]:
# `dtype` (data type) property shows the type of the data of an array:
print ("M data type:", M.dtype)

# itemsize returns the bytes per element
print ("M.itemsize:", M.itemsize) 

# `nbytes` returns number of bytes
print ("M.nbytes:", M.nbytes)

# `ndim` shows number of dimensions
print ("M.ndim:", M.ndim)

M data type: int32
M.itemsize: 4
M.nbytes: 24
M.ndim: 2


If we want, we can explicitly define the type of the array data when we create it, using the `dtype` keyword argument. Common data types that can be used with dtype are: `int, float, complex, bool, object`, etc. We can also explicitly define the bit size of the data types, for example: `int32, int16, float64, complex128`.

In [57]:
a = np.array([[1, 2], [3, 4]], dtype="float64")
print ("a:\n", a)
print ("\na.dtype:", a.dtype)

a:
 [[1. 2.]
 [3. 4.]]

a.dtype: float64


### NumPy array-generating functions

[[back to top]](#Table-of-Contents)

For larger arrays it is inpractical to initialize the data manually, using explicit python lists. Instead we can use one of the many functions in NumPy that generate arrays of different forms. Some of the more common are:
* `arange`
* `linspace`
* `random.rand` and `random.randn`
* `diag`
* `ones`
* `zeros`
* `identity`

In [74]:
print ("numpy.arange:")
x_1 = np.arange(0, 10, 1)      # arguments: start, stop, step
print ("np.arange(0, 10, 1):\n", x_1)
print ("np.arange(-3, 3, 0.5):\n", np.arange(-3, 3, 0.5)) 

numpy.arange:
np.arange(0, 10, 1):
 [0 1 2 3 4 5 6 7 8 9]
np.arange(-3, 3, 0.5):
 [-3.  -2.5 -2.  -1.5 -1.  -0.5  0.   0.5  1.   1.5  2.   2.5]


In [125]:
print ("numpy.linspace")
# using linspace and logspace, both end points ARE included
x_2 = np.linspace(0, 5, 9)   # arguments: start, stop, points amount between start and end
print ("np.linspace(0, 5, 9):\n", x_2)

numpy.linspace
np.linspace(0, 5, 9):
 [0.    0.625 1.25  1.875 2.5   3.125 3.75  4.375 5.   ]


In [75]:
from numpy import random

print ("numpy.random.rand and numpy.random.randn:") 
# uniform random numbers in [0,1]
print ("random.rand(5,5):\n", random.rand(5,5) * 10)
# standard normal distributed random numbers
print ("random.randn(5,5):\n", random.randn(5,5))

numpy.random.rand and numpy.random.randn:
random.rand(5,5):
 [[9.66832442 0.4582758  0.64252103 3.94388931 3.46967367]
 [4.55751344 7.61840477 9.41196798 7.40155428 5.37179319]
 [5.11408012 9.79312763 5.14929972 5.80663039 1.4184351 ]
 [9.03391026 5.6798925  9.21637857 3.92886926 8.97018879]
 [5.3860677  3.21808388 8.65347748 7.61761933 0.1361533 ]]
random.randn(5,5):
 [[-0.0310009  -0.17673476  0.35656924 -1.5125911  -0.69324119]
 [ 0.86696009  1.49958089  1.95978315 -0.63974811 -0.99750565]
 [ 0.67088856 -0.67077254  2.24506476 -0.36130824  0.57282764]
 [-0.08129497 -0.75063689  0.3045234   0.71318213 -0.35455603]
 [-0.7486405  -1.16085551 -0.50259422 -0.57986373  0.32109917]]


In [127]:
print ("numpy.diag:")
# a diagonal matrix
print ("np.diag([1,2,3]):\n", np.diag([1,2,3]))
# diagonal with offset from the main diagonal
print ("np.diag([1,2,3], k=1):\n", np.diag([1,2,3], k=1)) 

numpy.diag:
np.diag([1,2,3]):
 [[1 0 0]
 [0 2 0]
 [0 0 3]]
np.diag([1,2,3], k=1):
 [[0 1 0 0]
 [0 0 2 0]
 [0 0 0 3]
 [0 0 0 0]]


In [128]:
print ("numpy.zeros, numpy.ones and numpy.identity:") 
# array with zeros
print ("np.zeros((3,3)):\n", np.zeros((3,3)))
# array with unities
print ("np.ones((3,3)):\n", np.ones((3,3)))
# identity matrix
print ("np.identity(4):\n", np.identity(4))  # argument is the dimension of the squared matrix

numpy.zeros, numpy.ones and numpy.identity:
np.zeros((3,3)):
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
np.ones((3,3)):
 [[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
np.identity(4):
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


### Manipulating arrays

[[back to top]](#Table-of-Contents)

**Indexing**

We can index elements in an array using square brackets and indices.

In [78]:
# v is a vector, and has only one dimension, taking one index
print ("v[0] =", v[0])

# M is a matrix, or a 2 dimensional array, taking two indices 
print ("\nM[1,1] =", M[1,1])

# If we omit an index of a multidimensional array it returns the whole row (or, in general, a N-1 dimensional array)
print ("\nThe second row of M: M[1] =", M[1])
# The same thing can be achieved with using : instead of an index
print ("\nM[1,:] returns also the second row of M:", M[1,:]) # row 1

print ("\nThe second column of M: M[:,1] =", M[:,1]) # column 1

v[0] = 1

M[1,1] = 4

The second row of M: M[1] = [3 4]

M[1,:] returns also the second row of M: [3 4]

The second column of M: M[:,1] = [2 4 6]


In [130]:
print ("Initial matrix M:")
print (M)

#We can assign new values to elements in an array using indexing:
M[0,0] = -1
print ("\nM[0,0] elements was reassigned to 1:")
print (M)

# also works for rows and columns
M[2,:] = 0
M[:,1] = -1
print ("\nThe second row was replaced to zeros and then the second column eleemts were replaced to -1:")
print (M)

Initial matrix M:
[[1 2]
 [3 4]
 [5 6]]

M[0,0] elements was reassigned to 1:
[[-1  2]
 [ 3  4]
 [ 5  6]]

The second row was replaced to zeros and then the second column eleemts were replaced to -1:
[[-1 -1]
 [ 3 -1]
 [ 0 -1]]


We can also use index masks: If the index mask is an Numpy array of data type bool, then an element is selected (`True`) or not (`False`) depending on the value of the index mask at the position of each element:

In [131]:
x = np.arange(0, 10, 0.5)
print ("Initial array x:\n", x)

# Set the mask for filtering
mask = (5 < x) * (x < 7.5)
print ("\nMask:\n", mask)

print ("\nElements of x satisfying mask conditions:\n", x[mask])

Initial array x:
 [0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5 5.  5.5 6.  6.5 7.  7.5 8.  8.5
 9.  9.5]

Mask:
 [False False False False False False False False False False False  True
  True  True  True False False False False False]

Elements of x satisfying mask conditions:
 [5.5 6.  6.5 7. ]


The index mask can be converted to position index using the `numpy.where` function.

In [132]:
# Indexes of elements which will remain
indices = np.where(x % 2 == 0)
print ("Indices of `True` elements:\n", indices)

new_x = x[indices] # this indexing is equivalent to the fancy indexing x[mask]
print ("\nFiltered values:\n", new_x)

# Return elements either from 1 or 0 instead of real values
ind = np.where(x % 2 == 0, 1, 0)
print ("\nx array, where positions satisfying the condition is signed as 1 and others are equal to 0:\n", ind)

# It nicely works when you need count how many elements satisfy the condition
print ("\nHow many elements satisfy the condition:", ind.sum())

Indices of `True` elements:
 (array([ 0,  4,  8, 12, 16], dtype=int64),)

Filtered values:
 [0. 2. 4. 6. 8.]

x array, where positions satisfying the condition is signed as 1 and others are equal to 0:
 [1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0]

How many elements satisfy the condition: 5


### Operations on NumPy arrays

[[back to top]](#Table-of-Contents)

We can use the usual arithmetic operators to multiply, add, subtract, and divide arrays with scalar numbers.

In [133]:
v1 = np.arange(0, 5)
print ("Operations on the array v1:", v1)

print ("\nv1 * 2:", v1 * 2)
print ("\nv1 + 2:", v1 + 2)

M1 = np.random.randint(10, size=(5, 5))
print ("\n\nOperations on the array M1:\n", M1)

print ("\nM1 * 2:\n", M1 * 2)
print ("\nM1 + 2:\n", M1 + 2)

Operations on the array v1: [0 1 2 3 4]

v1 * 2: [0 2 4 6 8]

v1 + 2: [2 3 4 5 6]


Operations on the array M1:
 [[4 9 6 3 4]
 [9 0 7 3 6]
 [9 1 7 1 7]
 [1 8 7 4 9]
 [7 3 9 1 8]]

M1 * 2:
 [[ 8 18 12  6  8]
 [18  0 14  6 12]
 [18  2 14  2 14]
 [ 2 16 14  8 18]
 [14  6 18  2 16]]

M1 + 2:
 [[ 6 11  8  5  6]
 [11  2  9  5  8]
 [11  3  9  3  9]
 [ 3 10  9  6 11]
 [ 9  5 11  3 10]]


When we add, subtract, multiply and divide arrays with each other, the default behaviour is element-wise operations:

In [18]:
# Element-wise multiplication
print ("v1 * v1:", v1 * v1)
print ("\nM1 * M1:\n", M1 * M1)

# If we multiply arrays with compatible shapes, we get an element-wise multiplication of each row
print ("\nM1.shape, v1.shape:", M1.shape, v1.shape)
print ("M1 * v1:\n", M1 * v1)

v1 * v1: [ 0  1  4  9 16]

M1 * M1:
 [[81 16  9 25  9]
 [25  9 49 81  1]
 [81 16 36  9 16]
 [64  9 81 64 16]
 [64 16 25 16  9]]

M1.shape, v1.shape: (5, 5) (5,)
M1 * v1:
 [[ 0  4  6 15 12]
 [ 0  3 14 27  4]
 [ 0  4 12  9 16]
 [ 0  3 18 24 16]
 [ 0  4 10 12 12]]


### Data processing

[[back to top]](#Table-of-Contents)

Often it is useful to store datasets in NumPy arrays. NumPy provides a number of functions to calculate statistics of datasets in arrays.

|Method|Description|
|-----|-----|
|`sum`|Sum of all the elements in the array or along an axis. Zero-length arrays have sum 0.
|`prod`| Product of all elements
|`mean`|Arithmetic mean. Zero-length arrays have NaN mean.
|`std, var`|Standard deviation and variance, respectively, with optional degrees of freedom adjustment (default denominator n).
|`min, max`|Minimum and maximum.
|`argmin, argmax`|Indices of minimum and maximum elements, respectively.
|`cumsum`|Cumulative sum of elements starting from 0.
|`cumprod`|Cumulative product of elements starting from 1.

In [134]:
print ("M1.sum():", M1.sum())
print ("M1[:, 2].sum():", M1[:, 2].sum())
print ("(M1 + 1).prod():", (M1 + 1).prod())
print ("M1.mean():", M1.mean())

M1.sum(): 133
M1[:, 2].sum(): 36
(M1 + 1).prod(): 1073741824
M1.mean(): 5.32


> ### Exercise

> Create the following `numpy` array:
> $$A = \begin{bmatrix}
 1      &  2      & \cdots &  10     \\
 11     &  12     & \cdots &  20     \\
 \vdots &  \vdots & \ddots &  \vdots \\
 91     &  92    & \cdots  &  100    \\
\end{bmatrix}
$$
>
> Call it also `A`. 

> * Use the array object to get the number of elements, rows and columns. 

> * Lookup the help on the numpy.mean function and read over it (link below). Calculate the average value of all rows, all columns and whole matrix `A`. Write results to the variables `rows_mean`, `columns_mean` and `whole_mean`, respectively.

> * https://numpy.org/doc/stable/reference/generated/numpy.mean.html?highlight=numpy%20mean#numpy.mean


In [138]:
# type your code here

A = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
              [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
               [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
               [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
               [41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
               [51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
               [61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
               [71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
               [81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
               [91, 92, 93, 94, 95, 96, 97, 98, 99, 100]])
print (A)

#Continue Code

[[  1   2   3   4   5   6   7   8   9  10]
 [ 11  12  13  14  15  16  17  18  19  20]
 [ 21  22  23  24  25  26  27  28  29  30]
 [ 31  32  33  34  35  36  37  38  39  40]
 [ 41  42  43  44  45  46  47  48  49  50]
 [ 51  52  53  54  55  56  57  58  59  60]
 [ 61  62  63  64  65  66  67  68  69  70]
 [ 71  72  73  74  75  76  77  78  79  80]
 [ 81  82  83  84  85  86  87  88  89  90]
 [ 91  92  93  94  95  96  97  98  99 100]]


> ### Exercise

> How do you create a vector that has exactly 50 points and spans the range 11 to 23? Write result to a variabled named `range50` variable.

In [81]:
# type your code here

range50 = np.linspace(11,23,50)
range50

array([11.        , 11.24489796, 11.48979592, 11.73469388, 11.97959184,
       12.2244898 , 12.46938776, 12.71428571, 12.95918367, 13.20408163,
       13.44897959, 13.69387755, 13.93877551, 14.18367347, 14.42857143,
       14.67346939, 14.91836735, 15.16326531, 15.40816327, 15.65306122,
       15.89795918, 16.14285714, 16.3877551 , 16.63265306, 16.87755102,
       17.12244898, 17.36734694, 17.6122449 , 17.85714286, 18.10204082,
       18.34693878, 18.59183673, 18.83673469, 19.08163265, 19.32653061,
       19.57142857, 19.81632653, 20.06122449, 20.30612245, 20.55102041,
       20.79591837, 21.04081633, 21.28571429, 21.53061224, 21.7755102 ,
       22.02040816, 22.26530612, 22.51020408, 22.75510204, 23.        ])

> ### Exercise

> Using `numpy.where` function to find all numbers less then the mean of the A array.

In [83]:
# type your code here

A = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
              [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
               [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
               [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
               [41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
               [51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
               [61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
               [71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
               [81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
               [91, 92, 93, 94, 95, 96, 97, 98, 99, 100]])


In [89]:
amean = A.mean()
amean

indx = np.where(A < amean)
A[indx]

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])

In [92]:
A[np.where(A < A.mean())]

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])

In [94]:
np.median(A[np.where(A < A.mean())])

50.5

In [95]:
np.median([2,4,6,8])

5.0