# Chapter 2: Python arrays, tables, vectors, matrices

## Introduction

In data analysis, AI and numerical computation, it is common to gather numerical information into vectors and matrices.

Vectors and matrices are actually mathematical terms related to linear algebra.

$$ 
\vec{v} = 
    \begin{pmatrix} 
        x_1\\ x_2\\ \vdots \\ x_n
    \end{pmatrix}
     = (x_1, x_2, \ldots, x_n)^T 
$$


$$ 
A = 
    \begin{pmatrix}
        a_{11} & a_{12} & \cdots & a_{1j} & \cdots & a_{1n} \\
        a_{21} & a_{22} & \cdots & a_{2j} & \cdots & a_{2n} \\
        \vdots & \vdots & \ddots & \vdots & \ddots & \vdots \\
        a_{i1} & a_{i2} & \cdots & a_{ij} & \cdots & a_{in} \\
        \vdots & \vdots & \ddots & \vdots & \ddots & \vdots \\
        a_{m1} & a_{m2} & \cdots & a_{mj} & \cdots & a_{mn}
    \end{pmatrix}
$$ 

with $ a_{ij} $ the $i^{th}$ row and $j^{th}$ column of the matrix $A$.

In this course, we consider a vector as a one-dimensional collection of numbers like a list. Arrays and matrices are usually two-dimensional containing information in rows and columns. Tables can also be multi-dimensional, making their visual presentation more difficult.

## Numpy and Arrays

Numpy is a python library that provides a lot of functionalities for numerical computing. It handles large and multi-dimensional arrays and matrices and provides functions to operate on them.

Here are the most common functions:

We first import the numpy library (assuming it is installed)

In [46]:
import numpy as np

An array is created with the array() command taking a list or a list of list as argument.

In [47]:
x = np.array([1, 3, 4, 5])
A = np.array([[1,3],[4,5]])
print(x)
print(A)

[1 3 4 5]
[[1 3]
 [4 5]]


It has a shape

In [48]:
print(np.shape(x))
print(np.shape(A))

(4,)
(2, 2)


It is more convenient to create arrays from zeros(), ones() and full()

In [49]:
Z = np.zeros(5)
print(Z)
print(np.shape(Z))

Z2 = np.zeros( (4,5) )
print(Z2)
print(Z2.shape)

Y = np.ones( (2,3) )
print(Y)

F = np.full( (7,8), 11)
print(F)

[0. 0. 0. 0. 0.]
(5,)
[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
(4, 5)
[[1. 1. 1.]
 [1. 1. 1.]]
[[11 11 11 11 11 11 11 11]
 [11 11 11 11 11 11 11 11]
 [11 11 11 11 11 11 11 11]
 [11 11 11 11 11 11 11 11]
 [11 11 11 11 11 11 11 11]
 [11 11 11 11 11 11 11 11]
 [11 11 11 11 11 11 11 11]]


We use linspace() to create an evenly spaced array of numbers with an initial, an end value and number of elements.

In [50]:
x = np.linspace(0, 5, 10)
print(x)

[0.         0.55555556 1.11111111 1.66666667 2.22222222 2.77777778
 3.33333333 3.88888889 4.44444444 5.        ]


Whereas arange() is used to create a evenly spaced arrays with a defined step length as third parameter.

In [51]:
x2 = np.arange(0, 5, 0.2)
print(x2)

[0.  0.2 0.4 0.6 0.8 1.  1.2 1.4 1.6 1.8 2.  2.2 2.4 2.6 2.8 3.  3.2 3.4
 3.6 3.8 4.  4.2 4.4 4.6 4.8]


Random numbers generation between [a, b]. (Endpoint is not included)

In [52]:
a = 1
b = 6
amount = 50
nopat = np.random.randint(a, b+1, amount)
print(nopat)

[1 5 6 4 6 5 6 3 3 4 3 3 4 1 6 5 5 3 4 3 1 5 2 4 6 1 6 1 1 6 3 6 4 2 6 2 6
 2 3 3 4 6 4 5 3 2 1 6 6 2]


Normally distributed random numbers randn()  $ X \sim \mathcal{N}(0,1)$

In [53]:
x = np.random.randn(100)
print(x)

[ 1.55460126 -0.44810503 -1.87483573 -0.29858964 -0.31578272 -0.23224718
  0.0803487  -0.94073411 -0.07445872  1.13269397  0.04217753 -1.76134905
 -1.0237138  -0.5900761  -1.46483612  2.43241303  0.34382958  1.03623629
  0.56713424 -0.17795639 -0.17312412 -0.58381671  1.33945177  0.05230945
  0.55730428  0.31210471 -1.44297934  1.28859228 -0.2582429   1.5884062
 -0.67862802  1.05336658 -0.47150596 -0.51769381 -0.28712761 -1.44546481
 -1.68144502  1.5445426   0.23649187 -2.15028148 -0.34119315 -0.06399409
  0.41277574  0.67443338 -0.66373906  1.28221051  0.94642449 -1.33270105
 -0.4254615   0.28996499 -0.65881881 -0.86189962  0.04615967  1.52883651
 -0.71420495  1.44407387 -0.13908751 -0.62496732 -1.63491791  1.3912586
 -0.54603254 -0.03339613 -0.46193712  0.3949391  -0.05417613 -1.12503812
 -0.48062637  0.96616339  0.64095803 -0.31840161  0.12433156 -0.34819517
  0.0242189  -1.8194765   0.69040826  2.38953078  0.48790882  0.65572075
  0.51078523 -0.04045112 -0.65038536 -1.0731178   0.9

random() to produce random  numbers evenly distributed between [0.0, 1.0)

In [54]:
x = np.random.random(10)
print(x)

[0.35328317 0.43928741 0.55809933 0.87582722 0.62479202 0.37108846
 0.56875806 0.07835187 0.0539792  0.9130812 ]


To determine the size and dimension of a matrix: size and ndim

In [55]:
print(A)
print(A.ndim)
print(A.size)

[[1 3]
 [4 5]]
2
4


genfromtxt() command is used to read data from files

In [56]:
# data = np.genfromtext("data.csv",delimeter=",",skip_header=1)

change the format of the array with reshape(n,m)

In [57]:
A = np.arange(12)
print(np.shape(A))
print(A.reshape(3,4))
print(A.reshape(2,6))
print(A.reshape(2,3,2))

(12,)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]]
[[[ 0  1]
  [ 2  3]
  [ 4  5]]

 [[ 6  7]
  [ 8  9]
  [10 11]]]


row and column can be repeated with repeat()

In [58]:
A = np.repeat([[1,2,3]],4,axis=0)
B = np.repeat([[1],[2],[3]],3,axis=1)
print(A)
print(B)


[[1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]]
[[1 1 1]
 [2 2 2]
 [3 3 3]]


Be careful when copying arrays

In [59]:
A = np.array([1, 2])
B = A
B[0] = 99
print(A)

[99  2]


Array A has also changed! use copy() to prevent this

In [60]:
B = A.copy()

## Cutting matrices (indexing)
The first index is the row, the second the column.

In [61]:
A = np.array([[1,2,3],[4,5,6]])
print(A[0,0]) 
print(A[0,1])

1
2


In [62]:
print(A[:,0]) # fisrt column, ":"reads all rows

[1 4]


In [63]:
print(A[0,:]) # first row, ":" reads all columns

[1 2 3]


Spacing can also be used. [start: end: step]

In [64]:
A = np.array([1,2,3,4,5,6,7,8,9])
print(A[0:6:2])

[1 3 5]


We can also assign a number or an array to a cut

In [65]:
A = np.array([[1,2,3],[4,5,6]])
A[0,0] = 17
A[1,:] = [11,12,13]

We can stack arrays with vstack()

In [66]:
new = np.vstack( (A,A) )
print(new)

[[17  2  3]
 [11 12 13]
 [17  2  3]
 [11 12 13]]


Or horizontal stack

In [67]:
new2 = np.hstack( (A,A) )
print(new2)

[[17  2  3 17  2  3]
 [11 12 13 11 12 13]]


We can also delete a row or a column with delete(). (0 for rows, 1 for column)

In [68]:
B = np.delete(A,[0],0)
print(B)
C = np.delete(A,[1],1)
print(C)

[[11 12 13]]
[[17  3]
 [11 13]]


There are different ways to traverse a table:

In [69]:
A = np.array([1,2,3,4,5,6])
A = A.reshape(2,3)
n,m = np.shape(A)
for i in range(n):
    print("Row",i,"is",A[i,:])

Row 0 is [1 2 3]
Row 1 is [4 5 6]


In [70]:
for j in range(m): 
    print("Column",j,"is",A[:,j])

Column 0 is [1 4]
Column 1 is [2 5]
Column 2 is [3 6]


In [71]:
for i in range(n):
    for j in range(m):
        print("Element ({}, {}) is {}".format(i,j,A[i,j]))

Element (0, 0) is 1
Element (0, 1) is 2
Element (0, 2) is 3
Element (1, 0) is 4
Element (1, 1) is 5
Element (1, 2) is 6


In [72]:
for a in np.nditer(A):
    print(a)

1
2
3
4
5
6


## Calculations with matrices

Element-wise operations

In [73]:
A = np.array([[1,2],[3,4]])
B = np.array([[3,9],[4,-1]])

print(A+B)                      # Addition
print(A-B)                      # Subtraction
print(A*B)                      # Multiplication
print(A/B)                      # Division

[[ 4 11]
 [ 7  3]]
[[-2 -7]
 [-1  5]]
[[ 3 18]
 [12 -4]]
[[ 0.33333333  0.22222222]
 [ 0.75       -4.        ]]


Operation with a scalar

In [74]:
T = 5*np.ones((10,10))
print(T)

print(A-1)
print(B+2)

[[5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]]
[[0 1]
 [2 3]]
[[ 5 11]
 [ 6  1]]


Element-wise increase in power

In [75]:
print(A**2)

[[ 1  4]
 [ 9 16]]


[Matrix multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication) with matmul() or @

In [76]:
print( np.matmul(A,B) )
print( A @ B)

[[11  7]
 [25 23]]
[[11  7]
 [25 23]]


[Dot product](https://en.wikipedia.org/wiki/Dot_product) (Scalar product)

In [77]:
x.dot(x)

3.090658108629503

Matrix multiplication between a vector and a matrix

In [78]:
b = np.array([[5],[7]])
y = np.matmul(A,b)
print(y)

[[19]
 [43]]


Use the numpy library and not the math library for calculations such as

In [79]:
np.sqrt(A) # the square root of each item
np.sin(A)
np.cos(A)
np.tan(A)
np.log(A)
np.exp(A)
np.log10(A)
np.log2(A)

array([[0.       , 1.       ],
       [1.5849625, 2.       ]])

Comparison operators also target matrices element-wise

In [80]:
print(A > 1)

[[False  True]
 [ True  True]]


We can also check that all() or any() elements of the matrix is true or false

In [81]:
np.all(A>1)
np.any(A>1)

True

And extract these elements

In [82]:
A = np.array([1,2,3,4,5,6,7])
B = A[A>3]
print(B)

[4 5 6 7]


#### Example using matrix multiplication to solve a linear equation

Consider a pair of equations: 
$$ 
    \left\{
        \begin{aligned}
            2x + y &= 11 \\
            -4x + 3y &= 3
        \end{aligned}
    \right.
$$

This can be represented by a vector and a matrix in the following form:
$$
    \begin{pmatrix}
        2x + y \\
        -4x + 3y
    \end{pmatrix}
    =
    \begin{pmatrix}
        11 \\
        3
    \end{pmatrix}

$$

It can be rewritten as:
$$
    \begin{pmatrix}
        2 & 1\\
        -4 & 3
    \end{pmatrix}
        \begin{pmatrix}
        x \\
        y
    \end{pmatrix} = 
        \begin{pmatrix}
        11 \\
        3
    \end{pmatrix} = AX = b
$$

And we solve for $ X = A^{-1}b $ 

In [83]:
A = np.array([[2,1],[-4,3]])
b = np.array([11,3])
X = np.linalg.solve(A,b)
print(X)

[3. 5.]


In [84]:
Ainv = np.linalg.inv(A)
print(np.matmul(Ainv,b))

[3. 5.]


#### Statistical indicators for matrices

Elementary statistical indicator can be computed to summarize elements of matrices. The common one are sum(), prod(), min(), max(), mean(), median(), std(), var(). An example for sum() is provided below:

In [85]:
A = np.array([[1,2,3],[4,5,6]])
print(np.sum(A))
print(np.sum(A,0)) # column sums, summed along rows
print(np.sum(A,1)) # row sums, summed along the columns

21
[5 7 9]
[ 6 15]
