# Chapter 2: Python arrays, tables, vectors, matrices

## Introduction

In data analysis, AI and numerical computation, it is common to gather numerical information into vectors and matrices.

Vectors and matrices are actually mathematical terms related to linear algebra.

$$ \vec{v} = \begin{pmatrix} x_1\\ x_2\\ \vdots \\ x_n \end{pmatrix} = (x_1, x_2, \ldots, x_n)^T $$


$$ A = \begin{pmatrix}
a_{11} & a_{12} & \cdots & a_{1j} & \cdots & a_{1n} \\
a_{21} & a_{22} & \cdots & a_{2j} & \cdots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots & \ddots & \vdots \\
a_{i1} & a_{i2} & \cdots & a_{ij} & \cdots & a_{in} \\
\vdots & \vdots & \ddots & \vdots & \ddots & \vdots \\
a_{m1} & a_{m2} & \cdots & a_{mj} & \cdots & a_{mn}
\end{pmatrix}
$$ 

with $ a_{ij} $ the $i^{th}$ row and $j^{th}$ column of the matrix $A$.

In this course, we consider a vector as a one-dimensional collection of numbers like a list. Arrays and matrices are usually two-dimensional containing information in rows and columns. Tables can also be multi-dimensional, making their visual presentation more difficult.

## Numpy and Arrays

Numpy is a python library that provides a lot of functionalities for numerical computing. It handles large and multi-dimensional arrays and matrices and provides functions to operate on them.

Here are the most common functions:

We first import the numpy library (assuming it is installed)

In [1]:
import numpy as np

An array is created with the array() command taking a list or a list of list as argument.

In [2]:
x = np.array([1, 3, 4, 5])
A = np.array([[1,3],[4,5]])
print(x)
print(A)

[1 3 4 5]
[[1 3]
 [4 5]]


It has a shape

In [3]:
print(np.shape(x))
print(np.shape(A))

(4,)
(2, 2)


It is more convenient to create arrays from zeros(), ones() and full()

In [4]:
Z = np.zeros(5)
print(Z)
print(np.shape(Z))

Z2 = np.zeros( (4,5) )
print(Z2)
print(Z2.shape)

Y = np.ones( (2,3) )
print(Y)

F = np.full( (7,8), 11)
print(F)

[0. 0. 0. 0. 0.]
(5,)
[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
(4, 5)
[[1. 1. 1.]
 [1. 1. 1.]]
[[11 11 11 11 11 11 11 11]
 [11 11 11 11 11 11 11 11]
 [11 11 11 11 11 11 11 11]
 [11 11 11 11 11 11 11 11]
 [11 11 11 11 11 11 11 11]
 [11 11 11 11 11 11 11 11]
 [11 11 11 11 11 11 11 11]]


We use linspace() to create an evenly spaced array of numbers with an initial, an end value and number of elements.

In [5]:
x = np.linspace(0, 5, 10)
print(x)

[0.         0.55555556 1.11111111 1.66666667 2.22222222 2.77777778
 3.33333333 3.88888889 4.44444444 5.        ]


Whereas arange() is used to create a evenly spaced arrays with a defined step length as third parameter.

In [6]:
x2 = np.arange(0, 5, 0.2)
print(x2)

[0.  0.2 0.4 0.6 0.8 1.  1.2 1.4 1.6 1.8 2.  2.2 2.4 2.6 2.8 3.  3.2 3.4
 3.6 3.8 4.  4.2 4.4 4.6 4.8]


Random numbers generation between [a, b]. (Endpoint is not included)

In [7]:
a = 1
b = 6
amount = 50
nopat = np.random.randint(a, b+1, amount)
print(nopat)

[6 2 3 6 4 1 2 1 3 1 1 5 2 1 3 4 4 6 5 2 4 4 4 4 4 4 4 1 6 3 4 1 6 5 5 3 6
 4 1 3 1 4 1 1 2 4 5 1 2 1]


Normally distributed random numbers randn()  $ X \sim \mathcal{N}(0,1)$

In [8]:
x = np.random.randn(100)
print(x)

[ 1.89203241e-05 -5.88640091e-01  1.43720596e+00 -4.88303153e-01
 -6.17491712e-01 -5.82361121e-01  2.28874726e+00  5.18434445e-01
  7.17206200e-01  1.64126360e+00  1.75208968e+00  1.14616926e+00
  1.80105865e-01 -5.67474559e-01  2.14541185e-01  5.02311210e-01
 -2.31797873e-01  1.41356728e+00 -2.32483903e-02  1.49524035e+00
  4.85650042e-01 -1.78396413e-01  4.71016018e-01 -2.31896207e+00
 -1.86362226e+00  1.44041284e+00 -1.43327336e+00 -1.24501248e-01
 -1.11933669e+00 -1.79901484e+00  7.38060687e-01 -7.66147382e-02
 -4.23839630e-01  1.51404933e-01 -5.81939010e-01  2.86823859e+00
 -8.39989673e-01 -5.91252609e-01 -1.44304145e+00 -1.90958908e-03
 -9.55204271e-01 -1.86995939e+00  3.83355337e-01  3.45175472e-01
 -1.40525464e+00  6.91806952e-01 -5.31808115e-02 -1.10618465e+00
 -7.32966033e-01 -1.39558164e-04 -1.76294282e+00 -4.04396675e-01
  5.78262198e-01 -1.10392613e+00  4.30827241e-02  9.68374207e-01
  2.42217308e-01  3.24629992e-01 -3.45966557e-01  1.49351302e+00
  1.31132320e+00  1.63709

random() to produce random  numbers evenly distributed between [0.0, 1.0)

In [9]:
x = np.random.random(10)
print(x)

[0.63143348 0.48428329 0.98526785 0.44005311 0.66395748 0.33740174
 0.75886083 0.54957216 0.98238617 0.48517088]


To determine the size and dimension of a matrix: size and ndim

In [10]:
print(A)
print(A.ndim)
print(A.size)

[[1 3]
 [4 5]]
2
4


genfromtxt() command is used to read data from files

In [11]:
data = np.genfromtext("data.csv",delimeter=",",skip_header=1)

AttributeError: module 'numpy' has no attribute 'genfromtext'

change the format of the array with reshape(n,m)

In [None]:
A = np.arange(12)
print(np.shape(A))
print(A.reshape(3,4))
print(A.reshape(2,6))
print(A.reshape(2,3,2))

row and column can be repeated with repeat()

In [None]:
A = np.repeat([[1,2,3]],4,axis=0)
B = np.repeat([[1],[2],[3]],3,axis=1)
print(A)
print(B)


Be careful when copying arrays

In [None]:
A = np.array([1, 2])
B = A
B[0] = 99
print(A)

Array A has also changed! use copy() to prevent this

In [46]:
B = A.copy()

## Cutting matrices (indexing)
The first index is the row, the second the column.

In [None]:
A = np.array([[1,2,3],[4,5,6]])
print(A[0,0]) 
print(A[0,1])

In [None]:
print(A[:,0]) # fisrt column, ":"reads all rows

In [None]:
print(A[0,:]) # first row, ":" reads all columns

Spacing can also be used. [start: end: step]

In [None]:
A = np.array([1,2,3,4,5,6,7,8,9])
print(A[0:6:2])