***
**Introduction to Machine Learning** <br>
__[https://slds-lmu.github.io/i2ml/](https://slds-lmu.github.io/i2ml/)__
***

# 3. Packages

## 3.3 Numpy Package

Official Website: https://numpy.org/

### 3.3.1 Array Creation

### a) Convert a list to a NumPy array => 'Vector'

In [16]:
import numpy as np

my_list = [1,2,3]
np.array(my_list)

array([1, 2, 3])

### b) Convert a list of lists to a NumPy array => 'Matrix'

In [17]:
my_matrix = [[1,2,3],[4,5,6],[7,8,9]]
np.array(my_matrix)


array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

### c) Arange: integers within an interval

In [18]:
# (the last element is excluded):
np.arange(0,10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [19]:
# The distance between the integers is adjustable:
np.arange(0,11,2)

array([ 0,  2,  4,  6,  8, 10])

### d) Array with zeros

In [20]:
# 1D
np.zeros(3)

array([0., 0., 0.])

In [21]:
# 2D
np.zeros((5,5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

### e) Array with ones

In [23]:
# 1D
np.ones(3)

array([1., 1., 1.])

In [24]:
# 2D
np.ones((3,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

### f) linspace(): equidistant numbers within an interval

In [25]:
# 3 numbers between 0 and 10 (10 is included)
np.linspace(0, 10, 3)

array([ 0.,  5., 10.])

In [26]:
# To compare with arange():
# 3 is the distance, not 3 numbers
# Integers instead of floats
np.arange(0, 10, 3)

array([0, 3, 6, 9])

### g) eye(): identity matrix

In [27]:
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

### 3.3.2 Random numbers

### a) rand(): an array with a specified size and values from a uniform distribution over [0,1)

In [28]:
# 1D
np.random.rand(2)

array([0.95552993, 0.40964415])

In [29]:
# 2D
np.random.rand(3,3)

array([[0.31884859, 0.38797628, 0.5285368 ],
       [0.89377178, 0.77992624, 0.09595719],
       [0.61693628, 0.51100312, 0.10244393]])

In [30]:
# 3D
np.random.rand(2,2,2)

array([[[0.50100396, 0.1595289 ],
        [0.14919154, 0.66935693]],

       [[0.79224251, 0.36059737],
        [0.96975931, 0.09454554]]])

### b) randn(): similar to rand() but the values from N(0,1)

In [32]:
np.random.randn(2)

array([0.70539226, 1.12425172])

### c) randint(): integers from range [a, b)

In [33]:
# 10 numbers between 1 and 100 (drawn with replacement)
np.random.randint(1,100,10)

array([18, 91, 20, 21, 20, 48, 45, 21, 85, 95])

### d) normal(): random numbers from N(mu, sigmaˆ2)-distrubtion

In [34]:
np.random.normal(2,2,5)

array([-1.01045973,  3.33899999,  1.75641276,  3.42974568,  1.66340832])

### e) poisson(): random numbers from a Poisson-distrubtion

In [35]:
np.random.poisson(5,10)

array([8, 6, 2, 4, 4, 7, 6, 6, 5, 5])

### f) shuffle(): random rearrangement of a sequence.

In [36]:
# It works 'in-place', i.e., the result need not be saved again
arr = np.arange(10)
print(arr)

[0 1 2 3 4 5 6 7 8 9]


In [37]:
np.random.shuffle(arr)
print(arr)

[4 2 0 8 6 7 1 9 3 5]


In [38]:
# Attention: the sorting is NOT done 'in-place'.
np.sort(arr)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [39]:
print(arr)

[4 2 0 8 6 7 1 9 3 5]


### g) seed (): set a seed for generating random numbers

In [41]:
print(np.random.randn(1))

[-1.64461804]


In [42]:
print(np.random.randn(1))

[-0.30679084]


In [43]:
np.random.seed(1123)
print(np.random.randn(1))

[0.505581]


In [44]:
np.random.seed(1123)
print(np.random.randn(1))

[0.505581]


⇒ Documentation on numpy.random:
https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.random.html

### 3.3.3 Attributes & Methods

### a) reshape(): create an array with similar values,

In [45]:
# but in a new form
arr = np.arange(9)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [46]:
arr.reshape(3,3)

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [47]:
# The new dimension must match the original number of values
# (unlike R)
arr.reshape(3,4)

ValueError: cannot reshape array of size 9 into shape (3,4)

### b) max(), min(), argmax(), argmin()


In [48]:
# 10 integers between 0 and 50
ranarr = np.random.randint(0,51,10)
ranarr

array([28,  0, 26, 24, 14, 19, 37, 15, 21, 34])

In [49]:
ranarr.max()

37

In [50]:
ranarr.argmax()

6

In [51]:
ranarr.min()

0

In [52]:
ranarr.argmin()

1

### c) shape(): returns the array dimension. An attribute, not a method.### c) 

In [53]:
# => therefore without brackets
arr.shape

(9,)

In [54]:
# Alternatively, the built-in function len() can be used
len(arr)

9

The comma between 9 and ) seems unusual at first. Reason:
https://stackoverflow.com/questions/46134891/why-an-extra-comma-in-the-shape-of-a-single-index-numpy-array

### d) The shape after reshape() :

In [55]:
# it is now a matrix with one row
# (recognizable by the double square brackets)
arr.reshape(1,9)

array([[0, 1, 2, 3, 4, 5, 6, 7, 8]])

In [56]:
arr.reshape(1,9).shape

(1, 9)

In [57]:
# A matrix with one column:
arr.reshape(9,1).shape

(9, 1)

In [58]:
arr.reshape(9,1)

array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8]])

### e) dtype: returns the data type of an array's values (attribute)


In [59]:
arr.dtype

dtype('int32')

### f) type(): returns the type of a given input

In [60]:
# (not specific to NumPy)
type(arr)

numpy.ndarray

### 3.3.4 Indexing and Slicing

In [61]:
# Numbers between 0 and 10
arr = np.arange(0,11)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

### a) []-notation: very similar to Python lists

In [62]:
# selecting a single value
arr[8]

8

In [63]:
# selecting multiple values
arr[1:5]

array([1, 2, 3, 4])

### b) Broadcasting

In NumPy arrays, multiple elements can be assigned the same new value; this is not possible for Python lists:


In [64]:
# Set the first five values to 100...
arr[0:5] = 100
arr

array([100, 100, 100, 100, 100,   5,   6,   7,   8,   9,  10])

In [65]:
# ... does not work with lists
plist[0:5] = 100

NameError: name 'plist' is not defined

### c) Slicing

Python behaves a bit strange here:

In [71]:
# Reset the array to its original state:
arr = np.arange(0,11)

# Select the first 6 values:
slice_of_arr = arr[0:6]

# Change something in the slice, e.g., set everything to 99:
slice_of_arr[:] = 99

# Now something unexpected happens:
arr

array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])

### d) copy(): a real copy operation ("Deep-Copy")


In [72]:
arr_copy = arr.copy()
arr[:] = -55
arr_copy, arr

(array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10]),
 array([-55, -55, -55, -55, -55, -55, -55, -55, -55, -55, -55]))

![Screenshot_copy.png](attachment:Screenshot_copy.png)

### e) Indexing in 2D arrays, i.e. matrices

In [73]:
# Syntax: arr_ds[row_indices, col_indices]
arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))
arr_2d

array([[ 5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])

In [74]:
# Second row
arr_2d[1,:] # Or, arr_2d[1]

array([20, 25, 30])

In [75]:
# Element 2-1
arr_2d[1,0]

20

In [76]:
# 2x2-slice from the top right
arr_2d[:2,1:]

array([[10, 15],
       [25, 30]])

### f) 'Fancy Indexing': rows and columns can be selected in arbitrary order (similar to R).

In [81]:
# Create a new matrix and fill it with numbers
arr2d = np.zeros((5,5))
arr_length = arr2d.shape[1] # number of columns
# Fill the array
for i in range(arr_length):
    arr2d[i] = i
arr2d

array([[0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4.]])

In [82]:
# You can now index arbitrarily:
arr2d[[3,4,2,1],:]

array([[3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4.],
       [2., 2., 2., 2., 2.],
       [1., 1., 1., 1., 1.]])

### g) Select elements by their properties rather than their indices

In [83]:
# Numbers between 1 and 7
arr = np.arange(1,8)
# Which elements are larger than 4?
arr > 4

array([False, False, False, False,  True,  True,  True])

In [85]:
# For comparison: does not work with Python lists!
plist = list(range(1,11))
plist > 4

TypeError: '>' not supported between instances of 'list' and 'int'

In [84]:
# Create a boolean vector
bool_arr = arr>4
# Select the elements that are greater than 4
arr[bool_arr]

array([5, 6, 7])

In [86]:
# Shorter version:
arr[arr>4]

array([5, 6, 7])

### 3.3.4 Arithmetic Operations

### a) Basic Arithmetic

In [91]:
arr = np.arange(0, 10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [90]:
arr + arr, arr * arr, arr - arr, arr + 3, arr**2

(array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18]),
 array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81]),
 array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
 array([ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12]),
 array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81], dtype=int32))

In [92]:
# Division 0/0 does not result in an error but a warning.
# It has been replaced by nan; more precisely by np.nan
div = arr/arr
div

  div = arr/arr


array([nan,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

In [94]:
# Division 1/0 does not result in an error but a warning.
# It has been replaced by infinity; more precisely by np.inf
infi = 1/arr
infi

  infi = 1/arr


array([       inf, 1.        , 0.5       , 0.33333333, 0.25      ,
       0.2       , 0.16666667, 0.14285714, 0.125     , 0.11111111])

### b) Functions for arrays

Standard functions can be applied directly to all (or some) elements of an array

In [95]:
# Square root
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [97]:
# Exponential function
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

In [99]:
# Maximum
np.max(arr) # or arr.max()

9

In [100]:
# Sine
np.sin(arr)

array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])

In [101]:
# Logarithm
np.log(arr)

  np.log(arr)


array([      -inf, 0.        , 0.69314718, 1.09861229, 1.38629436,
       1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458])

In [102]:
# For comparison: the math package methods don't work on Python lists
import math
math.sqrt(plist)

TypeError: must be real number, not list

⇒ see also https://docs.scipy.org/doc/numpy/reference/ufuncs.html