# Practical Machine Learning                                                                             
# Lab 1

## Introduction to Numpy

### Numpy
    - is the core library for scientific computing in Python
    - provides a high-performance multidimensional array object, and tools for working with these arrays


In [1]:
# how to use/import:
import numpy as np

### Arrays

In [2]:
# Initialize an array using Python lists:
a = np.array([1, 2, 3])   # Create a rank 1 array
print(type(a))            # Prints "<class 'numpy.ndarray'>"
print(a.shape)            # Prints "(3,)"
print(a[0], a[1], a[2])   # Prints "1 2 3"
a[0] = 5                  # Change an element of the array
print(a)                  # Prints "[5, 2, 3]"

b = np.array([[1,2,3],[4,5,6]])    # Create a rank 2 array
print(b.shape)                     # Prints "(2, 3)"
print(b[0, 0], b[0, 1], b[1, 0])   # Prints "1 2 4"

<class 'numpy.ndarray'>
(3,)
1 2 3
[5 2 3]
(2, 3)
1 2 4


In [19]:
# Initialize an array using Numpy functions:
a = np.zeros((2,2))   # Create an array of all zeros
print(a)              # Prints "[[ 0.  0.]
                      #          [ 0.  0.]]"

b = np.ones((1,2))    # Create an array of all ones
print(b)              # Prints "[[ 1.  1.]]"

c = np.full((2,2), 7)  # Create a constant array
print(c)               # Prints "[[ 7.  7.]
                       #          [ 7.  7.]]"

d = np.eye(2)         # Create a 2x2 identity matrix
print(d)              # Prints "[[ 1.  0.]
                      #          [ 0.  1.]]"

random_array = np.random.random((2,2))  # Create an array filled with random values drawn from a uniform distribution
print(random_array)     
mu = 0; sigma = 0.2
random_array_gauss = np.random.normal(mu, sigma, (3, 2)) # Create an array filled with values drawn from a gaussian distribution,
                                                         # where mu is the mean and sigma the standard deviation
print(random_array_gauss)   


[[ 0.  0.]
 [ 0.  0.]]
[[ 1.  1.]]
[[7 7]
 [7 7]]
[[ 1.  0.]
 [ 0.  1.]]
[[ 0.1049197   0.71708055]
 [ 0.92331477  0.43210395]]
[[ 0.17382326  0.21775636]
 [-0.190186    0.02337912]
 [-0.01128    -0.31673062]]


#### Array indexing

In [20]:
# Slicing:
# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 1:3]

# A slice of an array is a view into the same data, so modifying it
# will modify the original array.
print(a[0, 1])   # Prints "2"
b[0, 0] = 77     # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])   # Prints "77"

# If you don't want to modify the original array, you can copy the subarray.
slice_copy = np.copy(a[:, 0:3])
slice_copy[0][0] = 100
print(slice_copy[0][0]) # => 100
print(a[0][0]) # => 1

2
77
100
1


In [21]:
# You can also mix integer indexing with slice indexing.
# However, doing so will yield an array of lower rank than the original array.


# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Two ways of accessing the data in the middle row of the array.
# Mixing integer indexing with slices yields an array of lower rank,
# while using only slices yields an array of the same rank as the
# original array:
row_r1 = a[1, :]    # Rank 1 view of the second row of a
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
print(row_r1, row_r1.shape)  # Prints "[5 6 7 8] (4,)"
print(row_r2, row_r2.shape)  # Prints "[[5 6 7 8]] (1, 4)"

# We can make the same distinction when accessing columns of an array:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print(col_r1, col_r1.shape)  # Prints "[ 2  6 10] (3,)"
print(col_r2, col_r2.shape)  # Prints "[[ 2]
                             #          [ 6]
                             #          [10]] (3, 1)"


[5 6 7 8] (4,)
[[5 6 7 8]] (1, 4)
[ 2  6 10] (3,)
[[ 2]
 [ 6]
 [10]] (3, 1)


In [22]:
# Integer array indexing: When you index into numpy arrays using slicing, the resulting array view will always
# be a subarray of the original array.

a = np.array([[1,2], [3, 4], [5, 6]])

# An example of integer array indexing.
# The returned array will have shape (3,) and
print(a[[0, 1, 2], [0, 1, 0]])  # Prints "[1 4 5]"

# The above example of integer array indexing is equivalent to this:
print(np.array([a[0, 0], a[1, 1], a[2, 0]]))  # Prints "[1 4 5]"

# When using integer array indexing, you can reuse the same
# element from the source array:
print(a[[0, 0], [1, 1]])  # Prints "[2 2]"

# Equivalent to the previous integer array indexing example
print(np.array([a[0, 1], a[0, 1]]))  # Prints "[2 2]"

[1 4 5]
[1 4 5]
[2 2]
[2 2]


In [23]:
# Boolean array indexing: Boolean array indexing lets you pick out arbitrary elements of an array. 
# Frequently this type of indexing is used to select the elements of an array that satisfy some condition

a = np.array([[1,2], [3, 4], [5, 6]])

bool_idx = (a > 2)   # Find the elements of a that are bigger than 2;
                     # this returns a numpy array of Booleans of the same
                     # shape as a, where each slot of bool_idx tells
                     # whether that element of a is > 2.

print(bool_idx)      # Prints "[[False False]
                     #          [ True  True]
                     #          [ True  True]]"

# We use boolean array indexing to construct a rank 1 array
# consisting of the elements of a corresponding to the True values
# of bool_idx
print(a[bool_idx])  # Prints "[3 4 5 6]"

# We can do all of the above in a single concise statement:
print(a[a > 2])     # Prints "[3 4 5 6]"

[[False False]
 [ True  True]
 [ True  True]]
[3 4 5 6]
[3 4 5 6]


#### Data types

In [24]:
x = np.array([1, 2])   # Let numpy choose the datatype
print(x.dtype)         # Prints "int64"

x = np.array([1.0, 2.0])   # Let numpy choose the datatype
print(x.dtype)             # Prints "float64"

x = np.array([1, 2], dtype=np.int64)   # Force a particular datatype
print(x.dtype)                         # Prints "int64"

int32
float64
int64


#### Array math

In [25]:
# Basic mathematical functions operate elementwise on arrays, and are available both as operator overloads 
# and as functions in the numpy module:

x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Element-wise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y)
print(np.add(x, y))

# Element-wise difference; both produce the array
# [[-4.0 -4.0]
#  [-4.0 -4.0]]
print(x - y)
print(np.subtract(x, y))

# Element-wise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(x * y)
print(np.multiply(x, y))

# Element-wise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)
print(np.divide(x, y))

# Element-wise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(np.sqrt(x))

[[  6.   8.]
 [ 10.  12.]]
[[  6.   8.]
 [ 10.  12.]]
[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]
[[  5.  12.]
 [ 21.  32.]]
[[  5.  12.]
 [ 21.  32.]]
[[ 0.2         0.33333333]
 [ 0.42857143  0.5       ]]
[[ 0.2         0.33333333]
 [ 0.42857143  0.5       ]]
[[ 1.          1.41421356]
 [ 1.73205081  2.        ]]


In [26]:
# Inner product. * is element-wise multiplication, not matrix multiplication. We instead use the dot function 
# to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices.
# dot is available both as a function in the numpy module and as an instance method of array objects:

x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

v = np.array([9,10])
w = np.array([11, 12])

# Inner product of vectors; both produce 219
print(v.dot(w))
print(np.dot(v, w))

# Matrix / vector product; both produce the rank 1 array [29 67]
print(x.dot(v))
print(np.dot(x, v))

# Matrix / matrix product; both produce the rank 2 array
# [[19 22]
#  [43 50]]
print(x.dot(y))
print(np.dot(x, y))

219
219
[29 67]
[29 67]
[[19 22]
 [43 50]]
[[19 22]
 [43 50]]


#### Matrix operation

In [27]:
# the transpose of a matrix 
my_array = np.array([[1, 2, 3], [4, 5, 6]]) # [[1, 2, 3],
 # [4, 5, 6]]
print(my_array.T)  # => [[1, 4],
                   # [2, 5],
                   # [3, 6]]
                   # the inverse of a matrix
my_array = np.array([[1., 2.], [3., 4.]])
print(np.linalg.inv(my_array)) # => [[-2. , 1. ],
                               #  [ 1.5, -0.5]]

[[1 4]
 [2 5]
 [3 6]]
[[-2.   1. ]
 [ 1.5 -0.5]]


#### Operations along a specific dimension (axis)

In [28]:
x = np.array([[1, 2],[3, 4]])
# sum along axis
print(np.sum(x)) # sum all numbers => 10
print(np.sum(x, axis=0)) # sum along columns => [4 6]
print(np.sum(x, axis=1)) # sum along rows => [3 7]

# If axis is a tuple of ints, a sum is performed on all of the axes specified in the tuple instead of a single axis or all the axes as before.
print(np.sum(x, axis=(0, 1))) # sum all numbers => 10

# mean along axis
y = np.array([[[1, 2, 3, 4], [5, 6, 7, 8]], [[1, 2, 3, 4], [5, 6, 7, 8]], [[1, 2, 3,
4], [5, 6, 7, 8]]])
print(y.shape) # => (3, 2, 4)
print(y) # => [[[1 2 3 4]
 # [5 6 7 8]]
 # [[1 2 3 4]
 # [5 6 7 8]]
 # [[1 2 3 4]
 # [5 6 7 8]]]
print(np.mean(y, axis=0)) # => [[1. 2. 3. 4.]
 # [5. 6. 7. 8.]]
print(np.mean(y, axis=1)) # => [[3. 4. 5. 6.]
 # [3. 4. 5. 6.]
# [3. 4. 5. 6.]]

# argmax along rows
z = np.array([[10, 12, 5], [17, 11 ,19]])
print(np.argmax(z, axis=1)) # => [1 2]

10
[4 6]
[3 7]
10
(3, 2, 4)
[[[1 2 3 4]
  [5 6 7 8]]

 [[1 2 3 4]
  [5 6 7 8]]

 [[1 2 3 4]
  [5 6 7 8]]]
[[ 1.  2.  3.  4.]
 [ 5.  6.  7.  8.]]
[[ 3.  4.  5.  6.]
 [ 3.  4.  5.  6.]
 [ 3.  4.  5.  6.]]
[1 2]


#### Broadcasting
Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

In [29]:
# add a vector (v) to every row of a matrix (m)
m = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = m + v
print(y) # => [[ 2 2 4]
 # [ 5 5 7]
 # [ 8 8 10]
 # [11 11 13]]

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


Broadcasting two arrays together follows these rules:

If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.

The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.

The arrays can be broadcast together if they are compatible in all dimensions.

After broadcasting, each array behaves as if it had shape equal to the elemen-twise maximum of shapes of the two input arrays.

In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension
