# Lab 0: Introduction to Numpy
Data Mining 2018/2019 <br>
Gosia Migut

**WHAT** This nonmandatory lab consists of several programming and insight exercises/questions. 

**WHY** The exercises are ment to prepare you for using Python and Numpy in this course. 

**HOW** Follow the exercises in this notebook either on your own or with a friend. Use [Mattermost][1] to disscus questions with your peers. For additional questions and feedback please consult the TA's during the lab session. 
    * Step 1: Intro to Numpy
    * Step 2: Practice with Numpy


[1]: https://mattermost.ewi.tudelft.nl/signup_user_complete/?id=ccffzw3cdjrkxkksq79qbxww7a

## Step 1: Array programming with NumPy

In this step we will show you the basics of array manipulation in [NumPy]. 

In general advise is that you should really read the manuals and tutorials on the web to make sure you learn how to use modern array manipulation languages (libraries) to unleash their power.

There are many python/numpy tutorials available like [this one]. 

[this one]: http://cs231n.github.io/python-numpy-tutorial/.

[NumPy]: http://www.numpy.org/

In data mining we are dealing with massive amounts of data. Data most often organised in tables. When all data elements in a table are of the same datatype (like an integer or a floating point number) the table can be represented with a homogeneous array.

Languages that are optimally suited for programming with data are therefore equipped with array data types that are integral part of the language. Although arrays look a lot like Python lists they are not as shown in the following code.

In [None]:
import numpy as np

### Declaring a normal python list.

In [None]:
list1 = [1, 2, 3, 4]
type(list1)

### Making a numpy array using python lists.

In [None]:
array1 = np.array(list1)
array1

In [None]:
type(array1)

In [None]:
print(array1)

### We can also go multi dimensional (in this case 2D)

In [None]:
# Declare an extra list.
list2 = [11, 22, 33, 44]
# Combine the lists into a 2D list.
lists = [list1, list2]
lists

In [None]:
array2 = np.array(lists)
array2

### just to make sure we can also print the shapes.
We would obviously expect a (4,1) and a (4,2)... or don't we?

In [None]:
print("Arr1: ", array1.shape)
print("Arr2: ", array2.shape)

The best way to think about NumPy arrays is that they consist of two parts, a data buffer which is just a block of raw elements, and a view which describes how to interpret the data buffer.

Here the shape (4,) means the array is indexed by a single index which runs from 0 to 4. 
In most situations the lack of second dimension is not a problem. If it does turn into a problem (e.g. when you are trying to take a transpose of this vector) you can just call the below function to generate a new view:

In [None]:
# Do note the double brackets, as the size is added as a tuple: (rows, columns)
array1 = array1.reshape((4,1))
array1.shape

Now we happen to know what we stored in our array, but sometimes you do not know such. To ask the data type of your array, you can call:


In [None]:
array2.dtype

### Making specific arrays
There are also other ways to make specific arrays, such as:

In [None]:
# The empty array
print("Ex1: ", np.empty(5))

# Array of 5 floating point zeros
print("Ex2: ", np.zeros(5))

# Array of 5 floating point ones
print("Ex3: ", np.ones(5))

# Array of 5 integer incrementing numbers
print("Ex4: ", np.arange(5))

# Start at 5, stop at 20, do it in steps of 2
print("Ex5: ", np.arange(5, 20, 2))

# Making the identity matrix (ones on the diagonal)
print("Ex6: ")
print(np.eye(5))

### Mathematical operations


You are also able to apply basic mathematical operations to arrays

In [None]:
array3 = np.array([[1, 2, 3, 4], [8, 9, 10, 11]])
array3

In [None]:
# Value-wise multiplication
array3 * array3

In [None]:
# Subtraction
array3 - 5

In [None]:
1 / array3

In [None]:
array3 ** 3

#### You can also apply functions to all elements in an array at once

The nice thing about Numpy arrays is that it allows you to manipulate the data in arrays without writing explicit loops. For instance look at the addition of all elements in an array:

In [None]:
a = np.random.rand(65536)

In [None]:
#calculate the sum of the elements in array
def loopsum(a):
    sum = 0
    for i in range(len(a)):
        sum += a[i]
    return sum

In [None]:
%timeit loopsum(a)
%timeit np.sum(a)

So the explicit loop sum function in python takes 10 ms versus 30 µs for the numpy version. That is about 350 times slower for the explicit loop version.
So be aware in this course to use build-in Numpy tools to manipulate and calculate with arrays.
Some built-in functions of numpy can be found [here].



[here]: https://docs.scipy.org/doc/numpy/reference/ufuncs.html#math-operations



### Indexing Arrays

In [None]:
array4 = np.arange(0, 10)
array4

There is a minor difference with normal python lists, when it comes to indexing, namely that the array allows two different ways to call the value at a position.

In [None]:
list3 = [[1, 2, 3], [4, 5, 6]]
array5 = np.array(list3)

# Watch the brackets closely.
print("List: ", list3[1][2])
# Array can use two different approaches
print("Array ", array5[1, 2])
print("Array ", array5[1][2])

### Slicing arrays
Sometimes you do not want the full array, but just parts of it, we can use array slicing for this

In [None]:
# Show original array
print(array4)
# We want to 2nd to 5th element:
print(array5[0:1,0:1])
# We can also use it to set the value of multiple entries:
array4[2:5] = 13
print(array4)

One important thing to not is that a slice, is just another view of the same data. If you change something in the slice, it also changes in the original. This is nice for you memory efficiency of your program, but sometimes it can hurt you when overlooked.

In [None]:
array4 = np.arange(0, 10)
# Take a slice, consiting of the 2nd to 6th element.
slice_array4 = array4[2:6]
# We iterate over all values, setting them to 22
slice_array4[:] = 22
print(slice_array4)
print(array4)

To prevent such, we can also make a new copy, to not just generate a view, but to actually reserve new memory for the object we are making

In [None]:
array4 = np.arange(0, 10)
array5 = array4.copy()
print(array4)
print(array5)
array5[:] = 22
print("So did we make a copy?")
print(array4)
print(array5)
print("Seems we did.")

#### 2D array slicing

In [None]:
array6 = np.array([[2, 4, 6], [8, 10, 12], [14, 16, 18]])
print(array6)
# let's say you only want just the upper right square of 2x2 of the above matrix
array6[:2, 1:]

#### Fancy Indexing
Sometimes you don't want to have every row, but perhaps skip a few entries. This is easily possible in python. Let us assume we only want the 2nd, 3rd, 5th, and 7th row in the following example.

In [None]:
# Below we use a list comprehension (which you should have seen in Introduction to Programming as well)
# To generate an array with 10 rows, and each column goes from 0 to 10.
array7 = np.array([[j for i in range(10)] for j in range(10)])
print(array7)
# As we start at index 0, we actually want the following rows [1, 2, 4, 6].
# Also note the double brackets below.
print(array7[[1, 2],:5])
print(array7[:,[2,4]])

You can do the above in any order you wish.

In [None]:
array7[[7, 3, 5, 2]]

### Array Transposition

In [None]:
array8 = np.arange(40).reshape((8, 5))
array8

In [None]:
# If you want to transpose a matrix you can go two ways:
print(np.transpose(array8))
# And
print(array8.T)

### Array Processing

In [None]:
# Range from -5 to 5
points = np.arange(-5, 5, 0.01)
print(points)

In [None]:
import matplotlib.pyplot as plt

# Return a meshgrid based on two vectors.

# it generates a len(x) times len(y) matrix.
# This allows you to store the values needed to compare every value of x with every value of y.
# If you input the same vector twice, dy is the transpose of dx
dx, dy = np.meshgrid(points, points)

z = (np.sin(dx) + np.sin(dy))
plt.imshow(z)
plt.colorbar()
plt.title('plot for sin(x) + sin(y)')

#### Numpy Where

In [None]:
# Numpy Where

A = np.array([1, 2, 3, 4])
B = np.array([100, 200, 300, 400])

condition = np.array([True, True, False, False])

print(A[condition])
print(B[condition])

In [None]:
# Where my condition is met, choose A, else choose B
answer = [(A_val if cond else B_val) for A_val, B_val, cond in zip(A, B, condition)]
print(answer)

In [None]:
# Where my condition is met, choose A, else choose B
answer2 = np.where(condition, A, B)
print(answer2)

#### Numpy Any & All

In [None]:
bool_arr = np.array([True, False, True, True])
# There exist quite useful functions for a concept we call Masking.

In [None]:
# If any value is true, return true (else false)
bool_arr.any()

In [None]:
# If all values are true, return true (else false)
bool_arr.all()

#### Numpy Unique and `in` checking

In [None]:
# Sometimes you just want to know all the unique values in a numpy array, luckily that function was already implemented for you
letters = ['A', 'B', 'C', 'D', 'D', 'A', 'E', 'F', 'G', 'H', 'Z']
np.unique(letters)

In [None]:
# We can also easily check for a big array, if it exists within a 1D vector.
np.in1d(['X', 'C', 'M', 'Z'], letters)

## Step 2: Practice Numpy


These are optional exercises which will get you familiar with Numpy. 

### Array Calculations and Array Indexing
In all exercises below you are not allowed to use a loop in python.

In [None]:
#Given two arrays A and B each of the same size calculate their sum (elementwise) and their product (elementwise)


In [None]:
#Calculate the mean of all elements in an array A without using the np.mean or np.average functions.


In [None]:
#Calculate the standard deviation of all elements in an array A without using np.var or np.std.

In [None]:
#Given an array A with shape (128,) calculate the sum of the first, third, fifth, etc elements (A[0]+A[2]+...).


In [None]:
#Given an array A with shape (1024,) make an array B containing only the first 512 elements of A. I.e. B-shape should be (512,).


In [None]:
#Given an array A with shape (1024,) make an array B containing only the elements A[22],A[23],...,A[42].


In [None]:
#Given an array A with shape (1024,) and an integer array I make an array B whose elements are A[I[0]], A[I[1]], ..., A[I[-1]]


In [None]:
#Given an array A with shape (N,) make an array with all elements of A in reverse order.


In [None]:
#Given an array A = np.random.rand(128) make an array B containing all elements in A that are less then or equal to 0.5

#### Two dimensional data arrays
In this course you will be working a lot with matrices and vectors. The following exercises will let you practice with those.
Given is data matrix X with shape (m,n)

In [None]:
#Select the i-th two from the matrix X. 

In [None]:
#Select the j-th column from the matrix X.

In [None]:
#Given a data matrix X with shape (m,n) calculate the vector M of shape (n,) where M[i] is the mean of the i-th column of X.


In [None]:
#Now subtract the mean vector you just calculated from all the rows in your matrix leading to the 
#data matrix X_0. Yes this can be done without a loop! Hint: look at array broadcasting.

In [None]:
#For column F select a row which has the largest value of all the elements in this column.
#Hint: look at the function np.argmax for this.

In [None]:
#In a lot of algorithms we start with a data matrix X and then we would like to make a matrix X′ that is matrix X
#but with a column prepended containing only the values 1. You can do that in a one-liner!

#### Tricks with Arrays

In [None]:
#Given an array (vector) A of shape (N,) make it into an array B of shape (N,1).

In [None]:
#Given an array (vector) A of shape (N,) make it into an array B of shape (1,N).

In [None]:
#Given an array A of shape (M,N) what is A[i]? What are the valid values for i?

In [None]:
#Let A35 be an array of shape (3,5) and let v5 be an array of shape (5,). Subtract v5 from each row in A35.

In [None]:
#Let A35 be an array of shape (3,5) and let v3 be an array of shape (3,). Subtract v3 from each column in A35.

#### Linear Algebra

In Python 3 the @ operator is introduced for matrix multiplication. Let A be an array of shape (m,n) and 
let B be an array of shape (m,k) then we can write A @ B for the matrix multiplication of A and B. 
Note that there is conceptual difference between a 1 dimensional array V of size (N,) 
and a vector V as we know it from linear algebra. In linear algebra a vector with N elements has dimensions N×1. 
A ‘vector’ V as a numpy array has shape (N,).




In [None]:
#Calculate the inner product of two vector v and w both of shape (N,).

In [None]:
#Calculate the product of a matrix A of shape (M,N) with a vector v of shape (N,).

In [None]:
#Let v be an array of shape (N,). What is the shape of v.T (or np.transpose(v))

In [None]:
#Let A be an array of shape (3,5) and let v3 be an array of shape (3,). 
#We define v31 = v3.reshape(3,1). What is v3 @ A, v31 @ A and v31.T @ A?