# ECE-5424 / CS-5824 Advanced Machine Learning
# Assignment 0

The goal of this assignment is to get you familiar with **Python 3**, **Numpy** and Jupyter Notebook. We will use them throughout the semester. If you don't have any experience in Python programming, we strongly recommend you to check out the comprehensive Python/Numpy tutorial from [Stanford CS231n](http://cs231n.github.io/python-numpy-tutorial/). If you are not faimiliar with Jupyter Notebook, please check out the tutorial [here](https://www.youtube.com/watch?v=HW29067qVWk) by Corey Schafer.

In this assignment, **you need to complete the following three sectoins**:
1. Numpy tutorial: read through this section, and make sure you understand the basics
2. Numpy exercises
3. One calculus and programming exercise.



## Submission guideline

1. Click the Save button at the top of the Jupyter Notebook.
2. Please make sure to have entered your Virginia Tech PID below.
3. Select Cell -> All Output -> Clear. This will clear all the outputs from all cells (but will keep the content of ll cells).
4. Select Cell -> Run All. This will run all the cells in order.
5. Once you've rerun everything, select File -> Download as -> PDF via LaTeX
6. Look at the PDF file and make sure all your solutions are there, displayed correctly. 
7. Zip BOTH the PDF file and this notebook. Rem
8. Submit your zipped file .

### Please Write Your VT PID Here: rifatsm

# Section 1. NumPy Tutorial

The following NumPy tutorial is borrowed from [Stanford CS231n](http://cs231n.stanford.edu/). Please run through each cell, and make sure you understand the materials here.

"NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more"  
-https://docs.scipy.org/doc/numpy-1.15.0/user/whatisnumpy.html.

In [76]:
import numpy as np

Let's run through an example showing how powerful NumPy is. Suppose we have two lists a and b, consisting of the first 100,000 non-negative numbers, and we want to create a new list c whose *i*th element is a[i] + 2 * b[i].  

Without NumPy:

In [77]:
%%time
a = [i for i in range(100000)]
b = [i for i in range(100000)]

CPU times: user 17.6 ms, sys: 6.58 ms, total: 24.2 ms
Wall time: 23.3 ms


In [78]:
%%time
c = []
for i in range(len(a)):
    c.append(a[i] + 2 * b[i])

CPU times: user 77.7 ms, sys: 7.6 ms, total: 85.3 ms
Wall time: 84 ms


With NumPy:

In [79]:
%%time
a = np.arange(100000)
b = np.arange(100000)

CPU times: user 4.5 ms, sys: 4.46 ms, total: 8.96 ms
Wall time: 7.85 ms


In [80]:
%%time
c = a + 2 * b

CPU times: user 3.53 ms, sys: 3.36 ms, total: 6.89 ms
Wall time: 5.59 ms


The result is 10 to 15 times faster, and we could do it in fewer lines of code (and the code itself is more intuitive)!

Regular Python is much slower due to type checking and other overhead of needing to interpret code and support Python's abstractions.

For example, if we are doing some addition in a loop, constantly type checking in a loop will lead to many more instructions than just performing a regular addition operation. NumPy, using optimized pre-compiled C code, is able to avoid a lot of the overhead introduced.

The process we used above is **vectorization**. Vectorization refers to applying operations to arrays instead of just individual elements (i.e. no loops).

Why vectorize?
1. Much faster
2. Easier to read and fewer lines of code
3. More closely assembles mathematical notation

Vectorization is one of the main reasons why NumPy is so powerful.

## ndarray

ndarrays, n-dimensional arrays of homogenous data type, are the fundamental datatype used in NumPy. As these arrays are of the same type and are fixed size at creation, they offer less flexibility than Python lists, but can be substantially more efficient runtime and memory-wise. (Python lists are arrays of pointers to objects, adding a layer of indirection.)

The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

In [81]:
# Can initialize ndarrays with Python lists, for example:
a = np.array([1, 2, 3])   # Create a rank 1 array
print(type(a))            # Prints "<class 'numpy.ndarray'>"
print(a.shape)            # Prints "(3,)"
print(a[0], a[1], a[2])   # Prints "1 2 3"
a[0] = 5                  # Change an element of the array
print(a)                  # Prints "[5, 2, 3]"

b = np.array([[1, 2, 3],
              [4, 5, 6]])    # Create a rank 2 array
print(b.shape)                     # Prints "(2, 3)"
print(b[0, 0], b[0, 1], b[1, 0])   # Prints "1 2 4"

c = np.array([             # Prints "(3, 2)"
                [1,2],
                [3,4],
                [5,6]
])
print(c.shape)

d = np.array([             # Prints "(4, 2)"
                [1,2],
                [3,4],
                [5,6],
                [7,8]
])
print(d.shape)

<class 'numpy.ndarray'>
(3,)
1 2 3
[5 2 3]
(2, 3)
1 2 4
(3, 2)
(4, 2)


There are many other initializations that NumPy provides:

In [82]:
a = np.zeros((2, 2))   # Create an array of all zeros
print(a)               # Prints "[[ 0.  0.]
                       #          [ 0.  0.]]"

b = np.full((2, 2), 7)  # Create a constant array
print(b)                # Prints "[[ 7.  7.]
                        #          [ 7.  7.]]"

c = np.eye(2)         # Create a 2 x 2 identity matrix
print(c)              # Prints "[[ 1.  0.]
                      #          [ 0.  1.]]"

d = np.random.random((2, 2))  # Create an array filled with random values
print(d)                      # Might print "[[ 0.91940167  0.08143941]
                              #               [ 0.68744134  0.87236687]]"

[[0. 0.]
 [0. 0.]]
[[7 7]
 [7 7]]
[[1. 0.]
 [0. 1.]]
[[0.13316945 0.52341258]
 [0.75040986 0.66901324]]


How do we create a 2 by 2 matrix of ones?

In [83]:
a = np.ones((2, 2))    # Create an array of all ones
print(a)               # Prints "[[ 1.  1.]
                       #          [ 1.  1.]]"

[[1. 1.]
 [1. 1.]]


Useful to keep track of shape; helpful for debugging and knowing dimensions will be very useful when computing gradients, among other reasons.

In [84]:
nums = np.arange(8)
print(nums)
print(nums.shape)

nums = nums.reshape((2, 4))
print('Reshaped:\n', nums)
print(nums.shape)

# The -1 in reshape corresponds to an unknown dimension that numpy will figure out,
# based on all other dimensions and the array size.
# Can only specify one unknown dimension.
# For example, sometimes we might have an unknown number of data points, and
# so we can use -1 instead without worrying about the true number.
nums = nums.reshape((4, -1))
print('Reshaped with -1:\n', nums)
print(nums.shape)

[0 1 2 3 4 5 6 7]
(8,)
Reshaped:
 [[0 1 2 3]
 [4 5 6 7]]
(2, 4)
Reshaped with -1:
 [[0 1]
 [2 3]
 [4 5]
 [6 7]]
(4, 2)


NumPy supports an object-oriented paradigm, such that ndarray has a number of methods and attributes, with functions similar to ones in the outermost NumPy namespace. For example, we can do both:

In [85]:
nums = np.arange(8)
print(nums.min())     # Prints 0
print(np.min(nums))   # Prints 0
print(nums.max())     # Prints 7
print(np.max(nums))   # Prints 7

0
0
7
7


## Array Operations/Math

NumPy supports many elementwise operations:

In [86]:
x = np.array([[1, 2],
              [3, 4]], dtype=np.float64)
y = np.array([[5, 6],
              [7, 8]], dtype=np.float64)

# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y)
print(np.add(x, y))

# Elementwise difference; both produce the array
# [[-4.0 -4.0]
#  [-4.0 -4.0]]
print(x - y)
print(np.subtract(x, y))

# Elementwise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(x * y)
print(np.multiply(x, y))

# Elementwise divide; both produce the array
# [[0.2        0.33333333]
# [0.42857143 0.5       ]]
print(x / y)
print(np.divide(x,y))

# Elementwise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(np.sqrt(x))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]
[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]
[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[1.         1.41421356]
 [1.73205081 2.        ]]


How do we elementwise divide between two arrays?

In [87]:
x = np.array([[1, 2], [3, 4]], dtype=np.float64)
y = np.array([[5, 6], [7, 8]], dtype=np.float64)

# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)
print(np.divide(x, y))

[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]


Note * is elementwise multiplication, not matrix multiplication. We instead use the dot function to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices. dot is available both as a function in the numpy module and as an instance method of array objects:



In [88]:
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])

v = np.array([9, 10])
w = np.array([11, 12])

# Inner product of vectors; both produce 219
print(v.dot(w))
print(np.dot(v, w))

# Matrix / vector product; both produce the rank 1 array [29 67]
print(x.dot(v))
print(np.dot(x, v))

# Matrix / matrix product; both produce the rank 2 array
# [[19 22]
#  [43 50]]
print(x.dot(y))
print(np.dot(x, y))

219
219
[29 67]
[29 67]
[[19 22]
 [43 50]]
[[19 22]
 [43 50]]


There are many useful functions built into NumPy, and often we're able to express them across specific axes of the ndarray:

In [89]:
x = np.array([[1, 2, 3], 
              [4, 5, 6]])

print(np.sum(x))          # Compute sum of all elements; prints "21"
print(np.sum(x, axis=0))  # Compute sum of each column; prints "[5 7 9]"
print(np.sum(x, axis=1))  # Compute sum of each row; prints "[6 15]"

print(np.max(x, axis=1))  # Compute max of each row; prints "[3 6]" 

21
[5 7 9]
[ 6 15]
[3 6]


How can we compute the index of the max value of each row? Useful, to say, find the class that corresponds to the maximum score for an input image.

In [90]:
x = np.array([[1, 2, 3], 
              [4, 5, 6]])

y = np.array([[1, 5, 3], 
              [4, 2, 6]])

print(np.argmax(x, axis=1)) # Compute index of max of each row; prints "[2 2]"
print(np.argmax(x, axis=0)) # Compute index of max of each column; prints "[1 1 1]"

print(np.argmax(y, axis=1)) # Compute index of max of each row; prints "[1 2]"
print(np.argmax(y, axis=0)) # Compute index of max of each column; prints "[1 0 1]"

[2 2]
[1 1 1]
[1 2]
[1 0 1]


Note the axis you apply the operation will have its dimension removed from the shape.
This is useful to keep in mind when you're trying to figure out what axis corresponds
to what.

For example:

In [91]:
x = np.array([[1, 2, 3], 
              [4, 5, 6]])

print(x.shape)               # Has shape (2, 3)
print(x.max(axis=0))         # Prints "[4 5 6]"
print((x.max(axis=0)).shape) # Taking the max over axis 0 has shape (3,)
                             # corresponding to the 3 columns.

# An array with rank 3
x = np.array([
              [
                [1, 2, 3], 
                [4, 5, 6]
              ],
              [
                [10, 23, 33], 
                [43, 52, 16]
              ]
             ])

print(x)
print(x.shape)               # Has shape (2, 2, 3)
print((x.max(axis=1)).shape) # Taking the max over axis 1 has shape (2, 3)

print((x.max(axis=(1, 2))))       # Can take max over multiple axes; prints [6 52]
print((x.max(axis=(1, 2))).shape) # Taking the max over axes 1, 2 has shape (2,)

print(x.max(axis=0))        # Prints [[10 23 33]
                            #         [43 52 16]]
print(x.max(axis=1))        # Prints [[ 4  5  6]
                            #         [43 52 33]]
print(x.max(axis=2))        # Prints [[ 3  6]
                            #         [33 52]]

(2, 3)
[4 5 6]
(3,)
[[[ 1  2  3]
  [ 4  5  6]]

 [[10 23 33]
  [43 52 16]]]
(2, 2, 3)
(2, 3)
[ 6 52]
(2,)
[[10 23 33]
 [43 52 16]]
[[ 4  5  6]
 [43 52 33]]
[[ 3  6]
 [33 52]]


## Indexing

NumPy also provides powerful indexing schemes.

In [92]:
# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8],
              [9, 10, 11, 12]])
print('Original:\n', a)

# Can select an element as you would in a 2 dimensional Python list
print('Element (0, 0) (a[0][0]):\n', a[0][0])   # Prints 1
# or as follows
print('Element (0, 0) (a[0, 0]) :\n', a[0, 0])  # Prints 1

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 1:3]
print('Sliced (a[:2, 1:3]):\n', b)

c = a[:1, 2:5]
print('Sliced (a[:1, 2:5]):\n', c)

# Steps are also supported in indexing. The following reverses the first row:
print('Reversing the first row (a[0, ::-1]) :\n', a[0, ::-1]) # Prints [4 3 2 1]

print('Reversing the first two rows (a[:2, ::-1]) :\n', a[:2, ::-1]) # Prints [[4 3 2 1]
                                                                     #        [8 7 6 5]]

Original:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
Element (0, 0) (a[0][0]):
 1
Element (0, 0) (a[0, 0]) :
 1
Sliced (a[:2, 1:3]):
 [[2 3]
 [6 7]]
Sliced (a[:1, 2:5]):
 [[3 4]]
Reversing the first row (a[0, ::-1]) :
 [4 3 2 1]
Reversing the first two rows (a[:2, ::-1]) :
 [[4 3 2 1]
 [8 7 6 5]]


Often, it's useful to select or modify one element from each row of a matrix. The following example employs **fancy indexing**, where we index into our array using an array of indices (say an array of integers or booleans):

In [93]:
# Create a new array from which we will select elements
a = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9],
              [10, 11, 12]])

print(a)  # prints "array([[ 1,  2,  3],
          #                [ 4,  5,  6],
          #                [ 7,  8,  9],
          #                [10, 11, 12]])"

# Create an array of indices
b = np.array([0, 2, 0, 1])

# Select one element from each row of a using the indices in b
print(a[np.arange(4), b])  # Prints "[ 1  6  7 11]"

# Mutate one element from each row of a using the indices in b
a[np.arange(4), b] += 10

print(a)  # prints "array([[11,  2,  3],
          #                [ 4,  5, 16],
          #                [17,  8,  9],
          #                [10, 21, 12]])


[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[ 1  6  7 11]
[[11  2  3]
 [ 4  5 16]
 [17  8  9]
 [10 21 12]]


We can also use boolean indexing/masks. Suppose we want to set all elements greater than MAX to MAX:

In [94]:
MAX = 5
nums = np.array([1, 4, 10, -1, 15, 0, 5])
print(nums > MAX)            # Prints [False, False, True, False, True, False, False]

nums[nums > MAX] = MAX
print(nums)                  # Prints [1, 4, 5, -1, 5, 0, 5]

[False False  True False  True False False]
[ 1  4  5 -1  5  0  5]


Finally, note that the indices in fancy indexing can appear in any order and even multiple times:

In [95]:
nums = np.array([1, 4, 10, -1, 15, 0, 5])
print(nums[[1, 2, 3, 1, 0]])  # Prints [4 10 -1 4 1]

[ 4 10 -1  4  1]


## Broadcasting

Many of the operations we've looked at above involved arrays of the same rank.  
However, many times we might have a smaller array and use that multiple times to update an array of a larger rank.  
For example, consider the below example of shifting the mean of each column from the elements of the corresponding column:

In [96]:
x = np.array([[1, 2, 3],
              [3, 5, 7]])
print(x.shape)  # Prints (2, 3)

col_means = x.mean(axis=0)
print(col_means)          # Prints [2. 3.5 5.]
print(col_means.shape)    # Prints (3,)
                          # Has a smaller rank than x!

mean_shifted = x - col_means
print('\n', mean_shifted)
print(mean_shifted.shape)  # Prints (2, 3)

(2, 3)
[2.  3.5 5. ]
(3,)

 [[-1.  -1.5 -2. ]
 [ 1.   1.5  2. ]]
(2, 3)


Or even just multiplying a matrix by 2:

In [97]:
x = np.array([[1, 2, 3],
              [3, 5, 7]])
print(x * 2) # Prints [[ 2  4  6]
             #         [ 6 10 14]]


[[ 2  4  6]
 [ 6 10 14]]


Broadcasting two arrays together follows these rules:

1. If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
2. The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
3. The arrays can be broadcast together if they are compatible in all dimensions.
4. After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
5. In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension.

For example, when subtracting the columns above, we had arrays of shape (2, 3) and (3,).

1. These arrays do not have same rank, so we prepend the shape of the lower rank one to make it (1, 3).
2. (2, 3) and (1, 3) are compatible (have the same size in the dimension, or if one of the arrays has size 1 in that dimension).
3. Can be broadcast together!
4. After broadcasting, each array behaves as if it had shape equal to (2, 3).
5. The smaller array will behave as if it were copied along dimension 0.

Let's try to subtract the mean of each row!

In [98]:
x = np.array([[1, 2, 3],
              [3, 5, 7]])

row_means = x.mean(axis=1)
print(row_means)  # Prints [2. 5.]

# mean_shifted = x - row_means

[2. 5.]


To figure out what's wrong, we print some shapes:

In [99]:
x = np.array([[1, 2, 3],
              [3, 5, 7]])
print(x.shape)  # Prints (2, 3)

row_means = x.mean(axis=1)
print(row_means)        # Prints [2. 5.]
print(row_means.shape)  # Prints (2,)

# Results in the following error: ValueError: operands could not be broadcast together with shapes (2,3) (2,) 
# mean_shifted = x - row_means

(2, 3)
[2. 5.]
(2,)


What happened?

Answer: If we following broadcasting rule 1, then we'd prepend a 1 to the smaller rank array ot get (1, 2). However, the last dimensions don't match now between (2, 3) and (1, 2), and so we can't broadcast.

Take 2, reshaping the row means to get the desired behavior:

In [100]:
x = np.array([[1, 2, 3],
              [3, 5, 7]])
print(x.shape)  # Prints (2, 3)

row_means = x.mean(axis=1).reshape((-1, 1))
print(row_means)        # Prints [[2.], [5.]]
print(row_means.shape)  # Prints (2, 1)

mean_shifted = x - row_means
print(mean_shifted)
print(mean_shifted.shape)  # Prints (2, 3)

(2, 3)
[[2.]
 [5.]]
(2, 1)
[[-1.  0.  1.]
 [-2.  0.  2.]]
(2, 3)


More broadcasting examples!

In [101]:
# Compute outer product of vectors
v = np.array([1, 2, 3])  # v has shape (3,)
w = np.array([4, 5])    # w has shape (2,)
# To compute an outer product, we first reshape v to be a column
# vector of shape (3, 1); we can then broadcast it against w to yield
# an output of shape (3, 2), which is the outer product of v and w:
# [[ 4  5]
#  [ 8 10]
#  [12 15]]
#print(np.reshape(v, (3, 1)))
print(np.reshape(v, (3, 1)) * w)

# Add a vector to each row of a matrix
x = np.array([[1, 2, 3], [4, 5, 6]])
# x has shape (2, 3) and v has shape (3,) so they broadcast to (2, 3),
# giving the following matrix:
# [[2 4 6]
#  [5 7 9]]
# print("start")
# print(x)
# print(v)
print(x + v)
# print("end")

# Add a vector to each column of a matrix
# x has shape (2, 3) and w has shape (2,).
# If we transpose x then it has shape (3, 2) and can be broadcast
# against w to yield a result of shape (3, 2); transposing this result
# yields the final result of shape (2, 3) which is the matrix x with
# the vector w added to each column. Gives the following matrix:
# [[ 5  6  7]
#  [ 9 10 11]]
# print(x.T)
print((x.T + w).T)
# Another solution is to reshape w to be a column vector of shape (2, 1);
# we can then broadcast it directly against x to produce the same
# output.
print(x + np.reshape(w, (2, 1)))

[[ 4  5]
 [ 8 10]
 [12 15]]
[[2 4 6]
 [5 7 9]]
[[ 5  6  7]
 [ 9 10 11]]
[[ 5  6  7]
 [ 9 10 11]]


## Views vs. Copies

Unlike a copy, in a **view** of an array, the data is shared between the view and the array. Sometimes, our results are copies of arrays, but other times they can be views. Understanding when each is generated is important to avoid any unforeseen issues.

Views can be created from a slice of an array, changing the dtype of the same data area (using arr.view(dtype), not the result of arr.astype(dtype)), or even both.

In [102]:
x = np.arange(5)
print('Original:\n', x)  # Prints [0 1 2 3 4]

# Modifying the view will modify the array
view = x[1:3]
view[1] = -1
print('Array After Modified View:\n', x)  # Prints [0 1 -1 3 4]

Original:
 [0 1 2 3 4]
Array After Modified View:
 [ 0  1 -1  3  4]


In [103]:
x = np.arange(5)
view = x[1:3]
view[1] = -1

# Modifying the array will modify the view
print('View Before Array Modification:\n', view)  # Prints [1 -1]
x[2] = 10
print('Array After Modifications:\n', x)          # Prints [0 1 10 3 4]
print('View After Array Modification:\n', view)   # Prints [1 10]

View Before Array Modification:
 [ 1 -1]
Array After Modifications:
 [ 0  1 10  3  4]
View After Array Modification:
 [ 1 10]


However, if we use fancy indexing, the result will actually be a copy and not a view:

In [104]:
x = np.arange(5)
print('Original:\n', x)  # Prints [0 1 2 3 4]

# Modifying the result of the selection due to fancy indexing
# will not modify the original array.
copy = x[[1, 2]]
copy[1] = -1
print('Copy:\n', copy) # Prints [1 -1]
print('Array After Modified Copy:\n', x)  # Prints [0 1 2 3 4]

Original:
 [0 1 2 3 4]
Copy:
 [ 1 -1]
Array After Modified Copy:
 [0 1 2 3 4]


In [105]:
# Another example involving fancy indexing
x = np.arange(5)
print('Original:\n', x)  # Prints [0 1 2 3 4]

copy = x[x >= 2]
print('Copy:\n', copy) # Prints [2 3 4]
x[3] = 10
print('Modified Array:\n', x)  # Prints [0 1 2 10 4]
print('Copy After Modified Array:\n', copy)  # Prints [2 3 4]

Original:
 [0 1 2 3 4]
Copy:
 [2 3 4]
Modified Array:
 [ 0  1  2 10  4]
Copy After Modified Array:
 [2 3 4]


## Summary

1. NumPy is an incredibly powerful library for computation providing both massive efficiency gains and convenience.
2. Vectorize! Orders of magnitude faster.
3. Keeping track of the shape of your arrays is often useful.
4. Many useful math functions and operations built into NumPy.
5. Select and manipulate arbitrary pieces of data with powerful indexing schemes.
6. Broadcasting allows for computation across arrays of different shapes.
7. Watch out for views vs. copies.

# Section 2. Numpy Exercises

In this section, you need to complete **5** questions using Numpy. You are encouraged to consult [Numpy Document](https://docs.scipy.org/doc/numpy/) about more functions that might be helpful for the exercises. 
<span style="color:blue"><b>Note that you are NOT ALLOWED to use for-loops in these exercises.</b></span>

#### 1.  Swapping row 0 and row 3 of array $X$ *in one line of code*, and print out the result.

In [106]:
# Swap this array X
X = np.arange(25).reshape(5, 5)
# print(X)

# TODO: your code here
X[[0,3]] = X[[3,0]]   # Swapping row 0 and row 3
print(X)


[[15 16 17 18 19]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [ 0  1  2  3  4]
 [20 21 22 23 24]]


#### 2. Create a $3\times5$ array $X$ with random value. Print out $X$, as well as the largest value of each column.

In [107]:
# fix the random seed so that we can reproduce the same result.
np.random.seed(5)

# TODO: your code here

X = np.random.random((3,5))  # Creating 3x5 array with random value
print(X)   # Print X
print(np.max(X, axis=0)) # Print the largest value of each column

[[0.22199317 0.87073231 0.20671916 0.91861091 0.48841119]
 [0.61174386 0.76590786 0.51841799 0.2968005  0.18772123]
 [0.08074127 0.7384403  0.44130922 0.15830987 0.87993703]]
[0.61174386 0.87073231 0.51841799 0.91861091 0.87993703]


#### 3. Compute the Mean Square Error (MSE): $MSE=\frac{1}{n}\sum^n_{i=0}(X_i-Y_i)^2$. Where $X_i$ and $Y_i$ imply the $i$-th elements in $X$ and $Y$, and $n$ is the number of elements in $X$. Remember to print out the result.

In [108]:
X = np.array([0.7, 0.5, 0.2, 0.8, 0.9])
Y = np.array([0.2, 0.4, 0.6, 0.4, 1.0])

# TODO: your code here
MSE = (np.square(X-Y)).mean(axis=0)   # Computing Mean Square Error (MSE)
print(MSE)  # Print MSE

0.118


#### 4. Create a $5\times5$ random matrix $Z$, and normalize the matrix so that it has zero mean and unit variance. Print out the normalized mean and variance to verify your result.

Note: Due to the nature of floating point number, your mean and variance won't be exactly zero and one after normalization. If you want, you can use `np.isclose` to check if they are close enough to your desired result.

In [109]:
# fix the random seed so that we can reproduce the same result.
np.random.seed(7)

# TODO: your code here
Z = np.random.random((5,5))    # Creating 5x5 Z Matrix
# print(Z)
# Normalize to zero mean unit variance
Z = (Z - Z.mean())/Z.std()   # Normalizing Z Matrix
# print(Z)
print(Z.mean())   # Print Normalized Mean
# print(np.isclose(Z.mean(),0))  # Checking Zero Mean with np.isclose   # Prints "True"
print(Z.var())   # Print Variance 
# print(np.isclose(Z.var(), 1))     # Checking Unit Variance with np.close  # Prints "True"


3.8191672047105384e-16
1.0


#### 5. Given the array $X$, for each entry, set the value to 0 if the original value is smaller than 10, and set the value to 1 otherwise. Remember to print out your result.

In [110]:
X = np.array([
 [20, 24, 31, 31, 28],
 [33, 13, 38,  0, 23],
 [23, 28, 28,  0,  7],
 [26, 10, 17, 19, 38],
 [37, 25,  5, 19, 15]])

# TODO: your codes here
X[X < 10] = 0   # Set value to 0
X[X >= 10] = 1   # Set value to 1
print(X)  # Print X

[[1 1 1 1 1]
 [1 1 1 0 1]
 [1 1 1 0 0]
 [1 1 1 1 1]
 [1 1 0 1 1]]


# Section 3. Calculus and Programming

Machine learning is not just about programming. You need to harness the power of Calculus, Linear Algebra and Probability to learn machine learning. In this section, you need to complete one math question, and write a short code snippet for it.

#### 1. Math question
Given $\sigma(x)=\frac{1}{1+e^{-x}}=\frac{e^x}{1+e^{x}}$, compute its first-order derivative $\frac{d}{dx}\sigma(x)$. Create a Markdown cell, and derive $\frac{d}{dx}\sigma(x)$ there.

Check this [tutorial](https://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference) to learn how to write down equations in MathJax.


\begin{align} \frac{d}{dx}\sigma(x) & = \frac{d}{dx}\bigg[\frac{e^x}{1+e^x}\bigg]\\
& = \frac{1}{1+e^x}\frac{d}{dx}(e^x) + e^x\frac{d}{dx}\bigg(\frac{1}{1+e^x}\bigg)\\
& = \frac{e^x}{1+e^x} - \frac{e^{2x}}{(1+e^x)^2}\\
\therefore \frac{d}{dx}\sigma(x) & = \frac{e^x}{(1+e^x)^2} \end{align}


#### 2. Implement your $\frac{d}{dx}\sigma(x)$ as a function `dx_sigma`, and print out your result by calling `dx_sigma(vector)`.

In [111]:
vector = np.array([0.3, 0.6, -0.5, 0.4, -0.8])

def dx_sigma(x):
    # TODO: your code here
    numerator = np.exp(x)
    denominator = (1+np.exp(x))**2
    res = numerator / denominator
    return res

print(dx_sigma(vector))

[0.24445831 0.22878424 0.23500371 0.24026075 0.2139097 ]
