Before using np.exp(), you will use math.exp() to implement the sigmoid function. You will then see why np.exp() is
preferable to math.exp().

Exercise: Build a function that returns the sigmoid of a real number x. Use math.exp(x) for the exponential function.
Reminder:  $ sigmoid(x) = \frac{1}{1+e^{−x}} $ is sometimes also known as the logistic function. It is a non-linear
function used not only in Machine Learning (Logistic Regression), but also in Deep Learning.

In [1]:
import numpy as np

def sigmoid(x):

    s = 1 / (1 + np.exp(-x))
    return s

In [8]:
sigmoid(3)

0.9525741268224334

Actually, we rarely use the "math" library in deep learning because the inputs of the functions are real numbers. In
deep learning we mostly use matrices and vectors. This is why numpy is more useful.

In fact, if $ x=(x_1,x_2,...,x_n) $ is a row vector then `np.exp(x)` will apply the exponential function to every
element of $ x $. The output will thus be: $ np.exp(x)=(e^{x_1}, e^{x_2}, \dots, e^{x_n}) $

In [10]:
arr = np.array([1, 2, 3])
print(sigmoid(arr))

[0.73105858 0.88079708 0.95257413]


In [11]:
def sigmoid_derivative(x):
    # Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x

    s = sigmoid(x) * (1 - sigmoid(x))
    return s

In [13]:
arr = np.array([1, 2, 3])
print ("sigmoid_derivative(x) = " + str(sigmoid_derivative(arr)))

sigmoid_derivative(x) = [0.19661193 0.10499359 0.04517666]


Two common numpy functions used in deep learning are `np.shape` and `np.reshape()`. `X.shape` is used to get the shape
(dimension) of a matrix/vector X. `X.reshape(...)` is used to reshape X into some other dimension.

In [14]:
"""
Implement image2vector() that takes an input of shape (length, height, 3) and returns a vector of shape
(length*height*3, 1). For example, if you would like to reshape an array v of shape (a, b, c) into a vector of
shape (a*b,c) you would do:
 """
def image2vector(image):
    """
    Argument:
    image -- a numpy array of shape (length, height, depth)
    Returns:
    v -- a vector of shape (length*height*depth, 1)
    """
    v = image.reshape(image.shape[0]*image.shape[1] * image.shape[2], 1)
    return v

In [15]:
arr = np.array([[[ 0.67826139,  0.29380381],
        [ 0.90714982,  0.52835647],
        [ 0.4215251 ,  0.45017551]],

       [[ 0.92814219,  0.96677647],
        [ 0.85304703,  0.52351845],
        [ 0.19981397,  0.27417313]],

       [[ 0.60659855,  0.00533165],
        [ 0.10820313,  0.49978937],
        [ 0.34144279,  0.94630077]]])

print ("image2vector(image) = " + str(image2vector(arr)))

image2vector(image) = [[0.67826139 0.29380381]
 [0.90714982 0.52835647]
 [0.4215251  0.45017551]
 [0.92814219 0.96677647]
 [0.85304703 0.52351845]
 [0.19981397 0.27417313]
 [0.60659855 0.00533165]
 [0.10820313 0.49978937]
 [0.34144279 0.94630077]]


### Normalising vectors
Another common technique we use in Machine Learning and Deep Learning is to normalize our data. It often leads to a
better performance because gradient descent converges faster after normalization. Here, by normalization we mean
changing $ x $ to  $ \frac{x}{\|x\|} $ (dividing each row vector of $ x $ by its norm).

In [6]:
"""
Implement normalizeRows() to normalize the rows of a matrix. After applying this function to an input matrix x, each row
of x should be a vector of unit length (meaning length 1).
"""
def normalize_rows(x):
    """
    Argument:
    x -- A numpy matrix of shape (n, m)
    Returns:
    x -- The normalized (by row) numpy matrix. You are allowed to modify x.
    """
    x_norm = np.linalg.norm(x, ord = 2, axis = 1, keepdims = True)
    x = x / x_norm
    return x

In [19]:
arr = np.array([
    [0, 3, 4],
    [1, 6, 4]])
print("normalizeRows(x) = " + str(normalize_rows(arr)))

normalizeRows(x) = [[0.         0.6        0.8       ]
 [0.13736056 0.82416338 0.54944226]]


### Broadcasting and the softmax function

$ softmax(x) = \begin{bmatrix} \frac{e^{x_{11}}}{\sum{e^{x_{11}}}} & \dots & \frac{e^{x_{1n}}}{\sum{e^{x_{1n}}}} \\
\vdots & \ddots & \vdots \\ \frac{e^{x_{m1}}}{\sum{e^{x_{m1}}}} & \dots & \frac{e^{x_{mn}}}{\sum{e^{x_{mn}}}}
\end{bmatrix} $

In [4]:
def softmax(x):
    """
    Calculates the softmax for each row of the input x. Your code should work for a row vector and also for matrices
    of shape (m,n). Argument: x -- A numpy matrix of shape (m,n). Returns: s -- A numpy matrix equal to the softmax of
    x, of shape (m,n)
    """
    s = np.exp(x) / np.sum(np.exp(x), axis = 1, keepdims = True)
    return s

In [5]:
arr = np.array([
    [9, 2, 5, 0, 0],
    [7, 5, 0, 0 ,0]])
print("softmax(x) = " + str(softmax(arr)))

softmax(x) = [[9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-04
  1.21052389e-04]
 [8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-04
  8.01252314e-04]]


If you print the shapes of `x_exp`, `x_sum` and `s` above and rerun the assessment cell, you will see that `x_sum` is of
`shape(2,1)` while `x_exp` and `s` are of `shape(2,5)`. `x_exp`/`x_sum` works due to python broadcasting.

### Vectorisation
In deep learning, you deal with very large datasets. Hence, a non-computationally-optimal function can become a huge
bottleneck in your algorithm and can result in a model that takes ages to run. To make sure that your code is
computationally efficient, you will use vectorization. For example, try to tell the difference between the following
implementations of the dot/outer/elementwise product.

In [None]:
import time

x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

### CLASSIC DOT PRODUCT OF VECTORS IMPLEMENTATION ###
tic = time.process_time()
dot = 0
for i in range(len(x1)):
    dot+= x1[i]*x2[i]
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### CLASSIC OUTER PRODUCT IMPLEMENTATION ###
tic = time.process_time()
outer = np.zeros((len(x1),len(x2))) # we create a len(x1)*len(x2) matrix with only zeros
for i in range(len(x1)):
    for j in range(len(x2)):
        outer[i,j] = x1[i]*x2[j]
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### CLASSIC ELEMENTWISE IMPLEMENTATION ###
tic = time.process_time()
mul = np.zeros(len(x1))
for i in range(len(x1)):
    mul[i] = x1[i]*x2[i]
toc = time.process_time()
print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ###
W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array
tic = time.process_time()
gdot = np.zeros(W.shape[0])
for i in range(W.shape[0]):
    for j in range(len(x1)):
        gdot[i] += W[i,j]*x1[j]
toc = time.process_time()
print ("gdot = " + str(gdot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

In [None]:
x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

### VECTORIZED DOT PRODUCT OF VECTORS ###
tic = time.process_time()
dot = np.dot(x1,x2)
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### VECTORIZED OUTER PRODUCT ###
tic = time.process_time()
outer = np.outer(x1,x2)
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### VECTORIZED ELEMENTWISE MULTIPLICATION ###
tic = time.process_time()
mul = np.multiply(x1,x2)
toc = time.process_time()
print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### VECTORIZED GENERAL DOT PRODUCT ###
tic = time.process_time()
dot = np.dot(W,x1)
toc = time.process_time()
print ("gdot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

In [11]:
def l1(yhat, y):
    """
    Arguments: yhat -- vector of size m (predicted labels), y -- vector of size m (true labels). Returns: loss -- the
    value of the L1 loss function defined above
    """
    loss = np.sum(np.abs(y - yhat))
    return loss

In [12]:
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L1 = " + str(l1(yhat,y)))

L1 = 1.1


In [13]:
def l2(yhat, y):
    """
    Arguments: yhat -- vector of size m (predicted labels), y -- vector of size m (true labels). Returns: loss -- the
    value of the L2 loss function defined above
    """
    loss = np.sum((y - yhat) ** 2)
    return loss

In [14]:
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L2 = " + str(l2(yhat,y)))

L2 = 0.43
