# <font color="darkblue">Python Basics with Numpy</font>

**Exercise**: Set test to `"Hello World"`.

In [1]:
test = "Hello, World!"

In [2]:
print ("test: " + test)

test: Hello, World!


## <font color="darkblue">Building basic functions with Numpy</font>

Numpy is the main package for scientific computing in Python. 

### Sigmoid function

Before using `np.exp()`, you will use `math.exp()` to see why `np.exp()` is preferable.

**Exercise**: Build a function that returns the sigmoid of a real number `x`. Use `math.exp(x)`

**Reminder**:
$sigmoid(x) = \frac{1}{1+e^{-x}}$ is the logistic function.

In [3]:
# basic_sigmoid

import math

def basic_sigmoid(x):
    """
    Compute sigmoid of x.

    Arguments:
    x -- A scalar

    Return:
    s -- sigmoid(x)
    """
    
    s = 1 / (1 + math.exp(-x))
    return s

In [4]:
basic_sigmoid(3)

0.9525741268224334

#### Expected Output 
`basic_sigmoid(3) = 0.9525741268224334`

We rarely use the `math` library in deep learning because the inputs of the functions are real numbers. In deep learning we mostly use matrices and vectors. This is why `Numpy` is more useful. 

In [5]:
### One reason why we use "numpy" instead of "math" in Deep Learning
# x = [1, 2, 3]
# basic_sigmoid(x) 
# You will see this give an error when you run it, because x is a vector.

In [6]:
import numpy as np

x = np.array([1, 2, 3])       # array op
print(np.exp(x))

[  2.71828183   7.3890561   20.08553692]


In [7]:
x = np.array([1, 2, 3])       # vector op
print("X = " + str(x + 3))
print("Y = " + str(1 / x))

X = [4 5 6]
Y = [ 1.          0.5         0.33333333]


**Exercise**: Implement `sigmoid` function using numpy. 

**Instructions**: Input could now be a real number, vector, or a matrix. The data structures we use in numpy to represent these shapes (vectors, matrices...) are called numpy arrays.

$$ \text{For } x \in \mathbb{R}^n \text{,     } \sigma(x) = \sigma\begin{pmatrix}
    x_1  \\
    x_2  \\
    ...  \\
    x_n  \\
\end{pmatrix} = \begin{pmatrix}
    \frac{1}{1+e^{-x_1}}  \\
    \frac{1}{1+e^{-x_2}}  \\
    ...  \\
    \frac{1}{1+e^{-x_n}}  \\
\end{pmatrix} $$

In [8]:
import numpy as np

def sigmoid(x):
    """
    Compute the sigmoid of x

    Arguments:
    x -- A scalar or numpy array of any size

    Return:
    s -- sigmoid(x)
    """
    
    s = 1 / (1 + np.exp(-x))
    return s

In [9]:
x = np.array([1, 2, 3])
sigmoid(x)

array([ 0.73105858,  0.88079708,  0.95257413])

#### Expected Output 
`sigmoid([1,2,3]) : array([ 0.73105858,  0.88079708,  0.95257413])`

### Sigmoid gradient

**Exercise**: Implement the function `sigmoid_grad()` to compute the gradient of the sigmoid function with respect to its input x. The formula is: $$\sigma'(x) = \sigma(x) (1 - \sigma(x))$$


In [10]:
def sigmoid_derivative(x):
    """
    Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x.
    You can store the output of the sigmoid function into variables and then use it to calculate the gradient.
    
    Arguments:
    x -- A scalar or numpy array

    Return:
    ds -- Your computed gradient.
    """
    
    s = sigmoid(x)
    ds = s * (1 -s)
    return ds

In [11]:
x = np.array([1, 2, 3])
print ("sigmoid_derivative(x) = " + str(sigmoid_derivative(x)))

sigmoid_derivative(x) = [ 0.19661193  0.10499359  0.04517666]


**Expected Output** : `sigmoid_derivative([1,2,3]) = [ 0.19661193  0.10499359  0.04517666]`


### Reshaping arrays

- X.shape is used to get the shape (dimension) of a matrix/vector X. 
- X.reshape is used to reshape X into some other dimension. 

For example, an image is represented by a 3D array of shape $(length, height, depth = 3)$. However, when you read an image as the input of an algorithm you convert it to a vector of shape $(length*height*3, 1)$. In other words, you "unroll", or reshape, the 3D array into a 1D vector.


**Exercise**: Implement generic `image2vector()` that takes an input of shape (length, height, 3) and returns a vector of shape (length $\times$ height $\times$ 3, 1). 

For example, if you would like to reshape an array v of shape (a, b, c) into a vector of shape (a*b,c) you would do:

``` python
v = v.reshape((v.shape[0]*v.shape[1], v.shape[2])) 
# v.shape[0] = a ; v.shape[1] = b ; v.shape[2] = c
```

In [12]:
# image2vector FUNCTION

def image2vector(image):
    """
    Argument:
    image -- a numpy array of shape (length, height, depth)
    
    Returns:
    v -- a vector of shape (length * height * depth, 1)
    """
    
    v = image.reshape(image.shape[0]*image.shape[1]*image.shape[2], 1)
    return v

In [13]:
# This is a 3 by 3 by 2 array

image = np.array([
        [[ 0.67826139,  0.29380381], [ 0.90714982,  0.52835647], [ 0.4215251 ,  0.45017551]],
        [[ 0.92814219,  0.96677647], [ 0.85304703,  0.52351845], [ 0.19981397,  0.27417313]],
        [[ 0.60659855,  0.00533165], [ 0.10820313,  0.49978937], [ 0.34144279,  0.94630077]]])

print ("image2vector(image) = \n" + str(image2vector(image)))

image2vector(image) = 
[[ 0.67826139]
 [ 0.29380381]
 [ 0.90714982]
 [ 0.52835647]
 [ 0.4215251 ]
 [ 0.45017551]
 [ 0.92814219]
 [ 0.96677647]
 [ 0.85304703]
 [ 0.52351845]
 [ 0.19981397]
 [ 0.27417313]
 [ 0.60659855]
 [ 0.00533165]
 [ 0.10820313]
 [ 0.49978937]
 [ 0.34144279]
 [ 0.94630077]]


### Normalizing rows

Another common technique we use in Machine Learning and Deep Learning is to normalize our data. It often leads to a better performance because gradient descent converges faster after normalization. 

Here, by normalization we mean changing x to $ \frac{x}{\| x\|} $ (dividing each row vector of x by its norm).

For e.g., if $$x = 
\begin{bmatrix}
    0 & 3 & 4 \\
    2 & 6 & 4 \\
\end{bmatrix}$$ then $$\| x\| = np.linalg.norm(x, axis = 1, keepdims = True) = \begin{bmatrix}
    5 \\
    \sqrt{56} \\
\end{bmatrix} $$and        $$ x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix}
    0 & \frac{3}{5} & \frac{4}{5} \\
    \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\
\end{bmatrix}$$ 


**Exercise**: Implement `normalizeRows()` to normalize the rows of a matrix. After applying this function to an input matrix x, each row of x should be a vector of unit length (meaning length 1).

In [14]:
# normalizeRows

def normalizeRows(x):
    """
    Implement a function that normalizes each row of the matrix x (to have unit length).
    
    Argument:
    x -- A numpy matrix of shape (n, m)
    
    Returns:
    x -- The normalized (by row) numpy matrix. You are allowed to modify x.
    """
    
    x_norm = np.linalg.norm(x, axis = 1, keepdims = True)
    x = x / x_norm
    print("shape of x_vect =" + str(x.shape))
    print("shape of x_norm =" + str(x_norm.shape))
    return x

In [15]:
x = np.array([
    [0, 3, 4],
    [1, 6, 4]])
print("normalizeRows(x) = \n" + str(normalizeRows(x)))

shape of x_vect =(2, 3)
shape of x_norm =(2, 1)
normalizeRows(x) = 
[[ 0.          0.6         0.8       ]
 [ 0.13736056  0.82416338  0.54944226]]


### Broadcasting and the softmax function

**Exercise**: Implement a `softmax` function. You can think of softmax as a normalizing function used when your algorithm needs to classify two or more classes.

**Instructions**:
- $ \text{for } X \in \mathbb{R}^{1\times n} \text{,     } softmax(X) = softmax(\begin{bmatrix}
    x_1  &&
    x_2 &&
    ...  &&
    x_n  
\end{bmatrix}) = \begin{bmatrix}
     \frac{e^{x_1}}{\sum_{j}e^{x_j}}  &&
    \frac{e^{x_2}}{\sum_{j}e^{x_j}}  &&
    ...  &&
    \frac{e^{x_n}}{\sum_{j}e^{x_j}} 
\end{bmatrix} $ 

- $\text{for a matrix } X \in \mathbb{R}^{m \times n} \text{,}$  
$$softmax(x) = softmax\begin{bmatrix}
    x_{11} & x_{12} & x_{13} & \dots  & x_{1n} \\
    x_{21} & x_{22} & x_{23} & \dots  & x_{2n} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    x_{m1} & x_{m2} & x_{m3} & \dots  & x_{mn}
\end{bmatrix} = \begin{bmatrix}
    \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots  & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\
    \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots  & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots  & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}}
\end{bmatrix} $$

In [16]:
# softmax

def softmax(x):
    """Calculates the softmax for each row of the input x.

    Your code should work for a row vector and also for matrices of shape (m,n).

    Argument:
    x -- A numpy matrix of shape (m,n)

    Returns:
    s -- A numpy matrix equal to the softmax of x, of shape (m,n)
    """
    
    x_exp = np.exp(x)
    x_sum = np.sum(x_exp, axis = 1, keepdims = True)
    softmax = x_exp/x_sum

    print("dimension of input:   " + str(x.shape))
    print("dimension of row_sum: " + str(x_sum.shape))
    print("dimension of softmax: " + str(softmax.shape))  
    return softmax

In [17]:
x = np.array([
    [9, 2, 5, 0],
    [7, 5, 0, 0]])
print("softmax(x) = \n" + str(softmax(x)))

dimension of input:   (2, 4)
dimension of row_sum: (2, 1)
dimension of softmax: (2, 4)
softmax(x) = 
[[  9.81016419e-01   8.94571181e-04   1.79679425e-02   1.21067044e-04]
 [  8.79384465e-01   1.19011746e-01   8.01894834e-04   8.01894834e-04]]


**Note**: If you print the shapes of x_exp, x_sum and softmax above and rerun the assessment cell, you will see that x_sum is of shape (2,1) while x_exp and softmax are of shape (2,4). This is due to Numpy broadcasting.

## <font color='darkblue'>Vectorization</font>

In [18]:
import numpy as np
x1 = np.random.randint(low = 0, high = 9, size = 100)
x2 = np.random.randint(low = 0, high = 9, size = 100)
print(x1)
print(x2)

[6 5 5 2 3 2 7 0 2 7 5 8 0 8 2 8 7 1 5 8 4 4 6 1 4 1 6 6 7 8 5 0 1 4 6 8 0
 4 0 5 8 1 6 8 3 5 6 2 5 2 1 7 3 5 0 7 7 4 7 3 5 0 7 2 4 7 4 4 8 3 8 1 8 0
 4 2 8 7 1 4 6 4 7 2 8 5 0 6 2 0 7 3 8 2 3 4 7 3 4 2]
[3 0 6 7 4 3 0 0 7 2 1 4 3 0 4 6 1 8 5 5 3 8 5 1 3 5 1 2 8 1 2 7 5 6 5 7 4
 4 6 7 3 4 3 2 4 6 6 3 3 4 6 8 2 7 7 2 1 2 8 7 0 5 6 8 5 6 8 6 4 6 3 0 3 5
 5 5 5 6 8 2 1 7 0 7 7 7 5 6 4 1 8 4 7 8 2 5 0 7 5 0]


In [19]:
import time

### CLASSIC DOT PRODUCT 
tic = time.process_time()
dot = 0
for i in range(len(x1)):
    dot += x1[i]*x2[i]
toc = time.process_time()
print ("dot product        " + str(dot))
print ("Computation time = " + str(1000*(toc - tic)) + "ms\n\n")

### CLASSIC OUTER PRODUCT
tic = time.process_time()
outer = np.zeros((len(x1), len(x2))) # Create a len(x1)*len(x2) zero matrix
for i in range(len(x1)):
    for j in range(len(x2)):
        outer[i,j] = x1[i]*x2[j]
toc = time.process_time()
print ("Outer product    =\n" + str(outer))
print ("Computation time = "  + str(1000*(toc - tic)) + "ms\n\n")


### CLASSIC ELEMENTWISE IMPLEMENTATION ###
tic = time.process_time()
assert(len(x1) == len(x2))
mul = np.zeros(len(x1))
for i in range(len(x1)):
    mul[i] = x1[i]*x2[i]
toc = time.process_time()
print ("Elementwise multiplication = \n" + str(mul))
print ("Computation time = " + str(1000*(toc - tic)) + "ms\n\n")


### CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ###
W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array
tic = time.process_time()
gdot = np.zeros(W.shape[0])
for i in range(W.shape[0]):
    for j in range(len(x1)):
        gdot[i] += W[i,j]*x1[j]
toc = time.process_time()
print ("general dot prod = " + str(gdot))
print ("Computation time = " + str(1000*(toc - tic)) + "ms\n\n")

dot product        1801
Computation time = 0.12487299999996981ms


Outer product    =
[[ 18.   0.  36. ...,  42.  30.   0.]
 [ 15.   0.  30. ...,  35.  25.   0.]
 [ 15.   0.  30. ...,  35.  25.   0.]
 ..., 
 [  9.   0.  18. ...,  21.  15.   0.]
 [ 12.   0.  24. ...,  28.  20.   0.]
 [  6.   0.  12. ...,  14.  10.   0.]]
Computation time = 4.755086999999936ms


Elementwise multiplication = 
[ 18.   0.  30.  14.  12.   6.   0.   0.  14.  14.   5.  32.   0.   0.   8.
  48.   7.   8.  25.  40.  12.  32.  30.   1.  12.   5.   6.  12.  56.   8.
  10.   0.   5.  24.  30.  56.   0.  16.   0.  35.  24.   4.  18.  16.  12.
  30.  36.   6.  15.   8.   6.  56.   6.  35.   0.  14.   7.   8.  56.  21.
   0.   0.  42.  16.  20.  42.  32.  24.  32.  18.  24.   0.  24.   0.  20.
  10.  40.  42.   8.   8.   6.  28.   0.  14.  56.  35.   0.  36.   8.   0.
  56.  12.  56.  16.   6.  20.   0.  21.  20.   0.]
Computation time = 0.16686300000001708ms


general dot prod = [ 226.1020921   219.59727277  200.116

In [20]:
### VECTORIZED DOT PRODUCT
tic = time.process_time()
dot = np.dot(x1,x2)
toc = time.process_time()
print ("dot product      = " + str(dot))
print ("Computation time = " + str(1000*(toc - tic)) + "ms\n\n")

### VECTORIZED OUTER PRODUCT ###
tic = time.process_time()
outer = np.outer(x1,x2)
toc = time.process_time()
print ("outer = \n" + str(outer))
print ("Computation time = " + str(1000*(toc - tic)) + "ms\n\n")

### VECTORIZED ELEMENTWISE MULTIPLICATION ###
tic = time.process_time()
mul = np.multiply(x1,x2)
toc = time.process_time()
print ("Elementwise multiplication = \n" + str(mul))
print ("Computation time = " + str(1000*(toc - tic)) + "ms\n\n")

### VECTORIZED GENERAL DOT PRODUCT ###
tic = time.process_time()
dot = np.dot(W,x1)
toc = time.process_time()
print ("general dot product = " + str(dot))
print ("Computation time    = " + str(1000*(toc - tic)) + "ms\n\n")

dot product      = 1801
Computation time = 0.08040600000003284ms


outer = 
[[18  0 36 ..., 42 30  0]
 [15  0 30 ..., 35 25  0]
 [15  0 30 ..., 35 25  0]
 ..., 
 [ 9  0 18 ..., 21 15  0]
 [12  0 24 ..., 28 20  0]
 [ 6  0 12 ..., 14 10  0]]
Computation time = 0.5502809999999858ms


Elementwise multiplication = 
[18  0 30 14 12  6  0  0 14 14  5 32  0  0  8 48  7  8 25 40 12 32 30  1 12
  5  6 12 56  8 10  0  5 24 30 56  0 16  0 35 24  4 18 16 12 30 36  6 15  8
  6 56  6 35  0 14  7  8 56 21  0  0 42 16 20 42 32 24 32 18 24  0 24  0 20
 10 40 42  8  8  6 28  0 14 56 35  0 36  8  0 56 12 56 16  6 20  0 21 20  0]
Computation time = 0.050516999999916656ms


general dot product = [ 226.1020921   219.59727277  200.11659373]
Computation time    = 2.718489000000046ms




## <font color='darkblue'>L1 and L2 loss functions</font>

**Exercise**: Implement the numpy vectorized version of the L1 loss. You may find the function abs(x) (absolute value of x) useful.

**Reminder**:
- The loss is used to evaluate the performance of your model. The bigger your loss is, the more different your predictions ($ \hat{y} $) are from the true values ($y$).

- L1 loss is defined as:
$$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}| \end{align*}$$

In [21]:
def L1(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    
    Returns:
    loss -- the value of the L1 loss function defined above
    """
    
    loss = np.sum(np.abs(yhat - y))
    return loss

In [22]:
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y    = np.array([1, 0, 0, 1, 1])

print("L1 = " + str(L1(yhat,y)))

L1 = 1.1


**Expected Output** `L1 = 1.1`

**Exercise**: Implement the L2 loss. 

**Reminder** 
- if $x = [x_1, x_2, ..., x_n]$, then `np.dot(x,x)` = $\sum_{j=0}^n x_j^{2}$. 

- L2 loss is defined as $$\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^m(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}$$

In [23]:
def L2(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    
    Returns:
    loss -- the value of the L2 loss function defined above
    """
    diff = y - yhat  
    loss = np.sum(np.dot(diff,diff)) 
    return loss

In [24]:
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])

print("L2 = " + str(L2(yhat,y)))

L2 = 0.43


**Expected Output**: `L2 = 0.43`