# Python Function Implementations with Numpy : Neural Networks as Basic Building Blocks

## 1 - Building basic functions with numpy

### 1.1 - Sigmoid function, np.exp()

Build a function that returns the sigmoid of a real number x. Use math.exp(x) for the exponential function.

**Reminder**:
$sigmoid(x) = \frac{1}{1+e^{-x}}$ is sometimes also known as the logistic function. It is a non-linear function used not only in Machine Learning (Logistic Regression), but also in Deep Learning.

<img src="images/Sigmoid.png" style="width:500px;height:228px;">


In [1]:
import math

def basic_sigmoid(x):
    s = 1 / (1 + math.exp(-x))
    return s

basic_sigmoid(3)

0.9525741268224334

### 1.2 - Sigmoid function, np.exp() - Using numpy instead of math

In fact, if $ x = (x_1, x_2, ..., x_n)$ is a row vector then $np.exp(x)$ will apply the exponential function to every element of x. The output will thus be: $np.exp(x) = (e^{x_1}, e^{x_2}, ..., e^{x_n})$


In [2]:
import numpy as np

x = np.array([1, 2, 3])
print(np.exp(x))

[ 2.71828183  7.3890561  20.08553692]


In [3]:
??np.exp

[0;31mCall signature:[0m  [0mnp[0m[0;34m.[0m[0mexp[0m[0;34m([0m[0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mType:[0m            ufunc
[0;31mString form:[0m     <ufunc 'exp'>
[0;31mFile:[0m            /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/numpy/__init__.py
[0;31mDocstring:[0m      
exp(x, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature, extobj])

Calculate the exponential of all elements in the input array.

Parameters
----------
x : array_like
    Input values.
out : ndarray, None, or tuple of ndarray and None, optional
    A location into which the result is stored. If provided, it must have
    a shape that the inputs broadcast to. If not provided or None,
    a freshly-allocated array is returned. A tuple (possible only as a
    keyword argument) must have length equal to the number of outputs.
where : array_like, 

## Implement the sigmoid function using numpy.

**Instructions**: x could now be either a real number, a vector, or a matrix. The data structures we use in numpy to represent these shapes (vectors, matrices...) are called numpy arrays. You don't need to know more for now.

$$
\text{For } x \in \mathbb{R}^n \text{,     } sigmoid(x) = sigmoid\begin{pmatrix}
    x_1  \\
    x_2  \\
    ...  \\
    x_n  \\
\end{pmatrix} = \begin{pmatrix}
    \frac{1}{1+e^{-x_1}}  \\
    \frac{1}{1+e^{-x_2}}  \\
    ...  \\
    \frac{1}{1+e^{-x_n}}  \\
\end{pmatrix}\tag{1}
$$


In [4]:
import numpy as np

def sigmoid(x):
    s = 1 / (np.exp(-x) + 1)
    return s



In [5]:
x = np.array([1, 2, 3])
sig_x = sigmoid(x)

isinstance(sig_x, np.ndarray)

True

### 1.2 - Sigmoid gradient

Implement the function sigmoid_grad() to compute the gradient of the sigmoid function with respect to its input x. The formula is: $$sigmoid\_derivative(x) = \sigma'(x) = \sigma(x) (1 - \sigma(x))\tag{2}$$
You often code this function in two steps:

1. Set s to be the sigmoid of x. You might find your sigmoid(x) function useful.
2. Compute $\sigma'(x) = s(1-s)$


In [6]:
def sigmoid_derivative(x):
    s = sigmoid(x)
    ds = s * (1 - s)
    return ds

x = np.array([1, 2, 3])
print("sigmoid_derivative(x) = " + str(sigmoid_derivative(x)))


sigmoid_derivative(x) = [0.19661193 0.10499359 0.04517666]


### 1.3 - Reshaping arrays : An insight into Image reading and Conversion into Numpy arrays

Two common numpy functions used in deep learning are [np.shape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html) and [np.reshape()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html).

- X.shape is used to get the shape (dimension) of a matrix/vector X.
- X.reshape(...) is used to reshape X into some other dimension.

For example, in computer science, an image is represented by a 3D array of shape $(length, height, depth = 3)$. However, when you read an image as the input of an algorithm you convert it to a vector of shape $(length*height*3, 1)$. In other words, you "unroll", or reshape, the 3D array into a 1D vector.

<img src="images/image2vector_kiank.png" style="width:500px;height:300;">


In [7]:
def img2vector(image):
    
    ''' Argument:

    image -- a numpy array of shape (length, height, depth)
    
    Returns:
    v -- a vector of shape (length*height*depth, 1)
    default depth = 3 for R G B values 

    '''
    v = image.reshape(image.shape[0] * image.shape[1] * image.shape[2] , 1)
    return v

In [8]:
# Read an Image and convert it to a vector
import numpy as np
import imageio
img = imageio.imread('images/img].jpeg')
print(img.shape)

# https://www.pluralsight.com/guides/importing-image-data-into-numpy-arrays

image = np.asarray(img)
print ("image2vector(image) = " + str(img2vector(image)))


(450, 800, 3)
image2vector(image) = [[25]
 [26]
 [12]
 ...
 [66]
 [85]
 [81]]


  img = imageio.imread('images/img].jpeg')


### 1.4 - Normalizing rows

Another common technique we use in Machine Learning and Deep Learning is to normalize our data. It often leads to a better performance because gradient descent converges faster after normalization. Here, by normalization we mean changing x to $ \frac{x}{\| x\|} $ (dividing each row vector of x by its norm).


For example, if $$x = 
\begin{bmatrix}
    0 & 3 & 4 \\
    2 & 6 & 4 \\
\end{bmatrix}\tag{3}$$ then $$\| x\| = np.linalg.norm(x, axis = 1, keepdims = True) = \begin{bmatrix}
    5 \\
    \sqrt{56} \\
\end{bmatrix}\tag{4} $$and        $$ x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix}
    0 & \frac{3}{5} & \frac{4}{5} \\
    \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\
\end{bmatrix}\tag{5}$$
Implement normalizeRows() to normalize the rows of a matrix. After applying this function to an input matrix x, each row of x should be a vector of unit length (meaning length 1).


In [9]:
def normalizeRows(x):

    x_norm = np.linalg.norm(x , axis = 1 , keepdims = True)
    # Divide x by its norm.
    x = x/x_norm
    return x

x = np.array([
    [0, 3, 4],
    [1, 6, 4]])
print("normalizeRows(x) = " + str(normalizeRows(x)))

normalizeRows(x) = [[0.         0.6        0.8       ]
 [0.13736056 0.82416338 0.54944226]]


### 1.5 - Broadcasting and the softmax function

A very important concept to understand in numpy is "broadcasting". It is very useful for performing mathematical operations between arrays of different shapes. For the full details on broadcasting, you can read the official [broadcasting documentation](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).

#### Implement a softmax function using numpy. You can think of softmax as a normalizing function used when your algorithm needs to classify two or more classes. You will learn more about softmax in the second course of this specialization.

- $ \text{for } x \in \mathbb{R}^{1\times n} \text{, } softmax(x) = softmax(\begin{bmatrix}
  x*1 &&
  x_2 &&
  ... &&
  x_n  
  \end{bmatrix}) = \begin{bmatrix}
  \frac{e^{x_1}}{\sum*{j}e^{x*j}} &&
  \frac{e^{x_2}}{\sum*{j}e^{x*j}} &&
  ... &&
  \frac{e^{x_n}}{\sum*{j}e^{x_j}}
  \end{bmatrix} $

- $\text{for a matrix } x \in \mathbb{R}^{m \times n} \text{,  $x_{ij}$ maps to the element in the $i^{th}$ row and $j^{th}$ column of $x$, thus we have: }$ $$softmax(x) = softmax\begin{bmatrix}
    x_{11} & x_{12} & x_{13} & \dots  & x_{1n} \\
    x_{21} & x_{22} & x_{23} & \dots  & x_{2n} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    x_{m1} & x_{m2} & x_{m3} & \dots  & x_{mn}
\end{bmatrix} = \begin{bmatrix}
    \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots  & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\
    \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots  & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots  & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}}
\end{bmatrix} = \begin{pmatrix}
    softmax\text{(first row of x)}  \\
    softmax\text{(second row of x)} \\
    ...  \\
    softmax\text{(last row of x)} \\
\end{pmatrix} $$


In [10]:
def softmax(x):
    x_exp = np.exp(x)
    x_sum = np.sum(x_exp , axis = 1 , keepdims = True)
    s = x_exp/x_sum
    return s

In [11]:
x_exp = np.exp(x)
x_exp

array([[  1.        ,  20.08553692,  54.59815003],
       [  2.71828183, 403.42879349,  54.59815003]])

In [12]:
x_sum = np.sum(x_exp , axis = 1 , keepdims = True)
x_sum

array([[ 75.68368696],
       [460.74522535]])

In [13]:
s = x_exp/x_sum
s

array([[0.01321289, 0.26538793, 0.72139918],
       [0.00589975, 0.8756006 , 0.11849965]])

## 2) Vectorization

In deep learning, you deal with very large datasets.

Hence, a non-computationally-optimal function can become a huge bottleneck in your algorithm and can result in a model that takes ages to run. To make sure that your code is computationally efficient, you will use vectorization. For example, try to tell the difference between the following implementations of the dot/outer/elementwise product.


In [14]:
import time 

x1 = [9, 2, 5, 0, 0, 7, 0, 0, 0]
x2 = [0, 2, 5, 0, 0, 7, 0, 0, 0]


### CLASSIC DOT PRODUCT OF VECTORS IMPLEMENTATION ###\
tic = time.process_time()
dot = 0
for i in range(len(x1)):
    dot+= x1[i]*x2[i]
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")



### CLASSIC OUTER PRODUCT IMPLEMENTATION ###
tic = time.process_time()
outer = np.zeros((len(x1),len(x2))) # we create a len(x1)*len(x2) matrix with only zeros
for i in range(len(x1)):
    for j in range(len(x2)):
        outer[i,j] = x1[i]*x2[j]
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")


### CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ###
W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array
tic = time.process_time()
gdot = np.zeros(W.shape[0])

for i in range(W.shape[0]):
    for j in range(len(x1)):
        gdot[i] += W[i,j]*x1[j]
toc = time.process_time()
print ("gdot = " + str(gdot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")



dot = 78
 ----- Computation time = 0.053999999999998494ms
outer = [[ 0. 18. 45.  0.  0. 63.  0.  0.  0.]
 [ 0.  4. 10.  0.  0. 14.  0.  0.  0.]
 [ 0. 10. 25.  0.  0. 35.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0. 14. 35.  0.  0. 49.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.]]
 ----- Computation time = 0.0969999999997917ms
gdot = [12.94580715  8.95419293  9.18348223]
 ----- Computation time = 0.12100000000003774ms


In [15]:
x1 = [9, 2, 5, 0, 0, 7, 0, 0, 0]
W = np.random.rand(3,len(x1))     # Random array with  3*len(x1) numpy array

print(len(W[0]))
print(len(W))

9
3


In [16]:
# Implemntation of Vectorization using Numpy and timing it
x1 = [9, 2, 5, 0, 0, 7, 0, 0, 0]
x2 = [0, 2, 5, 0, 0, 7, 0, 0, 0]

tic = time.process_time()
dot = np.dot(x1,x2)
toc = time.process_time()

print ("dot with numpy = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")


### VECTORIZED OUTER PRODUCT ###
tic = time.process_time()
outer = np.outer(x1,x2)
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### VECTORIZED GENERAL DOT PRODUCT ###
tic = time.process_time()
dot = np.dot(W,x1)
toc = time.process_time()
print ("gdot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

dot with numpy = 78
 ----- Computation time = 0.05500000000013827ms
outer = [[ 0 18 45  0  0 63  0  0  0]
 [ 0  4 10  0  0 14  0  0  0]
 [ 0 10 25  0  0 35  0  0  0]
 [ 0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0]
 [ 0 14 35  0  0 49  0  0  0]
 [ 0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0]]
 ----- Computation time = 0.08300000000005525ms
gdot = [14.09890273 14.13493541 11.00269118]
 ----- Computation time = 0.02799999999991698ms


**Note** that `np.dot()` performs a matrix-matrix or matrix-vector multiplication. This is different from `np.multiply()` and the `*` operator (which is equivalent to `.*` in Matlab/Octave), which performs an element-wise multiplication.


## 2.1 Implement the L1 and L2 loss functions

L1 Loss Function :

Implement the numpy vectorized version of the L1 loss. You may find the function abs(x) (absolute value of x) useful.

**Reminder**:

- The loss is used to evaluate the performance of your model. The bigger your loss is, the more different your predictions ($ \hat{y} $) are from the true values ($y$). In deep learning, you use optimization algorithms like Gradient Descent to train your model and to minimize the cost.
- L1 loss is defined as:
  $$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}| \end{align*}\tag{6}$$


L2 Loss Function :

Implement the numpy vectorized version of the L2 loss. There are several way of implementing the L2 loss but you may find the function np.dot() useful. As a reminder, if $x = [x_1, x_2, ..., x_n]$, then `np.dot(x,x)` = $\sum_{j=0}^n x_j^{2}$.

- L2 loss is defined as $$\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^m(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}\tag{7}$$


In [17]:
def L1(yhat , y):
    loss = np.sum(np.abs(yhat - y))
    return loss

def L2(yhat , y):
    loss = np.sum(np.dot(yhat - y , yhat - y))
    return loss


In [18]:
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L1 = " + str(L1(yhat,y)))

L1 = 1.1


In [19]:
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L2 = " + str(L2(yhat,y)))

L2 = 0.43000000000000005
