# Tensorflow Developer Certificate Preparation
___
## Introduction to Tensorflow in Python - DataCamp - ML-Scientist-Career-Track - by Isaiah Hull
___
## Chapter 3.2 - Activation Functions

### 1. Activation functions
In the previous notebook, we discussed dense layers. We also briefly introduced the concept of an activation function through the sigmoid function. We will now return to activation functions.

### 2. What is an activation function?
A typical hidden layer consists of two operations. 
- The **first** performs **matrix multiplication**, which is a ``linear operation``, and the **second** applies an **activation function**, which is ``nonlinear operation``.

### 3. Why nonlinearities are important
Consider a simple model using the credit card data. The features are borrower age and credit card bill amount. The target variable is default.
![3.2.1](./figures/3.2.1.PNG)


Let's say we create a scatterplot of age and bill amount.   
![3.2.2](./figures/3.2.2.PNG)  

We can see that bill amount usually increases early in life and decreases later in life. This suggests that a high bill for young and older borrowers may mean something different for default. If we want our model to capture this, it can't be linear. It must allow the impact of the bill amount to depend on the borrower's age. This is what an activation function does.

### 4. A simple example
Let's look at a simple example, where we assume that the weight on age is 1 and the weight on the bill amount is 2. 

In [1]:
import numpy as np
import tensorflow as tf

# Define example borrower features
young, old = 0.3, 0.6
low_bill, high_bill = 0.1, 0.5

# Apply matrix multiplication step for all  feature combination
young_high = 1.0*young +2.0*high_bill
young_low = 1.0*young + 2.0*low_bill
old_high = 1.0*old + 2.0*high_bill
old_low = 1.0*old + 2.0*low_bill

- Note that ages are divided by 100 and the bill's amount is divided by 10000. We then perform the matrix multiplication step for all combinations of features: young with a high bill, young with a low bill, old with a high bill, and old with a low bill.

In [2]:
# Difference in default predictions for young
print(young_high - young_low)

# Difference in default predictions for old
print(old_high - old_low)

0.8
0.8


- If we don't apply an activation function and we assume the bias is zero, we find that the impact of bill size on default does not depend on age. In both cases, we predict a value of 0 point 8. Note that our target is a binary variable that is equal to 1 when the borrower defaults; however, predictions will be real numbers between 0 and 1, where values over 0 point 5 will be treated as predicting default.

- But what if we apply a sigmoid activation function? 

In [3]:
# Difference in default predictions for young
print(tf.keras.activations.sigmoid(young_high).numpy() - tf.keras.activations.sigmoid(young_low).numpy())

# Difference in default predictions for old
print(tf.keras.activations.sigmoid(old_high).numpy() - tf.keras.activations.sigmoid(old_low).numpy())

0.16337562
0.14204395


The impact of bill amount on default now depends on the borrower's age. In particular, we can see that the change in the predicted value for default is larger for young borrowers than it is for old borrowers.

### 5. The activation functions
In this notebook, we'll use the three most common activation functions: 
- sigmoid 
- relu
- softmax. 

### 6. The sigmoid activation function
- The sigmoid activation function is used primarily in the **output layer of binary classification problems**. 
- When we use the low-level approach, we'll pass the sum of the product of weights and inputs into ``tf.keras.activations.sigmoid()``
- When we use the high-level approach, we'll simply pass ``sigmoid`` as a parameter to a keras dense layer.  
![3.2.3](./figures/3.2.3.PNG)

### 7. The relu activation function
- We'll typically use the ``rectified linear unit`` or ``relu`` activation in ``all hidden layers`` **other than** the ``output layer``. 
- When we use the low-level approach, we'll pass the sum of the product of weights and inputs into ``tf.keras.activations.relu()``
- When we use the high-level approach, we'll simply pass ``relu`` as a parameter to a keras dense layer.
- This activation simply takes the maximum of the value passed to it and 0.  
![3.2.4](./figures/3.2.4.PNG)

### 8. The softmax activation function
- Finally, the ``softmax activation function`` is used in the ``output layer`` in classification problems with ``more than two classes``. 
- The outputs from a softmax activation function can be interpreted as predicted class probabilities in multiclass classification problems.
- When we use the low-level approach, we'll pass the sum of the product of weights and inputs into ``tf.keras.activations.softmax()``
- When we use the high-level approach, we'll simply pass ``softmax`` as a parameter to a keras dense layer.

### 9. Activation functions in neural networks
- Let's wrap up by applying some activation functions in a neural network. 
- We'll do this using the high-level approach, starting with an input layer. 
- We'll pass this to our first dense layer, which has 16 output nodes and a relu activation. 
- Dense layer 2 then reduces the number of nodes from 16 to 8 and applies a sigmoid activation. 
- Finally, we apply a softmax activation function in the output layer, since there are more than 2 outputs.

In [5]:
import tensorflow as tf

# Define input layer (borrower_features are not defined
# inputs = tf.constant(borrower_features, tf.float32)

# Define dense layer 1
# dense1 = tf.keras.layers.Dense(16, activation = 'relu')(inputs)

# Define dense layer 2
# dense2 = tf.keras.layers.Dense(8, activation = 'sigmoid')(dense1)

# Define output layer
# outputs = tf.keras.layers.Dense(4, activation = 'softmax')(dense2)