## CATEGORY 1 - CLASSIFICATION ON IRIS DATASET

We will build a classification model on this data using neural network with Tensorflow's Keras API. For simplicity, let’s use ‘petal length’ and ‘petal width’ as the features, and only two species : ‘versicolor’ and ‘virginica’. Download the Iris flower dataset from the webpage. 
##Prepare the Dataset

Import the Iris data set into python and subset the data to keep the relevant rows. Plot the data point in the dataset for the two feature vectors and the two classes.

## Design a Neural Network

We are building a neural network with a single hidden layer. Also, we will set the size of the hidden layer to 6.


## Forward Propogation
In the forward propagation step, we will use tanh as the first activation function, and sigmoid as the second activation function.



## Compute the cost function

Compute the cost function to be minimized. We would be calculating the cross-entropy cost (−(ylog(p)+(1−y)log(1−p))).
More on the entropy loss from [here](https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html)


## Backward propagation

Compute the backward propagation step in which we calculate the derivatives of the cost function.
 Print the cost for every 1000 epochs.


## CATEGORY 2 - ACTIVATION FUNCTIONS
Now lets create some of the most frequently used activation functions and plot them. Sigmoid function is done for you. Perform in a similar lines for the remaining functions. helpers_04 may be downloaded from the webpage.

## Sigmoid (Logistic) Function

$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$

$\sigma$ ranges from (0, 1). When the input $x$ is negative, $\sigma$ is close to 0. When $x$ is positive, $\sigma$ is close to 1. At $x=0$, $\sigma=0.5$

In [0]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import itertools as it

import helpers_04

%matplotlib inline

In [0]:
def show_me(xs, ys, ylim, cross):
    fig = plt.figure(figsize=(6,4))

    plt.grid(True, which='both')
    plt.axhline(y=0, color='y')
    plt.axvline(x=0, color='y')
    
    plt.plot(xs, ys)
    plt.plot(0,cross,'ro')

    plt.ylim(ylim)

In [0]:
xs = np.linspace(-10.0, 10.0, num=100)
sigmoid = tf.nn.sigmoid(xs)
ys = tf.Session().run(sigmoid)
show_me(xs, ys, (-0.1, 1.15), .5)

Pros:

* Easy derivative
* Function looks like we think a neuron might function: it is either off or outputing a value (up to a maximum)

Cons: 

* Not symmetric, which causes issues when training
* Susceptible to vanishing gradients: when input values are saturated (either positively or negatively), the derivative is close to zero.

##### Derivative

Derivative of the sigmoid is easy to calculate if you know the output:

$$
\frac{d\sigma}{dx} = \sigma \left(1 - \sigma \right)
$$


## Hyperbolic Tangent (Tanh)

$$
tanh(x) = \frac{sinh(x)}{cosh(x)} = \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}} = \frac{e^{2x} - 1}{e^{2x} + 1}
$$

Pros:
* Similar to sigmoid, but "stretched" to range from (-1, 1)
* Symmetric around 0, which helps for optimization

Cons:
* Still suffers from vanishing gradient
##### Derivative

$$
\frac{dtanh}{dx} = 1 - tanh^{2} 
$$

## Rectified Linear Unit (ReLU)

$$
ReLU(x) = max\left(0,x\right) \\ \\
$$

Equivalent to:

$$
\begin{align*}
  ReLU(x) = \begin{cases}
    0 & \text{if $x\lt0$} \\
    x & \text{if $x\geq0$}
  \end{cases}
\end{align*}
$$

Pros:
* Incredibly easy to calculate output and derivative
* Doesn't suffer from vanishing gradient on positive side
* In practice tend to be more useful than Sigmoid/Tanh for typical activation functions

Cons:
* Not symmetric
* Can cause exploding activations if not careful
* Gradient can "die" if not careful

##### Derivative

$$
\begin{align*}
  \frac{dReLU}{dx} = \begin{cases}
    0 & \text{if $x\lt0$} \\
    1 & \text{if $x\geq0$}
  \end{cases}
\end{align*}
$$

## Leaky ReLU

$$
LReLU(x) = max\left(\alpha x,x\right) \\ \\
$$

Equivalent to:

$$
\begin{align*}
  ReLU(x) = \begin{cases}
    \alpha x & \text{if $x\lt0$} \\
    x & \text{if $x\geq0$}
  \end{cases}
\end{align*}
$$

Pros:
* Similar to ReLU, but doesn't "die".

Cons:
* Yet another hyper-parameter to tune.

##### Derivative

$$
\begin{align*}
  \frac{dReLU}{dx} = \begin{cases}
    \alpha & \text{if $x\lt0$} \\
    1 & \text{if $x\geq0$}
  \end{cases}
\end{align*}
$$
