<a href="https://colab.research.google.com/github/Machine-Learning-Tokyo/DL-workshop-series/blob/master/Part%20II%20-%20Learning%20in%20Deep%20Networks/activation_functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Activation functions:
- Tanh
- Sigmoid
- Softmax
- ReLU
- Leaky ReLU
- Custom activation (e.g. x<sup>2</sup>)

### Why we need activation functions?
The operations that take place in the Fully Connected and Convolution layers are linear functions.
Having two or more layers of linear functions stack over is pointless since the result can be obtained by using only one linear layer:

---
f<sub>1</sub>(x) = a<sub>1</sub>x + b<sub>1</sub>

f<sub>2</sub>(x) = a<sub>2</sub>x + b<sub>2</sub>


---
f<sub>1</sub>(f<sub>2</sub>(x)) = a<sub>1</sub>(a<sub>2</sub>x+b<sub>2</sub>) + b<sub>1</sub>

f<sub>1</sub>(f<sub>2</sub>(x)) = (a<sub>1</sub>a<sub>2</sub>)x + (a<sub>1</sub>b<sub>2</sub> + b<sub>1</sub>)

---

f<sub>1</sub>(f<sub>2</sub>(x)) = a<sub>3</sub>x + b<sub>3</sub>

a<sub>3</sub> = a<sub>1</sub>a<sub>2</sub>,

b<sub>3</sub> = a<sub>1</sub>b<sub>2</sub> + b<sub>1</sub>

---




In order to take advantage of multiple layers we need to add some non linearity in the system:

- tanh:
$$tanh(x) = \frac{e^x-e^{-x}}{e^x+e^{-x}}$$

- sigmoid:

$$\sigma(x) = \frac{1}{1+e^{-x}}$$

- Rectified Linear Unit (ReLU):
$$r(x) = max(0, x)$$

- Leaky Rectified Linear Unit (LeakyReLU):
$$ lr(x) = \begin{cases} ax &\text{if x < 0}\\x&\text{if x $\geq$ 0} \end{cases}$$

- softmax:
$$s(x_i) = \frac{e^{x_i}}{\sum_je^{x_j}}$$


In [0]:
import keras
from keras.layers import Activation, LeakyReLU
from keras.models import Sequential
import keras.backend as K

import matplotlib.pyplot as plt
import numpy as np

First we define a set of points on the y = x line

In [0]:
x = np.arange(-3, 3.5, 0.5)
plt.plot(x, x, '-')
plt.show()

Then we build a keras model with one layer: the activation layer.

In our example we use *tanh()* as activation function.

We pass the numbers through the model and get the outputs.

Then we plot both the original points and the model outputs together to see the transformation

In [0]:
K.clear_session()
model = Sequential([Activation('tanh', input_shape=(1,))])
y_act = model.predict(x)

plt.plot(x, x, '-')
plt.plot(x, y_act, '--')
plt.show()

Now let's do it for different activation functions and plot the results

In [0]:
activations = ['tanh', 'sigmoid', 'relu']
y_acts = []
for activation in activations:
  K.clear_session()
  model = Sequential([Activation(activation, input_shape=(1,))])
  y_act = model.predict(x)
  y_acts.append(y_act)

Since leaky relu cannot be passed as string argument, we treat it separately

In [0]:
K.clear_session()
model = Sequential([LeakyReLU(0.5, input_shape=(1,))])
y_act = model.predict(x)
y_acts.append(y_act)
activations.append('leakyRelu')

In the case of softmax if we pass each number by itself it will always output 1.

Thus we pass all the numbers together and get the corresponding probabilities (each output is in [0, 1] and the sum is 1)

In [0]:
K.clear_session()
model = Sequential([Activation('softmax', input_shape=(len(x),))])
x_sm = np.expand_dims(x, 0)
y_act = model.predict(x_sm)
y_acts.append(y_act[0])
activations.append('softmax')

In [0]:
fig, ax = plt.subplots()
ax.plot(x, x, 'o', label='original data')

for y_act, activation in zip(y_acts, activations):
  ax.plot(x, y_act, '--', label=activation)

legend = ax.legend(loc='lower right', fontsize='large')
plt.show()

We can also define a custom activation function using the ordinary functions and keras backend

In [0]:
def sq_activation(x):
  return x**2

K.clear_session()
model = Sequential([Activation(sq_activation, input_shape=(1,))])
y_act = model.predict(x)
activations.append('sq_activation')

In [0]:
fig, ax = plt.subplots()
ax.plot(x, x, '-', label='original data')
ax.plot(x, x**2, 'o', label='x^2')
ax.plot(x, y_act, '--', label='activation')

legend = ax.legend(loc='lower right', fontsize='large')
plt.show()