# Activation function options for a single neuron
> Coding of activation functions commonly used in deep learning to regulate the output of basic procesing unit (neuron). 

- toc: true
- badges: true
- hide_binder_badge: true
- comments: true
- categories: [deeplearning, python]
- hide: false
- search_exclude: true
- author: Omer

In this post, I am going to discuss how to implement the activation function of a neuron using a simple python code. A neuron is the basic processing unit of any deep learning architecture. It receives two weighted inputs (x and b), adds them together, and outputs the value (y). This value is then subjected to different activation functions (a).

As the name implies, activation function allows the neuron's output to propagate to the next stage (another neuron) by mapping y to a. The different reasons to have this *activated output*; 
- to keep it in a specific range [low high]
- to keep it positive
- to avoid having larger values

Over the years, researchers have come up with many activation functions. However, here we will be discussing the most commonly used functions;
1. __Sigmoid__
2. __Tanh__
3. __RelU__
4. __Softmax__

Top three function are used in intermediate layer neurons (except for input and output layers). __Softmax__ is usually employed in the output layer. 

We mathematically define our activation function as 

\begin{equation*}
Z = \Theta(y)
\end{equation*} 

where,
$\Theta(y)$ represents the chosen activation functions and 
$Z$ represents the activated output that will be feed to next stage neuron

To begin, we import required modules.

In [21]:
#collapse-hide

# We can either use e or exp
from math import e    # value of e. e.g., e**y 
#from math import exp # e as function. e.g., exp(y)

import numpy as np

In [22]:
#hide
1/1+(e**-10)
1/1+(exp(-10))

1.0000453999297625

## Sigmoid
- small changes in input lead to small changes in output (activation)
- extreme changes in input lead to extreme changes in output (activation)
- activated output range [0 1]
\begin{equation*}
\theta(y) = \frac{1}{1+e^{-y}}
\end{equation*}

In [12]:
def sigmoid(y):
    return 1/(1+e**-y)

Trying out different values of $y$, we can see that activated output is always positive and never goes beyond 1 (upper limit)

In [13]:
#collapse-hide
print(f'Sigmoid with w1.x+ w0.b = y = 0.0001: {sigmoid(0.00001):.3f}')
print(f'Sigmoid with w1.x+ w0.b = y = 1000  : {sigmoid(10000):.3f}')
print(f'Sigmoid with w1.x+ w0.b = y = -10   : {sigmoid(-10):.3f}')
print(f'Sigmoid with w1.x+ w0.b = y = -100  : {sigmoid(-100):.3f}')
print(f'Sigmoid with w1.x+ w0.b = y = -2    : {sigmoid(-2):.3f}')

Sigmoid with w1.x+ w0.b = y = 0.0001: 0.500
Sigmoid with w1.x+ w0.b = y = 1000  : 1.000
Sigmoid with w1.x+ w0.b = y = -10   : 0.000
Sigmoid with w1.x+ w0.b = y = -100  : 0.000
Sigmoid with w1.x+ w0.b = y = -2    : 0.119


## Tanh
- activated output range [-1 1]
\begin{equation*}
\theta(y) = \frac{e^{y} - e^{-y}}{e^{y} + e^{-y}}
\end{equation*}

In [14]:
def tanh(y):
    return (e**y - e**-y)/(e**y + e**-y)

Here again, we can see that by choosing a tanh activation function, the activated output is in the range between [-1, 1].

In [15]:
#collapse-hide
print(f'Sigmoid with w1.x+ w0.b = y = 0.0001: {tanh(0.00001):.3f}')
print(f'Sigmoid with w1.x+ w0.b = y = 100   : {tanh(100):.3f}')
print(f'Sigmoid with w1.x+ w0.b = y = -10   : {tanh(-10):.3f}')
print(f'Sigmoid with w1.x+ w0.b = y = -100  : {tanh(-100):.3f}')
print(f'Sigmoid with w1.x+ w0.b = y = -2    : {tanh(-2):.3f}')

Sigmoid with w1.x+ w0.b = y = 0.0001: 0.000
Sigmoid with w1.x+ w0.b = y = 100   : 1.000
Sigmoid with w1.x+ w0.b = y = -10   : -1.000
Sigmoid with w1.x+ w0.b = y = -100  : -1.000
Sigmoid with w1.x+ w0.b = y = -2    : -0.964


## ReLu
Rectified linear unit is the most commonly used activation function in deep learning architectures (CNN, RNN, etc.). It is mathematically defined as shown below with the activation range of [0 z] 

\begin{equation*}
\theta(y) = max(0,y)
\end{equation*}


In [16]:
def relu(y):
    return max(0,y)

As we see this function simply rectifies the activated output when $y$ is negative

In [17]:
#collapse-hide
print(f'Sigmoid with w1.x+ w0.b = y = 0.0001: {relu(0.00001):.3f}')
print(f'Sigmoid with w1.x+ w0.b = y = 100   : {relu(100):.3f}')
print(f'Sigmoid with w1.x+ w0.b = y = -10   : {relu(-10):.3f}')
print(f'Sigmoid with w1.x+ w0.b = y = -100  : {relu(-100):.3f}')
print(f'Sigmoid with w1.x+ w0.b = y = -2    : {relu(-2):.3f}')

Sigmoid with w1.x+ w0.b = y = 0.0001: 0.000
Sigmoid with w1.x+ w0.b = y = 100   : 100.000
Sigmoid with w1.x+ w0.b = y = -10   : 0.000
Sigmoid with w1.x+ w0.b = y = -100  : 0.000
Sigmoid with w1.x+ w0.b = y = -2    : 0.000


## Softmax 

As mentioned before, __softmax__ is usually employed in the output layer. As an example, if there are 3 neurons in the output layer, softmax will indicate which of the three neurons has the highest activated output. This is usually done to decide the categorical output in response to an input __X__ to our neural network. 

Let us first define the activation function for a single neuron $i$ as 

\begin{equation*}
\theta(y_{i}) = e^{y_{i}} ~~~~~~~~~~~~~~~~~~~~~~~(1)
\end{equation*}

We then normalize the activated output of each neuron by combined activation of all the $M$ neurons.
 
\begin{equation*}
\theta(y_{i}) = \frac {e^{y_{i}}} {\sum_{j=1}^{M} e^{y_{j}}} ~~~~~~~~~~~~~~~~~~~~~~~(2)
\end{equation*}

for $i=1...M$ 

afterwards, we simply select the neuron with the largest normalized activated output


In [19]:
def softmax(y): 
    each_neuron = [e**i for i in y ] # compute exp for each individual neuron (eq-1 above)
    return [j/sum(each_neuron) for j in each_neuron]  # normalizing each neuron output by total (eq-2 above) 

Here we show example of 3 neurons in output layer

In [20]:
#collapse-hide
print('Sigmoid with w1.x+ w0.b = y = [-1,1,5]  : ', softmax([-1,1,5]))
print('Sigmoid with w1.x+ w0.b = y = [0,2,1]   : ', softmax([0 ,2,1]))
print('Sigmoid with w1.x+ w0.b = y = [-10,1,5] : ', softmax([-10,1,5]))
print('Sigmoid with w1.x+ w0.b = y = [5,1,5]   : ', softmax([5,1,5]))
print('Sigmoid with w1.x+ w0.b = y = [3,5,0]   : ', softmax([3,5,0]),'\n\n')

print('Softmax with argmax to select the winning neuro in output')
print('Sigmoid with w1.x+ w0.b = y = [-1,1,5]  : ', np.argmax(softmax([-1,1,5])))
print('Sigmoid with w1.x+ w0.b = y = [0,2,1]   : ', np.argmax(softmax([0 ,2,1])))
print('Sigmoid with w1.x+ w0.b = y = [-10,1,5] : ', np.argmax(softmax([-10,1,5])))
print('Sigmoid with w1.x+ w0.b = y = [5,1,5]   : ', np.argmax(softmax([5,1,5])))
print('Sigmoid with w1.x+ w0.b = y = [3,5,0]   : ', np.argmax(softmax([3,5,0])))


Sigmoid with w1.x+ w0.b = y = [-1,1,5]  :  [0.002428258029591338, 0.017942534803329198, 0.9796292071670795]
Sigmoid with w1.x+ w0.b = y = [0,2,1]   :  [0.09003057317038048, 0.665240955774822, 0.2447284710547977]
Sigmoid with w1.x+ w0.b = y = [-10,1,5] :  [3.0040020689707774e-07, 0.01798620455903037, 0.9820134950407627]
Sigmoid with w1.x+ w0.b = y = [5,1,5]   :  [0.4954626425778431, 0.009074714844313748, 0.4954626425778431]
Sigmoid with w1.x+ w0.b = y = [3,5,0]   :  [0.11849965453500957, 0.8756005950630876, 0.0058997504019027815] 


Softmax with argmax to select the winning neuro in output
Sigmoid with w1.x+ w0.b = y = [-1,1,5]  :  2
Sigmoid with w1.x+ w0.b = y = [0,2,1]   :  1
Sigmoid with w1.x+ w0.b = y = [-10,1,5] :  2
Sigmoid with w1.x+ w0.b = y = [5,1,5]   :  0
Sigmoid with w1.x+ w0.b = y = [3,5,0]   :  1
