# Activation function options for a single neuron
> Coding of activation functions commonly used in deep learning to regulate the output of basic procesing unit (neuron). 

- toc: true
- badges: true
- hide_binder_badge: true
- hide_colab_badge: true
- hide_github_badge: true
- comments: true
- categories: [deeplearning, Julia 1.5]
- hide: false
- search_exclude: true
- author: Omer

In this post, I am going to discuss how to implement the activation function of a neuron using a simple python code. A neuron is the basic processing unit of any deep learning architecture. It receives two weighted inputs (x and b), adds them together, and outputs the value (y). This value is then subjected to different activation functions (A).

As the name implies, activation function allows the neuron's output to propagate to the next stage (another neuron) by mapping y to a. The different reasons to have this *activated output*; 
- to keep it in a specific range [low high]
- to keep it positive
- to avoid having larger values

Over the years, researchers have come up with many activation functions. However, here we will be discussing the most commonly used functions;
1. __Sigmoid__
2. __Tanh__
3. __RelU__
4. __Softmax__

Top three function are used in intermediate layer neurons (except for input and output layers). __Softmax__ is usually employed in the output layer. 

We mathematically define our activation function as 

\begin{equation*}
A = \theta(y)
\end{equation*} 

where,
$\theta(y)$ represents the chosen activation functions and 
$A$ represents the activated output that will be feed to next stage neuron

> Note: In Julia we do not need to import any library to achieve following functionality

## Sigmoid
- small changes in input lead to small changes in output (activation)
- extreme changes in input lead to extreme changes in output (activation)
- activated output range [0 1]
\begin{equation*}
\theta(y) = \frac{1}{1+e^{-y}}
\end{equation*}

A single line function in Julia can be written as follows. 

In [69]:
sigmoid(y) = round( 1/(1+exp(-y)), digits=3)

sigmoid (generic function with 1 method)

> Important: The `round` function is just used to limit the result to 3 significant digits. It's not necessary to achieve the functionality of sigmoid and is merely used for display convenience.  

Trying out different values of $y$, we can see that activated output is always positive and never goes beyond 1 (upper limit). 
    

In [70]:
println("Sigmoid with w1.x+ w0.b = y = 0.0001: $(sigmoid(0.00001))") 
println("Sigmoid with w1.x+ w0.b = y = 1000  : $(sigmoid(10000))")
println("Sigmoid with w1.x+ w0.b = y = -10   : $(sigmoid(-10))")
println("Sigmoid with w1.x+ w0.b = y = -100  : $(sigmoid(-100))")
println("Sigmoid with w1.x+ w0.b = y = -2    : $(sigmoid(-2))")

Sigmoid with w1.x+ w0.b = y = 0.0001: 0.5
Sigmoid with w1.x+ w0.b = y = 1000  : 1.0
Sigmoid with w1.x+ w0.b = y = -10   : 0.0
Sigmoid with w1.x+ w0.b = y = -100  : 0.0
Sigmoid with w1.x+ w0.b = y = -2    : 0.119


> Important: The `$` sign resolves the arguments. It is convenient way to use variables and functions inside string arguments of `println` function

## Tanh
- activated output range [-1 1]
\begin{equation*}
\theta(y) = \frac{e^{y} - e^{-y}}{e^{y} + e^{-y}}
\end{equation*}

In [71]:
tanh(y) = round( (exp(y) - exp(-y))/(exp(y) + exp(-y)), digits=3)

tanh (generic function with 1 method)

Here again, we can see that by choosing a tanh activation function, the activated output is in the range between [-1, 1].

In [72]:
#collapse-hide
println("tanh with w1.x+ w0.b = y = 0.0001: $(tanh(0.00001))")
println("tanh with w1.x+ w0.b = y = 1000  : $(tanh(100))")
println("tanh with w1.x+ w0.b = y = -10   : $(tanh(-10))")
println("tanh with w1.x+ w0.b = y = -100  : $(tanh(-100))")
println("tanh with w1.x+ w0.b = y = -2    : $(tanh(-2))")

tanh with w1.x+ w0.b = y = 0.0001: 0.0
tanh with w1.x+ w0.b = y = 1000  : 1.0
tanh with w1.x+ w0.b = y = -10   : -1.0
tanh with w1.x+ w0.b = y = -100  : -1.0
tanh with w1.x+ w0.b = y = -2    : -0.964


## ReLu
Rectified linear unit is the most commonly used activation function in deep learning architectures (CNN, RNN, etc.). It is mathematically defined as shown below with the activation range of [0 z] 

\begin{equation*}
\theta(y) = max(0,y)
\end{equation*}


In [73]:
relu(y) = round(max(0, y), digits=1)

relu (generic function with 1 method)

As we see this function simply rectifies the activated output when $y$ is negative

In [74]:
#collapse-hide
println("Relu with w1.x+ w0.b = y = 0.0001: $(relu(0.00001))")
println("Relu with w1.x+ w0.b = y = 1000  : $(relu(100))")
println("Relu with w1.x+ w0.b = y = -10   : $(relu(-10))")
println("Relu with w1.x+ w0.b = y = -100  : $(relu(-100))")
println("Relu with w1.x+ w0.b = y = -2    : $(relu(-2))")

Relu with w1.x+ w0.b = y = 0.0001: 0.0
Relu with w1.x+ w0.b = y = 1000  : 100.0
Relu with w1.x+ w0.b = y = -10   : 0.0
Relu with w1.x+ w0.b = y = -100  : 0.0
Relu with w1.x+ w0.b = y = -2    : 0.0


## Softmax 

As mentioned before, __softmax__ is usually employed in the output layer. As an example, if there are 3 neurons in the output layer, softmax will indicate which of the three neurons has the highest activated output. This is usually done to decide the categorical output in response to an input __X__ to our neural network. 

Let us first define the activation function for a single neuron $i$ as 

\begin{equation*}
\theta(y_{i}) = e^{y_{i}} ~~~~~~~~~~~~~~~~~~~~~~~(1)
\end{equation*}

We then normalize the activated output of each neuron by combined activation of all the $M$ neurons.
 
\begin{equation*}
\theta(y_{i}) = \frac {e^{y_{i}}} {\sum_{j=1}^{M} e^{y_{j}}} ~~~~~~~~~~~~~~~~~~~~~~~(2)
\end{equation*}

for $i=1...M$ 

afterwards, we simply select the neuron with the largest normalized activated output


In [75]:
function softmax(y, win=false)
    each_neuron = [] # declare empty array
    
    #compute exp for each individual neuron (eq-1 above)
    for i in y
        push!(each_neuron, exp(i)) 
    end
    
    #normalizing each neuron output by total (eq-2 above) 
    total = sum(each_neuron)
    for j in eachindex(each_neuron)
        each_neuron[j] = each_neuron[j] / total 
    end
    
    if win #if need wining neuron info
        val,id = findmax( round.(each_neuron, digits=3) )
        return id
    else
        return round.(each_neuron, digits=3) # dot shows element-wise operation on each element of array
    end
end


softmax (generic function with 2 methods)

Here we show example of 3 neurons in output layer

In [76]:
#collapse-hide
println("Softmax with w1.x+ w0.b = y = [-1,1,5] : $(softmax([-1,1,5]))")
println("Softmax with w1.x+ w0.b = y = [0,2,1]  : $(softmax([0,2,1] ))")
println("Softmax with w1.x+ w0.b = y = [-10,1,5]: $(softmax([-10,1,5]))")
println("Softmax with w1.x+ w0.b = y = [5,1,5]  : $(softmax([5,1,5]))")
println("Softmax with w1.x+ w0.b = y = [3,5,0]  : $(softmax([3,5,0]))")

println("\n\n Softmax with argmax to select the winning neuro in output \n")

println("Softmax with w1.x+ w0.b = y = [-1,1,5] : $(softmax([-1,1,5],true))")
println("Softmax with w1.x+ w0.b = y = [0,2,1]  : $(softmax([0,2,1],true ))")
println("Softmax with w1.x+ w0.b = y = [-10,1,5]: $(softmax([-10,1,5],true))")
println("Softmax with w1.x+ w0.b = y = [5,1,5]  : $(softmax([5,1,5],true))")
println("Softmax with w1.x+ w0.b = y = [3,5,0]  : $(softmax([3,5,0],true))")

Softmax with w1.x+ w0.b = y = [-1,1,5] : [0.002, 0.018, 0.98]
Softmax with w1.x+ w0.b = y = [0,2,1]  : [0.09, 0.665, 0.245]
Softmax with w1.x+ w0.b = y = [-10,1,5]: [0.0, 0.018, 0.982]
Softmax with w1.x+ w0.b = y = [5,1,5]  : [0.495, 0.009, 0.495]
Softmax with w1.x+ w0.b = y = [3,5,0]  : [0.118, 0.876, 0.006]


 Softmax with argmax to select the winning neuro in output 

Softmax with w1.x+ w0.b = y = [-1,1,5] : 3
Softmax with w1.x+ w0.b = y = [0,2,1]  : 2
Softmax with w1.x+ w0.b = y = [-10,1,5]: 3
Softmax with w1.x+ w0.b = y = [5,1,5]  : 1
Softmax with w1.x+ w0.b = y = [3,5,0]  : 2
