# Activation Functions

 Activactions functions are responsible for mapping the predicition that Neural Networks make according to the input they receive. The functioning of an __activation function__ is based on **"turning on"** or **"turning off"**( or how probable it is to turn on or turn off depending on the type of activation function) a speficic neuron on the entiral Network, determined by its calculation below:
 <pre>
 
 </pre>
$output = Activation \ Function (input \cdot weight+ bias)$

<pre>

</pre>




Here´s a Gif to demonstrate how it actually works:

<pre></pre>

<img src="Images/NN_Gif.gif" width= 400 height=400 />

<pre></pre>
* The neurons in green are those that were activated for that specific input
* The neurons that aren´t colored in green are deactivated for that specific input

__In the image below, there´s an illustration of what´s the structure of NN with activation function like:__

<pre>

</pre>

<img src="Images/Activation_Function_model.JPEG" />

_Let´s see some types of activations functions_

## The Step Activation Function

This function was one of the first activation functions that were used for Neural Networks, but nowadays they are not used anymore because it is not useful in terms of NN´s learning. That´s due to its failure to catch how close an input was to activate a neuron, if the value was $-1$ or $-300$, it will always have the same output without actually expliciting to the model the closeness of turning a neuro on.

_Here´s how a picture of how it works:_

<img src="Images/Step_function.PNG" width = 600 height = 600 />

*  ___Summaring, if a neuron´s output is less or equal to zero, it´ll output $0$, otherwise it´ll output $1$___

## The Linear Activation Function

This activation function is mostly used in the output layer of regression model of neural networks. it´s simply a $y=x$ function

_Here´s a picture of how it works:_
<pre>

</pre>

<img src="Images/Linear_Function.PNG" width=600 height=600/>
<pre>

</pre>

* ___Summaring, it´ll output the neurons result (output)___

## The Sigmoid Activation Function

This is a function that tends to $0$ when its input goes to negative infinity, that reaches $0.5$ when its inputs is $0$ and tends to $1$ when its input goes to positive infinity. This function is really good for NN´s learning, since it overcomes the failure of not analizing how close a neuron was to activate or not. There´s just one small disadvantage for this function, it doesn´t deal well with __dead neurons__ which are the types of neuron that keeps outputting zero due to its weight. This Activation function also brings the functionality of dealing with non-linear data. But after some time, it was replaced for the **ReLu**  activation function, because **ReLU** is easier to implement computationally and has the same advantages.

_Here´s an example of how it works:_
<pre>

</pre>


<img src="Images/Sigmoid_Function.PNG" width=600 height=600/>
<pre>

</pre>

* ___Summaring, it ouputs values between $\ 0 $ and $\ 1$, moreover when neuron´s output is $0$ its result is $0.5$___

## The Rectified Linear  Unit Activation Function (ReLU)

 This activation Function is one the the mostly used in the actual days, since it can deal with non-linear data, supports the NN´s learning process and it´s easier to implement than **Sigmoid Activation Function**. it actually is very similiar to the **Step Activation Function**, just instead of outputting $\ 1$ when the neuron´s input is greater than $\ 0$, it outputs the input itself.

_Here´s an example of how it works:_
<pre>

</pre>


<img src="Images/ReLu_function.PNG" width=600 height=600/>
<pre>

</pre>

* ___Summaring, when the neuron´s output is less or equal to $\ 0$, it deactivates, otherwise it outputs the neuron´s output___

## The Softmax Activation Function

This activaction function is really important for classification models, since it can receive non-normalized data and outputs a distribution of probabilities which is normalized. This one has a more complicated math associated with it, but with programming there´s no big deal, moreover with the use of Numpy library its calculation can get faster and more optimized.

_Here´s an example of how it works:_

<pre>

</pre>

<img src="Images/softmax.PNG" />

<pre>

</pre>

it is a bit confusing, isn´t it ?

Well, by breaking down this function into small chunks, it can be provided a better understanding of what this function does, taking all the confusion away.


In the general, this function gets each result of a neuron in layer, and put it as expoent of a exponenttial of base __e - Euler number__. Then it sums all of these exponentiated values, and divide each of these exponential values by this sum. 

__It´ll get even clearer in the code section.__


So, now, let´s go to the fun part : **Code Time**

In [5]:
# So, let´s take a more pragmatical approach by coding the Activation Functions

# But first, let´s settle down the background to to make use of the activation functions.


# Importing the libraries
import numpy as np

#Creating the class
class Dense_Layer():
    
    #Function responsible for intiliaze all of the important attributes of the class
    # n_input -> number of features that input layer has
    # n_neurons -> number os neurons in this layer
    def __init__(self,n_input,n_neurons):
        # The biases is an array of 1D-dimension which the number size the same as the quantity of neurons in the layer
        self.biases = np.ones((1,n_neurons))
        
        #The weights is an array of 2D-dimension which the shape follows this rule: (n_neurons,number of neurons of the precious layer), but we switched the values because we need its tranposition.
        # It gets mutlipleid by 0.01 so that the weights initially don´t have a great impact in the NN
        self.weights = 0.01*np.random.randn(n_neurons,n_input)
        
    #function responsible for the calculations and the layer output
    def forward(self,inputs):
        outputs = np.dot(inputs,self.weights) + self.biases
        
        return outputs
        

# Batch of inputs
inputs = [[27.0,3.0,12.0,4.0],[16.0,97.0,82.0,7.0],[21.0,-1.2,0.8,0.0]]

#n_input -> number of features of the input
n_input = 4

#n_neurons -> number of neurons for that layer
n_neurons = 4

# Initialize the Dense Layer
layer_1 = Dense_Layer(n_input,n_neurons)

# Perform the forward pass ( calculation that gives us the output array )
output_1 = layer_1.forward(inputs)


In [41]:
# Now, it´s activation time!! Let´s jump right into them.

# The Step Activaction Function

class Step_Activation():
        
    def forward(self,inputs):
        self.output = np.zeros((np.array(inputs).shape[0],np.array(inputs).shape[1]))
        
        for i,input_ in enumerate(inputs):
            for e,input__ in enumerate(input_):
                if input__ > 0:
                    self.output[i][e] = 1
        return self.output
        

Activation_1 = Step_Activation()
output = Activation_1.forward([[-1,-1,-1],[2,2,2],[3,-1,0]])
print(output)

[[0. 0. 0.]
 [1. 1. 1.]
 [1. 0. 0.]]


In [42]:
# The Linear Activation Function 

class Linear_Activation():
    
    def forward(self,inputs):
        
        return np.array(inputs)
    
Activation_1 = Linear_Activation()
output = Activation_1.forward([[-1,-1,-1],[2,2,2],[3,-1,0]])
print(output)

[[-1 -1 -1]
 [ 2  2  2]
 [ 3 -1  0]]


In [43]:
# The Sigmoid Activation Function

class Sigmoid_Activation():
    
    def forward(self,inputs):
        self.output = np.zeros((np.array(inputs).shape[0],np.array(inputs).shape[1]))
        inputs = np.array(inputs)
        
        self.output = 1/(1+np.exp(-inputs))
        
        return self.output

Activation_1 = Sigmoid_Activation()
output = Activation_1.forward([[-1,-1,-1],[2,2,2],[3,-1,0]])
print(output)

[[0.26894142 0.26894142 0.26894142]
 [0.88079708 0.88079708 0.88079708]
 [0.95257413 0.26894142 0.5       ]]


In [44]:
# The ReLU Activation Function

class ReLU_Activation():
    
    def forward(self,inputs):
        self.output = np.zeros((np.array(inputs).shape[0],np.array(inputs).shape[1]))
        inputs = np.array(inputs)
        
        for i,input_ in enumerate(inputs):
            for e,input__ in enumerate(input_):
                if input__ > 0:
                    self.output[i][e] = input__
        
        
        return self.output

Activation_1 = ReLU_Activation()
output = Activation_1.forward([[-1,-1,-1],[2,2,2],[3,-1,0]])
print(output)

[[0. 0. 0.]
 [2. 2. 2.]
 [3. 0. 0.]]


In [49]:
# There´s a better way to do this in Numpy:

# The ReLU Activation Function 

class ReLU_Activation():
    
    def forward(self,inputs):
        inputs = np.array(inputs)
        
        self.output = np.maximum(0,inputs)
        
        return self.output

Activation_1 = ReLU_Activation()
output = Activation_1.forward([[-1,-1,-1],[2,2,2],[3,-1,0]])
print(output)

[[0 0 0]
 [2 2 2]
 [3 0 0]]


In [61]:
# The Softmax Activation Function

class Softmax_Activation():
    
    def forward(self,inputs):
        inputs = np.array(inputs)
        
        exp_list = np.exp(inputs)
        
        probabilities = exp_list/np.sum(exp_list, axis=1, keepdims=True)
        
        return probabilities

Activation_1 = Softmax_Activation()
output = Activation_1.forward([[-1,-1,-1],[2,2,2],[3,-1,0]])
print(output)

[[0.33333333 0.33333333 0.33333333]
 [0.33333333 0.33333333 0.33333333]
 [0.93623955 0.01714783 0.04661262]]


There´s just one small problem we need to solve. The Softmax Activation takes as its input non-normalized values, so sometimes it can get high values. Regarding the fact that Softmax works with exponential function, it can be problematic due to overflow of values. Let me show it in code:

In [62]:
value_1 = 10
value_2 = 100
value_3 = 1000
value_4 = 10000

print(np.exp(value_1))

22026.465794806718


In [63]:
print(np.exp(value_2))

2.6881171418161356e+43


In [64]:
print(np.exp(value_3))

inf


  print(np.exp(value_3))


In [65]:
print(np.exp(value_4))

inf


  print(np.exp(value_4))


__So as it can be seen, we need to find a way to overcome this issue.__

We can do that, by subtracting the highest value in the array, since we´re dealing with exponenial function and we´re normaling the valuea at the end.

So let´s do that in code:

In [71]:
# The Softmax Activation Function

class Softmax_Activation():
    
    def forward(self,inputs):
        inputs = np.array(inputs)
        
        exp_list = (np.exp(inputs)-np.max(inputs,axis=1,keepdims=True))
        
        probabilities = exp_list/np.sum(exp_list, axis=1, keepdims=True)
        
        return probabilities

Activation_1 = Softmax_Activation()
output = Activation_1.forward([[-1,-1,-1],[2,2,2],[3,-1,0]])
print(output)

[[ 0.33333333  0.33333333  0.33333333]
 [ 0.33333333  0.33333333  0.33333333]
 [ 1.37195581 -0.21135731 -0.1605985 ]]


And to end this notebook, let´s re-do the proccess we did previouly but making use of the __Activation Functions__

In [72]:
# Importing the libraries
import numpy as np

#Creating the class
class Dense_Layer():
    
    #Function responsible for intiliaze all of the important attributes of the class
    # n_input -> number of features that input layer has
    # n_neurons -> number os neurons in this layer
    def __init__(self,n_input,n_neurons):
        # The biases is an array of 1D-dimension which the number size the same as the quantity of neurons in the layer
        self.biases = np.ones((1,n_neurons))
        
        #The weights is an array of 2D-dimension which the shape follows this rule: (n_neurons,number of neurons of the precious layer), but we switched the values because we need its tranposition.
        # It gets mutlipleid by 0.01 so that the weights initially don´t have a great impact in the NN
        self.weights = 0.01*np.random.randn(n_neurons,n_input)
        
    #function responsible for the calculations and the layer output
    def forward(self,inputs):
        outputs = np.dot(inputs,self.weights) + self.biases
        
        return outputs

# The ReLU Activation Function 

class ReLU_Activation():
    
    def forward(self,inputs):
        inputs = np.array(inputs)
        
        self.output = np.maximum(0,inputs)
        
        return self.output


# The Softmax Activation Function

class Softmax_Activation():
    
    def forward(self,inputs):
        inputs = np.array(inputs)
        
        exp_list = (np.exp(inputs)-np.max(inputs,axis=1,keepdims=True))
        
        probabilities = exp_list/np.sum(exp_list,axis=1, keepdims=True)
        
        return probabilities


In [73]:
# Batch of inputs
inputs = [[27.0,3.0,12.0,4.0],[16.0,97.0,82.0,7.0],[21.0,-1.2,0.8,0.0]]

#n_input -> number of features of the input
n_input = 4

#n_neurons -> number of neurons for that layer
n_neurons = 4

# Initialize the Dense Layer
layer_1 = Dense_Layer(n_input,n_neurons)

# Perform the forward pass ( calculation that gives us the output array )
output_1 = layer_1.forward(inputs)

# initialize the ReLU activation
activation_1 = ReLU_Activation() 

# Perform the the activation through the Forward pass

output_1_activacted = activation_1.forward(output_1)

# Initialize The other Dense Layer

#Number of neurons for the second hidden layer
n_neurons_2 = 4
# Number of features for the input: .shape, outputs a tuple with the shape of the array.
n_input_2 = output_1_activacted.shape[1]

layer_2 = Dense_Layer(n_input_2,n_neurons_2)

output_2 = layer_2.forward(output_1_activacted)

# Initialize the Softmax Activation

activation_2 = Softmax_Activation()

# Perform the the activation through the Forward pass

output_2_activacted = activation_2.forward(output_2)

print(output_2_activacted)


[[0.23221253 0.2647992  0.24480698 0.25818129]
 [0.23652323 0.2677135  0.23793666 0.25782661]
 [0.23268647 0.26349917 0.24750048 0.25631387]]


Now that we have approched the __Activation Functions__, we have basic structure of a Neural Network.

The next step is to see how our model is going, for that we need to calculate the __loss__ of it. So that we have the right tools to adjust our model by analizing this __loss__ and changing the __weights and bias__ values. 

Perhaps, you´re starting to see the big picture of NN´s. If not, don´t worry, we still have a lot to cover.

> In the next notebook, we are going to learn about the __Loss__ in a Neural Network.