<a href="https://colab.research.google.com/github/rahiakela/machine-learning-algorithms/blob/main/neural-networks-from-scratch/04-activation-function/relu_activation_function_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## ReLU Activation Function from Scratch

The rectified linear activation function is simpler than the sigmoid. It’s quite literally $y=x$ , clipped at $\theta$ from the negative side. If $x$ is less than or equal to $\theta$ , then $y$ is $\theta$ — otherwise, $y$ is equal to $x$.

$$
y = {\displaystyle \textstyle {\begin{cases} x, \space \space  x > 0 \\ 0, \space \space x < 0 \end{cases}}}
$$

<img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/neural-networks-from-scratch/04-activation-function/images/1.png?raw=1' width='600'/>

This simple yet powerful activation function is the most widely used activation function at the time of writing for various reasons — mainly speed and efficiency.

The ReLU activation function is extremely close to being a linear activation
function while remaining nonlinear, due to that bend after 0. This simple property is, however, very effective.




##Setup

In [None]:
!pip install nnfs

In [12]:
from nnfs.datasets import spiral_data
import numpy as np
import nnfs
import matplotlib.pyplot as plt

nnfs.init()

## ReLU Activation 

Despite the fancy sounding name, the rectified linear activation function is straightforward to code. Most closely to its definition:

In [1]:
inputs = [0, 2, -1, 3.3, -2.7, 1.1, 2.2, -100]

output = []

for i in inputs:
  if i > 0:     # if the current value is greater than 0, appending the current value
    output.append(i)
  else:         # if it’s not, appending 0
    output.append(0)

print(output)

[0, 2, 0, 3.3, 0, 1.1, 2.2, 0]


This can be written more simply, as we just need to take the largest of two values: 0 or neuron value. 

For example:

In [2]:
inputs = [0, 2, -1, 3.3, -2.7, 1.1, 2.2, -100]

output = []

for i in inputs:
    output.append(max(0, i))

print(output)

[0, 2, 0, 3.3, 0, 1.1, 2.2, 0]


NumPy contains an equivalent — `np.maximum()`:

In [3]:
inputs = [0, 2, -1, 3.3, -2.7, 1.1, 2.2, -100]

output = np.maximum(0, inputs)

print(output)

[0.  2.  0.  3.3 0.  1.1 2.2 0. ]


This method compares each element of the input list (or an array) and returns an object of the same shape filled with new values. 

We will use it in our new rectified linear activation class:

In [6]:
# ReLU activation class
class ReLU:
  # Forward pass
  def forward(self, inputs):
    # Calculate output values from input
    self.output = np.maximum(0, inputs)

    return self.output

In [7]:
relu = ReLU()
print(relu.forward(inputs))

[0.  2.  0.  3.3 0.  1.1 2.2 0. ]


Let’s apply this activation function to the dense layer’s outputs.

In [13]:
class Dense:

  def __init__(self, n_inputs, n_neurons):
    """Layer initialization: Initialize weights and biases"""
    # Note that we’re initializing weights to be (inputs, neurons), rather than ( neurons, inputs)
    self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
    # a bias can ensure that a neuron fires initially. so initializing it with zero
    self.biases = np.zeros((1, n_neurons))

  def forward(self, inputs):
    # Calculate output values from inputs, weights and biases
    self.output = np.dot(inputs, self.weights) + self.biases

# ReLU activation class
class ReLU:
  # Forward pass
  def forward(self, inputs):
    # Calculate output values from input
    self.output = np.maximum(0, inputs)

In [18]:
# Create dataset
X, y = spiral_data(samples=100, classes=3)

# Create Dense layer with 2 input features and 3 output values
dense1 = Dense(2, 3)

# Create ReLU activation (to be used with Dense layer)
relu = ReLU()

# Make a forward pass of our training data through this layer
dense1.forward(X)

# Forward pass through activation func.
# Takes in output from previous layer
relu.forward(dense1.output)

# Let's see output of the first few samples
print(f"Before ReLU:\n {dense1.output[:5]}")
print(f"After ReLU:\n {relu.output[:5]}")

Before ReLU:
 [[ 0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [ 1.2344425e-04 -4.2612613e-05  8.7741073e-06]
 [ 2.5981691e-04 -1.5019374e-04  2.1353657e-05]
 [ 4.0532288e-04 -3.3390927e-04  3.8064085e-05]
 [ 5.5259373e-04 -6.4534455e-04  6.0963972e-05]]
After ReLU:
 [[0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [1.2344425e-04 0.0000000e+00 8.7741073e-06]
 [2.5981691e-04 0.0000000e+00 2.1353657e-05]
 [4.0532288e-04 0.0000000e+00 3.8064085e-05]
 [5.5259373e-04 0.0000000e+00 6.0963972e-05]]


As you can see, negative values have been clipped (modified to be zero). That’s all there is to the rectified linear activation function used in the hidden layer. 

##Softmax Activation