# Perceptron
This notebook contains the implementation for a practical exercise on perceptron studies found on an [Artificial Neural Network book](https://www.amazon.com.br/Neurais-Artificiais-Engenharia-Ci%C3%AAncias-Aplicadas/dp/8588098539). After a brief explanation of how a perceptron model works, a training algorithm is implemented and evaluated on a given dataset.  

## The perceptron model

A perceptron is the simplest form of a neural network used to classification patterns that are linearly separable. An illustration of the perceptron model is shown in the following picture.

<img src="Figures/perceptron.png">

Basically, it consists of a single neuron with adjustable sinaptic weights and bias. Let $x \in R^{m+1}$ be an extended version of an input signal $x_i \in R^{m}$ by adding a fixed input $x_0 = +1$ (for practical implementations). Conversely, let $w \in R^{m+1}$ be an extension of a randomly initialized weight vector $w_i \in R^m$ by adding a bias element $b$. 

In this case, a signal $u$ is set as the following equation: 

$\begin{equation}    
    u = \sum_{k = 1}^m x_k \cdot w_k - b = w_i^T \cdot x_i - b = w^T \cdot x
\end{equation}
$

#### Activation Function

In this model, the $u$ signal goes through an activation function $\sigma(\cdot)$ which has several sorts of flavours: 

| Activation Type | Equation | 
| ----------------- | -------------------------| 
| linear $\sigma(x)$ | $\sigma(x) = x$ | 
| unipolar step (hard limiter) | $\sigma(x) = \left\{ \begin{array}{ll} 1  & \mbox{if } x \geq 0 \\ 0 & \mbox{if } x < 0 \end{array}\right.$ |  
| bipolar step | $\sigma(x) = \left\{\begin{array}{ll} 1  & \mbox{if } x \geq 0 \\ -1 & \mbox{if } x < 0 \end{array} \right.$ | 
| logistic | $\sigma(x, \beta) = \frac{1}{1+e^{-\beta x}}$ | 
| tanh | $\sigma(x, \beta) = \frac{1-e^{-\beta x}}{1+e^{-\beta x}}$ | 
| relu | $\sigma(x) = \left\{\begin{array}{ll} x  & \mbox{if } x \geq 0 \\ 0 & \mbox{if } x < 0 \end{array} \right.$ |

The logistic and tanh activation functions are said to be in the sigmoid group and they are largely used on regression problems. For classification, the unipolar or bipolar steps are usually applied given their output are finite integers, which can be mapped to classes. 

It is essential to be aware that, since an activation function has an dinamyc range of operation, both the input and output signals should be preprocessed in a way to also limit its range within that of the activation function's. For example, by using the bipolar step, the input signal should range between $-1$ and $+1$. 

#### Setting the output

Once an activation function is set, the output signal $y$ is defined as 

$y = \sigma(u)$

The single-neuron perceptron model is characterized by identifying linearly separable classes because, by using a bipolar step activation function, the classes are predicted by the following equation.


$y = \left\{
        \begin{array}{ll}
            +1  & \mbox{if } u = w_i^Tx_i - b \geq 0\\
            -1 & \mbox{if } u = w_i^Tx_i - b < 0
    \end{array}
    \right.
$

Therefore, by training a single-neuron perceptron model, we are defining a hyperplan defined by $w_i^Tx_i - b = 0$ which separates both classes on $R^m$. 

#### Training algorithm

In order to present a training algorithm, this notebook uses a dataset extracted from the aforementioned [Artificial Neural Network book](https://www.amazon.com.br/Neurais-Artificiais-Engenharia-Ci%C3%AAncias-Aplicadas/dp/8588098539) which is stored in the *./Datasets* folder. 

The training set contains information of 3 features extracted from a oil destilation process and 1 target value indicating whether registers belong to one of 2 classes {P1 and P2}, denoted by [-1, 1] respectively. The test set contains only the features of another set of data.





## Imports

In [1]:
import random 
import pandas as pd
import numpy as np

## Loading Dataset

In [2]:
df_train = pd.read_csv('./Datasets/ex3_6_train.tsv', sep='\t')
df_train.drop(['sample'], axis=1, inplace=True)
df_test = pd.read_csv('./Datasets/ex3_6_test.tsv', sep='\t')
df_test.drop(['sample'], axis=1, inplace=True)
print ("Train set shape: {}\nTest set shape: {}".format(df_train.shape, df_test.shape))
df_train.head()

Train set shape: (30, 4)
Test set shape: (10, 3)


Unnamed: 0,x1,x2,x3,target
0,-0.65,0.11,4.0,-1.0
1,-1.45,0.89,4.4,-1.0
2,2.09,0.69,12.07,-1.0
3,0.26,1.15,7.8,1.0
4,0.64,1.02,7.04,1.0


In [3]:
df_test.head()

Unnamed: 0,x1,x2,x3
0,-0.37,0.06,5.99
1,-0.78,1.13,5.59
2,0.3,0.56,5.82
3,0.78,1.06,8.07
4,0.16,0.8,6.3


In [4]:
x_train, y_train = df_train.drop(['target'], axis=1).values, df_train['target'].values
x_test = df_test.values

## Creating Perceptron Class

In [5]:
class Perceptron:
    def __init__(self, activation = 'tanh', learning_rate = 0.01, seed = None, beta = None): 
        self.activation = activation
        self.learning_rate = learning_rate
        self.seed = seed
        self.x = None
        self.x_pred = None
        self.w = None        
        self.beta = beta
        self.g = self.get_activation(self.activation, self.beta)
        
    def get_activation(self, activation, beta = None):
        """Returns an activation function
            :param activation (str): the name of the function 
                ['linear', 'unipolar_step', 'bipolar_step', 
                'logistic', 'simmetric_ramp', 'tanh', 'relu']
            :return (lambda function): the implemented activation function
        """
        if activation == 'linear':
            g = lambda x: x
        elif activation == 'unipolar_step':
            g = lambda x: 1 if x >= 0 else 0
        elif activation == 'bipolar_step':
            g = lambda x: 1 if x >= 0 else -1
        elif activation == 'logistic':
            g = lambda x, beta: 1/(1 + np.exp(-beta*x))
        elif activation == 'simmetric_ramp':
            g = lambda x, beta: x if x > -beta or x < beta else beta
        elif activation == 'tanh':
            g = lambda x, beta: (1 - np.exp(-beta*x))/(1 + np.exp(-beta*x))
        elif activation == 'relu':
            g = lambda x: x if x > 0 else 0
        else:
            raise NotImplemented
        return g
        
    def train(self, features, target, max_epochs = 30):
        """ Trains a single neuron perceptron model.
            :param features (np.array): an array containg training examples and its features
            :param target (np.array): the true values of the output 
            :max_epochs (int): the maximum number of epochs to train the algorithm
        """
        # Appending a bias constant to the features array
        self.x = np.array([np.concatenate(([1], i)) for i in features])
        
        # Initializing the weights with a random uniform function (0, 1)
        self.w = np.array([random.uniform(0,1) for i in np.arange(self.x.shape[1]-1)])
        self.w = np.concatenate(([-1], self.w))
        
        epoch = 1
        print ("Epoch {} >> W = {}".format(epoch, self.w))
        
        # Starting training until max_epochs is reached or no error is found
        keep_training = True
        while (keep_training):          
            keep_training = False            
            for index, sample in enumerate(self.x):
                u = np.dot(sample, self.w)
                y = self.g(u)                
                if (y != target[index]):
                    self.w = self.w + self.learning_rate*(target[index]-y)*sample
                    keep_training = True
            epoch += 1            
            if epoch > max_epochs:
                keep_training = False
        print ("Epoch {} >> W = {}".format(epoch, self.w))
        
    def predict(self, features):
        """ Predicts the output of a set of test features from a pre-trained model
            :params features (np.array): the test set of features
            :return y_pred (np.array): the predicted output
        """
        self.x_pred = np.array([np.concatenate(([1], i)) for i in features])
        u = np.dot(self.x_pred,self.w)
        self.y_pred = list()
        for u_i in u:
            self.y_pred.append(self.g(u_i))
        self.y_pred = np.array(self.y_pred)
        return self.y_pred
        

In [6]:
for training_index in np.arange(1, 6, 1):
    print ("\n### Starting training ", training_index)
    p = Perceptron(activation='bipolar_step', learning_rate=0.01)        
    p.train(features=x_train, target=y_train, max_epochs=2000)      
    print ("Predictions: ", p.predict(x_test))



### Starting training  1
Epoch 1 >> W = [-1.          0.30750558  0.24407762  0.11136984]
Epoch 466 >> W = [ 3.14        1.60750558  2.53687762 -0.73363016]
Predictions:  [-1  1  1  1  1  1 -1  1 -1 -1]

### Starting training  2
Epoch 1 >> W = [-1.          0.85632945  0.97428431  0.38397214]
Epoch 457 >> W = [ 3.06        1.56512945  2.51268431 -0.71802786]
Predictions:  [-1  1  1  1  1  1 -1  1 -1 -1]

### Starting training  3
Epoch 1 >> W = [-1.          0.2570316   0.96509117  0.18488782]
Epoch 416 >> W = [ 3.          1.5364316   2.44869117 -0.70071218]
Predictions:  [-1  1  1  1  1  1 -1  1 -1 -1]

### Starting training  4
Epoch 1 >> W = [-1.          0.34629129  0.44733042  0.45380844]
Epoch 409 >> W = [ 2.98        1.53369129  2.43533042 -0.69659156]
Predictions:  [-1  1  1  1  1  1 -1  1 -1 -1]

### Starting training  5
Epoch 1 >> W = [-1.          0.63720448  0.07353498  0.46367355]
Epoch 461 >> W = [ 3.12        1.61240448  2.54213498 -0.73152645]
Predictions:  [-1  1  1  1

After training and prediction 5 times the algorithms on the given train and test set, we can see that, for each round, the output can be different. This is due to the fact that the weights of the perceptron are initialized randomly, which also yields to a different number of training epochs as shown in the log above. 

Given that the number of epochs did not reach the maximum number of epochs stablished in the model initialization (2000), we can conclude that the perceptron did manage to separate all classes from the train set. As a result, it is possible to affirm that such classes are linearly separable

_______________