### Links
- https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/Neuron/index.html
- https://github.com/julienr/ipynb_playground
- http://ataspinar.com/2016/12/22/the-perceptron/
- https://www.quora.com/Why-do-we-use-the-derivatives-of-activation-functions-in-a-neural-network
- https://www.youtube.com/watch?v=OVHc-7GYRo4

# Perceptron
*Perceptron* is a mathematical model of a biological neuron.
<img src="https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/Neuron/images/bioneuron.jpg"/>

Each signal represented as numerical values.  
Each input value is multimpled by a value called **weight**

## Comparition to a real biological neuron

#### Biological model

- Electrical signals are modulated in various amounts at the synapses
between the dendrite and axons.
- Neuron fires an output signal only when the total strength of the input signals exceed a certain threshold.

<img src="https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/Neuron/images/bioneuron.jpg" height=30%/>

#### Mathematical model
- Each input is multiplied by a value called the weight.
- An output is calculed by wighted sum of the inputs to represent the total strength of the input signals, and applying a step function of the sum to determine its output.

<img src="https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/Neuron/images/artificial.jpg" height=30%/>


## Main components
- Training inputs set - a list of input data which was supplied by trainer for perceptron. It must be accompained with appropraite training result set.
- Training result set - a list of result for each entity for training inputs set. That is prepared by a trainer, and ANN look at it to check how close to the right value it is.
- Weights - Values describing  relationships between inputs
- Activation function.
- Derivative function.

## Perceptron implementation

First we need to assume our neural network.

<img src="http://ataspinar.com/wp-content/uploads/2016/11/perceptron_schematic_overview.png" />

Here you can see a simple perceptron neural network. Its basic elements are:  
1. $X$ is a list of values for  *N*-th case.  
2. $R$ is a list of right values for  *N*-th case. That means for *n*-case a neural network should produce $r _n$.
3. $W$ is a matrix with a synaptic neuron weights.  
4. A unit step function (also called as activation function) is named $ \theta (z)$. For $ \theta (z)$ we will use sigmoid function (But you may choose any you want).

$$  \theta (z) =  \frac{\mathrm{1} }{\mathrm{1} + e^-z }  $$ 

<img src="https://qph.fs.quoracdn.net/main-qimg-07066668c05a556f1ff25040414a32b7" />
4. A derivative function $S(x)$ - is a derivative function of $ \theta (z) $. It help us to measure the steepness of the graph of a function at some particular point on the graph. When we get the activation output value, i.e. the input values have went throught a neural network, we check how much the output is derivative from training set result. For that we simple calculate derivativence between the right output *R* and a output *L* from the neural network:

  $$ \mathbf{error} = \mathbf{L} - \mathbf{R} $$
  
  Then we want to calculate how we need to change our weights to make an output of the neural network become closer to the right output for given training set.   
  That's where a derivation comes in! It shows whether we need to move to - /+ destination (due to derivative nature) and how much.

$$ \frac{d}{dz}\theta(z) = \theta(z)(1 - \theta(z)) $$
  
  The diffirential of Sigmoid function is:
  
$$ \frac{d}{dx}S(x) = S(x)(1 - S(x)) $$

## Perceptron step by step
Mathematically,  a perceptron works the following way:


1. Training sets $X$ (shape 3x3):
\begin{equation*}
\mathbf{X}  =  
\begin{vmatrix}
\mathbf{x _\mathbf{11}} & \mathbf{x _\mathbf{12}} & \mathbf{x _\mathbf{13}} \\
\mathbf{x _\mathbf{21}} & \mathbf{x _\mathbf{22}} & \mathbf{x _\mathbf{33}} \\
\mathbf{x _\mathbf{31}} & \mathbf{x _\mathbf{32}} & \mathbf{x _\mathbf{33}} \\
\end{vmatrix}
\end{equation*}

2. Training sets result $R$ (shape 3X1):
\begin{equation*}
\mathbf{R} =  
\begin{vmatrix}
\mathbf{r_\mathbf{1}} \\
\mathbf{r_\mathbf{2}} \\
\mathbf{r_\mathbf{3}} \\
\end{vmatrix}
\end{equation*}

3. Synaptic weights $W$ (shape 3x1):
\begin{equation*}
\mathbf{W} =  \begin{vmatrix}
\mathbf{w_\mathbf{1}} \\
\mathbf{w_\mathbf{2}} \\
\mathbf{w_\mathbf{3}} \\
\end{vmatrix}
\end{equation*}

4. Sum of weights *O*:
    
\begin{equation*}
\mathbf{X} * \mathbf{W} =  \begin{vmatrix}
\mathbf{x _\mathbf{11}} & \mathbf{x _\mathbf{12}} & \mathbf{x _\mathbf{13}} \\
\mathbf{x _\mathbf{21}} & \mathbf{x _\mathbf{22}} & \mathbf{x _\mathbf{33}} \\
\mathbf{x _\mathbf{31}} & \mathbf{x _\mathbf{32}} & \mathbf{x _\mathbf{33}} \\
\end{vmatrix} * 
\begin{vmatrix}
\mathbf{w_\mathbf{1}} \\
\mathbf{w_\mathbf{2}} \\
\mathbf{w_\mathbf{3}} \\
\end{vmatrix} = 
\begin{vmatrix}
\mathbf{x _\mathbf{11}} * \mathbf{w_\mathbf{1}} + \mathbf{x _\mathbf{12}} * \mathbf{w_\mathbf{2}} + \mathbf{x _\mathbf{13}} * \mathbf{w_\mathbf{3}} \\
\mathbf{x _\mathbf{21}} * \mathbf{w_\mathbf{1}} + \mathbf{x _\mathbf{22}} * \mathbf{w_\mathbf{2}} + \mathbf{x _\mathbf{23}} * \mathbf{w_\mathbf{3}}\\
\mathbf{x _\mathbf{31}} * \mathbf{w_\mathbf{1}} + \mathbf{x _\mathbf{32}} * \mathbf{w_\mathbf{2}} + \mathbf{x _\mathbf{33}} * \mathbf{w_\mathbf{3}} \\
\end{vmatrix}
\end{equation*}

5. Calculate the activation function result. This is actual result of a neural network's work.

\begin{equation*}
\mathbf{O} = \mathbf{sigmoid(X*W)} = 
\mathbf{sygmoid(
\begin{vmatrix}
\mathbf{x _\mathbf{11}} * \mathbf{w_\mathbf{11}} + \mathbf{x _\mathbf{12}} * \mathbf{w_\mathbf{21}} + \mathbf{x _\mathbf{13}} * \mathbf{w_\mathbf{31}} \\
\mathbf{x _\mathbf{21}} * \mathbf{w_\mathbf{11}} + \mathbf{x _\mathbf{22}} * \mathbf{w_\mathbf{21}} + \mathbf{x _\mathbf{23}} * \mathbf{w_\mathbf{31}}\\
\mathbf{x _\mathbf{31}} * \mathbf{w_\mathbf{11}} + \mathbf{x _\mathbf{32}} * \mathbf{w_\mathbf{21}} + \mathbf{x _\mathbf{33}} * \mathbf{w_\mathbf{31}} \\
\end{vmatrix})}
\end{equation*}


\begin{equation*}
\mathbf{O}= 
\begin{vmatrix}
\mathbf{sigmoid}(\mathbf{x _\mathbf{11}} * \mathbf{w_\mathbf{1}} + \mathbf{x _\mathbf{12}} * \mathbf{w_\mathbf{2}} + \mathbf{x _\mathbf{13}} * \mathbf{w_\mathbf{3}}) \\
\mathbf{sigmoid}(\mathbf{x _\mathbf{21}} * \mathbf{w_\mathbf{1}} + \mathbf{x _\mathbf{22}} * \mathbf{w_\mathbf{2}} + \mathbf{x _\mathbf{23}} * \mathbf{w_\mathbf{3}})\\
\mathbf{sigmoid}(\mathbf{x _\mathbf{31}} * \mathbf{w_\mathbf{1}} + \mathbf{x _\mathbf{32}} * \mathbf{w_\mathbf{2}} + \mathbf{x _\mathbf{33}} * \mathbf{w_\mathbf{3}}) \\
\end{vmatrix}=
\begin{vmatrix}
\mathbf{o _\mathbf{1}} \\
\mathbf{o _\mathbf{2}} \\
\mathbf{o _\mathbf{3}} \\
\end{vmatrix}
\end{equation*}

6. Calculare the error.

\begin{equation*}
\mathbf{error} = \mathbf{R} - \mathbf{O} = 
\begin{vmatrix}
\mathbf{r_\mathbf{1}} \\
\mathbf{r_\mathbf{2}} \\
\mathbf{r_\mathbf{3}} \\
\end{vmatrix}
\mathbf{-}
\begin{vmatrix}
\mathbf{o _\mathbf{11}} \\
\mathbf{o _\mathbf{11}} \\
\mathbf{o _\mathbf{11}} \\
\end{vmatrix}
\mathbf{=}
\begin{vmatrix}
\mathbf{r_\mathbf{1}} - \mathbf{o _\mathbf{1}} \\
\mathbf{r_\mathbf{1}} - \mathbf{o _\mathbf{1}} \\
\mathbf{r_\mathbf{1}} - \mathbf{o _\mathbf{1}} \\
\end{vmatrix}
\mathbf{=}
\begin{vmatrix}
\mathbf{e_\mathbf{1}} \\
\mathbf{e_\mathbf{1}} \\
\mathbf{e_\mathbf{1}} \\
\end{vmatrix}
\end{equation*}

7. Make an adjustment of synaptric weights

\begin{equation*}
\mathbf{W}^{'}
\mathbf{=}
\mathbf{W} + \mathbf{X}^{T}*\mathbf{error}*\mathbf{S(O)}
\end{equation*}

\begin{equation*}
\mathbf{W}^{'}
\mathbf{=}
\mathbf{W} =  
\begin{vmatrix}
\mathbf{w_\mathbf{1}} \\
\mathbf{w_\mathbf{2}} \\
\mathbf{w_\mathbf{3}} \\
\end{vmatrix}
\mathbf{+}
\begin{vmatrix}
\mathbf{x _\mathbf{11}} & \mathbf{x _\mathbf{21}} & \mathbf{x _\mathbf{31}} \\
\mathbf{x _\mathbf{12}} & \mathbf{x _\mathbf{22}} & \mathbf{x _\mathbf{23}} \\
\mathbf{x _\mathbf{13}} & \mathbf{x _\mathbf{21}} & \mathbf{x _\mathbf{33}} \\
\end{vmatrix}
\mathbf{*}
\begin{vmatrix}
\mathbf{e_\mathbf{1}} \\
\mathbf{e_\mathbf{1}} \\
\mathbf{e_\mathbf{1}} \\
\end{vmatrix}
\mathbf{*}
\begin{vmatrix}
\mathbf{S(\mathbf{o _\mathbf{1}})} \\
\mathbf{S(\mathbf{o _\mathbf{2}})} \\
\mathbf{S(\mathbf{o _\mathbf{3}})} \\
\end{vmatrix}
\end{equation*}


8. Go to step 4 and repeat until we exceed all iterations. That is what is called "training" - we try to calculate the output of a neural network so that the output as near to the right training set output (as it should be) as possible.

Then, if you need only to ask a neural network what it thinks to be if you supply to it inputs $X(x_1,x_2,..x_n)$, you just stop after 5 step - that you get the output of activation function.

## Then to use and when not to use
- https://www.coursera.org/lecture/neural-networks/what-perceptrons-can-t-do-15-min-SUTuA  
Perceptron is good as a binary threshold.Perceptron is usually used to classify the data into two parts. Therefore, it is also known as a Linear Binary Classifier.  
<img src="https://cdn-images-1.medium.com/max/1600/1*xsR57_PO8U7PB_ItLslLmA.png"/>
To reach better result, you should supply more features to perceptron as inputs(features are something, which help determine one thing from another, i.e. cats have long vibrix (усы), but dogs haven't, also dogs can be really huge, cats usually not).

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/Perceptron_example.svg/500px-Perceptron_example.svg.png"/> 

You should **NOT** use perceptron if you need:
- To determine difficult items and you have a few features, which helps to set a binary barrier.
- You have lack of inputs.
- You need to remember difficult relationships between diffirent features.
+ see XOR problem (https://medium.com/@jayeshbahire/the-xor-problem-in-neural-networks-50006411840b)


## Coding
Now we implement Perceptron class for any given training inputs.  
Then we will try to determine whether perceptron is able to detect cats and dogs. Let's suppose the following inputs and a law:

HAS_TAIL = 1  
LIGHT_WEIGHT = 2  
HAS_MUSTACHE = 3  
HUGE = 4  
GOOD_AT_HEARING = 5  
SLEEP_A_LOT = 6  
VERY_STRONG = 7  
LITTLE_SIZE = 8  

We have these animals: a cat and a dog 
Let's suppose these rules to train a perceptron:
1. Both dogs and cats have tails
2. Usually cats have light weight
3. Cats are more likely have mustaches
4. Dogs can be really huge
5. Both dogs and cats can hear well
6. Cats sleep a lot
7. Usually dogs are very storng
8. Both cats and dogs can be small

In [71]:
# Our constants
HAS_TAIL = 0  
LIGHT_WEIGHT = 1  
HAS_MUSTACHE = 2  
HUGE = 3  
GOOD_AT_HEARING = 4  
SLEEP_A_LOT = 5  
VERY_STRONG = 6  
LITTLE_SIZE = 7
FLEXIBLE = 8

# Animals
CAT = 0
DOG = 1

In [73]:
import numpy as np

class Perceptron:
    
    def __init__(self):
        # Seed random generator
        np.random.seed(1)

    def _sigmoid(self, x):
        # Activation function
        return 1 / (1 + np.exp(-x))

    def _sigmoid_derivate(self, x):
        # Derivative functione
        return x * (1 - x)

    def train(self, training_set, training_result, iterations=1000):
        # Need to initilize synaptics weights
        row, columns = training_set.shape

        self.synapsis_weights = 2*np.random.random((columns, 1)) - 1

        for iteration in range(iterations):
            l1 = training_set
            output = self.think(l1)
            error = training_result - output
            # [N X M] x [M x J] = [N x J]
            adjustment = np.dot(training_set.T,error * self._sigmoid_derivate(output))
            self.synapsis_weights += adjustment

    def think(self, inputs):
        res = self._sigmoid(np.dot(inputs, self.synapsis_weights))
        return res
                            
        

In [80]:
# For supplied features detect what animal fits these features
def rules(features):
    # Firstlly we suppose all animals
    possibile_animals = set()
    
    if LITTLE_SIZE in features:
        possibile_animals.update((CAT,DOG))
    if LIGHT_WEIGHT in features:
        possibile_animals.update((CAT,))
    if HAS_TAIL in features:
        possibile_animals.update((CAT, DOG,))
    if HAS_MUSTACHE in features:
        possibile_animals.update((CAT,))
    if HUGE in features:
        possibile_animals.update((DOG),)
    if GOOD_AT_HEARING in features:
        possibile_animals.update((CAT, DOG),)
    if SLEEP_A_LOT in features:
        possibile_animals.update((CAT),)
    if VERY_STRONG in features:
        possibile_animals.update((DOG),)
    if FLEXIBLE in features:
        possibile_animals.uodate((CAT),)
        
    return possibile_animals

In [82]:
# Let's put some training data
# Each entry has 8 features - they can be any from defined in the cell above.
#HAS_TAIL = 0  
#LIGHT_WEIGHT = 1  
#HAS_MUSTACHE = 2  
#HUGE = 3  
#GOOD_AT_HEARING = 4  
#SLEEP_A_LOT = 5  
#VERY_STRONG = 6  
#LITTLE_SIZE = 7
#FLEXIBLE = 8

#Training set
#Cat, Dog, Dog, Cat, Cat
animals_to_train = np.zeros((5,9))
defined_right_animals = np.array([[CAT],[DOG],[DOG],[CAT],[CAT,]])
#Cat
for feature in [HAS_TAIL, HAS_MUSTACHE,SLEEP_A_LOT, GOOD_AT_HEARING, LIGHT_WEIGHT, LITTLE_SIZE]:
    animals_to_train[0][feature] = 1

#Dog
for feature in (HAS_TAIL, LIGHT_WEIGHT, GOOD_AT_HEARING):
    animals_to_train[1][feature] = 1
    
#Dog#2
for feature in (HAS_TAIL, HUGE, GOOD_AT_HEARING, VERY_STRONG):
    animals_to_train[2][feature] = 1
    
#Cat#2
for feature in (HAS_TAIL, HUGE, HAS_MUSTACHE, SLEEP_A_LOT):
    animals_to_train[3][feature] = 1

#Cat#3
for feature in (FLEXIBLE, SLEEP_A_LOT):
    animals_to_train[4][feature] = 1
    
print(">> Training set:")
print(animals_to_train)

print(">> Training set right output animals")
print(defined_right_animals)

>> Training set:
[[1. 1. 1. 0. 1. 1. 0. 1. 0.]
 [1. 1. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 1. 1. 0. 1. 0. 0.]
 [1. 0. 1. 1. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 1.]]
>> Training set right output animals
[[0]
 [1]
 [1]
 [0]
 [0]]


In [83]:
# Let train our network and ask her, what it thinks on random animal!

neuron = Perceptron()
# Train for 10000 iterations
neuron.train(animals_to_train, defined_right_animals,10000)


#HAS_TAIL = 0  
#LIGHT_WEIGHT = 1  
#HAS_MUSTACHE = 2  
#HUGE = 3  
#GOOD_AT_HEARING = 4  
#SLEEP_A_LOT = 5  
#VERY_STRONG = 6  
#LITTLE_SIZE = 7 

def show_features(features):
    _features = {
        0: 'HAS_TAIL',
        1: 'LIGHT_WEIGHT',
        2: 'HAS_MUSTACHE',
        3: 'HUGE',
        4: 'GOOD_AT_HEARING',
        5: 'SLEEP_A_LOT',
        6: 'VERY_STRONG',
        7: 'LITTLE_SIZE',
        8: "FLEXIBLE"
    }

    for index,foo in enumerate(features):
        if foo:
            print("Has", _features[index])
            
def show_animals():
    _animals={
        'CAT':CAT,
        'DOG':DOG
    }
    for animal in _animals:
        print("Animal",animal, _animals[animal])

#possible_features = [HAS_MUSTACHE, HAS_TAIL, HUGE, VERY_STRONG, SLEEP_A_LOT, GOOD_AT_HEARING, LIGHT_WEIGHT, LITTLE_SIZE]
possible_features = [HAS_TAIL, LITTLE_SIZE, FLEXIBLE, SLEEP_A_LOT]

unknown_animal_features = np.zeros((1,9))
for feature in possible_features:
    unknown_animal_features[0][feature] = 1
    
print("Unknown animal features:")
show_features(list(unknown_animal_features[0]))
print()
print("Our animals:")
show_animals()
print()
print("Possible animals with specified features:")
print(rules(unknown_animal_features))
print()
print("A perceptron thinks it it:")
print(neuron.think(unknown_animal_features))

Unknown animal features:
Has HAS_TAIL
Has SLEEP_A_LOT
Has LITTLE_SIZE
Has FLEXIBLE

Our animals:
Animal CAT 0
Animal DOG 1

Possible animals with specified features:
{0, 1}

A perceptron thinks it it:
[[0.00204806]]
