# Neural Networks <a class="tocSkip">

**Given is the following single neuron perceptron. In this one-layer perceptron model, the neuron calculates a
weighted sum of inputs. Then, it applies a threshold to the result: if the sum is larger than zero, the output is 1.
Otherwise, the output is zero.** 

Consider the following examples, where Z is the desired output (indeed, this is the OR function).  

**In this question, you will apply the Perceptron update algorithm to automatically learn the network’s weights, so
that it classifies correctly all the training examples. The algorithm is simple:**

**Iterate through the training examples, one by one (if the last example was used, and the algorithm hasn’t converged
yet, start again from the first example, and so forth)**  

For every example i:  

* Calculate the net’s output Yi.  
* Multiply the error (Zi − Yi) by the learning rate η. Add this correction to any weight for which the input in the example was non-zero.  
That is, if for the current example i X1 = 1, then update W1 → W1 + η(Zi − Yi), etc.  
* If the network outputs the correct result for all of the training set examples, terminate.

# Training

**Apply the algorithm for the given training examples. Use learning rate η = 0.2. Assign the weights the initial
values W1 = 0.1, W2 = 0.3**  
**Give your results as specified in Table 1.**  
**You should expect to getting the final weights within only a few passes over the training examples.**

The first training example (i=0) gives us a net input of $-0.5 + 0.1*0 + 0.3*0 = -0.5 $ which gets passed through the hardlim activation function resulting in a net output of 0.  
Therefore,our error is $Z-Y = 0$ so the weights remain the same.  

The second example however, gives us a positive error of 1,so the new weights are:  
$W_1 = 0.1$ (since the corresponding input X1 is zero)  
$W_2 = W_2 + η*(Z_1-Y_1) = 0.3 + 0.2*(1) = 0.5$  

Similarly,we pass the all training examples until all of them are classified correctly.

| X1 | X2 | W1 | W2 | Z | Y | Error | W1 | W2 |
|----|----|----|----|---|---|-------|----|----|
|0| 0| 0.1| 0.3| 0 |(-0.5) 0| 0| 0.1| 0.3|
|0| 1| 0.1| 0.3 |1 |(-0.2) 0| 1| 0.1| 0.5|
|1| 0| 0.1| 0.5 |1 |(-0.4) 0| 1| 0.3| 0.5|
|1| 1| 0.3| 0.5 |1 |(0.3) 1 |0 |0.3 |0.5|
|0| 0| 0.3| 0.5 |0 |(-0.5) 0| 0| 0.3| 0.5|
|0| 1| 0.3| 0.5 |1 |(0)  0  | 1| 0.3| 0.7|
|1| 0| 0.3| 0.7 |1 |(-0.2) 0| 1| 0.5| 0.7|
|1| 1| 0.5| 0.7 |1 |(0.7) 1 |0 |0.5 |0.7|
|0| 0| 0.5| 0.7 |0 |(-0.5) 0| 0| 0.5| 0.7|
|0| 1| 0.5| 0.7 |1 |(0.2) 1 |0 |0.5 |0.7|
|1| 0| 0.5| 0.7 |1 |(0) 0   |1 |0.7 |0.7|
|1| 1| 0.7| 0.7 |1 |(0.9) 1 |0 |0.7 |0.7|
|0| 0| 0.7| 0.7 |0 |(-0.5) 0| 0| 0.7| 0.7|
|0| 1| 0.7| 0.7 |1 |(0.2) 1 |0 |0.7 |0.7|
|1| 0| 0.7| 0.7 |0 |(0.2) 0 |0 |0.7 |0.7|
|1| 1| 0.7| 0.7 |1 |(0.9) 1 |0 |0.7 |0.7|

Since for the last epoch (pass through the entire dataset) ,all examples were classified correctly, we terminate the learning process.The final weights are 0.7 and 0.7 .

# Gradient update rule

**The perceptron training algorithm is in fact a simple gradient descent update. In this question, you will derive
this algorithm.**

**The approach for training a perceptron here is to minimize a squared error function.**

* Give the definition of a squared error function, E, in terms of W1, W2, Xi1 ,Xi2 and Zi.  
* Each weight should now be updated by taking a small step in the opposite direction of its gradient (so as to
minimize the error):  

W' = W − η∇E(W)  

**Show how this translates into the algorithm that you applied in the previous question.**

$$ 
Squared\ error = \frac{1}{2}\sum_i E_i ^2   
= \frac{1}{2}\sum_i(Z_i - (-0.5 + W_1 X_{i1} + W_2 X_{i2}))^2
$$

We take the gradient of the cost function with respect to the parameters W1,W2  

$$
\frac{dE}{dW_1} = ( (Z_i − (−0.5 + W_1 X_{i1} + W_2 X_{i2}))(−X_{i1}) = −E_i X_{i1})
$$  

$$
\frac{dE}{dW_2} = ( (Z_i − (−0.5 + W_1 X_{i1} + W_2 X_{i2}))(−X_{i2}) = −E_i X_{i2})
$$


The gradient descent update rule for W_1 and W_2 becomes:  

$$
W_1 = W_1 + η E_i X_{i1} = W_1 + η (Z_i -Y_i) X_{i1}
$$  
$$
W_2 = W_2 + η E_i X_{i2} = W_2 + η (Z_i -Y_i) X_{i2}
$$  

Which is the same as the perceptron update pule if we remember that this correction takes place for any weight for which the
input in the example was non-zero.

# Noisy examples

**In practice, the training example may be noisy. Suppose that there are contradicting examples in the training
set: for example, an additional example, where X1 = 1, X2 = 1, Z = 0. How do you think this will affect the
algorithm’s behavior? (you are welcome to go ahead and try).**

Such contradicting examples make it so that the examples are not linearly seperable and since the single layer perceptron is only capable of seperating linearly seperable cases,the algorithm will not converge but run indefinitelly.