In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np



In [None]:
from IPython.display import Image, IFrame
from IPython.core.display import HTML
from IPython.display import Latex

# Perceptron and theory

## Linear perceptron

A linear perceptron maps an input $x \in \mathbb{R}^n$ ($n$ values) to an output which can be a binary output $F(x) \in \{0,1\}$ (1 value which is one or zero).

This function is decomposed in two parts : 

1. **linear function** : defined by $n$ weights $a_1,...,a_n$ :
$$f(x) = a_1x_1 + a_2x_2 +...+ a_n x_n$$

Example : $n=2$, $(a_1,a_2) = (2,3)$ so that $f(x_1,x_2) = 2x_1 + 3x_2$ and $f(4,-1) = 5$.

In [None]:
x = np.array([4,-1])  #n-vector x
a = np.array([2,3])   #n weights

In [None]:
def f(x,a):
    return sum(x*a)

In [None]:
print(f(x,a))

2. **activation function** : it is a function $\varphi \colon \mathbb{R} \to \mathbb{R}$.

#### Example : Heaviside function

In [None]:
def H(y):
    if y<0:
        return 0
    else:
        return 1

In [None]:
H(-3)

In [None]:
H(f(x,a))

In [None]:
y=np.linspace(-6,6,300)
z=np.array([H(t) for t in y])
plt.plot(y,z,linewidth=7.0,color="red")

The output is the number $H(f(x)) \in \{0,1\}$.

#### Example : ReLu function

The Rectified Linear Unit function is the following :

In [None]:
def ReLu(y):
    if y<0:
        return 0
    else:
        return y

In [None]:
y=np.linspace(-6,6,300)
z=np.array([ReLu(t) for t in y])

In [None]:
plt.plot(y,z,linewidth=7.0,color="red")

#### Example : Sigmoïd function

The Sigmoïd function is the following :

In [None]:
def sigmoid(y):
    return 1/(1+np.exp(-y))

In [None]:
y=np.linspace(-6,6,300)
z=np.array([sigmoid(t) for t in y])

In [None]:
plt.plot(y,z,linewidth=7.0,color="red")

#### Exercise : compute the derivative of the sigmoïd function and plot its graph. 

In [None]:
def dsigmoid(y):
    return np.exp(-y)/(1+np.exp(-y))**2

In [None]:
y=np.linspace(-6,6,300)
z=np.array([sigmoid(t) for t in y])
dz=np.array([dsigmoid(t) for t in y])
#plt.plot(y,z,linewidth=7.0,color="red")
plt.plot(y,dz,linewidth=4.0,color="blue")
plt.show()

### Neuron representation

A linear pereceptron is represented with a **neuron** as follows : <img src="img/Perceptron1.png"> </img> 

### Example :

$n=2$, $(a_1,a_2) = (2,3)$ so that $f(x_1,x_2) = 2x_1 + 3x_2$ and the activation function Heaviside : the output is

- 1 if $2x_1+3x_2 \geq 0$ ;
- 0 otherwise

In [None]:
display(IFrame('https://www.geogebra.org/calculator/x63nersc?embed',900,400))

#### Exercise
How to find a perceptron separating blue circles from red squares ?
https://www.geogebra.org/calculator/zmyaznwf

In [None]:
display(IFrame('https://www.geogebra.org/calculator/zmyaznwf?embed',600,400))

## Affine perceptron

We introduce a biais $a_0$ : this a $(n+1)$-th weight which defines an affine function $f(x_1,...,x_n) = a_1x_1+...+a_nx_n + a_0$. An affine pereceptron is represented by a neuron as follows :
<img src="Perceptron2.png"> </img> 

Example : with 2 entries, an affine perceptron split a 2-dimensional space into 2 half planes.

## Perceptron theory

### Or, and, xor

In computer science, a boolean variable is a variable $x$ that has 1 of 2 possible values (TRUE or FALSE). In a boolean algebra, if $x$ et $y$ are boolean, we can define ```x OR y```.

We chose a graphic representation : TRUE is number 1, FALSE is number 0, ```x OR y``` is a point with $(x,y)$ coordinates in plane. This point is a red square if ```x OR y = TRUE```, a blue circle otherwise.

<img src="img/or_perceptron1.png"> </img> 

1. Can I realize this operation ```x OR y``` with a perceptron ?

<img src="img/or_perceptron2.png"> </img> 

Yes ! For example, take the weights $(a_1,a_2,a_0) = (1,1,-1)$. 

2. In the same way, can I realize the operation ```x AND y``` with a perceptron ?

<img src="img/and_perceptron1.png"> </img> 

Yes ! For example, take the weights $(a_1,a_2,a_0) = (1,1,-1.5)$. 

3. In the same way, can I realize the operation ```x XOR y``` with a perceptron ?

<img src="img/xor_perceptron.png"> </img> 

No ! You cannot find a straight line that separates these two kind of points.

The problem is to know that the two sets of points are **linearly separable**.

<div class="alert alert-success" role="alert">
In an $n$-dimensionnal Euclidian space, two sets of points $A$ and $B$ are linearly separable if an hyperplane can separate space : there exists $a_1,...,a_n,a_0$ such that for each $x \in A$, $\sum_{i=1}^n a_i x_i +a_0> 0$ and for each $x \in B$, $\sum_{i=1}^n a_i x_i + a_0 < 0$.     
    An another way to say that is their respective convex hulls are disjoint.
 </div>
 
 <div class="alert alert-danger" role="alert">	

In an $n$-dimensionnal Euclidian space, two sets of points $A$ and $B$ are linearly separable if there exists a perceptron that takes value 1 on $A$ and 0 on $B$.
</div>


A perceptron is a **linear classifier**.

#### Exercise 

Is it possible to realize the operation ```x OR y OR z``` with a perceptron ?

<img src="img/or_or_perceptron.png"> </img> 

### Wow, your perceptron is learning for the first time !

#### Learning rule

This learning rule is an example of supervised training, in which the learning rule is provided
with a set of examples of proper perecptron behavior: a collection of $(x,t)$ where $x$ is an input and $t$ is a the corresponding target output. As each input is applied to the network, the network output is compared
to the target. **The learning rule then adjusts the weights and biases
of the perceptron in order to move the perceptron output closer to the target.**

#### Test problem
These are 3  input/target pairs for our test problem :
$$x_1 = (1,2) \,;\, t_1 = 1 \qquad x_2 = (-1,2) \,;\, t_2 = 0 \qquad x_3 = (0,-1) \,;\, t_3 = 0$$

The perceptron for this problem should have two-inputs and one output. To
simplify our development of the learning rule, we will begin with a network
without a bias so that we are looking for two weights $a_1,a_2$. Activation function is Heaviside.

In [None]:
x=[];t=[]
x.append(np.array([1,2])) ; t.append(1)
x.append(np.array([-1,2])) ; t.append(0)
x.append(np.array([0,-1])) ; t.append(0)

#### Constructing Learning Rules
We set the weight vector $w = (a_1,a_2)$ to the following randomly generated values: $w = (1.0, -0.8)$. 

Then we execute the perceptron with the first input $x_1$ : output $y_1$ is equal to $0 \neq t_1$.



In [None]:
w = np.array([1.0,-0.8])
y = sum(w*x[0])
print(y)
y = H(y)
print(y)
y == t[0] #test if output y_1 is equal to t_1

**Rule** : if $t=1$ and $y=0$ then $w_{new} := w_{old} + x$.

Then we execute the perceptron with the second input $x_2$ and new weights:

In [None]:
w = w + x[0]
y = H(sum(w*x[1]))
y == t[1]

**Rule** : if $t=0$ and $y=1$ then $w_{new} := w_{old} - x$.

Then we execute the perceptron with the second input $x_3$ and new weights:

In [None]:
w = w - x[1]
y = H(sum(w*x[2]))
y == t[2]

In [None]:
y - t[2]

We apply the previous rule :

In [None]:
w = w - x[2]


Then we check :

In [None]:
for i in range(3):
    y = H(sum(w*x[i]))
    print(y == t[i])

#### Unfying Learning Rules


This rule can be extended to train the bias by noting that a bias is simply
a weight whose input is always 1.

#### Exercise :
Write a program that applies all the rules given above with different weights at start. Check the result.

Bonus : Represent with a 2d-graph the evolution of $w$ by plotting the corresponding linear classifyer.  

In [None]:
n = len(x)
w = np.array([-0.5,-0.9])
for i in range(n):
    y = H(sum(w*x[i]))
    err = t[i]-y
    w = w + err * x[i]
for i in range(3):
    y = H(sum(w*x[i]))
    print(y == t[i])

# Neural network

## 1. Layer

Now, given some input $(x_1,...,x_n)$, we organise multiple neurons in **layers**. 

<img src="https://www.researchgate.net/profile/Facundo_Bre/publication/321259051/figure/fig1/AS:614329250496529@1523478915726/Artificial-neural-network-architecture-ANN-i-h-1-h-2-h-n-o.png"> </img>

#### Example 0 : a neural network with 2 layers : 2 neurons (linear perceptron with ReLu activation function) on the 1st layer, 1 neuron (affine perceptron with Heavyside activation function) on the 2nd layer.

<img src="img/neural_layer_ex1.png"> </img>

**Exercise** : check that output is $1$ if input is $(4,7)$.

_Tip_ : we can use a matricial product : in Python, $A \times B$ is computed with this command : ```np.dot(A,B)```

In [None]:
def ReLu(y):
    if y<0:
        return 0
    else:
        return y

def H(y):
    if y<0:
        return 0
    else:
        return 1

In [None]:
X = np.array([4,7])
W1 = np.array([[2,-1],[-3,2]])
Y1 = np.dot(W1,X)
Y1 = [ReLu(y) for y in Y1]
W2 = np.array([4,5])
b2 = -1
Y2 = np.dot(W2,Y1)+b2
Y2 = H(Y2)
print(Y2)

### Example 1
<img src="img/neural_layer_ex2.png"> </img>



#### 1st layer, 1st neuron :
Is active if $-x+3y \geq 0$.

In [None]:
x=np.linspace(-6,6,300)
y=np.array([t/3 for t in x])
plt.plot(x,y,'k--')
plt.fill_between(x,y,np.max(y)+2,color="green",alpha=0.5)

plt.axis('equal')
plt.axis([-5,5,-3,3])
plt.show()

#### 1st layer, 2nd neuron :
Is active if $2x+y \geq 0$.

In [None]:
x=np.linspace(-6,6,300)
z=np.array([-2*t for t in x])
plt.plot(x,z,'k--')
plt.fill_between(x,z,np.max(z),color="green",alpha=0.5)

plt.axis('equal')
plt.axis([-5,5,-3,3])
plt.show()

#### 2nd layer, one neuron :

The output neuron realize the boolean function ```x AND y``` (c.f. previous course).

#### result of two layers :

The neural network has output equal to $1$ on the __intersection__ of the two half planes where neurones of 1st layer are equal to $1$. 

In [None]:
x=np.linspace(-6,6,300)
y=np.array([t/3 for t in x])
z=np.array([-2*t for t in x])
plt.plot(x,y,'k--')
plt.plot(x,z,'k--')
plt.fill_between(x,y,np.max(y)+2,color="green",alpha=0.3)
plt.fill_between(x,z,np.max(z),color="green",alpha=0.3)
plt.fill_between(x,z,np.max(z), where=y<z,color="red",alpha=1)
plt.fill_between(x,y,np.max(z), where=y>z,color="red",alpha=1)
plt.axis('equal')
plt.axis([-5,5,-3,3])
plt.show()

### Example 2
<img src="img/neural_layer_ex3.png"> </img>

Comments : 1st layer is the same than before, output neuron is the ```or```neuron.

In [None]:
x=np.linspace(-6,6,300)
y=np.array([t/3 for t in x])
z=np.array([-2*t for t in x])
plt.plot(x,y,'k--')
plt.plot(x,z,'k--')
plt.fill_between(x,y,np.max(y)+2,color="red",alpha=1)
plt.fill_between(x,z,np.max(z),color="red",alpha=1)
#plt.fill_between(x,z,np.max(z), where=y<z,color="red",alpha=0.5)
#plt.fill_between(x,y,np.max(z), where=y>z,color="red",alpha=0.5)
plt.axis('equal')
plt.axis([-5,5,-3,3])
plt.show()

The neural network has output equal to $1$ on the __union__ of the two half planes where neurones of 1st layer are equal to $1$. 

### Exercise :
Find a neural network whose output is 1 on the red area, 0 otherwise.
<img src="img/neural_layer_ex4.png"> </img>

In [None]:
def H(y):
    if y<0:
        return 0
    else:
        return 1

def nnetwork(x,y):
    X = np.array([x,y])
    W1 = np.array([[-1,-3],[-3,5],[-5,-1]])
    B1 = np.array([7,7,-7])
    W2 = np.array([[1,1,1]])
    B2 = -3
    layer1 = np.dot(W1,X)+B1
    layer1 = np.array([H(t) for t in layer1])
    layer2 = np.dot(W2,layer1)+B2
    layer2 = H(layer2)
    output = layer2
    return output


In [None]:
n=100
xmin = -6
xmax = 5
ymin = -5
ymax = 5
X = np.linspace(xmin,xmax,n)
Y = np.linspace(ymin,ymax,n)
for x in X:
    for y in Y:
        if nnetwork(x,y) == 1:
            plt.plot(x,y,'.r')
plt.axis('equal')
plt.axis([xmin,xmax,ymin,ymax])
xtick=np.linspace(xmin,xmax,xmax-xmin+1)
ytick=np.linspace(ymin,ymax,ymax-ymin+1)
plt.xticks(xtick)
plt.yticks(ytick)
plt.show()

## 2. Theory

### How to realize ```XOR``` ?

#### Exercise : 
We have seen that one neuron is not sufficient to realize the operation ```XOR```. Find a neural network with two layers that realize the operation ```XOR```.

Answer : 
<img src="img/xor_neuralnetwork.png"></img>

### Realizable sets : definition and properties

<div class="alert alert-success" role="alert">
A set $A$ is NN-realizable if it exists a neural network whose output is equal to $1$ in $A$, $0$ outside of $A$.
   </div>

<div class="alert alert-danger" role="alert">
Every convex $n$-polygon in $\mathbb{R}^2$ is NN-realizable with $n+1$ neurons. 
   </div>

<div class="alert alert-danger" role="alert">
If $A$ et $B$ are two NN-realizable sets then :
   <ol> 
       <li> $A \cup B$ is NN-realizable ;</li>
    <li> $A \cap B$ is NN-realizable ;</li>  
    <li> $\overline{A}$ is NN-realizable ;</li>
    <li> $A \backslash B$ is NN-realizable.</li>
    </ol>
    </div>

<div class="alert alert-danger" role="alert">
Every polygon in $\mathbb{R}^2$ is NN-realizable. Then, every Jordan curve (simpe closed curve) can be approximated by a neural network. 
   </div>

### Universal approximation theorem 
Goal : approximate every continuous function $\mathbb{R} \to \mathbb{R}$ by a neural network. More precisely, let $f \colon [a;b] \to \mathbb{R}$ : we want to find a neural network whose output $F(x) \approx f(x)$ for all $x \in [a;b]$. To do this, assume that the output neuron has the identity $x \mapsto x$ activation function. Other neurons have Heaviside activation function.

#### Heaviside step functions

First trivial case :
<img src="img/step_function1.png"></img>

Its is easy to shift the step on the left :
<img src="img/step_function2.png"></img>

or and the right :
<img src="img/step_function3.png"></img>

step down :
<img src="img/step_function4.png"></img>
<img src="img/step_function5.png"></img>

#### Rectangular functions
Just add two Heaviside step functions !
<img src="img/rect_function.png"></img>

#### Step functions
Just add some rectangular functions ! Be careful if the rectangles ar contiguous.
<img src="img/step_function6.png"></img>

#### Continuous functions
Finally, notice that every continuous function $[a;b] \to \mathbb{R}$ can be uniformly approximated by a step function.

<img src="img/approx_function.png"></img>

### In higher dimension :

#### Exercise :
Find a neural network that performs this 2-dimensionnal function :

<img src="img/step_function7.png"></img>

#### Answer : 

<img src="img/neural_2dim_step.png"></img>

## 3. Training rule for a 1-layer network

Let $W$ be the weight matrix : each row is corresponding to a neuron of the layer. Let $b$ be the column matrix of bias : each row is corresponding to a neuron of the layer. 

Let $e$ be the column matrix of error, then the training rule is:

$$W^{new} = W^{old} + e \times x^T$$
$$b^{new} = b^{old} + e$$

## 4. How to pick an architecture

Problem specifications help define the network in the following ways:
1. Number of network inputs = number of problem inputs
2. Number of neurons in output layer = number of problem outputs
3. Output layer transfer function choice at least partly determined by
problem specification of the outputs

**Exercise**

A single-layer neural network is to have six inputs and two outputs.
The outputs are to be limited to and continuous over the
range 0 to 1. What can you tell about the network architecture?
Specifically:
* How many neurons are required?
* What are the dimensions of the weight matrix?
* What kind of transfer functions could be used?
* Is a bias required?

**Answer**: 
* Two neurons, one for each output, are required.
* The weight matrix has two rows corresponding to the two neurons and
six columns corresponding to the six inputs. (The product is a two-element
vector.)
* Of the transfer functions we have discussed, the transfer function
would be most appropriate.
* Not enough information is given to determine if a bias is required.

## Problem

We have a classification problem with four classes of input vector. The four classes are : 
* class 1 : $x_1 = (1,1)$ and $x_2 = (1,2)$
* class 2 : $x_3 = (2,-1)$ and $x_4 = (2,0)$
* class 3 : $x_5 = (-1,2)$ and $x_6 = (-2,1)$
* class 4 : $x_7 = (-1,-1)$ and $x_8 = (-2,-2)$




a) Design a neural network to solve this problem.

We need 2 neurons and check if we can divide the 4 classes into 2 sets of 2.

<img src = "img/NNproblem1.png"> </img>

The answer is yes. 

Then we have to choose which value is expected according to the class of input. Let us choose theses target values :

* class 1 : $t_1 = t_2 = (0,0)$
* class 2 : $t_3 = t_4 = (0,1)$
* class 3 : $t_5 = t_6 = (1,0)$
* class 4 : $t_7 = t_8 = (1,1)$

Then we can  graphically find suitable weights for each neuron: $w_1 = (-3,-1)$ and $w_2 = (1,-2)$. 

It is easy to find correct bias by picking a point on each boundary line: $b_1 = 1$ and $b_2 = 0$. 

b) Train a perceptron network to solve this problem
using the perceptron learning rule.

_Tip: be careful of the size of your matrix when you make a product. Here is an example of product:_

## Gradient for a neural network

We consider a neural network with $n$ input $(x_1,...,x_n)$ and one output. That defines a fonction $F \colon \mathbb{R}^n \to \mathbb{R}$, $(x_1,...,x_n) \mapsto F(x_1,...,x_n)$.

Let $(a_1,...,a_m)$ be the set of weights of this neural network. Now we consider $(a_1,...,a_m)$ as variables and the function $\widetilde{F} \colon (a_1,...,a_m) \mapsto \widetilde{F}(a_1,...,a_m)$. 

In particular, we are interested in 
$$\frac{\partial \widetilde{F}}{\partial a_j}$$ for all $j \in \{1,...,m\}$ and then the gradient of a cost function $E = (\widetilde{F}-y_0)^2$.

## Gradient descent

#### Algorithm : 
It gives a sequence $x_0,x_1,...$ defined by this algorithm :
1. Compute gradient : $\nabla f(x_k)$ ;
2. Stopping criteria : $||\nabla f(x_k)||<\varepsilon$ ;
3. Choose a step value $\alpha_k >0$ ;
4. Iteration : $x_{k+1} = x_k - \alpha_k \nabla f(x_k)$.

In [None]:

from descent import *
from descente_stochastique import *
from descente_lot import *





def exemple1():
    # 2-var function
    def f(x, y):
        return x**2 + 3*y**2
    
    # handmade gradient
    def grad_f(x, y):
        g = [2*x, 6*y]
        return np.array(g)

    # Test
    print("--- gradient descent ---")
    X0 = np.array([2, 1])    
    my_step = 0.2
    X0 = np.array([-1, -1])    
    my_step = 0.1    
    display_descent(f, grad_f, X0, delta=my_step, nmax = 21)
    graphic_descent_2var_2d(f, grad_f, X0, delta=my_step, nmax = 10, zone = (-2.5,2.5,-1.5,1.5) ) 

    return

In [None]:
exemple1()

## First example with one neuron

### Model

We want to separate the plane according to these two sets of points : 

blue circles : (0, 3), (1, 1.5), (1, 4), (1.5, 2.5), (2, 2.5), (3, 3.5), (3.5, 3.25), (4, 3), (4, 4), (5, 4)

red squares : (1, 1), (2, 0.5), (2, 2), (3, 1.5), (3, 2.75), (4, 1), (4, 2.5), (4.5, 3), (5, 1), (5, 2.25).

with a single neuron perceptron : 

<img src="img/propagation_ex1.png"></img>

The activation function is the sigmoid function.

<img src="img/propagation_ex1_1.png"></img>

The cost function is 
$$E(a,b,c) = \frac{1}{N}\sum_{i=1}^N E_i(a,b,c)$$
where $E_i = (F(x_i,y_i)-t_i)^2$ and $t_i=1$ when $(x_i,y_i)$ is a red square, $t_i=0$ when $(x_i,y_i)$ is a blue circle. 

### Data : training set

In [None]:
blue_points = [(0, 3), (1, 1.5), (1, 4), (1.5, 2.5), (2, 2.5), (3, 3.5), (3.5, 3.25), (4, 3), (4, 4), (5, 4)]
red_points = [(1, 1), (2, 0.5), (2, 2), (3, 1.5), (3, 2.75), (4, 1), (4, 2.5), (4.5, 3), (5, 1), (5, 2.25)]

In [None]:
target = []
points = []
for x,y in blue_points:
    target.append(0)
    points.append((x,y))
    plt.scatter(x,y,color='blue')
for x,y in red_points:
    target.append(1)
    points.append((x,y))
    plt.scatter(x,y,color='red',marker='s')
plt.show()



### Gradient descent

We want to find the best weights $W=(a,b,c)$ by iteration : initialize $W_0 = (a_0,b_0,c_0)$, for example $W_0 = (0,1,-2)$ and fix a step value $\delta = 1$. The sequence of weights is define by
$$W_{k+1} = W_k - \delta \cdot \nabla E(W_k)$$

The local error is :
$$E_i(a,b,c) = (\sigma(ax_i+by_i+c)-t_i)^2$$

and the error is the sum of local error where $N$ is the number of points in the training set :

$$E(a,b,c) = \frac{1}{N}\sum_{i=1}^N E_i(a,b,c)$$

Notice that $\sigma' = \sigma(1-\sigma)$ so that 
$$\frac{\partial E_i}{\partial a}(x_i,y_i) = 2x_i \sigma_i(1-\sigma_i)(\sigma_i-t_i)$$
$$\frac{\partial E_i}{\partial b}(x_i,y_i) = 2y_i \sigma_i(1-\sigma_i)(\sigma_i-t_i)$$
$$\frac{\partial E_i}{\partial c}(x_i,y_i) = 2 \sigma_i(1-\sigma_i)(\sigma_i-t_i)$$
where $\sigma_i = \sigma(ax_i+by_i+c)$.

Finaly, $$\nabla E(W_k) = \frac{1}{N}\sum_{i=1}^N \left[\frac{\partial E_i}{\partial a},\frac{\partial E_i}{\partial b},\frac{\partial E_i}{\partial c}\right]$$

In [None]:
def sigmoid(y):
    return 1/(1+np.exp(-y))

def p(y):
    return y*(1-y)

def dsigmoid(y):
    return p(sigmoid(y))

In [None]:
W = np.array([0,1,-2])


def gradE(W,x,y,t):
    sigma = sigmoid(np.dot(W[:2],np.array([x,y]))+W[2])
    gradEa = 2*x*sigma*(1-sigma)*(sigma-t)
    gradEb = 2*y*sigma*(1-sigma)*(sigma-t)
    gradEc = 2*sigma*(1-sigma)*(sigma-t)
    return gradEa,gradEb,gradEc

def E(x,y,W,t):
    return (sigmoid(np.dot(W[:2],np.array([x,y]))+W[2])-t)**2

def gradE_total(W):
    g = np.array([0,0,0])
    i=0
    for (x,y) in points:
        g = np.array(gradE(W,x,y,target[i])) + g
        i+=1
    g=g/(i)
    return g

def E_total(W):
    e = 0.0
    i=0
    for (x,y) in points:
        e = E(x,y,W,target[i]) + e
        i+=1
    e = e/(i)
    return e

In [None]:
#init
W = np.array([0,1,-2])
epoch = 1000  #change number of iterations
delta = 1
error = []

for i in range(epoch):
    W = W - delta*gradE_total(W)
    error.append(E_total(W))

print('(a,b,c) = '+str(W))

In [None]:
for x,y in blue_points:
    plt.scatter(x,y,color='blue')
for x,y in red_points:
    plt.scatter(x,y,color='red',marker='s')
h = np.array([-0.1,5.1])
v = (-W[0]*h-W[2])/W[1]  #equation of boundary line
plt.fill_between(h,v,4.1,color="blue",alpha=0.1)
plt.fill_between(h,v,0,color="red",alpha=0.1)

plt.plot(h,v,'-')
plt.title('epoch= '+str(epoch))
plt.show()
print('Error = '+ str(E_total(W)))

In [None]:
plt.plot(range(epoch),error)
plt.title('Evolution of error up to '+str(epoch)+ ' iterations')
plt.show()

## Second example : approximate a step function

We want to find a neural network realizing a function $F$ such that :
* if $x \in [0;2] \cup [6;8]$, $F(x)=0$ ;
* if $x \in [3;5]$, $F(x) = 1$

### Data : training set
We consider 10 blue circles on $[0;2]$, 10 blue circles on $[6;8]$ and 10 red squares on $[3;5]$.



In [None]:
blue_circles = []
target = []
X = np.linspace(0,2,10)

for x in X:
    blue_circles.append((x,0))

X = np.linspace(6,8,10)

for x in X:
    blue_circles.append((x,0))

red_squares = []

X = np.linspace(3,5,10)

for x in X:
    red_squares.append((x,1))

for x,y in blue_circles:
    plt.scatter(x,y,color='blue')
for x,y in red_squares:
    plt.scatter(x,y,color='red',marker='s')

points = blue_circles+red_squares
plt.show()

## Model

We choose this architecture of neural network :

<img src="img/propagation_ex2.png"></img>

with the sigmoid activation function. 

Thus, we are looking for 7 weights $(a_1,a_2,...,a_7)$. 

In [None]:
W = [0.0,1.0,0.0,-1.0,1.0,1.0,-1.0]

def F(x,W):
    layer1 = [sigmoid(W[0]*x+W[1]),sigmoid(W[2]*x+W[3])]
    layer2 = sigmoid(W[4]*layer1[0]+W[5]*layer1[1]+W[6])
    return layer2

def gradF(x,W):
    layer1 = [sigmoid(W[0]*x+W[1]),sigmoid(W[2]*x+W[3])]
    layer2 = sigmoid(W[4]*layer1[0]+W[5]*layer1[1]+W[6])
    dsigma2 = p(layer2)
    g5 = layer1[0]*dsigma2
    g6 = layer1[1]*dsigma2
    g7 = dsigma2
    dsigma11 = p(layer1[0])
    dsigma12 = p(layer1[1])
    g2 = dsigma11/layer1[0]*W[4]*g5
    g1 = x * g2
    g4 = dsigma12/layer1[1]*W[5]*g6
    g3 = x * g4
    grad = [g1,g2,g3,g4,g5,g6,g7]
    return grad

def gradE(W,x,y):
    output = F(x,W)
    grad = [2*(F(x,W)-y)*g for g in gradF(x,W)]
    return grad


def E(W,x,t):
    return (F(x,W)-t)**2

def gradE_total(W):
    g = np.array([0,0,0,0,0,0,0])
    i=0
    for (x,y) in points:
        g = np.array(gradE(W,x,y)) + g
        i+=1
    return g/i

def E_total(W):
    e = 0.0
    i=0
    for (x,y) in points:
        e = E(W,x,y) + e
        i+=1
    e = e/(i)
    return e

In [None]:
epoch = 5000  #change number of iterations
delta = 1
error = []

for i in range(epoch):
    W = W - delta*gradE_total(W)
    error.append(E_total(W))

In [None]:
plt.plot(range(epoch),error)
plt.title('Evolution of error up to '+str(epoch)+ ' iterations')
plt.show()

In [None]:
for x,y in blue_circles:
    plt.scatter(x,y,color='blue')
for x,y in red_squares:
    plt.scatter(x,y,color='red',marker='s')
X = np.linspace(0,8,200)
Y = [F(x,W) for x in X]
plt.plot(X,Y,color='purple',linewidth=4)

plt.show()

# Handwritten numbers recognition with TensorFlow 

https://colab.research.google.com/drive/1bz0-4EU0N1NaLZkJKDyQU0E3W_f_Co8y?usp=sharing

# Text recognition
https://github.com/exo7math/deepmath-exo7/blob/master/pythontf2/python/tf2_texte_cours.py