# Adaline et Régression Logistique

We will be interested in the implementation of the perceptron algorithm (Rosenblatt, 68), Adaline (Widrow et Hoff, 60) and Logisitc Regression (Cox, 66) whose pseudo-code are the following:

Perceptron:
`Input: Train, eta, MaxEp
init: w
epoch = 0
err = 1
m = len(Train)
while epoque <= MaxEp and err! = 0
    err = 0
    for i in 1: m
        h <- w * x
        if (y * h <= 0)
            w <- w + eta * y * x
            err <- err + 1
     epoch <- epoch + 1
output: w`

Adaline:
`input: Train, eta, MaxEp
init : w
epoque=0
err=1
m = len(Train)
while epoque<=MaxEp and err!=0
    err=0
    for i in 1:m
        h <- w*x
        if(y*h<=0)
           err <- err+1
        w <- w + eta*(y-dp)*x
     epoque <- epoque+1
output: w`

Logistic Regression:
`input: Train, eta, MaxEp
init : w
epoque=0
err=1
m = len(Train)
while epoque<=MaxEp and err!=0
    err=0
    for i in 1:m
        choisir un exemple (x,y) de Train de façon aléatoire
        h <- w*x
        if(y*h<=0)
           err <- err+1
        w <- w + eta*y*(1-sigm(y*dp))*x
     epoque <- epoque+1
output: w`

1. Create a list of 4 elements corresponding to the logical AND example called `Train`:
$Train=\{((+1,+1),+1),((-1,+1),-1),((-1,-1),-1),((+1,-1),-1)\}$

Each element of the list is a list which last characteristic is the class of the example and the first characteristics their coordinates.

    

In [17]:
Train=[[+1,+1,+1],[-1,+1,-1],[-1,-1,-1],[+1,-1,-1]]

2. Code the Perceptron, Adaline and LR (Logistic regression) programs

Hint: You can write a function that calculates the dot product between an example $\mathbf{x} = (x_1, \ldots, x_d)$ and the weight vector $\mathbf{w} = (w_0, w_1, \ldots, w_d)$: 
$ h(\mathbf{x},\mathbf{w}) = w_0 + \ sum_ {j = 1} ^ d w_j x_j $.


In [32]:
import numpy as np
import math
import random

def h(x,w):
    # The prediction of the model
    Pred=w[0]
    for i in range(1, len(w)):
        Pred += x[i-1]*w[i]
    return Pred


def Perceptron(Train,eta,MaxEp):
    # Perceptron Algorithm 
    d=len(Train[0])-1
    m=len(Train)
    W=[0.0 for i in range(d+1)]
    epoch = 0
    error = 1
    
    while epoch <= MaxEp and error != 0:
        error = 0
        for i in range(0, m):
            x = Train[i][:-1]
            y = Train[i][-1]
            pred = h(x, W)
            if pred * y <= 0:
                W = np.add(W, np.array([1,] + x) * y * eta)
            error += 1
        epoch += 1
        
    return W

def Adaline(Train,eta,MaxEp):
    # Adaline Algorithm 
    d=len(Train[0])-1
    m=len(Train)
    W=[0.0 for i in range(d+1)]
    epoch = 0
    error = 1
    
    while epoch <= MaxEp and error != 0:
        error = 0
        for i in range(0, m):
            x = Train[i][:-1]
            y = Train[i][-1]            
            pred = h(x, W)
            if pred*y <= 0:
                error += 1 
            W = np.add(W, np.array([1,] + x) * eta * (y-pred))
        epoch += 1
        
    return W

def sig(x):
    return 1 / (1 + math.exp(-x))


def LR(Train,eta,MaxEp):
    # Logisitc Regression Algorithm 
    d=len(Train[0])-1
    m=len(Train)
    W=[0.0 for i in range(d+1)]
    epoch = 0
    error = 1
    indexes = list(range(0,m))
        
    while epoch<=MaxEp:
        error = 0
        random.shuffle(indexes)
        for i in indexes:
            x = Train[i][:-1]
            y = Train[i][-1]
            pred = h(x, W)
            if pred*y <= 0:
                error += 1
            W = np.add(W, np.array([1,] + x) * eta * y * (1-sig(y*pred)))
        epoch += 1
        
    return W


3. Apply the three learning models on the logical AND, and calculate the model error rate on this basis.

Hint: You can write a function that takes a weight vector $\mathbf{w}$ and an example $(\mathbf{x},y)$ and calculates the error rate of the model with weight $\mathbf{w}$.

In [33]:
W1 = Perceptron(Train, 0.1, 10)
W2 = Adaline(Train, 0.1, 10)
W3 = LR(Train, 0.1, 10)

def EmpiricalRisk(Test,W):
    E=0.0
    m=len(Test)
    # The empirical error of a model with weight W on a test set of size m
    for t in Test:
        pred = h(t[:-1], W)
        if pred * t[-1] <= 0:
            E += 1    
    return E/float(m)

print('Perceptron \n', W1, '\nError: ', EmpiricalRisk(Train, W1))
print('\nAdaline \n', W2, '\nError: ', EmpiricalRisk(Train, W2))
print('\nLR: \n', W3, '\nError: ', EmpiricalRisk(Train, W3))

Perceptron 
 [-0.1  0.1  0.1] 
Error:  0.0

Adaline 
 [-0.32703461  0.28199519  0.29815261] 
Error:  0.0

LR: 
 [-0.72984281  0.72087887  0.72724049] 
Error:  0.0


4. We are now going to focus on the behavior of the three models on http://archive.ics.uci.edu/ml/datasets/connectionist+bench+(sonar,+mines+vs.+rocks), https://archive.ics.uci.edu/ml/datasets/spambase, https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29, https://archive.ics.uci.edu/ml/datasets/Ionosphere. These files are in the current respository with the names `sonar.txt`; `spam.txt`; `wdbc.txt` and `ionoshpere.txt`. We can use the following `ReadCollection` function in order to read the files in the form of the training set that is requested. 

In [61]:
from math import sqrt
import pandas as pd
import random
from sklearn.model_selection import train_test_split

def Normalize(x):
    norm=0.0
    for e in x:
        norm+=e**2
    for i in range(len(x)):
        x[i]/=sqrt(norm)
    return x

# Read file in the Python format of request training set 
def ReadCollection(filename, class_mapping, f_clean):
    tag_df=pd.read_table(filename,sep=',',header=None)
    X=[]
    for e in range(len(tag_df)):
        x, cls = f_clean(list(tag_df.loc[e,:]))
        x=Normalize(x)
        x.insert(len(x),class_mapping[cls])
        X.append(x)

    random.shuffle(X)

    return X

def wdbc_clean(row):
    row.pop(0)
    cls = row.pop(0)
    return row, cls
                         
def default_clean(row):
    cls = row.pop(-1)
    return row, cls
                         
wdbc_map = {'M' : -1, 'B' : +1}                       
ionosphere_map = {'g' : -1, 'b' : +1}
sonar_map = {'R': -1, 'M': +1}                         
spam_map = {0: -1, 1: 1}

 2. Run the three models on these files with $\eta=0.01$ et $\eta=0.1$ and `MaxEp=500`.
 
 3. Report in the table below the average of the error rates on the test by repeating each experiment 20 times. 
 
 <br>
 <br>
 

 <center> $\eta=0.01$, MaxE$=500$ </center>


  | Collection | Perceptron | Adaline |    RL    |
  |------------|------------|---------|----------|
  |   WDBC     |0.1363636   |0.0874125|0.0972028 |                 
  | Ionosphere |0.1255682   |0.1056818| 0.08125  |
  |   Sonar    |0.2875      |0.2230769|0.2605769 |
  |   Spam     |0.1801911   |0.2457428|0.2244135 |
 
 <br><br>
  
  <center> $\eta=0.1$, MaxEp$=500$ </center>


  | Collection | Perceptron | Adaline |    RL    |
  |------------|------------|---------|----------|
  |   WDBC     |  0.1139868 |0.1045454|0.0853146 |                 
  | Ionosphere |  0.1136364 |0.1090909|0.0886363 |
  |   Sonar    |  0.2846154 | 0.2875  |0.2288462 |
  |   Spam     |  0.1935708 |0.2824066|0.1566029 |
  
  Hint: you can use the following function

In [63]:
from datetime import datetime

def EmpiricalRisk(Test,W):
    E=0.0
    m=len(Test)
    # The empirical error of a model with weight W on a test set of size m
    for t in Test:
        pred = h(t[:-1], W)
        if pred * t[-1] <= 0:
            E += 1    
    return E/float(m)

X = ReadCollection("spam.txt", spam_map, default_clean)

errP = errA = errL = 0.0
n = 0.1
MaxEp = 500
print('Exec start: ', datetime.now().time())
for i in range(20):
    x_train ,x_test = train_test_split(X,test_size=0.25)
    WLP = Perceptron(x_train,n,MaxEp)
    errP += EmpiricalRisk(x_test,WLP)
    WLA = Adaline(x_train,n,MaxEp)
    errA += EmpiricalRisk(x_test,WLA)
    WLR = LR(x_train,n,MaxEp)
    errL += EmpiricalRisk(x_test,WLR)

print('Exec end: ', datetime.now().time())
    
print("Err perceptron=", errP/float(20), "Err Adaline=", errA/float(20), "Err RL=", errL/float(20))

Exec start:  17:37:59.010756
Exec end:  18:18:06.964654
Err perceptron= 0.19357080799304954 Err Adaline= 0.2824066029539531 Err RL= 0.15660295395308427
