# Adaline et Régression Logistique

We will be interested in the implementation of the perceptron algorithm (Rosenblatt, 68), Adaline (Widrow et Hoff, 60) and Logisitc Regression (Cox, 66) whose pseudo-code are the following:

Perceptron:
`Input: Train, eta, MaxEp
init: w
epoch = 0
err = 1
m = len(Train)
while epoque <= MaxEp and err! = 0
    err = 0
    for i in 1: m
        h <- w * x
        if (y * h <= 0)
            w <- w + eta * y * x
            err <- err + 1
     epoch <- epoch + 1
output: w`

Adaline:
`input: Train, eta, MaxEp
init : w
epoque=0
err=1
m = len(Train)
while epoque<=MaxEp and err!=0
    err=0
    for i in 1:m
        h <- w*x
        if(y*h<=0)
           err <- err+1
        w <- w + eta*(y-dp)*x
     epoque <- epoque+1
output: w`

Logistic Regression:
`input: Train, eta, MaxEp
init : w
epoque=0
err=1
m = len(Train)
while epoque<=MaxEp and err!=0
    err=0
    for i in 1:m
        choisir un exemple (x,y) de Train de façon aléatoire
        h <- w*x
        if(y*h<=0)
           err <- err+1
        w <- w + eta*y*(1-sigm(y*dp))*x
     epoque <- epoque+1
output: w`

1. Create a list of 4 elements corresponding to the logical AND example called `Train`:
$Train=\{((+1,+1),+1),((-1,+1),-1),((-1,-1),-1),((+1,-1),-1)\}$

Each element of the list is a list which last characteristic is the class of the example and the first characteristics their coordinates.

    

In [3]:
Train=[[1,1,1], [-1,1,-1], [-1,-1,-1], [1,-1,-1]] # To be filled

2. Code the Perceptron, Adaline and LR (Logistic regression) programs

Hint: You can write a function that calculates the dot product between an example $\mathbf{x} = (x_1, \ldots, x_d)$ and the weight vector $\mathbf{w} = (w_0, w_1, \ldots, w_d)$: 
$ h(\mathbf{x},\mathbf{w}) = w_0 + \ sum_ {j = 1} ^ d w_j x_j $.


3. Apply the three learning models on the logical AND, and calculate the model error rate on this basis.

Hint: You can write a function that takes a weight vector $\mathbf{w}$ and an example $(\mathbf{x},y)$ and calculates the error rate of the model with weight $\mathbf{w}$.

In [93]:
import numpy as np
import random as rd

def h_(x,w):
    # The prediction of the model
    Pred=w[0] + np.dot(x,w[1:])
    return Pred

def sigmoid(z):
    return 1.0 / (1 + np.exp(-z))

def Perceptron(Train,eta,MaxEp):
    # Perceptron Algorithm 
    d=len(Train[0])-1
    m=len(Train)
    W=[0.0 for i in range(d+1)] 
    
    epoque = 0
    err = 1
    while (epoque<=MaxEp and err !=0):
        err=0
        for i in range(0,m):
            X, y = Train[i][0:d], Train[i][d]
            h = h_(X,W)
            X.insert(0,1)
            if(y*h <= 0):
                W = np.sum([W,np.dot(eta*y,X)], axis=0)
                err = err + 1
        epoque = epoque + 1    
    return W


def Adaline(Train,eta,MaxEp):
    # Adaline Algorithm 
    d=len(Train[0])-1
    m=len(Train)
    W=[0.0 for i in range(d+1)]
    
    epoque=0
    err=1
    while(epoque<=MaxEp and err!=0):
        err=0
        for i in range(0,m):
            X, y = Train[i][0:d], Train[i][d]
            h = h_(X,W)
            X.insert(0,1)
            if(y*h<=0):
                err = err+1
            W = np.sum([W, np.dot(eta*(y-h),X)], axis=0)
        epoque = epoque+1 
    return W

def LR(Train,eta,MaxEp):
    # Logisitc Regression Algorithm 
    d=len(Train[0])-1
    m=len(Train)
    W=[0.0 for i in range(d+1)]
    
    epoque=0
    err=1
    while(epoque<=MaxEp and err!=0):
        err=0
        for i in range(0,m):
            e = rd.randint(0,m-1) #choisir un exemple (x,y) de Train de façon aléatoire
            X, y = Train[e][0:d], Train[e][d]
            h = h_(X,W)
            X.insert(0,1)
            if(y*h<=0):
                err = err+1
            W = np.sum([W, np.dot(eta*y*(1-sigmoid(y*h)),X)] , axis=0)
        epoque = epoque+1    
    return W


In [94]:
WP = Perceptron(Train,1,20)

# Train2 = [[1,1,1], [-1,1,1], [-1,-1,-1], [1,-1,1]] #OR
# W2, err2 = Perceptron(Train2,1,20)
# W2


WA = Adaline(Train,0.01,20)

# WA2, errA2 = Adaline(Train2,0.01,20)
# WA2

WL = LR(Train, 0.001, 20)


In [96]:
def EmpiricalRisk(data,w):
    nb_error=0
    for i in range(len(data)):
        X=data[i][:len(data[0])-1]
        y=data[i][len(data[0])-1]
        X.insert(0,1)
        if(y*np.dot(w,X)<=0):
            nb_error+=1
    return nb_error/len(data)

4. We are now going to focus on the behavior of the three models on http://archive.ics.uci.edu/ml/datasets/connectionist+bench+(sonar,+mines+vs.+rocks), https://archive.ics.uci.edu/ml/datasets/spambase, https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29, https://archive.ics.uci.edu/ml/datasets/Ionosphere. These files are in the current respository with the names `sonar.txt`; `spam.txt`; `wdbc.txt` and `ionoshpere.txt`. We can use the following `ReadCollection` function in order to read the files in the form of the training set that is requested. 

In [82]:
from math import sqrt
import pandas as pd
import random
from sklearn.model_selection import train_test_split

def Normalize(x):
    norm=0.0
    for e in x:
        norm+=e**2
    for i in range(len(x)):
        x[i]/=sqrt(norm)
    return x

# Read wdbc.txt file in the Python format of request training set 
def ReadCollection(filename):
    tag_df=pd.read_table(filename,sep=',',header=None)
    Dic={'M': -1, 'B': +1}
    X=[]
    for e in range(len(tag_df)):
        x=list(tag_df.loc[e,:])
        x.pop(0)
        cls=x.pop(0)
        x=Normalize(x)
        x.insert(len(x),Dic[cls])
        X.append(x)

    
    random.shuffle(X)

    return X
def ReadCollection2(filename):
    tag_df=pd.read_table(filename,sep=',',header=None)
    if(filename == 'ionosphere.txt'): 
        Dic={'g': -1, 'b': +1}
    elif(filename == 'sonar.txt'): 
        Dic={'M': -1, 'R': +1}
    if(filename == 'spam.txt'): 
        Dic={0: -1, 1: +1} #1 and 0
    X=[]
    for e in range(len(tag_df)):
        x=list(tag_df.loc[e,:])
        cls=x.pop(len(x)-1)
        x=Normalize(x)
        x.insert(len(x),Dic[cls])
        X.append(x)

    
    random.shuffle(X)

    return X


In [90]:
wdbc = ReadCollection('wdbc.txt')
iono = ReadCollection2('ionosphere.txt')
sonar = ReadCollection2('sonar.txt')
spam = ReadCollection2('spam.txt')

 2. Run the three models on these files with $\eta=0.01$ et $\eta=0.1$ and `MaxEp=500`.
 
 3. Report in the table below the average of the error rates on the test by repeating each experiment 20 times. 
 
 <br>
 <br>
 
 
 <center> $\eta=0.01$, MaxE$=500$ </center>
    
    
  | Collection | Perceptron | Adaline |    RL    |
  |------------|------------|---------|----------|
  |   WDBC     |0.10069930  |0.077622 |0.0926573 |                 
  | Ionosphere |0.10909090  |0.102272 |0.0738636 |
  |   Sonar    |0.26634615  |0.244230 |0.2663461 |
  |   Spam     |            |         |          |
 
 <br><br>
  
  <center> $\eta=0.1$, MaxEp$=500$ </center>
    
    
  | Collection | Perceptron | Adaline |    RL    |
  |------------|------------|---------|----------|
  |   WDBC     |0.12797     |0.091958 |0.086713  |                 
  | Ionosphere |0.1261363   |0.104545 |0.0988636 |
  |   Sonar    |0.3038461   |0.300961 |0.254807  |
  |   Spam     |0.2076455   |0.270026 |0.1632927 |
  
  Hint: you can use the following function

In [107]:
def errAllModelsForEachDataSet(X,eta):
    errP=errA=errL=0.0
    for i in range(20):
        x_train ,x_test = train_test_split(X,test_size=0.25)
        WLP=Perceptron(x_train,eta,500)
        errP+=EmpiricalRisk(x_test,WLP)
        WLA=Adaline(x_train,eta,500)
        errA+=EmpiricalRisk(x_test,WLA)
        WLR=LR(x_train,eta,500)
        errL+=EmpiricalRisk(x_test,WLR)

    print("Err perceptron=",errP/float(20),"Err Adaline=",errA/float(20),"Err RL=",errL/float(20))

Error for $\eta=0.1$ 

In [99]:
print("wbc:")
errAllModelsForEachDataSet(wdbc,0.1)
print("ionosphere:")
errAllModelsForEachDataSet(iono,0.1)
print("sonar")
errAllModelsForEachDataSet(sonar,0.1)
print("spam")
errAllModelsForEachDataSet(spam,0.1)

Err perceptron= 0.12797202797202795 Err Adaline= 0.09195804195804197 Err RL= 0.0867132867132867


In [102]:
errAllModelsForEachDataSet(iono,0.1)

Err perceptron= 0.12613636363636363 Err Adaline= 0.10454545454545452 Err RL= 0.09886363636363636


In [103]:
errAllModelsForEachDataSet(sonar,0.1)

Err perceptron= 0.30384615384615393 Err Adaline= 0.30096153846153845 Err RL= 0.2548076923076924


In [104]:
errAllModelsForEachDataSet(spam,0.1)

Err perceptron= 0.2076455256298871 Err Adaline= 0.2700260642919201 Err RL= 0.16329278887923543


Error for $\eta=0.01$ 

In [None]:
print("wbc:")
errAllModelsForEachDataSet(wdbc,0.01)
print("ionosphere:")
errAllModelsForEachDataSet(iono,0.01)
print("sonar")
errAllModelsForEachDataSet(sonar,0.01)
print("spam")
errAllModelsForEachDataSet(spam,0.01)

Err perceptron= 0.1006993006993007 Err Adaline= 0.07762237762237763 Err RL= 0.09265734265734266
Err perceptron= 0.1090909090909091 Err Adaline= 0.10227272727272725 Err RL= 0.07386363636363635
Err perceptron= 0.2663461538461539 Err Adaline= 0.2442307692307692 Err RL= 0.2663461538461538
