# Laboratoire 3 : Machines à vecteurs de support et réseaux neuronaux
#### Département du génie logiciel et des technologies de l’information

| Étudiants             |                                                         |
|-----------------------|---------------------------------------------------------|
| Jean-Philippe Decoste |  DECJ19059105                                           |
| Ahmad Al-Taher        |   ALTA22109307                                          |
| Stéphanie Lacerte     |   LACS06629109                                          |
| Cours                 | GTI770 - Systèmes intelligents et apprentissage machine |
| Session               | Automne 2018                                            |
| Groupe                | 2                                                       |
| Numéro du laboratoire | 02                                                      |
| Professeur            | Hervé Lombaert                                          |
| Chargé de laboratoire | Pierre-Luc Delisle                                      |
| Date                  | 30 oct 2018                                             |

In [12]:
import csv
import math
import os

import graphviz
import matplotlib.pyplot as plt
import numpy as np
from imageV3 import Image as imageFeat
from sklearn.metrics import classification_report
from sklearn.model_selection import GridSearchCV, StratifiedShuffleSplit
from sklearn.svm import SVC

from spam import Spam
import utilities
from tabulate import tabulate

In [13]:
def prepareDataset():
    print("PREPARING DATASETS")
    allData_length = len(list(csv.reader(open(r"galaxies.csv"))))
    progress = 0
    datas = []

    print("Reading Galaxies features:")
    utilities.printProgressBar(0, allData_length, prefix='Progress:', suffix='Complete', length=50)
    with open(r"galaxies.csv", 'r') as theFile:
        primitives = csv.reader(theFile, delimiter=',', quotechar='|')

        for row in primitives:
            progress += 1
            utilities.printProgressBar(progress+1, allData_length, prefix='Progress', suffix='Complete', length=50)

            values = [float(i) for i in row]
            datas.append(imageFeat(values))
            
    print("\n-> Done preparing the dataset\n")

    #3. Split dataset using model_selection
    for data in np.array(datas):
        features.append(data.features)
        answers.append(data.answer)

## SVM

In [84]:
def svm():
    
    validation = StratifiedShuffleSplit(n_splits=5, test_size=0.2)
    #linear
    c=[0.001,0.1,1.0,10.0]
    params = dict(kernel=['linear'], C=c ,class_weight=['balanced'], cache_size=[2048])
    grid = GridSearchCV(SVC(), param_grid=params, cv=validation, n_jobs=-1, iid=True)
    #Fit the feature to svm algo
    grid.fit(features, answers)
    #build table
    outPut = []
    for i in range(0, 4):
        outPut.append([grid.cv_results_['params'][i]['C'],
                          "{0:.2f}%".format(grid.cv_results_['mean_test_score'][i]*100)])
    #print table
    print("SVM linear")
    print(tabulate(outPut, headers=['Variable C','class_weight= {‘balanced’}']))
    #print raw 
    print("The best parameters are ", grid.best_params_," with a score of {0:.2f}%".format(float(grid.best_score_)* 100))
    #rbf
    params = dict(kernel=['rbf'], C=c, gamma=c ,class_weight=['balanced'], cache_size=[2048])
    grid = GridSearchCV(SVC(), param_grid=params, cv=validation, n_jobs=-1, iid=True)
    #Fit the feature to svm algo
    grid.fit(features, answers)
    #build table
    outPut = []
    outPut.append(['0.001',
                   "{0:.2f}%".format(grid.cv_results_['mean_test_score'][0]*100),
                   "{0:.2f}%".format(grid.cv_results_['mean_test_score'][1]*100),
                   "{0:.2f}%".format(grid.cv_results_['mean_test_score'][2]*100),
                   "{0:.2f}%".format(grid.cv_results_['mean_test_score'][3]*100)])
    outPut.append(['0.1',
                   "{0:.2f}%".format(grid.cv_results_['mean_test_score'][4]*100),
                   "{0:.2f}%".format(grid.cv_results_['mean_test_score'][5]*100),
                   "{0:.2f}%".format(grid.cv_results_['mean_test_score'][6]*100),
                   "{0:.2f}%".format(grid.cv_results_['mean_test_score'][7]*100)])
    outPut.append(['1.0',
                   "{0:.2f}%".format(grid.cv_results_['mean_test_score'][8]*100),
                   "{0:.2f}%".format(grid.cv_results_['mean_test_score'][7]*100),
                   "{0:.2f}%".format(grid.cv_results_['mean_test_score'][9]*100),
                   "{0:.2f}%".format(grid.cv_results_['mean_test_score'][10]*100)])
    outPut.append(['10.0',
                   "{0:.2f}%".format(grid.cv_results_['mean_test_score'][11]*100),
                   "{0:.2f}%".format(grid.cv_results_['mean_test_score'][12]*100),
                   "{0:.2f}%".format(grid.cv_results_['mean_test_score'][13]*100),
                   "{0:.2f}%".format(grid.cv_results_['mean_test_score'][14]*100)])
    
    #print table
    print("SVM rbf")
    print(tabulate(outPut, headers=['Variable C','Ɣ=0.001','Ɣ=0.1','Ɣ=1.0','Ɣ=10.0']))
    print("The best parameters are ", grid.best_params_," with a score of {0:.2f}%".format(float(grid.best_score_)* 100))
    
    print("-> Done\n\n")

## Réseaux neuronaux

## Main

In [85]:
#1.A Read Galaxy features (name of file, path, n_split, test size, random state)
features = []
answers = []
prepareDataset()

#dont run because it is time consuming
svm()


PREPARING DATASETS
Reading Galaxies features:
Progress |**************************************************| 100.0% Complete
-> Done preparing the dataset

SVM linear
  Variable C  class_weight= {‘balanced’}
------------  ----------------------------
       0.001  78.84%
       0.1    80.09%
       1      80.00%
      10      79.99%
The best parameters are  {'C': 0.1, 'cache_size': 2048, 'class_weight': 'balanced', 'kernel': 'linear'}  with a score of 80.09%
SVM rbf
  Variable C  Ɣ=0.001    Ɣ=0.1    Ɣ=1.0    Ɣ=10.0
------------  ---------  -------  -------  --------
       0.001  51.89%     68.10%   51.89%   51.89%
       0.1    73.39%     80.66%   80.40%   78.23%
       1      79.90%     78.23%   82.70%   82.38%
      10      79.35%     81.68%   83.66%   82.12%
The best parameters are  {'C': 10.0, 'cache_size': 2048, 'class_weight': 'balanced', 'gamma': 0.1, 'kernel': 'rbf'}  with a score of 83.66%
-> Done




In [None]:
output 
SVM linear
  Variable C  class_weight= {‘balanced’}
------------  ----------------------------
       0.001  78.84%
       0.1    80.09%
       1      80.00%
      10      79.99%
The best parameters are  {'C': 0.1, 'cache_size': 2048, 'class_weight': 'balanced', 'kernel': 'linear'}  with a score of 80.09%
SVM rbf
  Variable C  Ɣ=0.001    Ɣ=0.1    Ɣ=1.0    Ɣ=10.0
------------  ---------  -------  -------  --------
       0.001  51.89%     68.10%   51.89%   51.89%
       0.1    73.39%     80.66%   80.40%   78.23%
       1      79.90%     78.23%   82.70%   82.38%
      10      79.35%     81.68%   83.66%   82.12%
The best parameters are  {'C': 10.0, 'cache_size': 2048, 'class_weight': 'balanced', 'gamma': 0.1, 'kernel': 'rbf'}  with a score of 83.66%
-> Done

## Introduction

## Question 1 
### Présentation de la méthode de validation

## Question 2
### Description du modèle élaboré

## Question 3
### Analyse de la phase d'entraînement

## Question 4
### Matrice des expérimentations

### Étude des hyperparamètres des modèles

## Question 5
### Matrice des expérimentations

### Études des hyperparamètres des modèles

## Question 6
### Discussion de l'impact de la taille des données

## Question 7
### Discussion de l'impact du bruit

## Question 8
### Formulation des recommandations

## Question 9
### Améliorations possibles

## Conclusion

## Bibliographie