Utilização do *Car Evaluation Data Set* ([https://archive.ics.uci.edu/ml/datasets/Car+Evaluation](https://archive.ics.uci.edu/ml/datasets/Car+Evaluation)) para testar o SVM.

O *dataset* possui os seguintes atributos:
* buying: vhigh, high, med, low. 
* maint: vhigh, high, med, low. 
* doors: 2, 3, 4, 5more. 
* persons: 2, 4, more. 
* lug_boot: small, med, big. 
* safety: low, med, high. 

E as seguintes classes: unacc, acc, good, vgood

**Carregamento do dataset**:

In [11]:
import numpy as np
from sklearn.preprocessing import LabelEncoder
from urllib2 import urlopen


url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data'
filedata = urlopen(url)
data = filedata.read()
dataset = np.array([s.split(',') for s in data.split('\n')][:-1])

# Transformação dos valores de categórico para numérico
le = LabelEncoder()
features = np.array([le.fit_transform(f) for f in dataset[:, :-1].T]).T
print features.shape

# obtendo a coluna com as respostas
labels = dataset[:, -1]
print labels.shape

(1728, 6)
(1728,)


**Escolhendo os hiperparâmetros**:

In [19]:
from sklearn.metrics import classification_report as report
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn import svm


# Dividindo em conjuntos de treino (80%) e teste (20%)
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, shuffle=True)

# Configurando todas as combinações de hiperparâmetros
param_grid = [
    {'C': [1, 10, 100], 'kernel': ['linear']},
    {'C': [1, 10, 100], 'gamma': [0.1, 0.01, 0.001], 'kernel': ['rbf']},
]

svc = svm.SVC(gamma="scale")
clf = GridSearchCV(svc, param_grid, cv=5)
clf.fit(X_train, y_train)

means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, clf.cv_results_['params']):
    print("%0.3f (+/-%0.03f) for %r" % (mean, std * 2, params))
print 
print "Melhores parâmetros calculados:", clf.best_params_
y_true, y_pred = y_test, clf.predict(X_test)
print report(y_true, y_pred)

0.721 (+/-0.015) for {'kernel': 'linear', 'C': 1}
0.725 (+/-0.018) for {'kernel': 'linear', 'C': 10}
0.726 (+/-0.028) for {'kernel': 'linear', 'C': 100}
0.818 (+/-0.037) for {'kernel': 'rbf', 'C': 1, 'gamma': 0.1}
0.699 (+/-0.001) for {'kernel': 'rbf', 'C': 1, 'gamma': 0.01}
0.699 (+/-0.001) for {'kernel': 'rbf', 'C': 1, 'gamma': 0.001}
0.973 (+/-0.013) for {'kernel': 'rbf', 'C': 10, 'gamma': 0.1}
0.719 (+/-0.015) for {'kernel': 'rbf', 'C': 10, 'gamma': 0.01}
0.699 (+/-0.001) for {'kernel': 'rbf', 'C': 10, 'gamma': 0.001}
0.993 (+/-0.011) for {'kernel': 'rbf', 'C': 100, 'gamma': 0.1}
0.901 (+/-0.032) for {'kernel': 'rbf', 'C': 100, 'gamma': 0.01}
0.706 (+/-0.007) for {'kernel': 'rbf', 'C': 100, 'gamma': 0.001}

Melhores parâmetros calculados: {'kernel': 'rbf', 'C': 100, 'gamma': 0.1}
              precision    recall  f1-score   support

         acc       0.99      0.97      0.98        73
        good       0.90      0.95      0.92        19
       unacc       1.00      1.00      1.0