Tema: **Car evaluation Data Set**

Objetivo: Efetuar avaliação e classificação da qualidade dos carros, atribuindo uma das classes (**unacc** (inaceitável), **acc** (aceitável), **good** (bom), **vgood** (muito bom)).

Dataset: Composto por 1728 amostras, coletadas sobre diferentes tipos de carros com base em 6 atributos (buying (preço de compra), maint (preço da manutenção), doors (número de portas), persons (capacidade em número de pessoas), lug_boot (o tamanho do porta-malas), safety (segurança estimada do carro)).

In [24]:
# Bibliotecas
from sklearn import neighbors, model_selection
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np

In [25]:
# Carregar os dados (CSV)
url = 'https://raw.githubusercontent.com/ezefranca/22292-Deep-Learning-Aplicado/main/trabalho1/car.data'
df = pd.read_csv(url)
column_names = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'class']
car_data = pd.read_csv(url, names=column_names)
car_data.head()

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,class
0,vhigh,vhigh,2,2,small,low,unacc
1,vhigh,vhigh,2,2,small,med,unacc
2,vhigh,vhigh,2,2,small,high,unacc
3,vhigh,vhigh,2,2,med,low,unacc
4,vhigh,vhigh,2,2,med,med,unacc


In [26]:
# Converter dados categóricos em numéricos
categorical_columns = ['buying', 'maint', 'lug_boot', 'safety']

for ind, column in enumerate(categorical_columns):
    column_map = [(item, ind) for ind, item in enumerate(car_data[column].unique())]
    car_data[column] = car_data[column].apply(dict(column_map).get)
car_data.head()

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,class
0,0,0,2,2,0,0,unacc
1,0,0,2,2,0,1,unacc
2,0,0,2,2,0,2,unacc
3,0,0,2,2,1,0,unacc
4,0,0,2,2,1,1,unacc


In [27]:
# Limpar as colunas doors e persons
car_data['doors'] = car_data['doors'].replace({'5more': '5'}, regex=True)
car_data['persons'] = car_data['persons'].replace({'more': '5'}, regex=True)

In [28]:
# Split (dividir) o datasets entre train (treino) e test (teste)
arr = car_data.values
X = arr[:,0:6]
y = arr[:,6]

train_data, test_data, train_labels, test_labels = model_selection.train_test_split(X, y, test_size=0.2)

In [29]:
# Fazer uma predição com os dados de teste
model = neighbors.KNeighborsClassifier(n_neighbors=3)
model.fit(train_data, train_labels)
prediction = model.predict(test_data)

accuracy_score(prediction, test_labels) * 100

92.48554913294798

In [70]:
# Exibir uma amostra da predição
safety_map = dict([(ind, item) for ind, item in enumerate(car_data['class'].unique())])
print("*-----------*-----------*")
print("*         Safety        *")
print("*-----------*-----------*")
print("*   Actual | Predicted  *")
print("*-----------*-----------*")
for i in range(10):
    print("{}\t\t{}".format(test_labels[i], prediction[i]))
    print("*-----------*-----------*")

*-----------*-----------*
*         Safety        *
*-----------*-----------*
*   Actual | Predicted  *
*-----------*-----------*
unacc		unacc
*-----------*-----------*
acc		acc
*-----------*-----------*
acc		unacc
*-----------*-----------*
unacc		unacc
*-----------*-----------*
unacc		unacc
*-----------*-----------*
unacc		unacc
*-----------*-----------*
unacc		unacc
*-----------*-----------*
unacc		unacc
*-----------*-----------*
acc		acc
*-----------*-----------*
unacc		unacc
*-----------*-----------*
