## Comparativo entre Técnicas de Classificação

### Pipeline de Classificação

Importando as packages e funções:

In [0]:
from sklearn.metrics import f1_score, recall_score, accuracy_score, precision_score
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

Importando os dados. Esse dataset contém dados relacionados a atributos de  vidros. Portanto, o objetivo é classificar corretament os tipos de vidros (Vidro de carro, Prédios, etc.) a partir de atributos relacionados a índice de refração, percentagem de diversos atributos químicos presentes como: potássio, cálcio, etc.

Mais informações a respeito do dataset: [UCL](https://archive.ics.uci.edu/ml/datasets/Glass+Identification)

In [0]:
df = pd.read_csv('https://raw.githubusercontent.com/intelligentagents/aprendizagem-supervisionada/master/data/glass.csv')

Visualizando e descrevendo  o dataset

In [4]:
# Exporando o dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 214 entries, 0 to 213
Data columns (total 11 columns):
id       214 non-null int64
 ri      214 non-null float64
 na      214 non-null float64
 mg      214 non-null float64
 al      214 non-null float64
 si      214 non-null float64
 k       214 non-null float64
 ca      214 non-null float64
 ba      214 non-null float64
 fe      214 non-null float64
 type    214 non-null int64
dtypes: float64(9), int64(2)
memory usage: 18.5 KB


In [5]:
df.head(5)

Unnamed: 0,id,ri,na,mg,al,si,k,ca,ba,fe,type
0,1,1.52101,13.64,4.49,1.1,71.78,0.06,8.75,0.0,0.0,1
1,2,1.51761,13.89,3.6,1.36,72.73,0.48,7.83,0.0,0.0,1
2,3,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.0,0.0,1
3,4,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.0,0.0,1
4,5,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.0,0.0,1


Descrevendo o dataset:

In [6]:
df.describe()

Unnamed: 0,id,ri,na,mg,al,si,k,ca,ba,fe,type
count,214.0,214.0,214.0,214.0,214.0,214.0,214.0,214.0,214.0,214.0,214.0
mean,107.5,1.518365,13.40785,2.684533,1.444907,72.650935,0.497056,8.956963,0.175047,0.057009,2.780374
std,61.920648,0.003037,0.816604,1.442408,0.49927,0.774546,0.652192,1.423153,0.497219,0.097439,2.103739
min,1.0,1.51115,10.73,0.0,0.29,69.81,0.0,5.43,0.0,0.0,1.0
25%,54.25,1.516523,12.9075,2.115,1.19,72.28,0.1225,8.24,0.0,0.0,1.0
50%,107.5,1.51768,13.3,3.48,1.36,72.79,0.555,8.6,0.0,0.0,2.0
75%,160.75,1.519157,13.825,3.6,1.63,73.0875,0.61,9.1725,0.0,0.1,3.0
max,214.0,1.53393,17.38,4.49,3.5,75.41,6.21,16.19,3.15,0.51,7.0


Deletando a coluna de id:

In [0]:
df = df.drop('id', axis=1)

Definindo as variáveis indepedentes e dependentes

In [0]:
X = df.iloc[:, :10].values
y = df.iloc[:, -1].values

Dividindo o dataset em conjunto de treinamento e testes

In [0]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)


Criando o dicionário contendo todos os classificadores:

In [0]:
estimators = {'Decision Tree': DecisionTreeClassifier(criterion = 'entropy', random_state = 0),
              'KNN': KNeighborsClassifier(n_neighbors = 5, metric = 'euclidean'),
              'SVC': SVC(kernel = 'rbf', random_state = 0)}

Criando dataframe que irá guardar os resultados finais dos classificadores:

In [0]:
df_results = pd.DataFrame(columns=['classificador', 'accuracy', 'precision', 'recall', 'f-measure'], index=None)

Percorrendo o dicionário e treinando e avaliando os modelos:

In [16]:
for name, estim in estimators.items():
    
    # print("Treinando Estimador {0}: ".format(name))
    
    # Treinando os classificadores com Conjunto de Treinamento
    estim.fit(X_train, y_train)
    
    # Prevendo os resultados do modelo criado com o conjunto de testes
    y_pred = estim.predict(X_test)
    
    
    # Armazenando as métricas de cada classificador em um dataframe
    df_results.loc[len(df_results), :] = [name, accuracy_score(y_test, y_pred), precision_score (y_test, y_pred, average = 'macro'),
                   recall_score(y_test, y_pred,  average = 'macro'), f1_score(y_test, y_pred,  average = 'macro')]



Exibindo os resultados finais:

In [0]:
df_results

Unnamed: 0,clf,acc,prec,rec,f1
0,Decision Tree,1.0,1.0,1.0,1.0
1,KNN,0.953488,0.923611,0.923611,0.923611
2,SVC,0.976744,0.988889,0.944444,0.96092
