## Estudo de Caso: Classificação

### Pipeline de Aprendizagem de Máquina

**Descrição do estudo de caso.**

Importando as libs:

In [0]:
from sklearn.metrics import f1_score, recall_score, accuracy_score, precision_score
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC

Importando os dados. O dataset consistem em conter caracteristicas de animais contendo atributos baseados em valores booleanos (0 ou 1). Portando, o objetivo é classificar qual tipo de animal (peixe, ave, etc.) de acordo com as caracteristicas deles (cabelo, penas, cauda, etc.).

Mais Informações na Fonte: [UCL](https://archive.ics.uci.edu/ml/datasets/Zoo)

In [2]:
df = pd.read_csv('https://raw.githubusercontent.com/intelligentagents/aprendizagem-supervisionada/master/data/zoo.csv')
# Descrevendo o dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101 entries, 0 to 100
Data columns (total 18 columns):
name        101 non-null object
hair        101 non-null int64
feathers    101 non-null int64
eggs        101 non-null int64
milk        101 non-null int64
airborne    101 non-null int64
aquatic     101 non-null int64
predator    101 non-null int64
toothed     101 non-null int64
backbone    101 non-null int64
breathes    101 non-null int64
venomous    101 non-null int64
fins        101 non-null int64
legs        101 non-null int64
tail        101 non-null int64
domestic    101 non-null int64
catsize     101 non-null int64
type        101 non-null int64
dtypes: int64(17), object(1)
memory usage: 14.3+ KB


Deletando as features que não tem importância no modelo: Nome, Código do Ticket e Código da Cabine:*italicized text*

In [0]:
df = df.drop(['name'], axis = 1)  

Visualizando o dataset:

In [4]:
df.head(5)

Unnamed: 0,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,breathes,venomous,fins,legs,tail,domestic,catsize,type
0,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1
1,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1
2,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4
3,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1
4,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1


Definindo as variáveis dependentes/independentes.

In [0]:
X = df.iloc[:, :16].values
y = df.iloc[:, 16].values

Criando os subconjuntos de treinamento e testes

In [0]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)


Criando o dicionário contendo todos os classificadores

In [0]:
estimators = {'Decision Tree': DecisionTreeClassifier(criterion = 'entropy', random_state = 0),
              'KNN': KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2),
              'Logistic Regression': LogisticRegression(random_state = 0),
              'Naive Bayes': GaussianNB(),
              'Random Forest': RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0),
              'SVC': SVC(kernel = 'rbf', random_state = 0)}

Criando dataframe que irá guardar os resultados finais dos classificadores

In [0]:
df_results = pd.DataFrame(columns=['clf', 'acc', 'prec', 'rec', 'f1'], index=None)

Percorrendo os classificadores e gerando os resultados para cada classificador:

In [9]:
for name, estim in estimators.items():
    
    # print("Treinando Estimador {0}: ".format(name))
    
    # Treinando os classificadores com Conjunto de Treinamento
    estim.fit(X_train, y_train)
    
    # Prevendo os resultados do modelo criado com o conjunto de testes
    y_pred = estim.predict(X_test)
    
    # Armazenando as métricas de cada classificador em um dataframe
    df_results.loc[len(df_results), :] = [name, accuracy_score(y_test, y_pred), precision_score (y_test, y_pred, average = 'macro'),
                   recall_score(y_test, y_pred,  average = 'macro'), f1_score(y_test, y_pred,  average = 'macro')]



Exibindo os resultados finais:

In [10]:
df_results

Unnamed: 0,clf,acc,prec,rec,f1
0,Decision Tree,1,1,1,1
1,KNN,1,1,1,1
2,Logistic Regression,1,1,1,1
3,Naive Bayes,1,1,1,1
4,Random Forest,1,1,1,1
5,SVC,1,1,1,1
