## Wine dataset

fonte: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_wine.html

#### Data Set Characteristics:

|                      |             |
| -------------------- | ----------- |
| Classes  | 3        |
| Samples per class | [59,71,48] |
| Samples total| 178 |
| Dimensionality| 13 |
| Features| real, positive |

This is a copy of UCI ML Wine recognition datasets. https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data

The data is the results of a chemical analysis of wines grown in the same region in Italy by three different cultivators. There are thirteen different measurements taken for different constituents found in the three types of wine.

Original Owners:

Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy.

Citation:

Lichman, M. (2013). UCI Machine Learning Repository [https://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.



##### Attribute Information
- Alcohol
- Malic acid
- Ash
- Alcalinity of ash
- Magnesium
- Total phenols
- Flavanoids
- Nonflavanoid phenols
- Proanthocyanins
- Color intensity
- Hue
- OD280/OD315 of diluted wines
- Proline

In [1]:
# Importando as bibliotecas

import warnings
warnings.filterwarnings("ignore")

# Carregando o dataset
from sklearn.datasets import load_wine
wine = load_wine()

In [2]:
# Exemplo de acesso aos dados
X = wine.data[:, :] # Features de cada elemento
y = wine.target # Classes de cada elemento

In [None]:
# preciso treinar o classificador, e testar o seu desempenho com dados 'novos'.
# Aqui, dividimos dos dados em treino e teste, para podermos testar nosso desempenho depois.

In [3]:
# Separa dados para treinar
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
# O uso desssa função facilita, mas é ogrigatório. Você pode dividir os seu dados manualmente.

In [4]:
# Carregando e treinando os classificadores

# Random Forests

from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)
y_pred = rfc.predict(X_test)

In [5]:
# Métricas do Random Forests

from sklearn.metrics import  accuracy_score, recall_score, precision_score

rfc_acc = round(accuracy_score(y_test, y_pred), 6) # round é para arredondar
rfc_recall = round(recall_score(y_test, y_pred, average='weighted'), 6)
rfc_precision = round(precision_score(y_test, y_pred, average='weighted'), 6)

In [6]:
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)

In [7]:
# Métricas do KNN

knn_acc = round(accuracy_score(y_test, y_pred), 6)
knn_recall = round(recall_score(y_test, y_pred, average='weighted'), 6)
knn_precision = round(precision_score(y_test, y_pred, average='weighted'), 6)

In [8]:
# Comparação
print('KNN vs Random Forest\n')
print('Classes: {0}\n'.format(wine.target_names))
print('Acurácia: {0} vs {1}'.format(knn_acc, rfc_acc))
print('Recall: {0} vs {1}'.format(knn_recall, rfc_recall))
print('Precisão: {0} vs {1}'.format(knn_precision, rfc_precision))

KNN vs Random Forest

Classes: ['class_0' 'class_1' 'class_2']

Acurácia: 0.694915 vs 1.0
Recall: 0.694915 vs 1.0
Precisão: 0.698231 vs 1.0


In [9]:
# Na validação cruzada
from sklearn.model_selection import cross_val_score
cv_rfc = cross_val_score(rfc, X, y)
cv_knn = cross_val_score(knn, X, y)
print('\nVAlidação Cruzada: {0} vs {1}'.format(cv_knn, cv_rfc))


VAlidação Cruzada: [0.63888889 0.69444444 0.66666667 0.65714286 0.85714286] vs [0.97222222 0.94444444 0.94444444 1.         1.        ]


In [15]:
sum_cv_rfc = 0
for cv_score in cv_rfc:
  sum_cv_rfc += cv_score
print('\nResultado Random Forest: {0}'.format(sum_cv_rfc/5))


Resultado Random Forest: 0.9722222222222221


In [16]:
sum_cv_knn = 0
for cv_score in cv_knn:
  sum_cv_knn += cv_score
print('\nResultado KNN: {0}'.format(sum_cv_knn/5))


Resultado KNN: 0.7028571428571428
