Can we detect fake wines from their chemical analysis? The data---available from the [UCI](https://archive.ics.uci.edu/) data repository---contain the chemical analysis of three cultivars of Italian wine. We want to develop a model that can automatically class a wine sample into one of the three varieties from the values of its anlaysis.

For this, the datatset has $13$ attributes---chemical compounds---measured on $178$ samples. We will attempt to fit an MLP model for this classification problem.

We begin by loading the data and examining the first few lines.

In [2]:
import pandas as pd
wine = pd.read_csv('wine_data.csv', names = ["Cultivar", "Alchol", "Malic_Acid", "Ash", "Alcalinity_of_Ash", "Magnesium", "Total_phenols", "Falvanoids", "Nonflavanoid_phenols", "Proanthocyanins", "Color_intensity", "Hue", "OD280", "Proline"])
wine.head()

Unnamed: 0,Cultivar,Alchol,Malic_Acid,Ash,Alcalinity_of_Ash,Magnesium,Total_phenols,Falvanoids,Nonflavanoid_phenols,Proanthocyanins,Color_intensity,Hue,OD280,Proline
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


As an initial step of the exploratory datta analysis (EDA), we compute the elementary statistics.

In [4]:
wine.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Cultivar,178.0,1.938202,0.775035,1.0,1.0,2.0,3.0,3.0
Alchol,178.0,13.000618,0.811827,11.03,12.3625,13.05,13.6775,14.83
Malic_Acid,178.0,2.336348,1.117146,0.74,1.6025,1.865,3.0825,5.8
Ash,178.0,2.366517,0.274344,1.36,2.21,2.36,2.5575,3.23
Alcalinity_of_Ash,178.0,19.494944,3.339564,10.6,17.2,19.5,21.5,30.0
Magnesium,178.0,99.741573,14.282484,70.0,88.0,98.0,107.0,162.0
Total_phenols,178.0,2.295112,0.625851,0.98,1.7425,2.355,2.8,3.88
Falvanoids,178.0,2.02927,0.998859,0.34,1.205,2.135,2.875,5.08
Nonflavanoid_phenols,178.0,0.361854,0.124453,0.13,0.27,0.34,0.4375,0.66
Proanthocyanins,178.0,1.590899,0.572359,0.41,1.25,1.555,1.95,3.58


In [6]:
wine.shape

(178, 14)

### Data Preparation

We should perform a complete EDA, but since we have already decided to fit an MLP, we will skip this stage. The data preparation entails the following steps:

- First, place the data into a data matrix of explanatory variables plus the repsonse variable.
- Then, divide the data into a training set and a test set.
- Finally, normalize the data since they have varying magnitudes. For this, we use the class `StandardScaler` on the training data, which can then be applied to the test data in a `pipeline`. An alternative would be to use the function `scale`  directly.

In [12]:
X = wine.drop('Cultivar',axis=1)
y = wine['Cultivar']
X.head()

Unnamed: 0,Alchol,Malic_Acid,Ash,Alcalinity_of_Ash,Magnesium,Total_phenols,Falvanoids,Nonflavanoid_phenols,Proanthocyanins,Color_intensity,Hue,OD280,Proline
0,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


In [13]:
y.head()

0    1
1    1
2    1
3    1
4    1
Name: Cultivar, dtype: int64

In [18]:
# Split into a trainig set and a test set (by defaltt 0.75 / 0.25)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)
# Perform the normalization on the training data
from sklearn.preprocessing import StandardScaler
normaliser = StandardScaler()
normaliser.fit(X_train)
X_train = normaliser.transform(X_train)
X_test  = normaliser.transform(X_test)
X_train[:4,1:6]

array([[-0.22625261,  1.30755432,  0.47952357,  2.35062312, -1.13042535],
       [-0.40703722,  1.23204207,  0.03353312,  1.39099579,  0.84521948],
       [ 2.02451573,  0.325895  ,  0.47952357, -0.93952772, -0.96578828],
       [-0.5516649 ,  0.21262661, -1.39363634,  0.84263732,  1.61901371]])

In [19]:
X_test[:4,1:6]

array([[-0.97650873, -1.71293591, -0.26379385, -0.52825887,  0.12081638],
       [ 3.10018413, -0.99556948,  0.47952357, -0.93952772,  0.54887276],
       [-0.70533182,  0.325895  , -1.00711128,  0.56845808,  1.66840483],
       [ 0.72286657,  1.23204207,  1.07417751, -0.18553482, -1.21274389]])

### Train the MLP model

We use the Multi-Layer Perceptron classifier, `MLPClassifier`, from the library `neural_network`

In [20]:
from sklearn.neural_network import MLPClassifier

We can now create an instance of the model. 

Among the numerous possible parameters, we only define 

- the number of hidden layers,
- the number od neurons in each hidden layer.

For this, we send a list whose $n$-th element is equal to the number of neurons in hidden layer $n.$ Here we choose $3$ layers, with $13$ neurons each, and we limit the number of iterations to $500.$

In [21]:
mlp = MLPClassifier(hidden_layer_sizes=(13,13,13),max_iter=500)

Having defined the model, we can now fit the training data, already prepared and normalized above.

In [22]:
mlp.fit(X_train,y_train)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(13, 13, 13), learning_rate='constant',
       learning_rate_init=0.001, max_iter=500, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

The output shows all the default values, as well as the architecture that we defined. All of these could be modified and tuned.


## Predictions and Model Evaluation

With the fitted model, we can now use the method `predict()` to make the actual predictions on the test data and print out the confusion matrix.

In [23]:
previsions = mlp.predict(X_test)

In [25]:
# confusion table
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_test,previsions))

[[15  0  0]
 [ 0 18  0]
 [ 0  2 10]]


We observe $2$ bad lassifications out of $45.$

In [26]:
print(classification_report(y_test,previsions))

             precision    recall  f1-score   support

          1       1.00      1.00      1.00        15
          2       0.90      1.00      0.95        18
          3       1.00      0.83      0.91        12

avg / total       0.96      0.96      0.95        45



We have an excellent classification rate of $96\%.$

### Conclusions

1. An MLP model with $3$ hidden layers, havinf $13$ neurons each, provides a classifier with an accuracy rate of $96\%.$
2. For a more reliable estimate, we should perform cross-validation.
3. Other supervised learning methods could be used here:
  - k-nn
  - SVM
  - Bagging, etc.