For the homework, we re-use a great dataset to practice your skills in Keras. We are going to compare classification and regression approaches. As you remember from the previous homework, our wine dataset contains 10 rating levels. Hence, we can also solve this problem as classification task (instead of a regression task). You can find the relevant files under the `wine` subdirectory in the session's folder, but the code below also reloads them for you. This dataset has been used in the following publication:

> P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. 'Modeling wine preferences by data mining from physicochemical properties.' *Decision Support Systems* 47(4):547-553.

You can find two datasets (encoded as csv-files): one for red wines, and one for white wines. You can load them using `pandas`:

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')
%cd gdrive/MyDrive/ML2022/session-2

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).
[Errno 2] No such file or directory: 'gdrive/MyDrive/ML2022/session-2'
/content/gdrive/MyDrive/ML2022/session-2


In [None]:
import utils
import numpy as np
import torch
from torch import nn


# to get reproducible results:
torch.manual_seed(1234)
np.random.seed(1234)



## Preprocessing (recap)
The longer version of this part can be found in the previous homework. Load the dataset for the white wines (it's also included in the session's folder):

In [None]:
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv

--2022-12-05 14:31:36--  https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 264426 (258K) [application/x-httpd-php]
Saving to: ‘winequality-white.csv.1’


2022-12-05 14:31:37 (1005 KB/s) - ‘winequality-white.csv.1’ saved [264426/264426]



In [None]:
import pandas as pd
white = pd.read_csv('winequality-white.csv', sep=';')
white.sample(10)

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
1431,6.1,0.22,0.49,1.5,0.051,18.0,87.0,0.9928,3.3,0.46,9.6,5
445,7.1,0.32,0.32,11.0,0.038,16.0,66.0,0.9937,3.24,0.4,11.5,3
2816,7.2,0.17,0.41,1.6,0.052,24.0,126.0,0.99228,3.19,0.49,10.8,5
4049,6.8,0.16,0.36,1.3,0.034,32.0,98.0,0.99058,3.02,0.58,11.3,6
4779,6.0,0.59,0.0,0.8,0.037,30.0,95.0,0.99032,3.1,0.4,10.9,4
142,7.9,0.21,0.4,1.2,0.039,38.0,107.0,0.992,3.21,0.54,10.8,6
2703,6.5,0.23,0.36,16.3,0.038,43.0,133.0,0.99924,3.26,0.41,8.8,5
3252,7.1,0.26,0.37,5.5,0.025,31.0,105.0,0.99082,3.06,0.33,12.6,8
4282,5.7,0.26,0.24,17.8,0.059,23.0,124.0,0.99773,3.3,0.5,10.1,5
46,6.2,0.45,0.26,4.4,0.063,63.0,206.0,0.994,3.27,0.52,9.8,4


This dataset records, for a large number of (Portugese) wines, a number of objective "physiochemical" properties, such as various kinds of acidity or the level of chlorides. In the very last column ("quality"), it contains a subjective rating for that wine, as an integer  score, ranging from 1 ("very bad") to 10 ("excellent"). This score is a median rating given by at least 3 evaluations made by wine experts.

We'll extract the "quality" column as our $y$ (target value) and features as $X$:

In [None]:
y_label = white['quality'].values
white = white.drop(['quality'], axis=1)
X = white.values

Note that not all ratings are in fact present in the data -- apparently there were no really bad wines! We ignore this below, for teaching purposes, but normally you'd have to remove the empty classes of course.

In [None]:
np.unique(y_label)

array([3, 4, 5, 6, 7, 8, 9])

**Task 1.** Divide the available data into a train set (60 %), dev set (20 %) and test set (20 %) using the train_test_split function from Scikit-learn (2 times). It's important to **stratify** these splits in terms of ratings, in order to make sure that we have a similar distribution of ratings in train and test.

Note that the stratify argument is crucial. Verify the shapes:

**Task 2** There's one final bit we need to take care of and that is the normalization of our data: if you inspect the feature values in the dataframe above, you'll notice that the features cover very different ranges. To account for that it's best to normalize our data. Again: use `sklearn` to do it!

In [None]:
from sklearn.preprocessing import StandardScaler

x_train =
x_dev =
x_test =





y_train_sc =
y_dev_sc =


(2938, 11)
(980, 11)


# Regression

In [None]:
def np2set_reg(x, y, shuffle):
  x = torch.tensor(x, dtype=torch.float)
  y =  torch.tensor(y, dtype=torch.float)

  dataset = torch.utils.data.TensorDataset(x, y)
  iterator = torch.utils.data.DataLoader(dataset, batch_size=64, shuffle=shuffle)
  return iterator



r_train_iter =  np2set_reg(x_train, y_train_sc, shuffle=True)
r_dev_iter =  np2set_reg(x_dev, y_dev_sc, shuffle=False)
r_test_iter = np2set_reg(x_test, y_test, shuffle=False)

**Task 3.** Base yourself on the code from the notebook for the previous homework and train a linear regression that aims to predict a wine appreciation using all columns. Make sure that your results are **reproducible** by correctly 'seeding'. Use the SGD-optimizer with a learning rate of "0.01" and train for 200 epochs with early stopping (with a "patience" of 5). Thus, early stopping should halt the training process if the validation loss does not improve for 5 (consecutive) epochs. Use MAE, MSE and accuracy to evaluate the results on the test set.

In [None]:
from sklearn.metrics import mean_squared_error as mse
from sklearn.metrics import mean_absolute_error as mae
from sklearn.metrics import accuracy_score


def accuracy_from_floats(y_true, y_pred):
  y_pred = np.around(y_pred).astype(int)
  return accuracy_score(y_true, y_pred)



**Task 4.** Add an additional, "hidden" dense layer with the ReLU activation and 30 input/output neurons. Is the result better?

**Task 5.** Add an additional (*second*) hidden layer with the ReLU activation and input/output 30 neurons. Is the result better?

# Classification

**Task 6.** Let's approach this dataset with a logistic regression now. This time, you will have to predict the rating levels as **classes** (10), instead of treating the output as a scalar. Make sure that your results are **reproducible** by correctly 'seeding'. Use the SGD-optimizer with a learning rate of "0.01" and train for 200 epochs with early stopping with patience of 5 (and restore the best weights). Early stopping should interrupt the training if the loss on the validation data  does not improve for 5 epochs. Use accuracy to evaluate the intermediary results, during training.

In [None]:
def np2iter_class(x, y, shuffle=True):
  x = torch.tensor(x, dtype=torch.float)
  y = torch.tensor(y, dtype=torch.long)

  ds = torch.utils.data.TensorDataset(x, y)
  return torch.utils.data.DataLoader(ds, batch_size=64, shuffle=shuffle)

c_train_iter = np2iter_class(x_train, y_train, shuffle=True)
c_dev_iter =  np2iter_class(x_dev, y_dev, shuffle=False)
c_test_iter =  np2iter_class(x_test, y_test, shuffle=False)


**Task 7.** Add an additional dense layer with the Relu activation and inputoutput 30 neurons. Is the result better?

In [None]:

# class_predictions = utils.test(classification_model, c_test_iter)
# class_predictions = np.argmax(class_predictions, axis=1)

# print("Accuracy: ",  accuracy_score(y_test, class_predictions),
#       ", MSE: ",mse(y_test, reg_predictions),
#       ", MAE", mae(y_test, reg_predictions))

**Task 8.** Add an additional dense layer (a second hidden layer) with the ReLU activation and input/output 30 neurons. Is the result better?

In [None]:

# class_predictions = utils.test(classification_model, c_test_iter)
# class_predictions = np.argmax(class_predictions, axis=1)

# print("Accuracy: ",  accuracy_score(y_test, class_predictions),
#       ", MSE: ",mse(y_test, reg_predictions),
#       ", MAE", mae(y_test, reg_predictions))

**Task 9** Write a short report about your findings and answer the following questions:


1. Is the wine problem a classification task (or a regression task)?
2. What is the most suitable metric for our task (MSE, MAE, accuracy)?
3. Which model is the best performer overall?
4. Is 'deeper' always better?
5. Add other your findings you want to share.

