### Dataset sulla qualità dei vini

This dataset is related to the red variants of the Portuguese wine "Vinho Verde". The dataset describes the quantity of various chemical substances present in the wine and allows to identify their effect on its quality. The dataset is available on Kaggle at the URL:

https://www.kaggle.com/datasets/yasserh/wine-quality-dataset

In [None]:
%pip install pandas matplotlib seaborn tensorflow scikit-learn --no-cache-dir

In [None]:
# library import

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf


plt.rcParams["figure.dpi"] = 150

In [None]:
# load the dataset

df = pd.read_csv('data/WineQT.csv')

In [None]:
df.head(10)

In [None]:
# some information about the dataset



In [None]:
# distributions of each feature
# -> histograms

for col in df.columns:
    f, (ax1) = plt.subplots(1, 1, figsize=(6, 3) )
    v_dist_1 = df[col].values
    sns.histplot(v_dist_1, ax=ax1, color='orange', kde=True)

    media = df[col].mean()
    mediana = df[col].median()
    moda = df[col].mode().values[0]

    ax1.axvline(media, color='r', linestyle='--', label="Mean")
    ax1.axvline(mediana, color='g', linestyle='-', label="Median")
    ax1.axvline(moda, color='b', linestyle='-', label="Mode")
    ax1.legend()
    plt.grid()
    plt.title(col)

In [None]:
# analisi della distribuzione degli esempi disponibili
# relativamente alla qualità di ognuno

plt.figure(figsize=(6, 6))
df.quality.value_counts().plot(kind='pie')
plt.legend()
plt.show()

In [None]:
# analisi delle eventuali correlazioni
# tramite una mappa di correlazione

plt.figure(figsize=(6, 6), dpi=150)
corr = df.corr()
sns.heatmap(
    corr, 
    xticklabels=corr.columns, 
    yticklabels=corr.columns, 
    cmap='viridis',
    annot=True,
    fmt=".1f"
)
plt.show()

## Let's create the dataset

Hints: 
- access just to the raw values
- drop the columns you don't need (e.g. the target)
- reserve a quota for the **test** dataset

## Do we need to scale the data?

MinMax scaling is a way to adjust your numbers so that they fit into a specific range, usually between 0 and 1. Imagine you have a bunch of different-sized sticks, and you want to compare them more easily. By using MinMax scaling, you shrink or stretch each stick so that the smallest one becomes exactly 0 units long, and the longest one becomes 1 unit long. All the other sticks get a size in between, depending on how long they were to begin with. This method makes it simpler to compare all the sticks because they now have a common scale to measure against.


<img src="assets/minmax.png" width=500>

## Our First neural network!

A neural network for a classification problem is like a smart helper that learns to sort things into different buckets. Imagine you have a big pile of fruits and you want to separate them into baskets labeled apples, bananas, and oranges. A neural network looks at each fruit, learns what each fruit type looks like by examining their features like color and shape, and then decides which basket to put them in. The more fruit it sees, the better it gets at sorting them correctly. So, a neural network helps us automatically sort or classify things into different groups based on what it has learned from examples.


<img src="assets/neural_network.png" width=600 />

In [None]:
# create a simple model

In [None]:
# train the model

In [None]:
# visualizzazione dell'addestramento

plt.figure(figsize=(12, 4))
plt.title('Mean squared error')
plt.plot(log.history['loss'], label='train')
plt.plot(log.history['val_loss'], label='test')
plt.xlabel('epoche')
plt.ylabel('errore')
plt.legend()
plt.show()

In [None]:
plt.figure(figsize=(12, 4))
plt.title('Mean absolute error')
plt.plot(log.history['mean_absolute_error'], label='train')
plt.plot(log.history['val_mean_absolute_error'], label='test')
plt.xlabel('epoche')
plt.ylabel('errore')
plt.legend()
plt.show()

In [None]:
# test the model

In [None]:
# save the model