# **Atividade do módulo 2: Classificação de vinhos**

Neste notebook, revisitaremos nosso problema de classificação binária, mas, desta vez, classificaremos um conjunto de dados do mundo real.

O conjunto de dados que escolhemos para esta atividade é um conjunto de dados de qualidade de vinho (https://archive.ics.uci.edu/ml/datasets/wine+quality). Esses conjuntos de dados incluem informações sobre mais de 6 mil garrafas de vinho tinto e branco. Sua atividade é desenvolver um classificador de neurônio único que possa distinguir entre vinho branco e vinho tinto com uma precisão razoável. Fornecemos abaixo um código para ajudar no carregamento dos arquivos e preparação do conjunto de dados (esta é uma boa oportunidade para você aprender mais Python por exemplo). Além disso, incluímos as chamadas finais de função que gostaríamos que você executasse para treinar e avaliar seu classificador. Sinta-se à vontade para reutilizar código que você já escreveu ou viu em notebooks anteriores.

In [1]:
# Baixar os arquivos .csv de vinho do arquivo de dados.
!rm -f winequality-red.csv winequality-white.csv
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv

--2023-12-12 19:26:08--  https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified
Saving to: ‘winequality-red.csv’

winequality-red.csv     [  <=>               ]  82.23K   212KB/s    in 0.4s    

2023-12-12 19:26:09 (212 KB/s) - ‘winequality-red.csv’ saved [84199]

--2023-12-12 19:26:09--  https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified
Saving to: ‘winequality-white.csv’

winequality-white.c     [   <=>              ] 258.23K   339KB/s    in 0.8s    

2023-12-12

In [2]:
# Estes são os pacotes necessários para esta atividade
import pandas as pd
import numpy as np

# Usar o Pandas para ler o arquivo csv em um dataframe.
# Observar que o delimitador neste csv é o ponto e vírgula ";" em vez de uma vírgula ","
df_red = pd.read_csv('winequality-red.csv',delimiter=";")

# Como estamos realizando uma atividade de classificação, atribuiremos a todos os vinhos tintos uma etiqueta de 1
df_red["color"] = 1

# O método .head() é muito útil para visualizar uma prévia dos nossos dados.
df_red.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,color
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,1
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5,1
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5,1
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6,1
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,1


In [3]:
df_white = pd.read_csv('winequality-white.csv',delimiter=";")
df_white["color"] = 0  # Atribuir vinho branco ao rótulo 0
df_white.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,color
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6,0
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6,0
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6,0
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6,0
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6,0


In [4]:
# Agora, combinamos nossos dois dataframes
df = pd.concat([df_red, df_white])

# Embaralhamos os dados para misturar os dados de vinho tinto e vinho branco juntos
df = df.sample(frac=1).reset_index(drop=True)
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,color
0,8.3,0.3,0.36,10.0,0.042,33.0,169.0,0.9982,3.23,0.51,9.3,6,0
1,6.8,0.41,0.31,8.8,0.084,26.0,45.0,0.99824,3.38,0.64,10.1,6,1
2,6.2,0.47,0.21,1.0,0.044,13.0,98.0,0.99345,3.14,0.46,9.2,5,0
3,6.2,0.64,0.09,2.5,0.081,15.0,26.0,0.99538,3.57,0.63,12.0,5,1
4,6.9,0.25,0.24,1.8,0.053,6.0,121.0,0.993,3.23,0.7,11.4,5,0


In [5]:
# Escolhemos três atributos do vinho para realizar nossa previsão
input_columns = ["citric acid", "residual sugar", "total sulfur dioxide"]
output_columns = ["color"]

# Extraímos os recursos relevantes em nossas matrizes (arrays) numpy X e Y
X = df[input_columns].to_numpy()
Y = df[output_columns].to_numpy()
print("Shape of X:", X.shape)
print("Shape of Y:", Y.shape)
in_features = X.shape[1]

Shape of X: (6497, 3)
Shape of Y: (6497, 1)


In [6]:
X

array([[  0.36,  10.  , 169.  ],
       [  0.31,   8.8 ,  45.  ],
       [  0.21,   1.  ,  98.  ],
       ...,
       [  0.82,   1.2 , 120.  ],
       [  0.38,   8.1 , 176.  ],
       [  0.26,   9.2 , 199.  ]])

In [None]:
# Inserir código aqui.

# classification_model = ...

# Treinar o modelo...
learning_rate = 0.001
epochs = 200
#...

# Usaremos essa função para avaliar o desempenho do nosso classificador treinado
# Dica: o modelo que você define acima deve ter uma função .forward para ser compatível
# Dica: esta função de avaliação é idêntica às dos notebooks anteriores
def evaluate_classification_accuracy(model, input_data, labels):
    # Contar o número de amostras classificadas corretamente dado um conjunto de pesos
    correct = 0
    num_samples = len(input_data)
    for i in range(num_samples):
        x = input_data[i,...]
        y = labels[i]
        y_predicted = model.forward(x)
        label_predicted = 1 if y_predicted > 0.5 else 0
        if label_predicted == y:
            correct += 1
    accuracy = correct / num_samples
    print("Our model predicted", correct, "out of", num_samples,
          "correctly for", accuracy*100, "% accuracy")
    return accuracy