# Clasificación binaria

Otro tipo de problema que se suele resolver con redes neuronales es el de clasificación binaria. Ahora no tenemos un valor que predecir, sino que en función de las entradas tenemos que clasificar los datos en varias clases

Importamos la base de datos de tipos de vinos

In [1]:
from sklearn import datasets

wine = datasets.load_wine()

Podemos ver qué trae esta base de datos

In [2]:
wine.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names'])

La llave `DESCR` es una descripción de la base de datos

In [3]:
print(wine['DESCR'])

.. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- Alcohol
 		- Malic acid
 		- Ash
		- Alcalinity of ash  
 		- Magnesium
		- Total phenols
 		- Flavanoids
 		- Nonflavanoid phenols
 		- Proanthocyanins
		- Color intensity
 		- Hue
 		- OD280/OD315 of diluted wines
 		- Proline

    - class:
            - class_0
            - class_1
            - class_2
		
    :Summary Statistics:
    
                                   Min   Max   Mean     SD
    Alcohol:                      11.0  14.8    13.0   0.8
    Malic Acid:                   0.74  5.80    2.34  1.12
    Ash:                          1.36  3.23    2.36  0.27
    Alcalinity of Ash:            10.6  30.0    19.5   3.3
    Magnesium:                    70.0 162.0    99.7  14.3
    Total Phenols:                0

Además tiene las llaves `data` y `target` donde se encuentran los datos anteriormente descritos. La llave `feature_names` contiene los numbres de cada una de las características

Así que creamos un dataframe con los datos

In [4]:
import pandas as pd

wine_df = pd.DataFrame(wine['data'], columns=wine['feature_names'])
wine_df['target'] = wine['target']
wine_df.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline,target
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0,0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0,0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0,0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0,0


Vemos las posibles clases que hay

In [5]:
print(wine.target_names)

['class_0' 'class_1' 'class_2']


Vemos cuantos elementos hay de cada clase

In [6]:
wine_df['target'].value_counts()

1    71
0    59
2    48
Name: target, dtype: int64

Por último vemos si hay algún dato faltante

In [7]:
wine_df.isnull().sum()

alcohol                         0
malic_acid                      0
ash                             0
alcalinity_of_ash               0
magnesium                       0
total_phenols                   0
flavanoids                      0
nonflavanoid_phenols            0
proanthocyanins                 0
color_intensity                 0
hue                             0
od280/od315_of_diluted_wines    0
proline                         0
target                          0
dtype: int64

## División de los datos en train y validacion

Para poder entrenar hemos visto que necesitamos dividir los datos en un conjunto de datos de entrenamiento y en un conjunto de datos de validación. Así que dividimos nuestros datos en estos dos conjuntos.

Primero vamos a ver cuantos datos tenemos

In [8]:
len(wine_df)

178

Como no tenemos muchos datos vamos a dividir el conjunto de datos en un 80% para entrenamiento entrenamiento y un 20% para validación

In [9]:
wine_train = wine_df.iloc[0:int(0.8*len(wine_df))]
wine_val = wine_df.iloc[int(0.8*len(wine_df)):]

len(wine_train), len(wine_val), len(wine_df), len(wine_train) + len(wine_val)

(142, 36, 178, 178)

Vemos cómo quedan los nuevos dataframes

In [10]:
wine_train.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline,target
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0,0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0,0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0,0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0,0


In [11]:
wine_val.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline,target
142,13.52,3.17,2.72,23.5,97.0,1.55,0.52,0.5,0.55,4.35,0.89,2.06,520.0,2
143,13.62,4.95,2.35,20.0,92.0,2.0,0.8,0.47,1.02,4.4,0.91,2.05,550.0,2
144,12.25,3.88,2.2,18.5,112.0,1.38,0.78,0.29,1.14,8.21,0.65,2.0,855.0,2
145,13.16,3.57,2.15,21.0,102.0,1.5,0.55,0.43,1.3,4.0,0.6,1.68,830.0,2
146,13.88,5.04,2.23,20.0,80.0,0.98,0.34,0.4,0.68,4.9,0.58,1.33,415.0,2


Vemos que al dividir el dataframe, en el dataframe de validación los índices no empiezan en 0. Esto nos puede dar problemas más adelante, así que reseteamos los índices

In [12]:
wine_val = wine_val.reset_index()   # reset index
wine_val = wine_val.drop(columns=['index']) # drop old index

wine_val.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline,target
0,13.52,3.17,2.72,23.5,97.0,1.55,0.52,0.5,0.55,4.35,0.89,2.06,520.0,2
1,13.62,4.95,2.35,20.0,92.0,2.0,0.8,0.47,1.02,4.4,0.91,2.05,550.0,2
2,12.25,3.88,2.2,18.5,112.0,1.38,0.78,0.29,1.14,8.21,0.65,2.0,855.0,2
3,13.16,3.57,2.15,21.0,102.0,1.5,0.55,0.43,1.3,4.0,0.6,1.68,830.0,2
4,13.88,5.04,2.23,20.0,80.0,0.98,0.34,0.4,0.68,4.9,0.58,1.33,415.0,2


## Dataset y Dataloader

Creamos el dataset

In [13]:
import torch

class WineDataset(torch.utils.data.Dataset):
    def __init__(self, dataframe):
        cols = [col for col in dataframe.columns if col != 'target']
        self.parameters = dataframe[cols].values.astype('float32')
        self.targets = torch.tensor(wine_train['target'].values.astype('int'))
        self.targets = torch.nn.functional.one_hot(self.targets, 3)
        self.targets = self.targets.float()

    def __len__(self):
        return len(self.parameters)

    def __getitem__(self, idx):
        parameters = self.parameters[idx]
        target = self.targets[idx]
        return parameters, target

In [14]:
train_ds = WineDataset(wine_train)
valid_ds = WineDataset(wine_val)
len(train_ds), len(valid_ds)

(142, 36)

vemos que tienen el mismo tamaño que los dataframes de entrenamiento y validación

Vamos a ver una muestra

In [15]:
wine_train.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline,target
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0,0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0,0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0,0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0,0


In [16]:
sample = train_ds[0]
print(f"len(sample): {len(sample)}")

parameters, target = sample
print(f"parameters: {parameters}\nparameters.dtype: {parameters.dtype}\nparameters.shape: {parameters.shape}\n\n")
print(f"target: {target}, target.dtype: {target.dtype}, target.shape: {target.shape}")

len(sample): 2
parameters: [1.423e+01 1.710e+00 2.430e+00 1.560e+01 1.270e+02 2.800e+00 3.060e+00
 2.800e-01 2.290e+00 5.640e+00 1.040e+00 3.920e+00 1.065e+03]
parameters.dtype: float32
parameters.shape: (13,)


target: tensor([1., 0., 0.]), target.dtype: torch.float32, target.shape: torch.Size([3])


Creamos ahora el dataloader

In [17]:
from torch.utils.data import DataLoader

BS_train = 32
BS_val = 1024

train_dl = DataLoader(train_ds, batch_size=BS_train, shuffle=True)
val_dl = DataLoader(valid_ds, batch_size=BS_val, shuffle=False)

Vemos un batch

In [18]:
batch = next(iter(train_dl))
parameters, target = batch[0], batch[1]
type(parameters), parameters.dtype, parameters.shape, type(target), target.shape

(torch.Tensor,
 torch.float32,
 torch.Size([32, 13]),
 torch.Tensor,
 torch.Size([32, 3]))

## Red Neuronal

Creamos una red neuronal para entrenarla

In [19]:
from torch import nn

class WineNeuralNetwork(nn.Module):
    def __init__(self, num_inputs, num_outputs, hidden_layers=[20, 8, 3]):
        super().__init__()
        self.network = torch.nn.Sequential(
            torch.nn.Linear(num_inputs, hidden_layers[0]),
            torch.nn.Sigmoid(),
            torch.nn.Linear(hidden_layers[0], hidden_layers[1]),
            torch.nn.Sigmoid(),
            torch.nn.Linear(hidden_layers[1], hidden_layers[2]),
            torch.nn.Sigmoid(),
            torch.nn.Linear(hidden_layers[2], num_outputs),
            torch.nn.Sigmoid()
        )

    def forward(self, x):
        probs = self.network(x)
        return probs

Vamos a ver qué tamaño necesitamos a la entrada y a la salida de la red

Un batch tiene unos parámetros con este tamaño

In [20]:
parameters.shape

torch.Size([32, 13])

Tenemos una matriz de tamaño 32x13. Como esto van a ser las entradas a la red neuronal, y esta matriz se va amultiplicar con la primera capa, al tener una multiplicación matricial, la primera capa debería ser de tamaño 13xX

Por otro lado, el mismo batch a la salida tiene un target con este tamaño

In [21]:
target.shape

torch.Size([32, 3])

32 es el tamaño del batch size, pero hay 3 clases, por lo que a la salida queremos que haya 3 neuronas

In [22]:
num_inputs = parameters.shape[1]
num_outputs = target.shape[1]
model = WineNeuralNetwork(num_inputs, num_outputs)

model

WineNeuralNetwork(
  (network): Sequential(
    (0): Linear(in_features=13, out_features=20, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=20, out_features=8, bias=True)
    (3): Sigmoid()
    (4): Linear(in_features=8, out_features=3, bias=True)
    (5): Sigmoid()
    (6): Linear(in_features=3, out_features=3, bias=True)
    (7): Sigmoid()
  )
)

Primero cogemos un batch del dataloader y se lo metemos a la red a ver si funciona y la hemos definido bien

In [23]:
probs = model(parameters)
probs.shape

torch.Size([32, 3])

Si se puede se manda la red a la GPU

In [24]:
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))

model.to(device)

Using cuda device


WineNeuralNetwork(
  (network): Sequential(
    (0): Linear(in_features=13, out_features=20, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=20, out_features=8, bias=True)
    (3): Sigmoid()
    (4): Linear(in_features=8, out_features=3, bias=True)
    (5): Sigmoid()
    (6): Linear(in_features=3, out_features=3, bias=True)
    (7): Sigmoid()
  )
)

Ahora volvemos a probar a meterle un batch

In [25]:
parameters_gpu = parameters.to(device)
probs = model(parameters_gpu)
probs.shape

torch.Size([32, 3])

## Función de pérdida y optimizador

Definimos una función de pérdida y un optimizador

In [26]:
LR = 1e-3

loss_fn = nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=LR)


## Ciclo de entrenamiento

Entrenamos la red

In [27]:
num_prints = 2

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        # X and y to device
        X, y = X.to(device), y.to(device)

        # Compute prediction and loss
        probs = model(X)
        loss = loss_fn(probs, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % int(len(train_dl)/num_prints) == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def val_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            # X and y to device
            X, y = X.to(device), y.to(device)
            
            probs = model(X)
            test_loss += loss_fn(probs, y).item()
            correct += (probs.argmax(1) == y.argmax(1)).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

Entrenamos

In [28]:
epochs = 14
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dl, model, loss_fn, optimizer)
    val_loop(val_dl, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 0.633537  [    0/  142]
loss: 0.635070  [   64/  142]
loss: 0.691325  [   56/  142]
Test Error: 
 Accuracy: 0.0%, Avg loss: 0.773762 

Epoch 2
-------------------------------
loss: 0.643228  [    0/  142]
loss: 0.694668  [   64/  142]
loss: 0.627726  [   56/  142]
Test Error: 
 Accuracy: 0.0%, Avg loss: 0.773607 

Epoch 3
-------------------------------
loss: 0.650080  [    0/  142]
loss: 0.640202  [   64/  142]
loss: 0.691199  [   56/  142]
Test Error: 
 Accuracy: 0.0%, Avg loss: 0.773416 

Epoch 4
-------------------------------
loss: 0.674855  [    0/  142]
loss: 0.633068  [   64/  142]
loss: 0.636927  [   56/  142]
Test Error: 
 Accuracy: 0.0%, Avg loss: 0.773253 

Epoch 5
-------------------------------
loss: 0.666601  [    0/  142]
loss: 0.646861  [   64/  142]
loss: 0.646120  [   56/  142]
Test Error: 
 Accuracy: 0.0%, Avg loss: 0.773083 

Epoch 6
-------------------------------
loss: 0.706424  [    0/  142]
loss: 0.618986  [   64/  

# Función de pérdida

Aunque más adelante veremos más en detalle las funciones de pérdida, aquí hemos usado `BCELoss`. Esto es porque en la red neuronal hemos puesto una `Sigmoid` en la última capa

```Python
    class WineNeuralNetwork(nn.Module):
        def __init__(self, num_inputs, num_outputs, hidden_layers=[20, 8, 3]):
            super().__init__()
            self.network = torch.nn.Sequential(
                torch.nn.Linear(num_inputs, hidden_layers[0]),
                torch.nn.Sigmoid(),
                torch.nn.Linear(hidden_layers[0], hidden_layers[1]),
                torch.nn.Sigmoid(),
                torch.nn.Linear(hidden_layers[1], hidden_layers[2]),
                torch.nn.Sigmoid(),
                torch.nn.Linear(hidden_layers[2], num_outputs),
                torch.nn.Sigmoid()
            )

        def forward(self, x):
            probs = self.network(x)
            return probs
```

Sin embargo lo que se suele hacer es no poner esa `Sigmoid` en la última capa y usar como función de pérdida `BCEWithLogitsLoss` que ya incluye en ella la capa `Sigmoid`

```Python
    class WineNeuralNetwork(nn.Module):
        def __init__(self, num_inputs, num_outputs, hidden_layers=[20, 8, 3]):
            super().__init__()
            self.network = torch.nn.Sequential(
                torch.nn.Linear(num_inputs, hidden_layers[0]),
                torch.nn.Sigmoid(),
                torch.nn.Linear(hidden_layers[0], hidden_layers[1]),
                torch.nn.Sigmoid(),
                torch.nn.Linear(hidden_layers[1], hidden_layers[2]),
                torch.nn.Sigmoid(),
                torch.nn.Linear(hidden_layers[2], num_outputs),
                # torch.nn.Sigmoid()
            )

        def forward(self, x):
            probs = self.network(x)
            return probs
```

```Python
    loss_fn = nn.BCEWithLogitsLoss()
```

Podemos ver en la [documentación](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html) que esto es así. Esto se hace porque a nivel computacional es más estable como explican

In [29]:
class WineNeuralNetwork2(nn.Module):
    def __init__(self, num_inputs, num_outputs, hidden_layers=[20, 8, 3]):
        super().__init__()
        self.network = torch.nn.Sequential(
            torch.nn.Linear(num_inputs, hidden_layers[0]),
            torch.nn.Sigmoid(),
            torch.nn.Linear(hidden_layers[0], hidden_layers[1]),
            torch.nn.Sigmoid(),
            torch.nn.Linear(hidden_layers[1], hidden_layers[2]),
            torch.nn.Sigmoid(),
            torch.nn.Linear(hidden_layers[2], num_outputs),
            #torch.nn.Sigmoid()
        )

    def forward(self, x):
        probs = self.network(x)
        return probs

num_inputs = parameters.shape[1]
num_outputs = target.shape[1]
model2 = WineNeuralNetwork2(num_inputs, num_outputs)

model2

WineNeuralNetwork2(
  (network): Sequential(
    (0): Linear(in_features=13, out_features=20, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=20, out_features=8, bias=True)
    (3): Sigmoid()
    (4): Linear(in_features=8, out_features=3, bias=True)
    (5): Sigmoid()
    (6): Linear(in_features=3, out_features=3, bias=True)
  )
)

In [30]:
model2.to(device)

WineNeuralNetwork2(
  (network): Sequential(
    (0): Linear(in_features=13, out_features=20, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=20, out_features=8, bias=True)
    (3): Sigmoid()
    (4): Linear(in_features=8, out_features=3, bias=True)
    (5): Sigmoid()
    (6): Linear(in_features=3, out_features=3, bias=True)
  )
)

In [31]:
LR = 1e-3

loss_fn2 = nn.BCEWithLogitsLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=LR)

In [32]:
epochs = 14
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dl, model2, loss_fn2, optimizer)
    val_loop(val_dl, model2, loss_fn2)
print("Done!")

Epoch 1
-------------------------------
loss: 0.596870  [    0/  142]
loss: 0.613399  [   64/  142]
loss: 0.603413  [   56/  142]
Test Error: 
 Accuracy: 100.0%, Avg loss: 0.471199 

Epoch 2
-------------------------------
loss: 0.673263  [    0/  142]
loss: 0.598584  [   64/  142]
loss: 0.664005  [   56/  142]
Test Error: 
 Accuracy: 100.0%, Avg loss: 0.471199 

Epoch 3
-------------------------------
loss: 0.628519  [    0/  142]
loss: 0.649880  [   64/  142]
loss: 0.641197  [   56/  142]
Test Error: 
 Accuracy: 100.0%, Avg loss: 0.471199 

Epoch 4
-------------------------------
loss: 0.654718  [    0/  142]
loss: 0.583769  [   64/  142]
loss: 0.649034  [   56/  142]
Test Error: 
 Accuracy: 100.0%, Avg loss: 0.471199 

Epoch 5
-------------------------------
loss: 0.641619  [    0/  142]
loss: 0.625088  [   64/  142]
loss: 0.607333  [   56/  142]
Test Error: 
 Accuracy: 100.0%, Avg loss: 0.471199 

Epoch 6
-------------------------------
loss: 0.626805  [    0/  142]
loss: 0.621660 