#Clasificación de cáncer de seno usando pytorch Lightning

En este cuaderno se reescribe el código propuesto para clasificación de cáncer de seno usando pytorch.

El conjunto de datos empleado corresponde a casos tumores de seno, de los cuales 357 resultaro beningnos y 212 malignos. Se cuenta con 10 características que serán empleadas para determinar la probabilidad de que un tumor sea cancerigeno.

In [1]:
import pathlib

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

import torch
!pip install pytorch_lightning 
import pytorch_lightning as pl
from pytorch_lightning import LightningModule
from torch.autograd import Variable

from torch import nn
from torchvision.transforms import ToTensor
import torch.nn.functional as F
!pip install livelossplot
from livelossplot import PlotLosses

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


El conjunto de datos fue descargado de [kaggle](https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data)

In [2]:
from google.colab import drive
drive.mount('/content/gdrive',force_remount=True)

ruta='gdrive/My Drive/Maestría/Inteligencia Artificial/'

Mounted at /content/gdrive


In [3]:
data = pd.read_csv(ruta+"Cancer_data.csv")
del data['Unnamed: 32']


Se generan los conjuntos de variables explicativas y variables de respuesta.

In [4]:
x = data.iloc[:,2:].values
y = data.iloc[:,1].values

In [5]:
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)

Se genera el conjunto de prueba y entrenamiento usando una semilla aleatoria para garantizar la replicabilidad.

In [6]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.1, random_state = 12)


Para la regresión se normalizan los datos.

In [7]:
sc = StandardScaler()
x_train = torch.tensor(sc.fit_transform(x_train), dtype = torch.float32)
x_test = torch.tensor(sc.transform(x_test), dtype = torch.float32)
y_train = torch.tensor(np.array(y_train), dtype = torch.float32)
y_test = torch.tensor(np.array(y_test), dtype = torch.float32)

Se usa pytorch para la clasificación, estableciendo una capa densa y un dropout en cada caso.

In [8]:
class Logistic_Reg_model(LightningModule):
  def __init__(self,inputDim):
    super(Logistic_Reg_model,self).__init__()
    self.layer1 = torch.nn.Linear(inputDim,16)
    self.layer2 = torch.nn.Dropout(0.1)
    self.layer3 = torch.nn.Linear(16,16)
    self.layer4 = torch.nn.Dropout(0.1)
    self.layer5 = torch.nn.Linear(16,1)

  def forward(self, x):
    y_pred = self.layer1(x)
    y_pred = self.layer2(y_pred)
    y_pred = self.layer3(y_pred)
    y_pred = self.layer4(y_pred)
    y_pred = torch.sigmoid(self.layer5(y_pred)).squeeze(1)
    return y_pred

In [9]:
inputDim = x_train.size(1)

In [10]:
model = Logistic_Reg_model(inputDim)
optim = torch.optim.Adam(model.parameters(),lr=0.01)
loss_function = torch.nn.BCELoss()

In [11]:
y_predicted = model(x_train)

In [12]:
nepochs = 100
for epoch in range(nepochs):
  y_predicted = model(x_train)
  loss = loss_function(y_predicted,y_train)
  loss.backward()
  optim.step()
  optim.zero_grad()
  if (epoch+1)%10 ==0:
    print('epoch:',epoch+1,'loss=',loss.item())

epoch: 10 loss= 0.12754739820957184
epoch: 20 loss= 0.07645468413829803
epoch: 30 loss= 0.06147556006908417
epoch: 40 loss= 0.057048261165618896
epoch: 50 loss= 0.05802769213914871
epoch: 60 loss= 0.0510437972843647
epoch: 70 loss= 0.05331026017665863
epoch: 80 loss= 0.053392380475997925
epoch: 90 loss= 0.05210573971271515
epoch: 100 loss= 0.0509592704474926


A continuación se prueba el modelo

In [13]:
with torch.no_grad():
 y_predicted=model(x_test)
 y_predicted_class=y_predicted.round()
 accuracy=(y_predicted_class.eq(y_test).sum())/float(y_test.shape[0])
 print("Nuestra precisión es del {}%".format(accuracy.item()*100))

Nuestra precisión es del 98.24561476707458%
