# Skin Cancer Classification

En el siguiente notebook se expone el desarrollo de una CNN para clasificar fotografías de lunares en función de si son cancerígenas o no. El dataset está compuesto por fotografías en formato ".jpg" separadas en carpetas según si pertenecen al set de entrenamiento o de test y según si son benignas o malignas. El dataset parece complicado puesto que las fotografías han sido tomadas desde distintos ángulos, con distinta luz y algunas imágenes se ven peor debido a que el paciente tenia vello en la zona del lunar.

### Imports

In [2]:
import numpy as np
import pandas as pd
from PIL import Image
import os
import glob
import matplotlib.pyplot as plt
from keras.utils.np_utils import to_categorical
np.random.seed(5)

Using TensorFlow backend.


## ETL process

### Load data

In [3]:
benign_train_folder = 'data/train/benign'
malignant_train_folder = 'data/train/malignant'

benign_test_folder = 'data/test/benign'
malignant_test_folder = 'data/test/malignant'

Función para leer todas las imágenes de una carpeta

In [4]:
def read(folder_path):
    data_path = os.path.join(folder_path,'*jpg')
    folder = glob.glob(data_path)
    matrix = []
    for f in folder:
        img = np.asarray(Image.open(f).convert("RGB"))
        matrix.append(img)
    matrix = np.asarray(matrix)
    return matrix

#Create data
X_benign_train = read(benign_train_folder)
X_malignant_train = read(malignant_train_folder)
X_benign_test = read(benign_test_folder)
X_malignant_test = read(malignant_test_folder)

Creamos las etiquetas para cada set de datos; 0 para los benignos y 1 para los malignos.

In [5]:
Y_benign_train = np.zeros(X_benign_train.shape[0])
Y_malignant_train = np.ones(X_malignant_train.shape[0])
Y_benign_test = np.zeros(X_benign_test.shape[0])
Y_malignant_test = np.ones(X_malignant_test.shape[0])

Concatenamos los set de datos y los barajamos, la red funcionaría peor si los datos de entrenamiento estuvieran ordenados en función la salida.

In [6]:
X_train = np.concatenate((X_benign_train, X_malignant_train), axis = 0)
Y_train = np.concatenate((Y_benign_train, Y_malignant_train), axis = 0)
s = np.arange(X_train.shape[0])
np.random.shuffle(s)
X_train = X_train[s]
Y_train = Y_train[s]

X_test = np.concatenate((X_benign_test, X_malignant_test), axis = 0)
Y_test = np.concatenate((Y_benign_test, Y_malignant_test), axis = 0)
s = np.arange(X_test.shape[0])
np.random.shuffle(s)
X_test = X_test[s]
Y_test = Y_test[s]

Normalizaión y transformación de las salidas

In [7]:
#Turn labels into one hot encoding (ya veremos si esto hace falta o si lo dejo)
Y_train = to_categorical(Y_train, num_classes= 2)
Y_test = to_categorical(Y_test, num_classes= 2)

# Normalization
X_train = X_train/255.
X_test = X_test/255

## Model  

In [8]:
from tensorflow import keras 
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten, MaxPooling2D, AveragePooling2D

Se implementa un modelo con 3 capas convolucionales y 3 capas de Maxpooling que vuelcan los resultados en una capa fully connected conectada con una capa de 30 neuronas.

In [None]:
model = Sequential() #Test Acc:  0.760606050491333
#add model layers
model.add(Conv2D(128, kernel_size=11, activation='relu', input_shape=(224,224,3)))
model.add(MaxPooling2D(pool_size=(2, 2), strides=2, padding='same', data_format=None))

model.add(Conv2D(128, kernel_size=11, activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=2, padding='same', data_format=None))

model.add(Conv2D(64, kernel_size=11, activation='relu'))
mode.add(MaxPooling2D(pool_size=(2, 2), strides=2, padding='same', data_format=None))

mode.add(Flatten())
mode.add(Dense(30, activation='relu'))
mode.add(Dense(2, activation='softmax'))

Compilamos con optimizer=adam y loss=categorical_crossentropy. Despues entrenamos el modelo.

In [None]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(X_train, Y_train, validation_data=(X_test, Y_test), batch_size=16, epochs=3)

Por último, evaluamos dicho modelo con los datos de test.

In [None]:
test_loss, test_acc = model2.evaluate(X_test, Y_test, batch_size=16)
print('Test Acc: ', test_acc)

In [None]:
from keras.applications.resnet50 import ResNet50
model = ResNet50(include_top=True,
                 weights= None,
                 input_tensor=None,
                 input_shape=[224,224,3],
                 pooling='avg',
                 classes=2)

model.compile(#optimizer = Adam(lr) ,
              optimizer='adam',
              loss = "binary_crossentropy", 
              metrics=["accuracy"])

model.fit(X_train, Y_train, #validation_split=0.2,
                    epochs= 5, batch_size= 16, #verbose=2, 
                    #callbacks=[learning_rate_reduction]
                   )

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
 592/2637 [=====>........................] - ETA: 27:45 - loss: 0.3902 - accuracy: 0.8091

In [None]:
test_loss, test_acc = model.evaluate(X_test, Y_test, batch_size=16)
print('Test Acc: ', test_acc)

In [None]:
model = Sequential()
model.add(layers.Convolution2D(32, (3, 3), activation='relu', input_shape=(64, 64, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.BatchNormalization())

model.add(layers.Convolution2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.BatchNormalization())

model.add(layers.Convolution2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.BatchNormalization())

model.add(layers.Convolution2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.BatchNormalization())
    
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))        