# Skin Cancer Classification

In this project, we will create a classification model using skin cancer images labeled as cancer or non-cancer obtained from Kaggle.com.We will use Keras deep learning. Then we will save our model. We will integrate the model into our website that we prepared with Streamlit library. When we give a new skin photo, the model will predict cancer or not it is.

<img src='https://news.christianacare.org/wp-content/uploads/2024/05/GettyImages-1404462398.jpg' >
<a href='https://www.kaggle.com/datasets/kylegraupe/skin-cancer-binary-classification-dataset'>Click her to reach dataset<a>

### Import Dataset

In [1]:
#!pip install opencv-python

In [2]:
import cv2
import pandas as pd
import os

In [3]:
labels=['Cancer','Non_Cancer']
img_path='Skin_Data/'

In [4]:
img_list=[]
label_list=[]
for label in labels:
    for img_file in os.listdir(img_path+label):
        img_list.append(img_path+label+'/'+img_file)
        label_list.append(label)     

In [5]:
df=pd.DataFrame({'img':img_list,'label':label_list})

In [6]:
df.head()

Unnamed: 0,img,label
0,Skin_Data/Cancer/1007-1.jpg,Cancer
1,Skin_Data/Cancer/1010-01.JPG,Cancer
2,Skin_Data/Cancer/1012-2.JPG,Cancer
3,Skin_Data/Cancer/1031-1.jpg,Cancer
4,Skin_Data/Cancer/1051-3(94).jpg,Cancer


In [7]:
df.tail()

Unnamed: 0,img,label
283,Skin_Data/Non_Cancer/953-1.JPG,Non_Cancer
284,Skin_Data/Non_Cancer/954-3.JPG,Non_Cancer
285,Skin_Data/Non_Cancer/955.JPG,Non_Cancer
286,Skin_Data/Non_Cancer/984.JPG,Non_Cancer
287,Skin_Data/Non_Cancer/986-1.JPG,Non_Cancer


In [8]:
import matplotlib.pyplot as plt

In [9]:
d={'Cancer':1,'Non_Cancer':0}

In [10]:
df['encode_label']=df['label'].map(d)

In [11]:
df.tail()

Unnamed: 0,img,label,encode_label
283,Skin_Data/Non_Cancer/953-1.JPG,Non_Cancer,0
284,Skin_Data/Non_Cancer/954-3.JPG,Non_Cancer,0
285,Skin_Data/Non_Cancer/955.JPG,Non_Cancer,0
286,Skin_Data/Non_Cancer/984.JPG,Non_Cancer,0
287,Skin_Data/Non_Cancer/986-1.JPG,Non_Cancer,0


In [12]:
import numpy as np

### Deep Learning

We will identify x and y datas.

In [13]:
x=[]
for img in df['img']:
    img=cv2.imread(img)
    img=cv2.resize(img,(170,170)) #boyutunu 170x170 pixel yaptık
    img=img/255.0 #normalize ettik
    x.append(img)

In [14]:
x=np.array(x)

In [15]:
y=df['encode_label']

Let's import train test split and we will split%20 of our datas as test data.

In [16]:
from sklearn.model_selection import train_test_split

In [17]:
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=42)

In [18]:
from keras.models import Sequential
from keras.layers import Conv2D, Dense, Flatten, Input, MaxPooling2D,Dropout,BatchNormalization,Reshape

CNN - Convolutional Neural Networks

In [19]:
model = Sequential()
model.add(Input(shape=(170,170,3)))
model.add(Conv2D(32,(3,3),activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(64,(3,3),activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Flatten())
model.add(Dense(128))
model.add(Dense(2,activation='softmax'))
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])

In [20]:
history = model.fit(x_train,y_train,epochs=15,validation_data=(x_test,y_test),verbose=1)

Epoch 1/15
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 289ms/step - accuracy: 0.5806 - loss: 11.8310 - val_accuracy: 0.7414 - val_loss: 0.6951
Epoch 2/15
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 215ms/step - accuracy: 0.7215 - loss: 0.5766 - val_accuracy: 0.7586 - val_loss: 0.6118
Epoch 3/15
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 216ms/step - accuracy: 0.7379 - loss: 0.5827 - val_accuracy: 0.7069 - val_loss: 0.5702
Epoch 4/15
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 215ms/step - accuracy: 0.7514 - loss: 0.5417 - val_accuracy: 0.7759 - val_loss: 0.4773
Epoch 5/15
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 219ms/step - accuracy: 0.7482 - loss: 0.4794 - val_accuracy: 0.7931 - val_loss: 0.4672
Epoch 6/15
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 223ms/step - accuracy: 0.8129 - loss: 0.4089 - val_accuracy: 0.7931 - val_loss: 0.4456
Epoch 7/15
[1m8/8[0m [32m━━━━━━━━━━━

In [21]:
model.save('my_cnn_model.h5')



We trained and saved our model.Now we can use it in any algorithm.

## Transfer Learning

We trained a new model from scratch using CNN. However, we can also achieve our goal by using pre-trained models and providing them with our own data. Among the pre-trained models, the most widely used in the field of image processing are the ResNet50 and VGG16 models. Here, we will use the VGG16 model.

In [22]:
from keras.models import Sequential
from keras.layers import Conv2D, Dense, Flatten, Input, MaxPooling2D,Dropout,BatchNormalization,Reshape
from tensorflow.keras.applications import VGG16,ResNet50
from tensorflow.keras.preprocessing.image import ImageDataGenerator #tek satırda resim okuma

In [23]:
data_dir='Skin_Data'
img_width,img_heigth=224,224

train_datagen=ImageDataGenerator(rescale=1/255, validation_split=.20)

train_datagenerator=train_datagen.flow_from_directory(directory=data_dir,target_size=(img_width,img_heigth),
                                class_mode='binary', subset='training')

test_datagen=ImageDataGenerator(rescale=1/255)
test_datagenerator=train_datagen.flow_from_directory(directory=data_dir,target_size=(img_width,img_heigth),
                                class_mode='binary', subset='validation')

base_model=VGG16(weights='imagenet', input_shape=(img_width,img_heigth,3),include_top=False)

model=Sequential()

model.add(base_model)
for layer in base_model.layers:
    layer.trainable=False

model.add(Flatten())
model.add(Dense(1024,activation='relu'))
model.add(Dense(1,activation='sigmoid'))

model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])

model.fit(train_datagenerator,epochs=10,validation_data=test_datagenerator)

Found 232 images belonging to 2 classes.
Found 56 images belonging to 2 classes.


  self._warn_if_super_not_called()


Epoch 1/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 4s/step - accuracy: 0.7078 - loss: 5.1778 - val_accuracy: 0.2857 - val_loss: 1.7129
Epoch 2/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 4s/step - accuracy: 0.5554 - loss: 1.3742 - val_accuracy: 0.7143 - val_loss: 1.1237
Epoch 3/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 4s/step - accuracy: 0.6960 - loss: 0.7738 - val_accuracy: 0.8393 - val_loss: 0.4752
Epoch 4/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 4s/step - accuracy: 0.7899 - loss: 0.4413 - val_accuracy: 0.8393 - val_loss: 0.4198
Epoch 5/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 4s/step - accuracy: 0.8361 - loss: 0.3825 - val_accuracy: 0.8214 - val_loss: 0.4013
Epoch 6/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 4s/step - accuracy: 0.8443 - loss: 0.3397 - val_accuracy: 0.8571 - val_loss: 0.3820
Epoch 7/10
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m

<keras.src.callbacks.history.History at 0x26cc3d3dc70>

In [24]:
model.summary()

In [25]:
model.save('my_cnn_tf_model.h5')



### Conclusion

We turned the file paths of the skin cancer images obtained from Kaggle.com into a DataFrame with the help of the Pandas and OS libraries. Then, we sequentially read all the images using the OpenCV library and normalized them for easier processing. Afterward, we trained our model using Keras deep learning. As a second step, we used the VGG16 model by feeding our data into it, training the pre-trained model with our data, and making it suitable for our purpose. Both methods can be used for Image Classification, but since training a model from scratch is a time-consuming process, the common approach is to use a pre-trained model and apply transfer learning.