Assignment: https://www.cs.cmu.edu/~10315/10315_S24_Mini_Project.pdf

**Title:** Prediction Confidence in Cancer Diagnoses

In [None]:
training_flag = False

**Model Card:**

**1) Task Input and Output**

Input: The model takes as input images of skin lesions. Each image is represented as a 3D array of pixel values, indicating the color intensity of each pixel.

Output: The model predicts the diagnosis of the skin lesion based on the input image. The diagnosis is classified into different categories, representing different types of skin cancer.

**2) Training Data**

The model is trained on the HAM10000 dataset, which contains 10,015 dermatoscopic images of skin lesions, labeled with ground truth diagnoses. The dataset includes seven different types of skin lesions, including melanoma, melanocytic nevi, basal cell carcinoma, actinic keratoses, benign keratosis-like lesions, dermatofibroma, and vascular lesions.

**3) Intended Use**

The model is intended to assist dermatologists and healthcare professionals in diagnosing skin lesions. It is designed to provide a preliminary assessment based on dermatoscopic images, which can help guide further clinical evaluation and treatment decisions. The primary users of the model would be healthcare professionals, including dermatologists, general practitioners, and other medical professionals involved in skin cancer diagnosis and treatment.  The model can be used as a decision support tool to aid in the early detection of skin cancer and other skin lesions. It can help prioritize cases for further evaluation and potentially reduce the number of unnecessary biopsies.  The model's predictions are based on the information present in the input images and do not capture all relevant clinical information, so it is important for healthcare professionals to use the model's predictions in conjunction with their clinical judgment and other diagnostic tools.

**4) Risks**

The dataset is imbalanced, with certain classes of skin lesions being underrepresented compared to others, which could lead to biased model predictions and lower performance on minority classes. Additionally, data could me mislabled as there could be some diagnostic error within the dataset.
The use of medical data also raises ethical considerations, including patient privacy and consent. It is also difficult to interpret decisions of a CNN model, so we cannot be sure of what exactly is influencing a decisions. There is a risk of bias in the dataset, which could result in unfair or discriminatory outcomes. From a visual inspection of the dataset, a supermajority of the photos include a white or fair-skinned patients, so the model will likely perfom a lot worse for people with darker skin.

**Introduction: What is your project about? What problem are you trying to solve? Describe the dataset that you used and the inputs/outputs of the problem.**

The goal of the project is to compare the confidence that different models have in classifying different images of skin lesions by their type of cancer diagnosis.  To do so, we use logistic regression and a convolutional neural network.  For both model types, we will input a collection of multi-source dermatoscopic images of pigmented lesions and output a vector with classification probabilities for different diagnostic categories: actinic keratoses and intraepithelial carcinoma / Bowen's disease, basal cell carcinoma, benign keratosis-like lesions, dermatofibroma, melanoma, melanocytic nevi and vascular lesions.  The diagnostic category with the highest probability is the predicted diagnosis of the model, and we will compare the different levels of confidence that each model has in its predictions.  We will also compare how different activation functions within the convolution neural network will impact the confidence rates.

Experimental Question: How do different activation functions (e.g. softmax, ReLU, sigmoid) in a convolutional neural network impact the confidence in predicted classifications?

Techniques: We plan to implement a convolution neural network and logistic regression to compare the relative confidence that each type of model has in the classification prediction.  The first model we will utilize is the logistic regression model.  The logistic regression model, as we’ve covered in this course, is a discriminative classification model that returns a real value for an input that is meant to model the probability of that data point belonging to each class.  The final classification made by the model is determined by identifying the class that is associated with the highest probability predicted by the model, indicating that the data point most likely belongs to that respective class.  We selected the logistic regression model as we’ve identified that our project is essentially a classification task of different skin lesions by their type of cancer diagnosis.  The logistic regression model is easy to interpret due to its simplicity and typically less demanding and more efficient than more complex models, setting an appropriate baseline or standard to compare the performance of other models against.
For the convolutional neural network, we will use our experimental question, of which activation to use, to fine tune the model to compare the results with the logistic regression model.  This model was selected because it is generally significantly more complex than logistic regression models since it is essentially a deep neural network model with multiple layers (i.e. convolutional layer, pooling layer, etc.) that can be heavily fine-tuned.



**Methods:**

Logistic Regression We use logistic regression as a baseline model because of its simplicity. Each image is input into the model, which outputs a probability for various cancer diagnoses. The highest probability indicates the most likely diagnosis.

*Convolutional Neural Network * The CNN will experiment with activation functions, ReLU, softmax and sigmoid to assess their impact on model confidence and accuracy. The network includes 5 convolutional layers with pooling, two more fully connected layers and an output layer to bring it together. We tailored the model to enhance feature extraction from the images, making it ideal for tasks like lesion classification.

Comparative Analysis We will compare the confidence and reliability of predictions between the logistic regression model and the CNN, focusing on how different activation functions influence the CNN’s performance. This analysis will help identify the most effective model for accurate cancer image classification.

# Get Data

In [None]:
!git clone https://github.com/kimjanise/315-mini-project.git

In [None]:
import shutil
shutil.move('/content/315-mini-project/HAM10000_metadata.csv', '/content')
shutil.move('/content/315-mini-project/kaggle.json', '/content')

In [None]:
!pip install kaggle
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

In [None]:
!kaggle datasets download -d kmader/skin-cancer-mnist-ham10000

In [None]:
!unzip skin-cancer-mnist-ham10000.zip -d /content/skin-cancer-data

# Models

In [None]:
import numpy as np
from numpy.linalg import inv
import pandas as pd
import matplotlib.pyplot as plt
import math
import pandas as pd
import os
import re
from PIL import Image

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

import torchvision
from torchvision import transforms

In [None]:
metadata = pd.read_csv('../content/HAM10000_metadata.csv')

**Data Processing:**

In [None]:
# lesion names are given in the description of the dataset
lesion_type_dict = {
    'nv': 'Melanocytic nevi',
    'mel': 'Melanoma',
    'bkl': 'Benign keratosis-like lesions ',
    'bcc': 'Basal cell carcinoma',
    'akiec': 'Actinic keratoses',
    'vasc': 'Vascular lesions',
    'df': 'Dermatofibroma'
}

lesion_ID_dict = {
    'nv': 0,
    'mel': 1,
    'bkl': 2,
    'bcc': 3,
    'akiec': 4,
    'vasc': 5,
    'df': 6
}

lesion_names = ['Melanocytic nevi','Melanoma','Benign keratosis-like lesions ',
               'Basal cell carcinoma','Actinic keratoses','Vascular lesions',
               'Dermatofibroma']

lesion_names_short = ['nv','mel','bkl','bcc','akiec','vasc','df']

metadata['lesion_type']=metadata['dx'].map(lesion_type_dict)
metadata['lesion_ID'] = metadata['dx'].map(lesion_ID_dict)

metadata['lesion_type'].value_counts()

In [None]:
import cv2
from cv2 import imread, resize

In [None]:
X = []
y = []
lista1 = os.listdir('/content/skin-cancer-data/HAM10000_images_part_1')
lista2 = os.listdir('/content/skin-cancer-data/HAM10000_images_part_2')
#import images from folder 1
for i in range(len(lista1)):
    fname_image = lista1[i]
    fname_ID = fname_image.replace('.jpg','')
    #features
    file_to_read =os.path.join('/content/skin-cancer-data/HAM10000_images_part_1',fname_image)
    img = imread(file_to_read)
    img = resize(img, (100,100))
    X.append(img)
    #targets
    output = np.array(metadata[metadata['image_id'] == fname_ID].lesion_ID)
    y.append(output[0])

for i in range(len(lista2)):
    fname_image = lista2[i]
    fname_ID = fname_image.replace('.jpg','')
    #features
    file_to_read =os.path.join('/content/skin-cancer-data/HAM10000_images_part_2',fname_image)
    img = imread(file_to_read)
    img = resize(img, (100,100))
    X.append(img)
    #targets
    output = np.array(metadata[metadata['image_id'] == fname_ID].lesion_ID)
    y.append(output[0])


In [None]:
from tensorflow.keras.utils import to_categorical
X = np.array(X)
y = np.array(y)
y_train = to_categorical(y, num_classes=7)

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y_train, test_size=0.2, random_state=50, stratify=y)


In [None]:
fig, ax = plt.subplots(1, 7, figsize=(30, 30))
for i in range(7):
    ax[i].set_axis_off()
    ax[i].imshow(X_train[i])
    ax[i].set_title(lesion_names[np.argmax(y_train[i])])

In [None]:
from sklearn.utils.class_weight import compute_class_weight
y_id = np.array(metadata['lesion_ID'])

# compute weights for the loss function, because the problem is unbalanced
class_weights = np.around(compute_class_weight(class_weight='balanced',classes=np.unique(y_id),y=y),2)
class_weights = dict(zip(np.unique(y_id),class_weights))

**Logistic Regression Model:**

In [None]:
# dimensionality reduction and flattening to fit the input of a logistic regression model and enable convergence
from sklearn.linear_model import LogisticRegression
X_train_logreg = np.array([np.matrix.flatten(np.array([[np.mean(y) for y in x] for x in datapoint])) for datapoint in X_train])
y_train_logreg = np.array([np.argmax(row) for row in y_train])
X_test_logreg = np.array([np.matrix.flatten(np.array([[np.mean(y) for y in x] for x in datapoint])) for datapoint in X_test])
y_test_logreg = np.array([np.argmax(row) for row in y_test])

In [None]:
logreg = LogisticRegression(max_iter=10000, tol=0.1, class_weight=class_weights).fit(X_train_logreg, y_train_logreg)

In [None]:
scores = logreg.score(X_test_logreg, y_test_logreg)
scores
# print("Accuracy: %.2f%%" % (scores[1]*100))

In [None]:
from sklearn import metrics
y_pred_logreg = logreg.predict(X_test_logreg)
cnf_matrix = metrics.confusion_matrix(y_test_logreg, y_pred_logreg)
cnf_matrix

In [None]:
y_hat_logreg = logreg.predict_proba(X_test_logreg)

In [None]:
# Loading Pre-trained models

# modelSoftmax = Model()
# load_model_from_file = f'./content/pretrained_models/pretrained_softmax_model.h5'
# print(f'Loding model from {load_model_from_file}')
# modelSoftmax.load_state_dict(torch.load(load_model_from_file, map_location=torch.device(trainer.device)))

from keras.models import load_model
modelSoftmax = load_model('./content/pretrained_models/pretrained_softmax_model.h5')

# model_dir = '/content/pretrained_models'
# os.makedirs(model_dir, exist_ok=True)
# model_path = os.path.join(model_dir, 'pretrained_softmax_model.h5')  # h5 is a common format for Keras models
# modelSoftmax.save(model_path)

In [None]:
# Accuracy Comparison
model_names = ["Logistic Regression", "CNN with ReLU", "CNN with Sigmoid", "CNN with Softmax"]
model_train_accuracy = []
model_test_accuracy = []

X_axis = np.arange(len(lesion_names_short))
plt.bar(X_axis - 0.2, model_train_accuracy, 0.4, label = 'Train Accuracy')
plt.bar(X_axis + 0.2, model_test_accuracy, 0.4, label = 'Test Accuracy')
plt.xticks(X_axis, model_names)
plt.xlabel("Model Type")
plt.ylabel("Percentage")
plt.title("Train and Test Accuracy Across Models")
plt.legend()
plt.show()

In [None]:
# Prediction Confidence Comparison
y_pred_cnn_relu = modelRelu.predict(X_test)
y_hat_cnn_relu = modelRelu.predict_proba(X_test)
y_pred_cnn_sigmoid = modelSigmoid.predict(X_test)
y_hat_cnn_sigmoid = modelSigmoid.predict_proba(X_test)
y_pred_cnn_softmax = modelSoftmax.predict(X_test)
y_hat_cnn_softmax = modelSoftmax.predict_proba(X_test)

logreg_avg_confidence = np.zeros()
cnn_relu_avg_confidence = np.zeros()
cnn_sigmoid_avg_confidence = np.zeros()
cnn_softmax_avg_confidence = np.zeros()
for i in range(lesion_names_short):
  logreg_avg_confidence[i] = np.mean(y_hat_logreg[y_pred_logreg == i])
  cnn_relu_avg_confidence[i] = np.mean(y_hat_cnn_relu[y_pred_cnn_relu == i])
  cnn_sigmoid_avg_confidence[i] = np.mean(y_hat_cnn_sigmoid[y_pred_cnn_sigmoid == i])
  cnn_softmax_avg_confidence[i] = np.mean(y_hat_cnn_softmax[y_pred_cnn_softmax == i])

X_axis = np.arange(len(lesion_names_short))
plt.bar(X_axis - 0.3, logreg_avg_confidence, 0.2, label = 'Logistic Regression')
plt.bar(X_axis - 0.1, cnn_relu_avg_confidence, 0.2, label = 'CNN with ReLU')
plt.bar(X_axis + 0.1, cnn_sigmoid_avg_confidence, 0.2, label = 'CNN with Sigmoid')
plt.bar(X_axis + 0.3, cnn_softmax_avg_confidence, 0.2, label = 'CNN with Softmax')
plt.xticks(X_axis, lesion_names_short)
plt.xlabel("Class")
plt.ylabel("Confidence in Prediction")
plt.title("Average Confidence in Class Predictions")
plt.legend()
plt.show()

In [None]:
# Training Loss Analysis
plt.figure(figsize=(10, 5))

plt.plot(range(1, len(relu_train_loss) + 1), relu_train_loss, color='b', label='CNN with ReLu')
plt.plot(range(1, len(relu_train_loss) + 1), sigmoid_train_loss, color='r', label='CNN with Sigmoid')
plt.plot(range(1, len(relu_train_loss) + 1), softmax_train_loss, color='g', label='CNN with Softmax')

plt.title('Training Loss Across CNN Models')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

**CNN Models:**

In [None]:
import keras
from keras.models import Sequential, load_model, Model
from keras.callbacks import EarlyStopping,ModelCheckpoint
# from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, Activation
# from keras.layers import Conv2D,BatchNormalization,MaxPool2D,Flatten,Dense
from keras.layers import Conv2D, MaxPool2D, Dense, Input, Activation, Dropout, GlobalAveragePooling2D, \
    BatchNormalization, concatenate, AveragePooling2D, Flatten
from keras.optimizers import Adam
from keras.preprocessing.image import ImageDataGenerator

CNN Model with ReLu:

In [None]:
modelRelu = Sequential([
    # 1st convolutional layer
    Conv2D(filters=96, kernel_size=(11,11), strides=(4,4), activation='relu', input_shape=(100,100,3)),
    BatchNormalization(),
    MaxPool2D(pool_size=(3,3), strides=(2,2)),
    # 2nd convolutional layer
    Conv2D(filters=256, kernel_size=(5,5), strides=(1,1), activation='relu', padding="same"),
    BatchNormalization(),
    MaxPool2D(pool_size=(3,3), strides=(2,2)),
    # 3rd convolutional layer
    Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same"),
    BatchNormalization(),
    # 4th convolutional layer
    Conv2D(filters=384, kernel_size=(1,1), strides=(1,1), activation='relu', padding="same"),
    BatchNormalization(),
    # 5th convolutional layer
    Conv2D(filters=256, kernel_size=(1,1), strides=(1,1), activation='relu', padding="same"),
    BatchNormalization(),
    MaxPool2D(pool_size=(3,3), strides=(2,2)),
    Flatten(),
    # 6th, Dense layer
    Dense(4096, activation='relu'),
    Dropout(0.5),
    # 7th Dense layer
    Dense(4096, activation='relu'),
    Dropout(0.5),
    # 8th output layer
    Dense(7, activation='softmax')
])

In [None]:
# training
early_stopping_monitor = EarlyStopping(patience=100,monitor='val_accuracy')
model_checkpoint_callback = ModelCheckpoint(filepath='model.h5',
                                            save_weights_only=False,
                                            monitor='val_accuracy',
                                            mode='auto',
                                            save_best_only=True,
                                            verbose=1)
batch_size = 32
epochs = 100
optimizer = Adam(learning_rate=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-3)
modelRelu.compile(optimizer = optimizer, loss = 'categorical_crossentropy', metrics=['accuracy'])

datagen = ImageDataGenerator(zoom_range = 0.2, horizontal_flip=True, shear_range=0.2)

datagen.fit(X_train)

history=modelRelu.fit(datagen.flow(X_train,y_train), epochs=epochs, batch_size=batch_size, shuffle=True, callbacks=[early_stopping_monitor,model_checkpoint_callback], validation_data=(X_test, y_test), class_weight=class_weights)

In [None]:
scores = modelRelu.evaluate(X_test, y_test, verbose=1)
print("Accuracy: %.2f%%" % (scores[1]*100))

CNN Model with Sigmoid:

In [None]:
modelSigmoid = Sequential([
    # 1st convolutional layer
    Conv2D(filters=96, kernel_size=(11,11), strides=(4,4), activation='sigmoid', input_shape=(100,100,3)),
    BatchNormalization(),
    MaxPool2D(pool_size=(3,3), strides=(2,2)),
    # 2nd convolutional layer
    Conv2D(filters=256, kernel_size=(5,5), strides=(1,1), activation='sigmoid', padding="same"),
    BatchNormalization(),
    MaxPool2D(pool_size=(3,3), strides=(2,2)),
    # 3rd convolutional layer
    Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), activation='sigmoid', padding="same"),
    BatchNormalization(),
    # 4th convolutional layer
    Conv2D(filters=384, kernel_size=(1,1), strides=(1,1), activation='sigmoid', padding="same"),
    BatchNormalization(),
    # 5th convolutional layer
    Conv2D(filters=256, kernel_size=(1,1), strides=(1,1), activation='sigmoid', padding="same"),
    BatchNormalization(),
    MaxPool2D(pool_size=(3,3), strides=(2,2)),
    Flatten(),
    # 6th, Dense layer
    Dense(4096, activation='sigmoid'),
    Dropout(0.5),
    # 7th Dense layer
    Dense(4096, activation='sigmoid'),
    Dropout(0.5),
    # 8th output layer
    Dense(7, activation='softmax')
])

In [None]:
# training
early_stopping_monitor = EarlyStopping(patience=100,monitor='val_accuracy')
model_checkpoint_callback = ModelCheckpoint(filepath='model.h5',
                                            save_weights_only=False,
                                            monitor='val_accuracy',
                                            mode='auto',
                                            save_best_only=True,
                                            verbose=1)
batch_size = 32
epochs = 100
optimizer = Adam(learning_rate=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-3)
modelSigmoid.compile(optimizer = optimizer, loss = 'categorical_crossentropy', metrics=['accuracy'])

datagen = ImageDataGenerator(zoom_range = 0.2, horizontal_flip=True, shear_range=0.2)

datagen.fit(X_train)

history=modelSigmoid.fit(datagen.flow(X_train,y_train), epochs=epochs, batch_size=batch_size, shuffle=True, callbacks=[early_stopping_monitor,model_checkpoint_callback], validation_data=(X_test, y_test), class_weight=class_weights)

In [None]:
scores = modelSigmoid.evaluate(X_test, y_test, verbose=1)
print("Accuracy: %.2f%%" % (scores[1]*100))

CNN Model with Softmax:

In [None]:
modelSoftmax = Sequential([
    # 1st convolutional layer
    Conv2D(filters=96, kernel_size=(11,11), strides=(4,4), activation='sigmoid', input_shape=(100,100,3)),
    BatchNormalization(),
    MaxPool2D(pool_size=(3,3), strides=(2,2)),
    # 2nd convolutional layer
    Conv2D(filters=256, kernel_size=(5,5), strides=(1,1), activation='sigmoid', padding="same"),
    BatchNormalization(),
    MaxPool2D(pool_size=(3,3), strides=(2,2)),
    # 3rd convolutional layer
    Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), activation='sigmoid', padding="same"),
    BatchNormalization(),
    # 4th convolutional layer
    Conv2D(filters=384, kernel_size=(1,1), strides=(1,1), activation='sigmoid', padding="same"),
    BatchNormalization(),
    # 5th convolutional layer
    Conv2D(filters=256, kernel_size=(1,1), strides=(1,1), activation='sigmoid', padding="same"),
    BatchNormalization(),
    MaxPool2D(pool_size=(3,3), strides=(2,2)),
    Flatten(),
    # 6th, Dense layer
    Dense(4096, activation='sigmoid'),
    Dropout(0.5),
    # 7th Dense layer
    Dense(4096, activation='sigmoid'),
    Dropout(0.5),
    # 8th output layer
    Dense(7, activation='softmax')
])

In [None]:
# training
early_stopping_monitor = EarlyStopping(patience=100,monitor='val_accuracy')
model_checkpoint_callback = ModelCheckpoint(filepath='model.h5',
                                            save_weights_only=False,
                                            monitor='val_accuracy',
                                            mode='auto',
                                            save_best_only=True,
                                            verbose=1)
batch_size = 32
epochs = 100
optimizer = Adam(learning_rate=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-3)
modelSoftmax.compile(optimizer = optimizer, loss = 'categorical_crossentropy', metrics=['accuracy'])

datagen = ImageDataGenerator(zoom_range = 0.2, horizontal_flip=True, shear_range=0.2)

datagen.fit(X_train)

history=modelSoftmax.fit(datagen.flow(X_train,y_train), epochs=epochs, batch_size=batch_size, shuffle=True, callbacks=[early_stopping_monitor,model_checkpoint_callback], validation_data=(X_test, y_test), class_weight=class_weights)

In [None]:
scores = modelSoftmax.evaluate(X_test, y_test, verbose=1)
print("Accuracy: %.2f%%" % (scores[1]*100))

**Results and Discussion:**

analysis- The CNN with ReLU activation outperformed other models, achieving the highest accuracy of 71%. This result supports what we expected for ReLU in CNNs.

The sigmoid and softmax activations in the CNN performed poorly, with accuracies of 52% and 48%, respectively. Softmax in a convolutional layer, typically used only in the output layer for multi-class classification, shows a misapplication, which resulted in the lowest performance.

The logistic regression model performed reasonably well, considering its simplicity compared to neural networks, with an accuracy of 60%.​​ This was really surprising to us as we expected it to preform a lot worse than the CNNs. This makes it a quality baseline for comparison, indicating that complex models do not always yield better results.

The results were somewhat expected, especially the superior performance of the CNN with ReLU compared to sigmoid and softmax activations. The underperformance of logistic regression compared to the best CNN configuration was anticipated due to the logistic model's linear nature, which limits its ability to capture complex patterns in image data as effectively as CNNs.
The low performance of the softmax activation within the CNN was unexpected. We expected it to have a performance closer to the ReLU activation function and not do as poorly as it did.

**Conclusion**
Overall, the experimental results show the importance of choosing the right activation function and model architecture based on the task at hand. While CNNs with ReLU activation are generally more effective for image classification tasks, logistic regression provides a baseline. We were disappointed in the performance of the CNN with ReLU as we expected a higher accuracy.


**References and Citations:**


https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000/data

https://www.kaggle.com/code/harinagasaiperisetla/skin-lesion-ham10000-using-cnn

https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/

https://machinelearningmastery.com/multinomial-logistic-regression-with-python/

In [None]:
# Model push
!curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
!sudo apt-get install git-lfs
!git lfs install

%cd /content/checkpoints/train
!git clone https://{username}:{password}@github.com/{username}/{project}.git

