This repository contains code to create web application which use to detect melanome from given skin image.
You can try this app using : http://3.17.64.68:8501/
Skin cancer is the most prevalent type of cancer. Melanoma, specifically, is responsible for 75% of skin cancer deaths, despite being the least common skin cancer. The American Cancer Society estimates over 100,000 new melanoma cases will be diagnosed in 2020. It's also expected that almost 7,000 people will die from the disease. As with other cancers, early and accurate detection—potentially aided by data science—can make treatment more effective. https://www.kaggle.com/c/siim-isic-melanoma-classification.
The objective of this project is to identify melanoma in images of skin lesions. Using patient-level contextual information may help the development of image analysis tools, which could better support clinical dermatologists.In particular, we need to use images within the same patient and determine which are likely to represent a melanoma. In other words, we need to create a model which should predict the probability whether the lesion in the image is malignantor benign.Value 0 denotes benign, and 1 indicates malignant.
The dataset which we are going to use are from following sources:
Kaggle SIIM Melanoma Classification Challange : https://www.kaggle.com/c/siim-isic-melanoma-classification
The dataset consists of images in :
- DIOCOM format
- JPEG format in JPEG directory
- TFRecord format in tfrecords directory
Additionally, there is a metadata comprising of train, test and submission file in CSV format.
The complete EDA of this dataset is available here.
In this project we used ResNeXt50 which is pretrained on Imagenet.
For training we resized all the images into 224X224
.
To convert all images into this fromat script is avaialable here.
We used 10 fold StratifiedKfold and created new file which has KFlods. The script is avialable here
We used train.py to train this model on our dataset.
Streamlit
folder contains python script named app.py with a Streamlit app built around the model trained.
and prediction.py contains predict function which takes an image and returns prediction.
You can experiment with following hyperparametes to see different results:
resize_images.py
: image size
create_folds.py
: No of Folds
train.py
:
- Used Model
- Augmentations
- Learning Rate
- Optimizer
- Use of Metadata
Python 3.7.6
cuda version 10.2.89
cuddn 7.6.5
python packages are detailed separately in requirements.txt
or environment.yml
.
You can install all necessary files using
pip install -r requirements.txt
or if you are using conda you can create virtual environment named pytorch
which has all required libraries:
conda env create -f environment.yml
Assumes that Kaggle Api is installed.
cd data
kaggle competitions download -c siim-isic-melanoma-classification
python resize_images.py
python create_folds.py
python train.py
cd Streamlit
streamlit run app.py
For this particluar problem, we will be evaluate model using area under the ROC curve. An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters:
An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives. The following figure shows a typical ROC curve.
source: https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc
AUC : 0.8990
This means for a given image our model is 89.9% sure about its prediction whether it is postive or negative.