## Classify skin lesions ##

**null hypothesis:** There is no difference between dermatoscopic images of pigmented skin lesions between skin cancer diagnostic categories.

**alternative hypothesis:** There is a difference between dermatoscopic images of pigmented skin lesions between skin cancer diagnostic categories.

**goals and success:** Correctly classify dermatoscoptic images of pigmented skin lesions into a skin cancer diagnostic category at a probability higher than chance. There are 7 categories, so success metric would need to be 15% or greater.

**risks or limitations:** Requires enough sample data in each diagnostic category. Also required enough similarities of images within diagnostic category as well as distinctions between diagnostic categories.

Download HAM10000 dataset (multi-source collection of dermatoscopic images of common pigmented skin lesions) with metadata in csv file

<img src="img_3.png" alt="pre-processesing" title="pre-processesing"/>

<img src="dx_samples_1.png" alt="Diagnostic category samples" title="Diagnostic category samples"/>

<img src="dx_samples_2.png" alt="Diagnostic category samples" title="Diagnostic category samples"/>

In [None]:
from glob import glob
import os 
import shutil
import pandas as pd

ham_df = pd.read_csv("./data/HAM10000_metadata.csv")

# Credit to Kevin Mader for this code
base_skin_dir = os.path.join(".", "data")

imageid_path_dict = {os.path.splitext(os.path.basename(x))[0]: x
                     for x in glob(os.path.join(base_skin_dir, "*", "*.jpg"))}

ham_df['path'] = ham_df['image_id'].map(imageid_path_dict.get)

# Placing label (diagnosis) into the name of image file
for dx, filename in zip(ham_df['dx'], ham_df['path']):
    path = filename[0:30]
    image_id = filename[-12:]
    os.rename(filename, path+dx+image_id)
    
# Path where the image files are currently stored
path_1 = "./data/HAM10000_images_part_1"
path_2 = "./data/HAM10000_images_part_2"

# Create a new directory to store all images together
# os.mkdir("./data/images/")

# Function to copy files to this image directory
def copy_files(path):
    src_files = os.listdir(path)
    for filename in src_files:
        full_filename = os.path.join(path, filename)
        if (os.path.isfile(full_filename)):
            shutil.copy(full_filename, "./data/images/"+filename)
   
copy_files(path_1)
copy_files(path_2)

# Create a new directory to store some images
path_img = "./data/images"
os.mkdir("./data/test_images/")

# Function to move files to this test directory
def move_files(dx, repeat):
    src_files=os.listdir(path_img)
    i = 0
    dx = [filename for filename in src_files if dx in filename]
    for i in range(repeat):
        filename = dx[i]
        full_filename = os.path.join(path_img, filename)
        shutil.move(full_filename, "./data/test_images/"+filename)

# Set aside these images for testing at the end
move_files("akiec", 3)
move_files("bcc", 2)
move_files("bkl", 2)
move_files("df", 2)
move_files("mel", 2)
move_files("nv", 2)
move_files("vasc", 2)

### Data augmentation###
<img src="akiec_augmentations.png" alt="Actinic keratosis augmentations" title="Actinic keratosis augmentations"/>

<img src="bkl_augmentations.png" alt="Benign keratosis augmentations" title="Benign keratosis augmentations"/>

<img src="mel_augmentations.png" alt="Melanoma augmentations" title="Melanoma augmentations"/>

### Batch images###
<img src="example_batch.png" alt="Example batch" title="Example batch"/>

<img src="example_batch2.png" alt="Example batch" title="Example batch"/>

### Deep learning###

1. Create convolutional neural network with ResNet50 architecture
2. Freeze, train last layers (accuracy 84.3%)
3. Unfreeze, train whole model (accuracy 87.6%)
4. Progressive resizing, freeze, train last layers (accuracy 91.8%)
5. Unfreeze, train whole model (accuracy 93.1%)

### Most confused###
<img src="pred_act_loss_prob.png" alt="Prediction Actual Loss Probability" title="Prediction Actual Loss Probability"/>

<img src="confusion_matrix.png" alt="Confusion Matrix" title="Confusion Matrix"/>

### Last steps###

- ISIC 2018 Conference and competition - 
https://submission.challenge.isic-archive.com/#phase/5bee4312c5eaea4f24b5ec0c

- Deployed model - 
https://skintest.onrender.com/