# Skin AI
This notebook is a very simple example of training a skin ai to distinguish lesions. In this example we are distinguishing vascular lesions from melanocytic nevi. To run a block of code simply click into the box and press 'Shift'+'Enter'. The box below imports libraries we need to run the rest.

In [None]:
import glob
import os
import shutil
import random
from IPython.display import display, clear_output, Markdown

import cv2
import numpy as np
import pandas as pd
import torch
import torchvision
from matplotlib import pyplot as plt
from PIL import Image
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from torch import nn
from torchvision import transforms

from isic_utils import download_isic_images

# Download Data From ISIC
ISIC is a public dataset of skin lesions. We are downloading a small subset of vascular lesions and nevi.

In [None]:
if os.path.exists('./train_images'): shutil.rmtree('./train_images') 
if os.path.exists('./test_images'): shutil.rmtree('./test_images')

os.mkdir('./train_images')
os.mkdir('./train_images/nevus')
os.mkdir('./train_images/vascular_lesion')
os.mkdir('./test_images')
os.mkdir('./test_images/nevus')
os.mkdir('./test_images/vascular_lesion')

print('downloading unlabeled training images')
vasc_offset = download_isic_images('./train_images/', limit=25, dx='vascular lesion')
nev_offset = download_isic_images('./train_images/', limit=25, dx='nevus')
print('downloading vascular lesion test images')
vasc_offset = download_isic_images('test_images/vascular_lesion', limit=50, dx='vascular lesion', offset=vasc_offset)
print('downloading nevus test images')
nev_offset = download_isic_images('./test_images/nevus', limit=50, dx='nevus', offset=nev_offset)
print('downloading additional vascular lesion train images')
download_isic_images('train_images/vascular_lesion', limit=50, dx='vascular lesion', offset=vasc_offset)
print('downloading additional nevus train images')
download_isic_images('./train_images/nevus', limit=50, dx='nevus', offset=nev_offset)

# Label Data

## Lets learn the diffrence between vascular lesions and nevi
Run the code below, and it will walk through examples of the two types of lesions. The following code outputs the diagnosis, an image and a response box. In the response press 'Enter' to procceed or 'q' then 'Enter' to quit. The code will also end when you have seen all available images. When you think you understand the two types you can proceed to the next step to label some additional examples.

In [None]:
images = list(glob.glob('./train_images/vascular_lesion/*.jpg'))
images += list(glob.glob('./train_images/nevus/*.jpg'))
temp_df = pd.DataFrame(images, columns=['image_path'])
temp_df['label'] = ['vascular_lesion'] * 50 + ['nevus'] * 50
temp_df = temp_df.sample(frac=1)
for i, row in temp_df.iterrows():
    display(Markdown(f'# {row.label}'))
    display(Image.open(row.image_path).resize((512, 512)))
    resp = input()
    clear_output()
    if resp in {'q', 'quit', 'exit', 'break'}:
        break

## Your Turn
Label the data below by inputting either a 'v' for vascular lesion or a 'n' for nevus. 

In [None]:
images = list(glob.glob('./train_images/*.jpg'))
random.shuffle(images)
data_df = pd.DataFrame(images, columns=['image_path'])
labels = []
for i, row in data_df.iterrows():
    while True:
        display(Image.open(row.image_path).resize((512, 512)))
        resp = input()
        clear_output()
        if resp in {'vascular', 'vascular lesion', 'vasc', 'v'}:
            labels.append('vascular_lesion')
            break
        elif resp in {'nevus', 'nevi', 'nev', 'n'}:
            labels.append('nevus')
            break
        elif resp in {'q', 'quit', 'exit', 'break'}:
            break
        else:
            continue

# Turn Data Into Dataframe
The code bellow takes the labels you generated plus some additional data to create a more convinient form to handle the data. below also seperates the data into three "splits."
- 'train': the data used to train a classifier
- 'valid': short for validation, the data used to choose hyperparameters and pick the best model
- 'test': data that should not be used for development, and should reflect the real score of using the data

In [None]:
data_df = data_df.append(pd.DataFrame(glob.glob('./train_images/vascular_lesion/*.jpg'), columns=['image_path']))
data_df = data_df.append(pd.DataFrame(glob.glob('./train_images/nevus/*.jpg'), columns=['image_path']))
data_df['label'] = labels + ['vascular_lesion'] * 50 + ['nevus'] * 50
data_df['split'] = np.random.choice(['train', 'valid'], size=(len(data_df)))

test_images = list(glob.glob('./test_images/vascular_lesion/*.jpg'))
test_images += list(glob.glob('./test_images/nevus/*.jpg'))
test_df = pd.DataFrame(test_images, columns=['image_path'])
test_df['label'] = ['vascular_lesion'] * 50 + ['nevus'] * 50
test_df['split'] = ['test'] * 100
data_df = data_df.append(test_df)

# Create Handcrafted Feature Vectors
Prior to deep learning and often usefull even now, features had to be extracted from images without a learning algorithm. Below we are going to create histograms of the colors found in the lesion.

## Segmentation By Otsu Thresholding
Otsu thresholding works by minimizing intra class variance. This means if we split the pixels by the threshold the variation in the background, and forground pixels should be the minimum possible. A simple means of improving this for noisy images is to add bluring before doing the threshold. Run the below code a number of times to see examples of the thresholding.

In [None]:
def otsu_threshold_segmenter(img, blur_size=None, process_size=512):
    if process_size:
        img = img.resize((process_size, process_size))
    img = np.array(img).mean(axis=-1).astype(np.uint8)
    if blur_size:
        img = cv2.GaussianBlur(img, (blur_size, blur_size), 0)
    return 255 - cv2.threshold(img, 0, 255 , cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]

# Vascular lesion example
display(Markdown('## Vascular Lesion'))
vasc_path = data_df[(data_df.split == 'train') & (data_df.label == 'vascular_lesion')].sample(1).image_path.values[0]
img = Image.open(vasc_path)
display(img.resize((256, 256)))
display(Image.fromarray(otsu_threshold_segmenter(img, blur_size=5)).resize((256, 256)))

# Nevus Example
display(Markdown('## Nevus Lesion'))
nevi_path = data_df[(data_df.split == 'train') & (data_df.label == 'nevus')].sample(1).image_path.values[0]
img = Image.open(nevi_path)
display(img.resize((256, 256)))
display(Image.fromarray(otsu_threshold_segmenter(img, blur_size=5)).resize((256, 256)))

## Get color histograms of lesions
As you likely noticed during the labeling task, vascular lesions tend to be red-er then nevi which tend to be more brown. This is because vascular lesions involve blood vessels whereas nevi consist of melanocytes. The code below defines a function which counts the number of pixels within color ranges, to form a color histogram.

In [None]:
def get_color_histogram(img, mask=None, bins_per_channel=10):
    if mask is None:
        pixels_of_intrest = img.reshape(-1, 3)
    else:
        pixels_of_intrest = img[mask.astype(np.bool8)]
    histograms = np.concatenate([np.histogram(pixels_of_intrest[:, i], bins=bins_per_channel, range=(0, 255))[0] for i in range(3)])
    histograms = histograms.astype(np.float64) / float(histograms.max())
    return histograms

# Vascular lesion example
display(Markdown('## Vascular Lesion'))
vasc_path = data_df[(data_df.split == 'train') & (data_df.label == 'vascular_lesion')].sample(1).image_path.values[0]
img = Image.open(vasc_path)
img_resized = np.array(img.resize((256, 256)))
seg_resized = np.array(Image.fromarray(otsu_threshold_segmenter(img, blur_size=5)).resize((256, 256)))
histogram = get_color_histogram(img_resized, seg_resized)
edges = np.linspace(0, 255, 10)
display(img.resize((256, 256)))
plt.figure(figsize=(4, 4))
plt.bar(edges, histogram[0:10])
plt.title('Red')
plt.show()
plt.figure(figsize=(4, 4))
plt.bar(edges, histogram[10:20])
plt.title('Green')
plt.show()
plt.figure(figsize=(4, 4))
plt.bar(edges, histogram[20:30])
plt.title('Blue')
plt.show()

# Nevus Example
display(Markdown('## Nevus Lesion'))
nevi_path = data_df[(data_df.split == 'train') & (data_df.label == 'nevus')].sample(1).image_path.values[0]
img = Image.open(nevi_path)
img_resized = np.array(img.resize((256, 256)))
seg_resized = np.array(Image.fromarray(otsu_threshold_segmenter(img, blur_size=5)).resize((256, 256)))
histogram = get_color_histogram(img_resized, seg_resized)
display(img.resize((256, 256)))
edges = np.linspace(0, 255, 10)
plt.figure(figsize=(4, 4))
plt.bar(edges, histogram[0:10])
plt.title('Red')
plt.show()
plt.figure(figsize=(4, 4))
plt.bar(edges, histogram[10:20])
plt.title('Green')
plt.show()
plt.figure(figsize=(4, 4))
plt.bar(edges, histogram[20:30])
plt.title('Blue')
plt.show()

# Train Traditional Classifier
The code below trains an svm on the histograms. The first block adds the color features to the data dataframe while the second actually trains it.

In [None]:
features = []
for i, row in data_df.iterrows():
    img = Image.open(row.image_path)
    img_resized = np.array(img.resize((256, 256)))
    seg_resized = np.array(Image.fromarray(otsu_threshold_segmenter(img, blur_size=5)).resize((256, 256)))
    features.append(get_color_histogram(img_resized, seg_resized))
data_df['color_feature'] = features

In [None]:
x = list(data_df[data_df.split == 'train'].color_feature.values)
y = data_df[data_df.split == 'train'].label.values

val_x = list(data_df[data_df.split == 'valid'].color_feature.values)
val_y = data_df[data_df.split == 'valid'].label.values

test_x = list(data_df[data_df.split == 'test'].color_feature.values)
test_y = data_df[data_df.split == 'test'].label.values

model = SVC(kernel='linear').fit(x, y)
print('train_acc:', model.score(x, y))
print('valid_acc:', model.score(val_x, val_y))
print('test_acc:', model.score(test_x, test_y))

# ImageNet Pretrained Model As Feature Generator
The below code is a simplified example of using a deep net. A model pretrained to classify imagenet classes is used to generate features which are then used to train an svm as was done fore the handcrafted features.

In [None]:
feature_model = torchvision.models.resnet18(pretrained=True)
feature_model.fc = nn.Identity()
for param in feature_model.parameters():
    param.requires_grad = False

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

preprocessing = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    normalize,
])

In [None]:
features = []
for i, row in data_df.iterrows():
    img = Image.open(row.image_path)
    x = preprocessing(img).unsqueeze(0)
    features.append(np.array(feature_model(x)[0]))
data_df['deep_feature'] = features

In [None]:
x = list(data_df[data_df.split == 'train'].deep_feature.values)
y = data_df[data_df.split == 'train'].label.values

val_x = list(data_df[data_df.split == 'valid'].deep_feature.values)
val_y = data_df[data_df.split == 'valid'].label.values

test_x = list(data_df[data_df.split == 'test'].deep_feature.values)
test_y = data_df[data_df.split == 'test'].label.values

model = SVC(kernel='linear').fit(x, y)
print('train_acc:', model.score(x, y))
print('valid_acc:', model.score(val_x, val_y))
print('test_acc:', model.score(test_x, test_y))