# Forked!
Original : https://www.kaggle.com/saife245/melanoma-detail-analysis-eda-ip-augmentation-model

Another great EDA kernel: https://www.kaggle.com/parulpandey/melanoma-classification-eda-starter

For this kernel, I'm also studying PyTorch.
https://www.kaggle.com/zzy990106/pytorch-5-fold-efficientnet-baseline

There's also apparently a library for pre-trained models / architectures: for PyTorch Timm

AutoAugment
https://arxiv.org/abs/1805.09501

Pre-processed 224x224 images
https://www.kaggle.com/arroqc/siic-isic-224x224-images

How to spot melanoma

The medical community has developed two ways to spot the early signs of melanoma, the most dangerous type of skin cancer. A person can use the ABCDE method and the ugly duckling method.

    The ABCDE method:- Brown spots, marks, and moles are usually harmless. However, the first sign of melanoma can occur in what doctors call an atypical mole, or dysplastic nevi. To spot an atypical mole, check for the following:
        A: Asymmetry. If the two halves of a mole do not match, this can be an early indication of melanoma.
        B: Border. The edges of a harmless mole are even and smooth. If a mole has uneven edges, this can be an early sign of melanoma. The mole’s border may be scalloped or notched.
        C: Color.Harmless moles are a single shade, usually of brown. Melanoma can cause differentiation in shade, from tan, brown, or black to red, blue, or white.
        D: Diameter. Harmless moles tend to be smaller than dangerous ones, which are usually larger than a pencil’s eraser — around one-quarter of an inch, or 6 millimeters.
        E: Evolving. If a mole starts to change, or evolve, this can be a warning. Changes may involve shape, color, or elevation from the skin. Or, a mole may start to bleed, itch, or crust. 
         
![ABCDE](https://i.pinimg.com/originals/60/f6/d8/60f6d82a2edbc0c7e0e9df5308e654c3.jpg)


Melanoma stage grouping...

    Stage 0: This refers to melanoma in situ, which means melanoma cells are found only in the outer layer of skin or epidermis. This stage of melanoma is very unlikely to spread to other parts of the body.

    Stage I: The primary melanoma is still only in the skin and is very thin. Stage I is divided into 2 subgroups, IA or IB, depending on the thickness of the melanoma and whether a pathologist sees ulceration under a microscope.

    Stage II: Stage II melanoma is thicker than stage I melanoma, extending through the epidermis and further into the dermis, the dense inner layer of the skin. It has a higher chance of spreading. Stage II is divided into 3 subgroups—A, B, or C—depending on how thick the melanoma is and whether there is ulceration.
    Stage III: This stage describes melanoma that has spread locally or through the lymphatic system to a regional lymph node located near where the cancer started or to a skin site on the way to a lymph node, called “in-transit metastasis, satellite metastasis, or microsatellite disease.” The lymphatic system is part of the immune system and drains fluid from body tissues through a series of tubes or vessels. Stage III is divided into 4 subgroups—A, B, C, or D—depending on the size and number of lymph nodes involved with melanoma, whether the primary tumor has satellite or in-transit lesions, and if it appears ulcerated under a microscope.
    Stage IV: This stage describes melanoma that has spread through the bloodstream to other parts of the body, such as distant locations on the skin or soft tissue, distant lymph nodes, or other organs like the lung, liver, brain, bone, or gastrointestinal tract. Stage IV is further evaluated based on the location of distant metastasis:
        M1a: The cancer has only spread to distant skin and/or soft tissue sites.
        M1b: The cancer has spread to the lung.
        M1c: The cancer has spread to any other location that does not involve the central nervous system.
        M1d: The cancer has spread to the central nervous system, including the brain, spinal cord, and/or cerebrospinal fluid, or lining of the brain and/or spinal cord.

![Stages Melanoma](https://images.agoramedia.com/everydayhealth/gcms/Melanoma-Stages-722x406.jpg)

## What am I predicting?
* You are predicting a binary target for each image. 
* Your model should predict the probability (floating point) between 0.0 and 1.0 that the lesion in the image is malignant (the target). 
* In the training data, train.csv, the value 0 denotes benign, and 1 indicates malignant.

# Imports

In [None]:
import os
import gc
import json
import math
import cv2
import PIL
import re
import numpy as np
import pandas as pd
from PIL import Image
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
import matplotlib.pyplot as plt
#from sklearn.metrics import cohen_kappa_score, accuracy_score
import scipy
from tqdm import tqdm
%matplotlib inline
#from keras.preprocessing import image
import glob
import tensorflow.keras.applications.densenet as dense
from kaggle_datasets import KaggleDatasets
import seaborn as sns
sns.set_style('whitegrid')

import missingno as msno

from plotly.offline import iplot
import cufflinks
cufflinks.go_offline()
cufflinks.set_config_file(world_readable=True, theme='pearl')

In [None]:
tf.__version__

## Few Word about dataset...

### What should I expect the data format to be?
The images are provided in DICOM format. This can be accessed using commonly-available libraries like **pydicom**, and contains both image and metadata. It is a commonly used medical imaging data format.

Images are also provided in **JPEG** and **TFRecord** format (in the jpeg and tfrecords directories, respectively). Images in TFRecord format have been resized to a uniform 1024x1024.

Metadata is also provided outside of the DICOM format, in CSV files. See the Columns section for a description.

## Columns...
* **image_name** - unique identifier, points to filename of related DICOM image
* **patient_id** - unique patient identifier
* **sex** - the sex of the patient (when unknown, will be blank)
* **age_approx** - approximate patient age at time of imaging
* **anatom_site_general_challenge** - location of imaged site
* **diagnosis** - detailed diagnosis information (train only)
* **benign_malignant** - indicator of malignancy of imaged lesion
* **target** - binarized version of the target variable

In [None]:
train = pd.read_csv('/kaggle/input/siim-isic-melanoma-classification/train.csv')
test = pd.read_csv('/kaggle/input/siim-isic-melanoma-classification/test.csv')

print('Train: ', train.shape)
print("Test:", test.shape)

In [None]:
train.head()

In [None]:
test.head()

In [None]:
msno.matrix(train, );

## Non-Image EDA

In [None]:
vc = train.groupby("benign_malignant")["diagnosis"].value_counts().unstack()[train["diagnosis"].value_counts().sort_values().index]
display(vc)
vc.iplot(kind='bar', yTitle='Percentage', 
          linecolor='black', 
          opacity=0.7,
          theme='pearl',
          bargap=0.5,
          gridcolor='white',
          barmode = 'stack',
          title='Distribution of the Target column in the training set')

In [None]:
vc = train.groupby("benign_malignant")["sex"].value_counts(normalize=True).unstack()
vc.iplot(kind='bar', yTitle='Percentage', 
          linecolor='black', 
          opacity=0.7,
          theme='pearl',
          bargap=0.5,
          gridcolor='white',
          barmode = 'stack',
          title='Target vs Gender')

In [None]:
plt.figure(figsize=(12,5))

sns.distplot(train.loc[train['sex'] == 'female', 'age_approx'], label = 'Benign')

sns.distplot(train.loc[train['sex'] == 'male', 'age_approx'], label = 'Malignant')

scipy.stats.ttest_ind(train.loc[train['sex'] == 'female', 'age_approx'], train.loc[train['sex'] == 'male', 'age_approx'], nan_policy='omit')

In [None]:
vc = train.groupby("age_approx")["benign_malignant"].value_counts().unstack()
vc.iplot(kind='bar', yTitle='Percentage', 
          linecolor='black', 
          opacity=0.7,
          theme='pearl',
          bargap=0.2,
          gridcolor='white',
          barmode = 'stack',
          title='Age vs Gender')
vc = train.groupby("age_approx")["benign_malignant"].value_counts(normalize=True).unstack()
vc.iplot(kind='bar', yTitle='Percentage', 
          linecolor='black', 
          opacity=0.7,
          theme='pearl',
          bargap=0.2,
          gridcolor='white',
          barmode = 'stack',
          title='Age vs Gender, normalized')

In [None]:
plt.figure(figsize=(12,5))

sns.distplot(train.loc[train['target'] == 0, 'age_approx'], label = 'Benign')

sns.distplot(train.loc[train['target'] == 1, 'age_approx'], label = 'Malignant')

scipy.stats.ttest_ind(train.loc[train['target'] == 0, 'age_approx'], train.loc[train['target'] == 1, 'age_approx'], nan_policy='omit')

In [None]:
vc = train["diagnosis"].value_counts()[::-1]
vc[vc.index != "unknown"].plot.barh()

In [None]:
vc = train.groupby("anatom_site_general_challenge")["benign_malignant"].value_counts().unstack()
vc.iplot(kind='bar', yTitle='Percentage', 
          linecolor='black', 
          opacity=0.7,
          theme='pearl',
          bargap=0.2,
          gridcolor='white',
          barmode = 'stack',
          title='Target vs Gender')
vc = train.groupby("anatom_site_general_challenge")["benign_malignant"].value_counts(normalize=True).unstack()
vc.iplot(kind='bar', yTitle='Percentage', 
          linecolor='black', 
          opacity=0.7,
          theme='pearl',
          bargap=0.2,
          gridcolor='white',
          barmode = 'stack',
          title='Target vs Gender')

# Seeing images

In [None]:
def display_training_curves(training, validation, title, subplot):
  if subplot%10==1: # set up the subplots on the first call
    plt.subplots(figsize=(10,10), facecolor='#F0F0F0')
    plt.tight_layout()
  ax = plt.subplot(subplot)
  ax.set_facecolor('#F8F8F8')
  ax.plot(training)
  ax.plot(validation)
  ax.set_title('model '+ title)
  ax.set_ylabel(title)
  ax.set_xlabel('epoch')
  ax.legend(['train', 'valid.'])

def grid_display(list_of_images, no_of_columns=2, figsize=(15,15), title = False):
    num_images = len(list_of_images)
    no_of_rows = int(num_images / no_of_columns)
    fig, axes = plt.subplots(no_of_rows,no_of_columns, figsize=figsize)
    if no_of_rows == 1:
        list_axes = []
        list_axes.append(axes)
        axes = list_axes
    
    idx = 0
    idy = 0
    
    for i, img in enumerate(list_of_images):
        axes[idy][idx].imshow(img)
        axes[idy][idx].axis('off')
        if title:
            axes[idy][idx].set_title(title[i])
            
        if idx < no_of_columns - 1:
            idx+=1
        else:
            idx=0
            idy+=1
    fig.tight_layout()
    return fig

## Benign, Moles, Nevuus

In [None]:
image_list = train[train['target'] == 0].sample(16)['image_name']
image_all=[]
for image_id in image_list:
    image_file = f'/kaggle/input/siim-isic-melanoma-classification/jpeg/train/'+image_id+'.jpg' 
    img = np.array(Image.open(image_file))
    image_all.append(img)
#show_images(image_all, cols=1)
fig = grid_display(image_all, 4, (15,15), title = range(16))

## Melanoma

In [None]:
image_list = train[train['target'] == 1].sample(16)['image_name']
image_all=[]
for image_id in image_list:
    image_file = f'/kaggle/input/siim-isic-melanoma-classification/jpeg/train/'+image_id+'.jpg' 
    img = np.array(Image.open(image_file))
    image_all.append(img)
fig = grid_display(image_all, 4, (15,15), title = range(16))

## Melanoma, torso

In [None]:
image_list = train[(train['anatom_site_general_challenge'] == 'torso') & (train['target'] == 1)].sample(16)['image_name']
image_all=[]
for image_id in image_list:
    image_file = f'/kaggle/input/siim-isic-melanoma-classification/jpeg/train/'+image_id+'.jpg' 
    img = np.array(Image.open(image_file))
    image_all.append(img)
fig = grid_display(image_all, 4, (15,15), title = range(16))

## Melanoma, Lower extremities

In [None]:
image_list = train[(train['anatom_site_general_challenge'] == 'lower extremity') & (train["target"] == 1)].sample(16)['image_name']
image_all=[]
for image_id in image_list:
    image_file = f'/kaggle/input/siim-isic-melanoma-classification/jpeg/train/'+image_id+'.jpg' 
    img = np.array(Image.open(image_file))
    image_all.append(img)
fig = grid_display(image_all, 4, (15,15), title = range(16))

## Melanoma, Upper extremity.

In [None]:
image_list = train[(train['anatom_site_general_challenge'] == 'upper extremity') & (train["target"] == 1)].sample(16)['image_name']
image_all=[]
for image_id in image_list:
    image_file = f'/kaggle/input/siim-isic-melanoma-classification/jpeg/train/'+image_id+'.jpg' 
    img = np.array(Image.open(image_file))
    image_all.append(img)
fig = grid_display(image_all, 4, (15,15), title = range(16))

## Melanoma, head/neck

In [None]:
image_list = train[(train['anatom_site_general_challenge'] == 'head/neck') & (train["target"] == 1)].sample(16)['image_name']
image_all=[]
for image_id in image_list:
    image_file = f'/kaggle/input/siim-isic-melanoma-classification/jpeg/train/'+image_id+'.jpg' 
    img = np.array(Image.open(image_file))
    image_all.append(img)
fig = grid_display(image_all, 4, (15,15), title = range(16))

## Visualiza the skin cancer at Palms/soles...

In [None]:
image_list = train[(train['anatom_site_general_challenge'] == 'palms/soles') & (train["target"] == 1)].sample(4)['image_name']
image_all=[]
for image_id in image_list:
    image_file = f'/kaggle/input/siim-isic-melanoma-classification/jpeg/train/'+image_id+'.jpg' 
    img = np.array(Image.open(image_file))
    image_all.append(img)
fig = grid_display(image_all, 4, (15,15), title = range(16))

## Benign, seborrheic keratosis

In [None]:
image_list = train[train['diagnosis'] == 'seborrheic keratosis'].sample(16)['image_name']
image_all=[]
for image_id in image_list:
    image_file = f'/kaggle/input/siim-isic-melanoma-classification/jpeg/train/'+image_id+'.jpg' 
    img = np.array(Image.open(image_file))
    image_all.append(img)
fig = grid_display(image_all, 4, (15,15), title = range(16))

## Benign, Lentigo

In [None]:
image_list = train[train['diagnosis'] == 'lentigo NOS'].sample(16)['image_name']
image_all=[]
for image_id in image_list:
    image_file = f'/kaggle/input/siim-isic-melanoma-classification/jpeg/train/'+image_id+'.jpg' 
    img = np.array(Image.open(image_file))
    image_all.append(img)
fig = grid_display(image_all, 4, (15,15), title = range(16))

## Benign, lichenoid keratosis...

In [None]:
image_list = train[train['diagnosis'] == 'lichenoid keratosis'].sample(16)['image_name']
image_all=[]
for image_id in image_list:
    image_file = f'/kaggle/input/siim-isic-melanoma-classification/jpeg/train/'+image_id+'.jpg' 
    img = np.array(Image.open(image_file))
    image_all.append(img)
fig = grid_display(image_all, 4, (15,15), title = range(16))

## Benign, Solar lentigo

In [None]:
image_list = train[train['diagnosis'] == 'solar lentigo'].sample(4)['image_name']
image_all=[]
for image_id in image_list:
    image_file = f'/kaggle/input/siim-isic-melanoma-classification/jpeg/train/'+image_id+'.jpg' 
    img = np.array(Image.open(image_file))
    image_all.append(img)
fig = grid_display(image_all, 4, (15,15), title = range(4))

## Atypical melanocytic proliferation

In [None]:
image_list = train[train['diagnosis'] == 'atypical melanocytic proliferation'].sample(1)['image_name']
image_all=[]
for image_id in image_list:
    image_file = f'/kaggle/input/siim-isic-melanoma-classification/jpeg/train/'+image_id+'.jpg' 
    img = np.array(Image.open(image_file))

plt.imshow(img)
plt.axis('off');

## Visualiza the skin cancer cafe-au-lait macule...

In [None]:
image_list = train[train['diagnosis'] == 'cafe-au-lait macule'].sample(1)['image_name']
image_all=[]
for image_id in image_list:
    image_file = f'/kaggle/input/siim-isic-melanoma-classification/jpeg/train/'+image_id+'.jpg' 
    img = np.array(Image.open(image_file))
    image_all.append(img)
plt.imshow(img)
plt.axis('off');

## Skin cancer At different Age Group...

In [None]:
arr = [15.0,20.0,25.0,30.0,35.0,40.0,45.0,50.0,55.0,60.0,65.0,70.0,75.0,80.0,85.0,90.0]
image_all=[]
titles = ['At Age 15.0','At Age 20.0','At Age 25.0','At Age 30.0','At Age 35.0','At Age 40.0'
          ,'At Age 45.0','At Age 50.0','At Age 55.0','At Age 60.0','At Age 65.0','At Age 70.0'
          ,'At Age 75.0','At Age 80.0','At Age 85.0','At Age 90.0']
for i in arr:
    image_list = train[(train['age_approx'] == i) & (train["target"] == 1)].sample()['image_name']
    for image_id in image_list:
        image_file = f'/kaggle/input/siim-isic-melanoma-classification/jpeg/train/'+image_id+'.jpg' 
        img = np.array(Image.open(image_file))
        image_all.append(img)
fig = grid_display(image_all, 4, (15,15), title = titles)

# Is there a difference in histograms?

In [None]:
benign_images = train[train["target"] == 0].sample(10)["image_name"]
cancer_images = train[train["target"] == 1].sample(10)["image_name"]

benign_image_arr = []
cancer_image_arr = []

for image_id in benign_images:
    image_file = f'/kaggle/input/siim-isic-melanoma-classification/jpeg/train/'+image_id+'.jpg' 
    img = np.array(Image.open(image_file))
    benign_image_arr.append(img)
    
for image_id in cancer_images:
    image_file = f'/kaggle/input/siim-isic-melanoma-classification/jpeg/train/'+image_id+'.jpg' 
    img = np.array(Image.open(image_file))
    cancer_image_arr.append(img)

In [None]:
reds.mean()/255, greens.mean()/255, blues.mean()/255

In [None]:
reds = np.hstack([v[:, :, 0].ravel() for v in benign_image_arr])
greens = np.hstack([v[:, :, 1].ravel() for v in benign_image_arr])
blues = np.hstack([v[:, :, 2].ravel() for v in benign_image_arr])

plt.figure(figsize=(15, 8))
_ = plt.hist(reds, bins=256, color='red', alpha=0.5)
_ = plt.hist(greens, bins=256, color='green', alpha=0.5)
_ = plt.hist(blues, bins=256, color='blue', alpha=0.5)

_ = plt.xlabel('Intensity Value')
_ = plt.ylabel('Count')
_ = plt.legend(['Red_Channel', 'Green_Channel', 'Blue_Channel'])

print("R: {:.2f}, G: {:2f}, B: {:2f}".format(reds.mean(), greens.mean(), blues.mean()))

plt.show()

In [None]:
reds = np.hstack([v[:, :, 0].ravel() for v in cancer_image_arr])
greens = np.hstack([v[:, :, 1].ravel() for v in cancer_image_arr])
blues = np.hstack([v[:, :, 2].ravel() for v in cancer_image_arr])

plt.figure(figsize=(15, 8))
_ = plt.hist(reds, bins=256, color='red', alpha=0.5)
_ = plt.hist(greens, bins=256, color='green', alpha=0.5)
_ = plt.hist(blues, bins=256, color='blue', alpha=0.5)

_ = plt.xlabel('Intensity Value')
_ = plt.ylabel('Count')
_ = plt.legend(['Red_Channel', 'Green_Channel', 'Blue_Channel'])

print("R: {:.2f}, G: {:2f}, B: {:2f}".format(reds.mean(), greens.mean(), blues.mean()))

plt.show()

# Data Augmentations Showcase

In [None]:
# img = Image.open(train_path + 'ISIC_2637011.jpg')

# light = transforms.Compose([
#     transforms.RandomErasing()
#     ])


# fig, axes = plt.subplots(1,2, figsize=(12, 6))
# axes[0].imshow(img)
# axes[1].imshow(transforms.RandomErasing()(np.array(img)))
# # axes[1].imshow(Cutout(scale=(0.05, 0.007), value=(0, 0))(np.array(img)))

# axes[0].axis('off')
# axes[1].axis('off')