<div align="center">
<font size="6"> SIIM-ISIC Melanoma Classification  </font>  
</div> 


<div align="center">
<font size="4"> Identify melanoma in lesion images  </font>  
</div> 

<img align="left" src="https://raw.githubusercontent.com/kabartay/kaggle-siim-isic-melanoma-classification/master/materials/logo.png" data-canonical-src="https://raw.githubusercontent.com/kabartay/kaggle-siim-isic-melanoma-classification/master/materials/logo.png" width="280" height="280" />

Skin cancer is the most prevalent type of cancer. **Melanoma**, specifically, is responsible for **75%** of skin cancer deaths, despite being the least common skin cancer. The American Cancer Society estimates over 100,000 new melanoma cases will be diagnosed in 2020. It's also expected that almost 7,000 people will die from the disease. As with other cancers, early and accurate detection—potentially aided by data science—can make treatment more effective.

Currently, dermatologists evaluate every one of a patient's moles to identify outlier lesions or “ugly ducklings” that are most likely to be melanoma. Existing AI approaches have not adequately considered this clinical frame of reference. Dermatologists could enhance their diagnostic accuracy if detection algorithms take into account “contextual” images within the same patient to determine which images represent a melanoma. If successful, classifiers would be more accurate and could better support dermatological clinic work.

As the leading healthcare organization for informatics in medical imaging, the [Society for Imaging Informatics in Medicine (SIIM)](https://siim.org/)'s mission is to advance medical imaging informatics through education, research, and innovation in a multi-disciplinary community. SIIM is joined by the [International Skin Imaging Collaboration (ISIC)](https://www.isic-archive.com/), an international effort to improve melanoma diagnosis. The ISIC Archive contains the largest publicly available collection of quality-controlled dermoscopic images of skin lesions.

In this competition, you’ll identify melanoma in images of skin lesions. In particular, you’ll use images within the same patient and determine which are likely to represent a melanoma. Using patient-level contextual information may help the development of image analysis tools, which could better support clinical dermatologists.

Melanoma is a deadly disease, but if caught early, most melanomas can be cured with minor surgery. Image analysis tools that automate the diagnosis of melanoma will improve dermatologists' diagnostic accuracy. Better detection of melanoma has the opportunity to positively impact millions of people.

<img align="left" src="https://raw.githubusercontent.com/kabartay/kaggle-siim-isic-melanoma-classification/master/materials/melanoma.png" data-canonical-src="https://raw.githubusercontent.com/kabartay/kaggle-siim-isic-melanoma-classification/master/materials/melanoma.png" width="1200" height="450" />


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        #print(os.path.join(dirname, filename)) # otherwise too long
        continue

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import os
import json
from pathlib import Path

import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
data_path = Path('/kaggle/input/siim-isic-melanoma-classification/')
train_path = data_path / 'train'
test_path = data_path / 'test'
print("training_path", train_path)
print("test_path", test_path)

In [None]:
!ls /kaggle/input/siim-isic-melanoma-classification/

## Images

In [None]:
#!ls /kaggle/input/siim-isic-melanoma-classification/jpeg/train

In [None]:
#!ls /kaggle/input/siim-isic-melanoma-classification/jpeg/test

In [None]:
data_path = Path('/kaggle/input/siim-isic-melanoma-classification/')
im_train_path = data_path / 'jpeg' / 'train'
im_test_path = data_path / 'jpeg' / 'test'
print("train_path: ", im_train_path)
print("test_path:  ", im_test_path)

In [None]:
import tensorflow as tf
from tensorflow.python.keras.preprocessing.image import load_img, img_to_array

from keras import models, regularizers, layers, optimizers, losses, metrics
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils, to_categorical
from keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing import image

In [None]:
def image_show(im_num,im_folder,im_size):
    """
    MO: Show melanoma images.
    """
    im_ind = 'ISIC'
    im_name = '{}_{}'.format(im_ind,im_num)
    if im_folder=='train':
        im_dir = im_train_path
    elif im_folder=='test':
        im_dir = im_test_path
    im_path = str(im_dir)+'/'+str(im_name)+'.jpg'
    im_path
    
    #from tf.keras.preprocessing.image.load_img
    img = image.load_img(im_path, target_size=(im_size, im_size)) #target_size=(224, 224)
    imgplot = plt.imshow(img)
    print(im_ind,"Image Number:", im_num)
    plt.show()

In [None]:
plt.figure(figsize = (10,10))
image_show(5225336,'train',224)

In [None]:
plt.figure(figsize = (10,10))
image_show(5224960,'test',224)

In [None]:
!ls /kaggle/input/siim-isic-melanoma-classification/test/ISIC_5224960.dcm

In [None]:
#import dicom

import pydicom
from pydicom.data import get_testdata_files

print(__doc__)

PathDicom = '/kaggle/input/siim-isic-melanoma-classification/'
lstFilesDCM = []  # create an empty list
for dirName, subdirList, fileList in os.walk(PathDicom):
    for filename in fileList:
        if ".dcm" in filename.lower():  # check whether the file's DICOM
            lstFilesDCM.append(os.path.join(dirName,filename))

In [None]:
print(lstFilesDCM[0])

In [None]:
RefDs = pydicom.dcmread(lstFilesDCM[0])
RefDs

In [None]:
# Get ref file
RefDs = pydicom.dcmread(lstFilesDCM[0])

# Load dimensions based on the number of rows, columns, and slices (along the Z axis)
ConstPixelDims = (int(RefDs.Rows), int(RefDs.Columns), len(lstFilesDCM))
print(ConstPixelDims)

In [None]:
pat_name = RefDs.PatientName
display_name = pat_name.family_name + ", " + pat_name.given_name
print("Patient's name...:", display_name)
print("Patient id.......:", RefDs.PatientID)
print("Modality.........:", RefDs.Modality)
print("Study Date.......:", RefDs.StudyDate)

In [None]:
if 'PixelData' in RefDs:
    rows = int(RefDs.Rows)
    cols = int(RefDs.Columns)
    print("Image size.......: {rows:d} x {cols:d}, {size:d} bytes".format(rows=rows, cols=cols, size=len(RefDs.PixelData)))
    if 'PixelSpacing' in RefDs:
        print("Pixel spacing....:", RefDs.PixelSpacing) 

        
# use .get() if not sure the item exists, and want a default value if missing
print("Slice location...:", RefDs.get('SliceLocation', "(missing)"))

# plot the image using matplotlib
plt.figure(figsize = (10,10))
plt.imshow(RefDs.pixel_array, cmap=plt.cm.bone)
plt.show()

## Explore tables

In [None]:
train = pd.read_csv(data_path / 'train.csv')
test  = pd.read_csv(data_path / 'test.csv')
sub   = pd.read_csv(data_path / 'sample_submission.csv')

train.shape, test.shape, sub.shape

In [None]:
train.isna().sum()

In [None]:
train['sex'] = train['sex'].fillna('na')
train['age_approx'] = train['age_approx'].fillna(0)
train['anatom_site_general_challenge'] = train['anatom_site_general_challenge'].fillna('na')

In [None]:
train.isna().sum()

In [None]:
train.head(10)

In [None]:
test.isna().sum()

In [None]:
test['anatom_site_general_challenge'] = test['anatom_site_general_challenge'].fillna('na')

In [None]:
test.isna().sum()

In [None]:
test.head(10)

In [None]:
train['sex'].value_counts().plot(kind='bar')

In [None]:
test['sex'].value_counts().plot(kind='bar')

In [None]:
train['sex'].isna().sum()

In [None]:
train['age_approx'].value_counts().plot(kind='bar')

In [None]:
test['age_approx'].value_counts().plot(kind='bar')

In [None]:
train['diagnosis'].value_counts().plot(kind='bar')

## Melanoma is rare, <2%

In [None]:
train['diagnosis'].value_counts()

In [None]:
print('Diagnosis                             Percent\n-----------------------------------------------')
print((train['diagnosis'].value_counts() / train['diagnosis'].value_counts().sum() ) *100)

## In one plot and save

In [None]:
fig, axs = plt.subplots(4,2, figsize=(13,20))

# left train, right test

train['sex'].value_counts().plot(kind='bar', legend=True, ax=axs[0,0])
test['sex'].value_counts().plot(kind='bar', legend=True, ax=axs[0,1])

train['age_approx'].value_counts().plot(kind='bar', legend=True, ax=axs[1,0])
test['age_approx'].value_counts().plot(kind='bar', legend=True, ax=axs[1,1])

train['age_approx'].hist(bins=90, ax=axs[2,0])
test['age_approx'].hist(bins=90, ax=axs[2,1])
axs[2,0].set_xlabel('Age')
axs[2,1].set_xlabel('Age')

train['anatom_site_general_challenge'].value_counts().plot(kind='bar', legend=True, ax=axs[3,0])
test['anatom_site_general_challenge'].value_counts().plot(kind='bar', legend=True, ax=axs[3,1])


plt.savefig('data_sex_age_anatom.png',dpi=100)

plt.show()