![](https://i.ytimg.com/vi/CF24dVuQImU/maxresdefault_live.jpg)

#### This kernel is based on basic exploratory data analysis and augmentations.Please give me an upvote , if you like this notebook as this is my first exercise in deep learning.

#### References

* https://www.kaggle.com/nxrprime/siim-d3-eda-augmentations-and-resnext#seven
* https://www.tensorflow.org/tutorials/structured_data/imbalanced_data#define_the_model_and_metrics
* https://www.kaggle.com/parulpandey/melanoma-classification-eda-starter

# What is Melanoma?

* Melanoma is a type of skin cancer that occurs when pigment producing cells called melanocytes mutate and begin to divide uncontrollably.
* Most pigment cells develop in the skin. Melanomas can develop anywhere on the skin, but certain areas are more at risk than others. 
* In men, it is most likely to affect the chest and back. In women, the legs are the most common site. Other common sites of melanoma include the face.
* However, melanoma can also occur in the eyes and other parts of the body, including — on very rare occasions — the intestines.
* When this happens, it can be difficult to treat, and the outlook may be poor. 
* Risk factors for melanoma include overexposure to the sun, having fair skin, and a family history of melanoma, among others.

## Risk Factors

* Research into the exact causes of melanoma is ongoing.
* However, scientists do know that people with certain skin types are more prone to developing melanoma.

**The following factors may also contribute to an increased risk of skin cancer:**

1. A high density of freckles or a tendency to develop freckles following exposure to the sun.

2. A high number of moles or five or more atypical moles

3. The presence of actinic lentigines, also known as liver spots or age spots

4. Pale skin that does not tan easily and tends to burn.

5. Light eyes, red or light hair

6. High sun exposure, particularly if it produces blistering sunburn, and if sun exposure is intermittent rather than regular older age

7. family or personal history of melanoma


## ABCDE examination

* The ABCDE examination of moles is an important method for revealing potentially cancerous lesions. 
* It describes five simple characteristics to check for in a mole that can help a person either confirm or rule out melanoma:

![](https://discoverplasticsurgery.com/wp-content/uploads/2018/09/melanoma-risk-factors.jpg)

## Objective

* To identify melanoma in images of skin lesions. 
* In particular, you’ll use images within the same patient and determine which are likely to represent a melanoma. 
* Using patient-level contextual information may help the development of image analysis tools, which could better support clinical dermatologists.

## Dataset Info

* The dataset contains 33,126 dermoscopic training images of unique benign and malignant skin lesions from over 2,000 patients.
* Each image is associated with one of these individuals using a unique patient identifier.
* All malignant diagnoses have been confirmed via histopathology, and benign diagnoses have been confirmed using either expert agreement, longitudinal follow-up, or histopathology.

## Images

* The images are provided in three formats.

1) DICOM (Digital Imaging and Communications in Medicine) is the international standard to transmit, store, retrieve, print, process, and display medical imaging information.This can be accessed using libraries like pydicom.

2) JPEG

3) TFRecord

# Import Libraries

In [None]:
import os

from os import listdir  #returns a list that gives the names of the entries in the directory
from os.path import isfile,join

import pandas as pd
import numpy as np
from numpy import math
import seaborn as sns
sns.set(style='darkgrid')
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('fivethirtyeight')
plt.show()

#Plotly
import plotly.express as px
import plotly.graph_objs as go
from plotly.offline import iplot
import cufflinks
cufflinks.go_offline()
cufflinks.set_config_file(world_readable=True,theme='pearl')

#To read a dicom image , we can use pydicom
import pydicom

#Disable warnings
import warnings
warnings.filterwarnings("ignore")

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

from tensorflow.keras.applications import ResNet50
from keras.models import Sequential, Model,load_model
from keras.layers import Flatten,Dense

In [None]:
DEVICE = 'GPU'

if DEVICE == "GPU":
    print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

## List directories

In [None]:
#The simplest way to get a list of entries in a directory is to use os.listdir()
#Pass in the directory you need the entries
os.listdir('../input/siim-isic-melanoma-classification')

* We have two csv files,train.csv and test.csv
* Two dicom image files, train and test
* Tfrecords file
* A jpeg file with train and test files
* And sample submission file

# Csv files

In [None]:
train = pd.read_csv('../input/siim-isic-melanoma-classification/train.csv')
test = pd.read_csv('../input/siim-isic-melanoma-classification/test.csv')

In [None]:
train.head()

In [None]:
test.head()

In [None]:
train.shape

* 33126 Images and 8 columns

## Columns

* **image_name** - unique identifier, points to filename of related DICOM image
* **patient_id** - unique patient identifier
* **sex** - the sex of the patient (when unknown, will be blank)
* **age_approx** - approximate patient age at time of imaging
* **anatom_site_general_challenge** - location of imaged site
* **diagnosis** - detailed diagnosis information (train only)
* **benign_malignant** - indicator of malignancy of imaged lesion
* **target** - binarized version of the target variable

# Missing values

In [None]:
#First we create a list of missing values by each feature
missing = list(train.isna().sum())

#then we create a list of columns and their missing values as inner list to a separate list
lst= []
i=0
for col in train.columns:
    insert_lst = [col,missing[i]]
    lst.append(insert_lst)
    i+=1

#finally create a dataframe
missing_df = pd.DataFrame(data=lst,columns=['Column_Name','Missing_Values'])

fig = px.bar(missing_df,x='Missing_Values',y='Column_Name',orientation='h',
             text='Missing_Values',title='Missing values in train dataset')
fig.update_traces(textposition='outside')
fig.show()

#Same thing for test file
missing = list(test.isna().sum())

lst= []
i=0
for col in test.columns:
    insert_lst = [col,missing[i]]
    lst.append(insert_lst)
    i+=1

#finally create a dataframe
missing_df = pd.DataFrame(data=lst,columns=['Column_Name','Missing_Values'])

fig = px.bar(missing_df,x='Missing_Values',y='Column_Name',orientation='h',
             text='Missing_Values',title='Missing values in test dataset')
fig.update_traces(textposition='outside')
fig.show()

* Three columns in train dataset with missing values
* Anatom_site_general_challenge , sex and age_approx
* One test column with missing value , Anatom_site_general_challenge

#### Sex

* Now as sex has two unique values , male and female.It becomes difficult to impute missing values for this feature.
* One method is to use mode . i.e male in this feature.
* Another is to relate it with other variables.We'll try this method and see if we can find something. 

In [None]:
# We separate the non nan values and nan values in separate dataframe.

not_null_sex = train[train['sex'].notnull()].reset_index(drop=True)
nan_sex = train[train['sex'].isnull()].reset_index(drop=True)


In [None]:
not_null_sex.head()

In [None]:
fig = plt.figure(figsize=(15,6))

fig1 = sns.countplot(data=not_null_sex,hue='sex',x='anatom_site_general_challenge')

In [None]:
#Check the anatom site in missing values.

nan_sex['anatom_site_general_challenge'].unique()

* Two patients ['IP_9835712', 'IP_5205991'] (sex) and (age) is not given.All missing values are benign,diagnosis='unknown' and anatomy as seen above.
* We relate it with other feature but we don't see any significant difference in both the sex
* So we go with mode.

In [None]:
#Compute missing value with mode of sex

train['sex'].fillna(train['sex'].mode()[0],inplace=True)

### Age

In [None]:
train['age_approx'].value_counts()

In [None]:
train['age_approx'].median()

* The mode is 45 and median is 50.
* It's best to use median to fill the missing values.

In [None]:
#Compute missing values with median

train['age_approx'].fillna(train['age_approx'].median(),inplace=True)

In [None]:
train['age_approx'].isna().sum()

#### Anatom_site_general_challenge

* There are six anatomy sites in our data.
* If we see the mode , it is torso with 16845 values.
* There are 527 missing values in Anatom_site_general_challenge.
* So as there are more than 500 missing values in this feature , I will add another category of 'NK' i.e NotKnown as we can't predict what the anatomy site will be for the patient.
* Test dataset also has one column with missing values i.e anatom_site_general_challenge
* Now as we filled 'NK' inplace of missing values in training dataset , we'll do the same in test dataset

In [None]:
train['anatom_site_general_challenge'].value_counts()

In [None]:
train['anatom_site_general_challenge'].fillna('NK',inplace=True)
test['anatom_site_general_challenge'].fillna('NK',inplace=True)


### Let's check if there are any missing values left

In [None]:
print('Train : {}'.format(train.isna().sum().sum()))
print('Test : {}'.format(test.isna().sum().sum()))

# EDA on the above features

## First , the target feature

**Our target feature has two categories** 
* **`Benign`**

* **`Malignant`**

![](https://chcsga.org/wp-content/uploads/2019/05/d.jpg**)

<img src=https://chcsga.org/wp-content/uploads/2019/05/d.jpg width="500">

In [None]:
fig=plt.figure(figsize=(15,8))

labels = 'Benign','Malignant'

benign = train[train['benign_malignant']=='benign']
malignant = train[train['benign_malignant']=='malignant']
sizes = [len(benign),len(malignant)]

colors= ['lightskyblue','red']
#Plot
plt.pie(sizes,labels=labels,colors=colors,autopct='%1.1f%%',shadow=True,startangle=140)

plt.axis('equal');


* We have more benign cases than malignant.
* About 98.2% are benign cases and only 1.8% malignant cases are there in train dataset.
* We can clearly see , there is imbalance in class data.
* This we need to keep in mind while model building.

### How many patients are there in the dataset.

In [None]:
print("There are {} number of patients in our dataset.".format(train['patient_id'].nunique()))
print("And there are total {} dicom images in the same dataset.".format(train['image_name'].nunique()))

In [None]:
# We groupby patient id and see the number of images wrt to each patient

x = train.groupby(['patient_id'],as_index=False)['image_name'].count()
x.sort_values(by="image_name",ascending=False)

#### So the maximum number of images for a patient is 115 and least is 2

### Sex

In [None]:
x = train.groupby(['sex'],as_index=False)['benign_malignant'].count()
x = x.set_index('sex')
x

In [None]:
sns.countplot(data=train,x='sex',hue='benign_malignant');

In [None]:
# In the test dataset

sns.countplot(data=test,x='sex');

* In both train and test datasets , males are more than females.
* If we relate it to target in the train dataset , then in both gender , there are more number of benign cases.

## Age

<img src=https://www.cancerresearchuk.org/sites/default/files/cancer-stats/cases_crude_mf_allcancer_i17/cases_crude_mf_allcancer_i17.png width="1000" height="700">

* The risk of melanoma increases as people age.
* The average age of people when it is diagnosed is 65.

In [None]:
def create_dist(df,title):
    fig = plt.figure(figsize=(15,6))

    x= df["age_approx"].value_counts(normalize=True).to_frame()
    x = x.reset_index()
    ax = sns.barplot(data=x,y='age_approx',x='index')
    ax.set(xlabel='Age', ylabel='Percentage')
    ax.set(title=title);

In [None]:
create_dist(train,"Age distribution in train dataset")

In [None]:
create_dist(test,"Age distribution in test dataset")

In [None]:
fig = plt.figure(figsize=(15,6))

ax = sns.countplot(data=train,x='age_approx',hue='benign_malignant');
ax.set(title='Age vs Target');

* In the train dataset, Age follows a gaussian distribution and in test it's not the same.
* But in both datasets, we have more number of middle aged patients.
* So, till age 40 , there are no malignant cases in train dataset.And from age 45 to 75 , there are malignant cases.

## Anatomy sites

In [None]:
fig = px.histogram(train,y='anatom_site_general_challenge',height=500,width=800,color_discrete_sequence=['indianred'])
fig.show()

fig = px.histogram(train,x='anatom_site_general_challenge',color='benign_malignant',barmode='group',height=500,width=800)
fig.show()

* Most of the cases in the dataset are on torso area and then on extremities ( upper and lower)
* There are few cases on palms,soles,oral and genitals.
* Only four locations in the body are having malignant cases , although the number is less (torso,extremity and head/neck)

In [None]:
fig = px.histogram(test,y='anatom_site_general_challenge',height=500,width=800,color_discrete_sequence=['indianred'],title='Similar case in test dataset too')
fig.show()

## Skin lesions

In [None]:
fig = px.histogram(train,y='diagnosis',height=500,width=800,color_discrete_sequence=['goldenrod'],title='Diagnoses skin lesions')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()

In [None]:
fig = px.histogram(train,x='diagnosis',color='benign_malignant',barmode='group',height=500,width=800)
fig.show()

So we have 9 different skin lesions out of which all are not cancerous.

* **Nevus** :- A common pigmented skin lesion, usually developing during adulthood.In most cases, a nevus is benign and doesn't require treatment. Rarely, they turn into melanoma or other skin cancers. A nevus that changes shape, grows bigger or darkens should be evaluated for removal.

* **Melanoma** :- We already know what this is.

* **Seborrheic keratosis** :- A non-cancerous skin condition that appears as a waxy brown, black or tan growth.A seborrhoeic keratosis is one of the most common non-cancerous skin growths in older adults.

* **Lentigo NOS** :- A typen of skin cancer that appears on your trunk, arms, and legs. Lentigo often starts at birth or during childhood. The spots can go away in time.

* **Lichenoid keratosis** :- Lichenoid keratosis is a skin condition that typically occurs as a single, small, raised plaque, thickened area, or papule.This condition is harmless. However, in some cases lichenoid keratosis can be mistaken for other kinds of skin conditions, including skin cancers.

* **Solar lentigo** :- Solar lentigo is caused by exposure to ultraviolet radiation from the sun. This type is common in people over age 40, but younger people can get it, too. It happens when UV radiation causes pigmented cells called melanocytes in the skin to multiply. Solar lentigo appears on sun-exposed areas of the body, like the face, hands, shoulders, and arms. The spots may grow over time.

* **Cafe-au-lait macule** :- A café-au-lait macule is a common birthmark, presenting as a hyperpigmented skin patch with a sharp border and diameter of > 0.5 cm.

* **Atypical melanocytic proliferation** :- Atypical Melanocytic lesions are irregular moles and skin spots that require further examination. The five visual characteristics used to identify an atypical melanocytic lesion are the same as the characteristics used to identify signs of invasive melanoma.

# Let's look at some images

In [None]:
#Create a separate images folder
train_images_dir = '../input/siim-isic-melanoma-classification/train/'
train_images = listdir(train_images_dir)

test_images_dir = '../input/siim-isic-melanoma-classification/test/'
test_images = listdir(test_images_dir)

In [None]:
#Define a function to plot randomly sampled images using pydicom

def plot_images(df):
    fig = plt.figure(figsize=(15,6))

    for i in range(1,11):
        image = df['image_name'][i]
        ds = pydicom.dcmread(train_images_dir+image+'.dcm')
        fig.add_subplot(2,5,i)
        plt.imshow(ds.pixel_array)
    

In [None]:
#We sample 11 rows from train dataset
random = train.sample(n=11)
random = random.reset_index(drop=True)

#Plot the images
plot_images(random)

* We can see the images are of different sizes ,with different lighting conditions , different body parts.
* These all things need to be considered for model building.
* We need to perform scaling , resizing and some data augmentation techniques.

#### As we are predicting benign and malignant cases, let's look at these images.

## Benign images

In [None]:
#Similary , we sample random benign images
random = train[train['benign_malignant']=='benign'].sample(n=11)
random = random.reset_index(drop=True)

plot_images(random)

* In benign , we can see that it is concentrated and not spread out like me.
* Also , the diameter is less.

### Malignant images

In [None]:
#Similary , we sample random malignant images
random = train[train['benign_malignant']=='malignant'].sample(n=11)
random = random.reset_index(drop=True)

plot_images(random)

* We can see the irregularities and change in shapes of the above images.
* Also see the diameter and assymmetry.
* Although, it is difficult to classify just looking at the images.

Skin lesions grow on various parts of the body.
In the anatomy_site feature,skin lesions on six location has been given.
Let's look at each part and study the images.

We have six locations where image is taken,we can study benign and malignant images in each of these sites

In [None]:
#define a function for plotting anatomy sites

def plot_anatomy(target,anatomy_site):
    anatomy = train[train['anatom_site_general_challenge']==anatomy_site]

    fig = plt.figure(figsize=(15,6))
    for i in range(0,4):
        image = anatomy[anatomy['benign_malignant']==target].reset_index(drop=True)['image_name'][i]
        ds = pydicom.dcmread(train_images_dir+image+'.dcm')
        fig.add_subplot(2,4,i+1)
        plt.imshow(ds.pixel_array)
        plt.title(target)
    plt.suptitle(anatomy_site)

In [None]:
plot_anatomy('benign','head/neck')
plot_anatomy('malignant','head/neck')

In [None]:
plot_anatomy('benign','upper extremity')
plot_anatomy('malignant','upper extremity')

In [None]:
plot_anatomy('benign','lower extremity')
plot_anatomy('malignant','lower extremity')

In [None]:
plot_anatomy('benign','torso')
plot_anatomy('malignant','torso')

In [None]:
plot_anatomy('benign','palms/soles')
plot_anatomy('malignant','palms/soles')

In [None]:
plot_anatomy('benign','oral/genital')
plot_anatomy('malignant','oral/genital')

* Malignant images have some irregularities and also change in shapes and diameter.
* Difference can be clearly seen in torso,palms,soles,genital and extremities sites.


## Different diagnosis of Skin lesions

![](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Pie_chart_of_incidence_and_malignancy_of_pigmented_skin_lesions.png/800px-Pie_chart_of_incidence_and_malignancy_of_pigmented_skin_lesions.png)

* By looking at the above image, we se that Nevus,Keratosis,lentigo .These all are non=cancerous.
* Melanoma is seen in red colour and malignant.
* We'll study some of these lesions below for better understanding by looking at the images.

In [None]:
def plot_diagnosis(skin_lesion):
    fig = plt.figure(figsize=(12,6))

    for i in range(0,6):
        image = train[train['diagnosis']==skin_lesion].reset_index(drop=True)['image_name'][i]
        ds = pydicom.dcmread(train_images_dir+image+'.dcm')
        fig.add_subplot(2,3,i+1)
        plt.imshow(ds.pixel_array)
    plt.suptitle(skin_lesion.upper())

In [None]:
plot_diagnosis('nevus')

* From nevus , we see that the moles are concentrated and not spread out .Showing no signs of malignant.

In [None]:
plot_diagnosis('melanoma')

* Melanoma as we all know is dangerous .The skin lesion is spread out , we can see the colour , the shape and irregularity.

In [None]:
plot_diagnosis('seborrheic keratosis')

* Seborrheic keratosis is a noncancerous condition that can look a lot like melanoma.
* The growths look waxy as if they are painted onto the body.
* These do not typically cause symptons, but some people dislike the way they look.

In [None]:
plot_diagnosis('lentigo NOS')

In [None]:
plot_diagnosis('lichenoid keratosis')

* It looks like scaly, dry patches on the skin.
*  Almost 90 percent of people with lichenoid keratosis will have just one lesion or spot on the skin.

In [None]:
plot_diagnosis('solar lentigo')

* Solar lentigo appears on sun-exposed areas of the body, like the face, hands, shoulders, and arms. 
* The spots may grow over time. Solar lentigines are sometimes called liver spots or age spots.

#### There is only one image for ('cafe-au-lait macule') and ('atypical melanocytic proliferation')

In [None]:
fig = plt.figure(figsize=(10,6))

image = train[train['diagnosis']=='cafe-au-lait macule'].reset_index(drop=True)['image_name'][0]
ds = pydicom.dcmread(train_images_dir+image+'.dcm')
fig.add_subplot(1,2,1)
plt.imshow(ds.pixel_array)
plt.title('cafe-au-lait macule'.upper())

image = train[train['diagnosis']=='atypical melanocytic proliferation'].reset_index(drop=True)['image_name'][0]
ds = pydicom.dcmread(train_images_dir+image+'.dcm')
fig.add_subplot(1,2,2)
plt.imshow(ds.pixel_array)
plt.title('atypical melanocytic proliferation'.upper());


This is all for EDA right now.The following model building is commented for now and I'll be soon updating it.

# Model Building

## Using RepeatedKFold

* As our target data is highly imbalanced i.e 98.2% benign and 1.8% malignant.
* We can use cross validation here for imbalanced data classification.
* This ensures that the proportion of benign to malignant samples found in the original distribution is respected in all the folds.

In [None]:
#Import
from sklearn.model_selection import RepeatedKFold


In [None]:
def load_data_kfold(k):
    #X = train['image_name']
    #y = train['target']
    
    #X_train,X_val = tts(train_x, test_size=0.2, random_state=1234)

    #y_train = np.array(y_train)
    train_x = train[['image_name','target']]
    train_x['image_name'] = train_x['image_name'].apply(lambda x: x + '.jpg')
    folds = list(RepeatedKFold(n_splits=k, n_repeats=1, random_state=0).split(train_x))
    
    return folds,train_x

k = 3
folds,train_x = load_data_kfold(k)



In [None]:
folds

## ResNet50 Model

In [None]:
'''METRICS = [
      tf.keras.metrics.TruePositives(name='tp'),
      tf.keras.metrics.FalsePositives(name='fp'),
      tf.keras.metrics.TrueNegatives(name='tn'),
      tf.keras.metrics.FalseNegatives(name='fn'), 
      tf.keras.metrics.BinaryAccuracy(name='accuracy'),
      tf.keras.metrics.Precision(name='precision'),
      tf.keras.metrics.Recall(name='recall'),
      tf.keras.metrics.AUC(name='auc'),
]'''

def get_model():
    model =ResNet50(weights='imagenet',include_top=False,input_shape=(224,224,3))

    for layer in model.layers:
        layer.trainable = False 

    x=Flatten()(model.output)
    output=Dense(1,activation='softmax')(x)

    model = Model(model.input,output)
    
    model.compile(
    'Adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'],
    )
    
    return model
    

# Model Summary

In [None]:
model = get_model()
model.summary()



## Train model on each fold

### Initial parameters

In [None]:
class Config:
    BATCH_SIZE = 64
    EPOCHS = 10
    HEIGHT = 224
    WIDTH = 224

In [None]:
for j, (train_idx, val_idx) in enumerate(folds):
    
    print('\nFold ',j)
    print('///////////////////////////////////')
    X_train_cv = train_x.iloc[train_idx]
    #y_train_cv = y_train[train_idx]
    X_valid_cv = train_x.iloc[val_idx]
    #y_valid_cv= y_train[val_idx]
    
    #name_weights = "final_model_fold" + str(j) + "_weights.h5"
    #callbacks = get_callbacks(name_weights = name_weights, patience_lr=10)

    train_datagen=tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255, 
                         rotation_range=360,
                         horizontal_flip=True,
                         vertical_flip=True)
    
    train_generator=train_datagen.flow_from_dataframe(
        dataframe=X_train_cv,
        directory='../input/siim-isic-melanoma-classification/jpeg/train/',
        x_col="image_name",
        y_col="target",
        class_mode="raw",
        batch_size=Config.BATCH_SIZE,
        target_size=(Config.HEIGHT, Config.WIDTH))

    validation_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

    valid_generator=validation_datagen.flow_from_dataframe(
        dataframe=X_valid_cv,
        directory='../input/siim-isic-melanoma-classification/jpeg/train/',
        x_col="image_name",
        y_col="target",
        class_mode="raw", 
        batch_size=Config.BATCH_SIZE,   
        target_size=(Config.HEIGHT, Config.WIDTH))
    
    model = get_model()
    
    TRAINING_SIZE = len(train_generator)
    VALIDATION_SIZE = len(valid_generator)
    BATCH_SIZE = 64

    compute_steps_per_epoch = lambda x: int(math.ceil(1. * x / BATCH_SIZE))
    steps_per_epoch = compute_steps_per_epoch(TRAINING_SIZE)
    validation_steps = compute_steps_per_epoch(VALIDATION_SIZE)
    
    history = model.fit_generator(generator=train_generator,
                                        steps_per_epoch=steps_per_epoch,
                                        validation_data=valid_generator,
                                        validation_steps=validation_steps,
                                        epochs=10,
                                        verbose=1)
    
    #print(model.evaluate(X_valid_cv['image_name'], X_valid_cv['target']))



* We have got a pretty good validation accuracy of 96% after 3 folds.
* I don't understand why the loss value is nan , I'll try to correct it in later update.
* Next step is prediction on test dataset.

# Evaluation on test dataset

In [None]:
test_x = test[['image_name']]

test_x['image_name'] = test_x['image_name'].apply(lambda x: x + '.jpg')

In [None]:
test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

test_generator = test_datagen.flow_from_dataframe(  
        dataframe=test_x,
        directory = '../input/siim-isic-melanoma-classification/jpeg/test/',
        x_col="image_name",
        batch_size=1,
        class_mode=None,
        shuffle=False,
        target_size=(Config.HEIGHT, Config.WIDTH),
        seed=0)




In [None]:
preds = model.predict_generator(test_generator,verbose=1)

In [None]:
predicted_class_indices = np.argmax(preds, axis = 1)

In [None]:
predicted_class_indices

In [None]:
len(preds)

In [None]:
len(predicted_class_indices)

# Creating Submission file

In [None]:
sub = pd.read_csv('/kaggle/input/siim-isic-melanoma-classification/sample_submission.csv')
sub

In [None]:
sub['target'] = predicted_class_indices

In [None]:
sub.to_csv('submission.csv', index=False)

### Things to do

* Data Augmentations
* Training model with class weights


## Please give an upvote if you like this notebook :)

![](https://media0.giphy.com/media/wIVA0zh5pt0G5YtcAL/source.gif)