# Data Preparation


This stage includes all the operations that need to be done in order to move on to the model stage of the data set. Currently, our dataset is not ready for model training. For this, it is necessary to perform some data manipulation operations on the data.

In [46]:
import pandas as pd
import os

In [47]:
df = pd.read_csv("../data/HAM10000_metadata.csv")

In [48]:
df. head()

Unnamed: 0,lesion_id,image_id,dx,dx_type,age,sex,localization
0,HAM_0000118,ISIC_0027419,bkl,histo,80.0,male,scalp
1,HAM_0000118,ISIC_0025030,bkl,histo,80.0,male,scalp
2,HAM_0002730,ISIC_0026769,bkl,histo,80.0,male,scalp
3,HAM_0002730,ISIC_0025661,bkl,histo,80.0,male,scalp
4,HAM_0001466,ISIC_0031633,bkl,histo,75.0,male,ear


## Lesion Name Abbreviations

In the metadata dataset, lession names were abbreviated. Abbreviations will be replaced with their original names so that they can be expressed more easily in the analysis.

In [49]:
lessions = {
    'nv': 'Melanocytic nevi',
    'mel': 'Melanoma',
    'bkl': 'Benign keratosis-like lesions ',
    'bcc': 'Basal cell carcinoma',
    'akiec': 'Actinic keratoses',
    'vasc': 'Vascular lesions',
    'df': 'Dermatofibroma'
}
df['lession_type'] = df['dx'].map(lessions.get) 

## Target Column 

In the business understanding section, the business goal was determined as skin lesion classification. The model to be created will determine the type of lesion based on the image shown. Therefore, the target variable is the lesion type (dx). The target variable should be represented as a categorical variable.

In [None]:
df['lession_type_id'] = pd.Categorical(df['lession_type']).codes

## Adding Image Path

In order to make input the lesion pictures with their ground truth into model, the path of the pictures must be defined in the dataset.

In [50]:
image_data_path = "../data/images/raw/"
df["image_path"] = image_data_path + df["image_id"] +".jpg"

In [52]:
df.head()

Unnamed: 0,lesion_id,image_id,dx,dx_type,age,sex,localization,image_path,lession_type,lession_type_id
0,HAM_0000118,ISIC_0027419,bkl,histo,80.0,male,scalp,../data/images/raw/ISIC_0027419.jpg,Benign keratosis-like lesions,2
1,HAM_0000118,ISIC_0025030,bkl,histo,80.0,male,scalp,../data/images/raw/ISIC_0025030.jpg,Benign keratosis-like lesions,2
2,HAM_0002730,ISIC_0026769,bkl,histo,80.0,male,scalp,../data/images/raw/ISIC_0026769.jpg,Benign keratosis-like lesions,2
3,HAM_0002730,ISIC_0025661,bkl,histo,80.0,male,scalp,../data/images/raw/ISIC_0025661.jpg,Benign keratosis-like lesions,2
4,HAM_0001466,ISIC_0031633,bkl,histo,75.0,male,ear,../data/images/raw/ISIC_0031633.jpg,Benign keratosis-like lesions,2


## Export New Dataframe


Our dataset is ready for model training. The data containing the metadata will be saved as a different file in order to be able to use it in the next step.

In [53]:
df.to_csv("../data/df_skin.csv", index=False )