# Data cleaning

**Import required libraries**

In [None]:
# import required libraries
import pandas as pd
import shutil
import os
from PIL import Image
import imagehash 
import re
import random

**Loading the scrapped data from DermNet.**

In [None]:
# load and preview dataset
image_df = pd.read_csv('Data/data1-294.csv')
print(image_df.shape)
image_df.head()

(13992, 2)


Unnamed: 0,skin_disorder_name,images
0,acne affecting the back images,https://dermnetnz.org/assets/Uploads/acne/acne...
1,acne affecting the back images,https://dermnetnz.org/assets/Uploads/acne/acne...
2,acne affecting the back images,https://dermnetnz.org/assets/Uploads/acne/acne...
3,acne affecting the back images,https://dermnetnz.org/assets/Uploads/acne/acne...
4,acne affecting the back images,https://dermnetnz.org/assets/Uploads/acne/acne...


## **<u>Acne</u>**

**Meaning**<br>
Acne is a common skin condition that occurs when hair follicles become clogged with oil and dead skin cells. This leads to the formation of pimples, blackheads, whiteheads, and sometimes deeper cysts. Acne usually appears on the face, neck, chest, back, and shoulders, and can affect people of all ages, although it is most common during puberty.<br>

**Causes**<br>
The causes of acne are multifactorial and can include hormonal imbalances, genetics, stress, certain medications, and an overproduction of sebum, the oily substance that lubricates the skin. Certain factors such as diet and hygiene practices have also been implicated in the development of acne, although the evidence for these is less clear.<br>

**Symptoms**<br>
The symptoms of acne can vary depending on the severity of the condition. Mild acne may only present with a few blackheads or whiteheads, while moderate acne can involve a combination of pimples, blackheads, and whiteheads. Severe acne may include deep, painful cysts that can lead to scarring. Acne can also have a significant impact on a person's self-esteem and mental health, particularly if it is severe or persistent.

**Treatment**<br>
Treatment options for acne depend on the severity of the condition. Mild acne can often be managed with over-the-counter topical treatments that contain benzoyl peroxide or salicylic acid. These products work by reducing the amount of oil on the skin and unclogging pores. More severe acne may require prescription medications, such as topical retinoids or oral antibiotics, which can help to reduce inflammation and kill the bacteria that cause acne. In cases of severe, persistent acne, isotretinoin, a powerful oral medication, may be prescribed. Additionally, lifestyle modifications such as maintaining good hygiene practices, avoiding certain foods, and managing stress can also be helpful in managing acne.


### **Cleaning Acne images**

**i. Moving acne images in the Images folder to their own folder**

In [None]:
# Labels representing acne in DermNet's scrapped data
acne_labels = list(image_df[image_df['skin_disorder_name'].str.contains('acne')]['skin_disorder_name'].unique())

# removing acne labels whose images will not be used because there are not clear
acne_labels.remove('infantile acne images')
acne_labels.remove('steroid acne images')

acne_labels

['acne affecting the back images',
 'acne affecting the face images',
 'acne and other follicular disorder images',
 'facial acne images']

In [None]:
# Getting the acne images file names
original_acne_img = [image_name for image_name in os.listdir('Images/') \
                     if ('acne affecting the back images' in image_name) |\
                        ('acne affecting the face images' in image_name) |\
                        ('acne and other follicular disorder images' in image_name) |\
                        ('facial acne images' in image_name) 
                        ] 

# Confirming the number of acne images before any cleaning
print('There are', len(original_acne_img),'acne images')
original_acne_img[:5]

There are 702 acne images


['acne affecting the back images0.jpg',
 'acne affecting the back images1.jpg',
 'acne affecting the back images10.jpg',
 'acne affecting the back images11.jpg',
 'acne affecting the back images12.jpg']

In [None]:
# Creating a new folder with just acne images to make cleaning easier
folder_name = 'cleaned_images/acne_images/'


# Note📝: For reproducibility of the code, this step is important.
         # If the folder is not dropped before an error will occur if you rerun this cell
         
# Checking if the folder exists and deleting it if it exists        
if os.path.exists(folder_name):
    # deleting the folder and its contents
    shutil.rmtree(folder_name)

# create the new folder
os.mkdir(folder_name)

# Moving the images into that folder
for img in original_acne_img:
    origin = os.path.join('Images/', img)
    destination = os.path.join(folder_name, img)
    shutil.copy(origin, destination)
    
# Confirming that the number of acne images after moving them to a separate folder is still 702
acne_img = [image_name for image_name in os.listdir('cleaned_images/acne_images/')] 
print('There are', len(acne_img),'acne images.')

There are 702 acne images.


**ii. Combining the images into one folder**

In [None]:
# extra acne images
extra_acne = [image_name for image_name in os.listdir('extra_images/extra_acne_images')]
extra_acne[:5]

['07Acne081101.jpg',
 '07Acne0811011 - Copy.jpg',
 '07Acne0811011.jpg',
 '07AcnePittedScars.jpg',
 '07AcnePittedScars1 - Copy.jpg']

In [None]:
# moving the extra images into the acne folder
for img in extra_acne:
    origin = os.path.join('extra_images/extra_acne_images/', img)
    destination = os.path.join('cleaned_images/acne_images/', img)
    shutil.copy(origin, destination)

# Confirming that the total acne images is 1427 before any cleaning
acne_img = [image_name for image_name in os.listdir('cleaned_images/acne_images/')] 
print('There are a total of', len(acne_img),'acne images.')

There are a total of 1427 acne images.


**iii. Removing duplicate images from the folder**

In [None]:
# Function for removing duplicated images.
def drop_duplicated_images(folder):

    # Define a threshold for image similarity
    threshold = 8

    # Define a dictionary to store the hash values and file paths of the images
    image_hashes = {}
    duplicated_images = []

    # Loop through all the image files in a directory
    for filename in os.listdir(folder):
        # Load the image file
        image = Image.open(os.path.join(folder, filename))

         # Compute the hash value of the image using the average hash algorithm
        hash_value = imagehash.average_hash(image)

        # Check if the hash value is already in the dictionary
        if hash_value in image_hashes:
            # If a similar hash value already exists, delete the duplicate image
            duplicated_images.append(filename)
            os.remove(os.path.join(folder, filename))
        else:
             # Otherwise, add the hash value and file path to the dictionary
            image_hashes[hash_value] = os.path.join(folder, filename)
            
    return duplicated_images

In [None]:
# Dropping duplicates
duplicated_images = drop_duplicated_images('cleaned_images/acne_images/')

# number of acne images after removing duplicated images (1109)
acne_img = [image_name for image_name in os.listdir('cleaned_images/acne_images/')] 
print('There are', len(acne_img),'acne images after removing duplicated images')

There are 1109 acne images after removing duplicated images


Acne and other follicular disorder images has a collection of different images. Only images that have acne as a specific label will be included, the others will be dropped from the dataset. </br>

In [None]:
# dropping those images from the acne_images folder
indexes_to_drop = [295, 296, 297, 298, 300, 303, 304, 307, 308, 309, 310, 311, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 325, 326, 328, 329, 330, 333, 337, 338, 339, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 354, 355, 359, 361, 362, 363, 364, 366, 367, 368, 371, 372, 373, 374, 375, 376, 378, 380, 381, 382, 384, 385, 387, 388, 389, 390, 392, 393, 395, 396, 397, 398, 402, 403, 405, 408, 409, 411, 413, 415, 416, 417, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 431, 432, 433, 434, 436, 437, 438, 441, 443, 444, 445, 446, 447]

for filename in os.listdir('cleaned_images/acne_images/'):
    for index in indexes_to_drop:
        if f"images{index}" in filename.lower():
            os.remove(os.path.join('cleaned_images/acne_images/', filename))
            
print("Number of acne images left:", len(os.listdir('cleaned_images/acne_images/')))

Number of acne images left: 1000


## **<u>Atopic dermatitis(Eczema) </u>**

**Meaning**<br>
Atopic dermatitis, also known as eczema, is a chronic inflammatory skin condition that is characterized by dry, itchy, and inflamed patches of skin. It is a common condition that can affect people of all ages, but it is most common in infants and children. <br>

**Causes**<br>
The exact causes of atopic dermatitis are not fully understood, but it is believed to be a combination of genetic and environmental factors. People with atopic dermatitis often have a genetic predisposition to the condition, and environmental triggers such as allergens, irritants, and stress can exacerbate the symptoms.

**Symptoms**<br>
The symptoms of atopic dermatitis can vary depending on the severity of the condition. Mild cases may only present with dry, itchy skin, while more severe cases can lead to red, inflamed, and weeping skin lesions. In some cases, the skin may become thickened and scaly. Atopic dermatitis can also cause significant discomfort and interfere with a person's quality of life.

**Treatment**<br>
Treatment options for eczema include using gentle soaps and moisturizers, avoiding harsh chemicals and irritants, and taking short, lukewarm baths or showers. Prescription creams or ointments containing corticosteroids or immunosuppressants may be used for more severe cases of eczema. Antihistamines can also be helpful in reducing itching. <br>
Preventing flare-ups of eczema can be done by avoiding triggers such as certain foods, allergens, and irritants. Regular use of moisturizers can also help to keep the skin hydrated and reduce the risk of flare-ups.<br>

### **Cleaning Eczema images**

**i. Moving eczema images in the Image folder to their own folder**

In [None]:
# Labels representing eczema in Dermnet's scraped data.
eczema_labels = image_df[(image_df['skin_disorder_name'].str.contains('eczema')) | \
                         (image_df['skin_disorder_name'].str.contains('atopic dermatitis images')) |\
                         (image_df['skin_disorder_name'].str.contains('hand dermatitis images')) |\
                         (image_df['skin_disorder_name'] == 'dermatitis images') |\
                         (image_df['skin_disorder_name'].str.contains('nummular dermatitis images'))] \
                         ['skin_disorder_name'].unique()
len(eczema_labels)

8

In [None]:
# Getting the eczema images file names
eczema_img = [image_name for image_name in os.listdir('Images/') if ('eczema' in image_name) |
                                                                    ('atopic dermatitis images' in image_name) |
                                                                    ('hand dermatitis images' in image_name) | 
                                                                    (image_name.startswith('dermatitis images'))|
                                                                    ('nummular dermatitis images' in image_name)
                                                                     ] 

# Confirming the number of eczema images before any cleaning
print('There are', len(eczema_img),'eczema images.')
eczema_img[:5]

There are 631 eczema images.


['atopic dermatitis images1058.jpg',
 'atopic dermatitis images1059.jpg',
 'atopic dermatitis images1060.jpg',
 'atopic dermatitis images1061.jpg',
 'atopic dermatitis images1062.jpg']

In [None]:
# Creating a new folder with just eczema images to make cleaning easier
folder_name = 'cleaned_images/eczema_images/'

# Note📝: For reproducibility of the code, this step is important.
         # If the folder is not dropped before an error will occur if you rerun this cell
         
# Checking if the folder exists and deleting it if it exists         
if os.path.exists(folder_name):
    # deleting the folder and its contents
    shutil.rmtree(folder_name)

# create the new folder
os.mkdir(folder_name)

# Moving the images into that folder
for img in eczema_img:
    origin = os.path.join('Images/', img)
    destination = os.path.join(folder_name, img)
    shutil.copy(origin, destination)

In [None]:
# Confirming that the number of eczema images after moving them to a separate folder is still 631
eczema_img = [image_name for image_name in os.listdir('cleaned_images/eczema_images/')] 
print('There are', len(eczema_img),'eczema images.')

There are 631 eczema images.


**ii. Combining the images into one folder**

In [None]:
# Extra eczema images
extra_eczema = [image_name for image_name in os.listdir('extra_images/extra_eczema')]

# The folder has a mixture of images. We will filter out the eczema images only
extra_eczema_images = [image_name for image_name in extra_eczema\
                        if ('dermatitis' in image_name) |\
                        ('eczema' in image_name)]
extra_eczema_images[:5]

['eczema234.jpg',
 'eczema289.jpg',
 'eczema301.jpg',
 'eczema403.jpg',
 'eczema500.jpg']

In [None]:
# This was done by moving the extra images into the eczema folder
for img in extra_eczema_images:
    origin = os.path.join('extra_images/extra_eczema_images_clean/', img)
    destination = os.path.join('cleaned_images/eczema_images/', img)
    shutil.copy(origin, destination)
    
# Confirming that the total acne images is 1367 before any cleaning
eczema_img = [image_name for image_name in os.listdir('cleaned_images/eczema_images/')] 
print('There are a total of', len(eczema_img),'eczema images.')

There are a total of 1367 eczema images.


**iii. Removing duplicate images from the folder**

In [None]:
# Using a function created earlier to drop duplicates
duplicated_images = drop_duplicated_images('cleaned_images/eczema_images/')

# Confirming the number of images after dropping duplicates
eczema_img = [image_name for image_name in os.listdir('cleaned_images/eczema_images/')] 
print('There are', len(eczema_img),'eczema images after removing duplicated images.')

There are 1000 eczema images after removing duplicated images.


## **<u>Actinic keratosis</u>**
**Meaning** <br>
Actinic keratosis(AK) is a skin condition that is caused by long-term exposure to UV rays, resulting in the formation of rough, scaly patches on the skin. It is considered a precancerous condition because it has the potential to develop into squamous cell carcinoma, a type of skin cancer

**Causes** <br>
The primary cause of actinic keratosis is long-term exposure to UV rays from the sun or other sources such as tanning beds. People with fair skin, light-colored hair, and light-colored eyes are at a higher risk of developing AK. Other risk factors include a history of frequent sunburns, a weakened immune system, and exposure to chemicals such as coal tar or arsenic.

**Symptoms** <br>
The most common symptom of actinic keratosis is the formation of rough, scaly patches or lesions on the skin. These patches can be pink, red, or brown in color and may feel like sandpaper. They are usually found on areas of the skin that are frequently exposed to the sun, such as the face, scalp, ears, neck, hands, and arms. In some cases, the patches may itch or burn, and they may become inflamed or bleed if they are scratched or rubbed.

**Treatment** <br>
The treatment of actinic keratosis depends on the severity of the condition. Mild cases may be treated with topical creams or gels that contain medications such as imiquimod, fluorouracil, or diclofenac. These medications work by stimulating the immune system or causing the abnormal cells to die off. In more severe cases, cryotherapy (freezing the lesions with liquid nitrogen) or curettage (scraping off the lesions with a special tool) may be necessary. In rare cases where the lesions have developed into skin cancer, surgical removal may be required. It is also important to take steps to prevent further damage to the skin, such as wearing protective clothing and sunscreen, avoiding tanning beds, and staying out of the sun during peak hours.


**i. Moving actinic keratosis images in the Image folder to their own folder**

In [None]:
# image labels with the name keratosis in DermNet's scrapped data
print(image_df[image_df['skin_disorder_name'].str.contains('keratosis')]['skin_disorder_name'].unique())

['actinic keratosis affecting the face images'
 'actinic keratosis affecting the hand images'
 'actinic keratosis affecting the legs and feet images'
 'actinic keratosis affecting the scalp images'
 'actinic keratosis dermoscopy images'
 'actinic keratosis on the nose images'
 'actinic keratosis treated with imiquimod images'
 'granular parakeratosis images' 'keratosis pilaris images'
 'seborrhoeic keratosis dermoscopy images' 'seborrhoeic keratosis images'
 'solar keratosis affecting the face images'
 'solar keratosis affecting the hand images'
 'solar keratosis affecting the legs and feet images'
 'solar keratosis affecting the scalp images'
 'solar keratosis on the nose images'
 'solar keratosis treated with imiquimod images']


Actinic keratosis is also known as solar keratosis or senile keratosis

In [None]:
# extra keratosis images and dataframe
df = pd.read_csv('Data/ISIC_2019_Training_GroundTruth.csv')

# filter df to get rows where AK = 1.0
df1 = df.copy()
df1 = df1[df1['AK'] == 1.0]
df1['skin_disorder_name'] = df1['images']

# drop the unwanted columns from df
df1 = df1.drop(['MEL', 'NV', 'BCC', 'AK', 'BKL', 'DF', 'VASC', 'SCC', 'UNK'], axis=1)

# Loop through each file in the folder and add the skin disorder name to the list
img_names = []
image_paths = []

for file in os.listdir('extra_images/extra_actinic_keratosis_images'):
    if file.endswith(".jpg"):
        skin_disorder_name = file.split(".")[0]
        img_names.append(skin_disorder_name)
        image_paths.append(file)

In [None]:
# Getting the keratosis images file names
keratosis_img = [image_name for image_name in os.listdir('Images/') if ('actinic keratosis' in image_name) | ('solar keratosis' in image_name)]
AK_img = [image_name for image_name in os.listdir('extra_images/extra_AK_and_BKL_images') if any(x in image_name for x in df1['images'].tolist())]
AK_img2 = [image_name for image_name in os.listdir('extra_images/extra_actinic_keratosis_images')]

# Checking if the folder exists and deleting it if it exists
if os.path.exists('cleaned_images/keratosis_images/'):
    # deleting the folder and its contents
    shutil.rmtree('cleaned_images/keratosis_images/')
    
# Creating a new folder with just keratosis images to make cleaning easier
os.mkdir('cleaned_images/keratosis_images/')
for img in keratosis_img:
    origin = os.path.join('Images/', img)
    destination = os.path.join('cleaned_images/keratosis_images/', img)
    shutil.copy(origin, destination)

for img in AK_img:
    origin = os.path.join('extra_images/extra_AK_and_BKL_images/', img)
    destination = os.path.join('cleaned_images/keratosis_images/', img)
    shutil.copy(origin, destination)

for img in AK_img2:
    origin = os.path.join('extra_images/extra_actinic_keratosis_images/', img)
    destination = os.path.join('cleaned_images/keratosis_images/', img)
    shutil.copy(origin, destination)
    
# Confirming that the number of keratosis images after moving them to a separate folder is still 1391
keratosis_img = [image_name for image_name in os.listdir('cleaned_images/keratosis_images/')] 
print('There are', len(keratosis_img),'actinic keratosis images')

There are 1391 actinic keratosis images


**ii. Removing duplicate images from the folder**

In [None]:
# call function to drop duplicates from image folder 
duplicated_images = drop_duplicated_images('cleaned_images/keratosis_images/')

# number of images after removing duplicates
keratosis_img = [image_name for image_name in os.listdir('cleaned_images/keratosis_images/')] 
print('Number of actinic keratosis images after removing duplicated images:', len(keratosis_img))

Number of actinic keratosis images after removing duplicated images: 1000


## **<u>Benign Keratosis-like Lesions**</u>
**Meaning** <br>
Benign Keratosis-like Lesions (BKL) are a group of benign skin lesions that resemble actinic keratosis (AK) but are not classified as AK because they do not have the same degree of dysplasia. BKL lesions can appear as small, scaly, or waxy bumps on the skin, ranging in color from light tan to dark brown. They typically occur on areas of the skin that have been exposed to the sun, such as the face, neck, scalp, and hands. Examples of BKL lesions include seborrheic keratosis, solar lentigo, and lichen planus-like keratosis.
 
**Causes** <br>
The exact cause of BKL is not known, but it is believed to be related to long-term sun exposure. Other factors that may contribute to the development of BKL include a weakened immune system, age, and a history of other skin conditions.

**Symptoms**<br>
BKL lesions typically appear as small, scaly, or waxy bumps on the skin. They may be light tan to dark brown in color and may have a rough, textured surface. They can be single or multiple and can occur on any part of the body, but are most commonly found on the face, neck, scalp, and hands.

**Treatment**<br>
BKL lesions are usually benign and do not require treatment unless they are causing symptoms or affecting the patient's appearance. Treatment options may include cryotherapy (freezing the lesion with liquid nitrogen), curettage (scraping the lesion off the skin), or topical medications such as 5-fluorouracil or imiquimod. In some cases, BKL lesions may be biopsied to confirm the diagnosis or rule out other skin conditions. It is important to protect the skin from sun exposure and to seek medical attention for any suspicious skin lesions.

**i. Moving Benign Keratosis-like Lesions images to their own folder**

In [None]:
# filter df to get rows where BKL = 1.0
BKL_df = df.copy()
BKL_df = BKL_df[BKL_df['BKL'] == 1.0]
BKL_df = BKL_df[~BKL_df["images"].str.contains("downsampled")]
BKL_df['skin_disorder_name'] = BKL_df['images']

# drop the unwanted columns and rows from df
BKL_df = BKL_df.drop(['MEL', 'NV', 'BCC', 'AK', 'BKL', 'DF', 'VASC', 'SCC', 'UNK'], axis=1)
BKL_df = BKL_df[:1003]

# Getting the BKL images file names
BKL_img = [image_name for image_name in os.listdir('extra_images/extra_AK_and_BKL_images') if any(x in image_name for x in BKL_df['images'].tolist())]

# Checking if the folder exists and deleting it if it exists
if os.path.exists('cleaned_images/BKL_images/'):
    # deleting the folder and its contents
    shutil.rmtree('cleaned_images/BKL_images/')
    
# Creating a new folder with just BKL images to make cleaning easier
os.mkdir('cleaned_images/BKL_images/')
for img in BKL_img:
    origin = os.path.join('extra_images/extra_AK_and_BKL_images/', img)
    destination = os.path.join('cleaned_images/BKL_images/', img)
    shutil.copy(origin, destination)
    
# Number of BKL images after moving them to a separate folder
BKL_img = [image_name for image_name in os.listdir('cleaned_images/BKL_images/')] 
print('There are', len(BKL_img),'BKL images')

There are 1003 BKL images


**ii. Removing duplicated images from the folder**

In [None]:
# use the function to drop duplicates from image folder
duplicated_images2 = drop_duplicated_images('cleaned_images/BKL_images/')

# number of images after removing duplicates
BKL_img = [image_name for image_name in os.listdir('cleaned_images/BKL_images/')] 
print('Number of BKL images after removing duplicated images:', len(BKL_img))

Number of BKL images after removing duplicated images: 1000


## **<u>Melanoma**</u>
**Definition**<br>
Melanoma is a disease in which malignant (cancer) cells form in melanocytes (cells that color the skin). There are different types of cancer that start in the skin. Melanoma can occur anywhere on the skin. Unusual moles, exposure to sunlight, and health history can affect the risk of melanoma.

The most common type of melanoma is superficial spreading melanoma. It tends to spread across the surface of the skin, has uneven borders, and varies in color from brown to black, pink, or red.

Nodular melanoma is another type that grows down into deeper layers of the skin and may appear as a raised bump or growth.

Lentigo maligna melanoma tends to appear on parts of the body that get more sun, especially the face, and it often affects older people. It looks like a large, uneven dark patch on the surface of the skin.

Metastatic melanoma occurs when the cancer spreads, or metastasizes, to other parts of the body, possibly including the lymph nodes, organs, or bones.

Other rare types of melanoma also exist, and while it most commonly affects the skin, some affect internal tissues, as well as the eyes.

**i. Moving Melanoma images to their own folder**

In [None]:
## Labels representing melanoma in DermNet's scrapped data
melanoma_labels = image_df[image_df['skin_disorder_name'].str.contains('melanoma')]['skin_disorder_name'].unique()
print(melanoma_labels)

#number of labels representing melanoma
len(melanoma_labels)

['acral lentiginous melanoma images' 'amelanotic melanoma images'
 'hypomelanotic malignant melanoma images'
 'lentigo maligna melanoma images' 'melanoma in situ images'
 'melanoma of nail unit images' 'metastatic melanoma images'
 'nodular melanoma images' 'superficial spreading melanoma images']


9

In [None]:
# Getting the melanoma images file names
melanoma_img = [image_name for image_name in os.listdir('Images/') if 'melanoma' in image_name] 

# Checking if melanoma folder exists and deleting it if it exists
if os.path.exists('cleaned_images/melanoma/'):
    # deleting the folder and its contents
    shutil.rmtree('cleaned_images/melanoma/')
    
# Creating a new folder with just melanoma images to make cleaning easier
os.mkdir('cleaned_images/melanoma/')
for img in melanoma_img:
    origin = os.path.join('Images/', img)
    destination = os.path.join('cleaned_images/melanoma/', img)
    shutil.copy(origin, destination)
    
for filename in os.listdir('extra_images/extra_melanoma_images/'):
    src_path = os.path.join('extra_images/extra_melanoma_images/', filename)
    dst_path = os.path.join('cleaned_images/melanoma/', filename)
    shutil.copy(src_path, dst_path)
    
# Number of melanoma images after moving them to a separate folder
melanoma_img = [image_name for image_name in os.listdir('cleaned_images/melanoma/')] 
print('There are', len(melanoma_img),'melanoma images')

There are 5074 melanoma images


**ii. Removing duplicated images from the folder**

In [None]:
# Dropping duplicates
duplicated_images = drop_duplicated_images('cleaned_images/melanoma/')

melanoma_img = [image_name for image_name in os.listdir('cleaned_images/melanoma/')] 
print('There are', len(melanoma_img),'melanoma images after removing duplicate images')

There are 4631 melanoma images after removing duplicate images


In [None]:
# Function to randomly select 1000 images 
def reduce_images(folder_path):
    # Get the list of image file name
    file_names = os.listdir(folder_path)

    # Shuffle the file names
    random.shuffle(file_names)

    # Select the first 1000 file names
    selected_file_names = file_names[:1000]

    # Create a new folder to store the selected images
    selected_folder_path = f'{folder_path}_images'
    os.mkdir(selected_folder_path)

    # Copy the selected images to the new folder
    for file_name in selected_file_names:
        file_path = os.path.join(folder_path, file_name)
        selected_file_path = os.path.join(selected_folder_path, file_name)
        shutil.copy(file_path, selected_file_path)

In [None]:
# select 1000 images
reduce_images('cleaned_images/melanoma')
print('Number of melanoma images:', len([image_name for image_name in os.listdir('cleaned_images/melanoma_images/')]))

Number of melanoma images: 1000


## **<u>Psoriasis </u>**

**Meaning**<br>
Psoriasis is a chronic autoimmune skin disorder characterized by the rapid buildup of skin cells that form thick, silvery scales and itchy, dry, and red patches on the skin. It is a non-contagious condition that can affect any part of the body, including the scalp, nails, and joints. Psoriasis occurs when the immune system mistakenly attacks healthy skin cells, causing the skin cells to grow too quickly and accumulate on the skin's surface. The condition is typically lifelong and can vary in severity from mild to severe. While there is no cure for psoriasis, there are treatments available that can help manage symptoms and improve quality of life. <br>

**Causes**<br>
The exact cause of psoriasis is not fully understood, but it is believed to be a combination of genetic, environmental, and immune system factors. Some of the known factors that can trigger or exacerbate psoriasis include:<br>

<ol>
  <li>Genetics: Psoriasis tends to run in families, suggesting a genetic component to the condition.</li>
  <li>Immune system dysfunction: Psoriasis is believed to be an autoimmune disorder, in which the immune system mistakenly attacks healthy skin cells, causing inflammation and other symptoms.</li>
  <li>Environmental factors: Certain environmental factors can trigger or worsen psoriasis, such as infections, injuries to the skin, stress, smoking, and alcohol consumption.</li>
  <li>Medications: Certain medications, such as lithium, beta-blockers, and antimalarials, can trigger or worsen psoriasis.</li>
  <li>Hormonal changes: Changes in hormone levels, such as those that occur during puberty, pregnancy, and menopause, can trigger or worsen psoriasis.</li>
</ol>

**Symptoms**<br>
Psoriasis symptoms can vary depending on the type and severity of the condition, but some common symptoms include:<br>
<ol>
  <li>Red, inflamed patches of skin: These patches may be covered with thick, silvery scales that may flake off or bleed if scratched.</li>
  <li>Dry, cracked skin: The affected skin may be dry and itchy, and may crack and bleed in severe cases.</li>
  <li>Thickened, pitted, or ridged nails: Psoriasis can affect the nails, causing them to become thickened, discolored, pitted, or ridged.</li>
  <li>Joint pain and stiffness: In some cases, psoriasis can also cause joint pain and stiffness, a condition called psoriatic arthritis.</li>
  <li>Itching and burning: Psoriasis patches may be itchy and burning, which can cause discomfort and distress.</li>
  <li>Soreness or discomfort: Psoriasis patches can be painful and tender to the touch.</li>
</ol>


**Treatment**<br>
Treatment options for eczema include using gentle soaps and moisturizers, avoiding harsh chemicals and irritants, and taking short, lukewarm baths or showers. Prescription creams or ointments containing corticosteroids or immunosuppressants may be used for more severe cases of eczema. Antihistamines can also be helpful in reducing itching. <br>
Preventing flare-ups of eczema can be done by avoiding triggers such as certain foods, allergens, and irritants. Regular use of moisturizers can also help to keep the skin hydrated and reduce the risk of flare-ups.<br>
<ol>
  <li>Topical medications: These are creams, ointments, gels, or foams that are applied directly to the affected skin to reduce inflammation and itching. Topical medications may include corticosteroids, vitamin D analogues, retinoids, and tar preparations.</li>
  <li>Phototherapy: This involves exposing the skin to ultraviolet light to slow down the growth of affected skin cells and reduce inflammation. Phototherapy can be done in a doctor's office or at home using a special light box.</li>
  <li>Systemic medications: These are medications that are taken orally or by injection to suppress the immune system and reduce inflammation. Systemic medications may include methotrexate, cyclosporine, and biologics.</li>
  <li>Lifestyle changes: Making changes to your diet, reducing stress, and avoiding triggers such as smoking and alcohol consumption may help to reduce the frequency and severity of psoriasis flare-ups.</li>
  <li>Moisturizers: Applying moisturizers regularly can help to soothe dry, itchy skin and reduce the risk of flare-ups.</li>
</ol>

**i. Moving psoriasis images in the Images folder to their own folder**

In [None]:
# Labels representing acne in DermNet's scrapped data
psoriasis_labels = list(image_df[image_df['skin_disorder_name'].str.contains('psoriasis')]['skin_disorder_name'].unique())
print(psoriasis_labels)

# Count of labels representing psoriasis
len(psoriasis_labels)

['chronic plaque psoriasis images', 'facial psoriasis images', 'flexural psoriasis images', 'generalised pustular psoriasis images', 'genital psoriasis images', 'guttate psoriasis images', 'nail psoriasis images', 'palmoplantar psoriasis images', 'psoriasis affecting the face images', 'psoriasis of the scalp images', 'pustular psoriasis of the hand and feet images']


11

In [None]:
# Getting the psoriasis images file names
psoriasis_img = [image_name for image_name in os.listdir('Images/') if 'psoriasis' in image_name] 

# Checking if psoriasis folder exists and deleting it if it exists
if os.path.exists('cleaned_images/psoriasis/'):
    # deleting the folder and its contents
    shutil.rmtree('cleaned_images/psoriasis/')
    
# Creating a new folder with just psoriasis images to make cleaning easier
os.mkdir('cleaned_images/psoriasis/')
for img in psoriasis_img:
    origin = os.path.join('Images/', img)
    destination = os.path.join('cleaned_images/psoriasis/', img)
    shutil.copy(origin, destination)
    
for filename in os.listdir('extra_images/extra_psoriasis_images/'):
    src_path = os.path.join('extra_images/extra_psoriasis_images/', filename)
    dst_path = os.path.join('cleaned_images/psoriasis/', filename)
    shutil.copy(src_path, dst_path)
    
# Number of psoriasis images after moving them to a separate folder
psoriasis_img = [image_name for image_name in os.listdir('cleaned_images/psoriasis/')] 
print('There are', len(psoriasis_img),'psoriasis images')

There are 1057 psoriasis images


**ii. Removing duplicated images from the folder**

In [None]:
# Dropping duplicates
duplicated_images = drop_duplicated_images('cleaned_images/psoriasis/')

psoriasis_img = [image_name for image_name in os.listdir('cleaned_images/psoriasis/')] 
print('There are', len(psoriasis_img),'psoriasis images after removing duplicate images')

There are 1006 psoriasis images after removing duplicate images


In [None]:
# select 1000 images
reduce_images('cleaned_images/psoriasis')
print('Number of psoriasis images:', len([image_name for image_name in os.listdir('cleaned_images/psoriasis_images/')]))

Number of psoriasis images: 1000


## **<u>Basal cell carcinoma</u>**
**Meaning**<br>
Basal cell carcinoma (BCC) is a malignant tumor that also arises from the basal cells in the skin. It is the most common type of skin cancer. Malignant tumors are more dangerous than benign tumors because they can grow quickly and invade and destroy surrounding tissues, leading to significant damage to the body's normal functions.

**Causes**<br>
Basal cell carcinoma are primarily caused by exposure to ultraviolet (UV) radiation from the sun or tanning beds. Prolonged exposure to UV radiation damages the DNA in the skin cells, leading to mutations that can cause the cells to grow and divide uncontrollably, eventually forming a tumor. Other factors that may increase the risk of developing these skin cancers include having fair skin, a history of sunburns or intense sun exposure, a weakened immune system, a family history of skin cancer, and certain genetic conditions.

**Symptoms**<br>
Basal cell carcinoma can have similar symptoms, but there are some differences. The condition typically present as raised, pearly, or translucent bumps or lesions on the skin that may be pink, red, or white in color. These lesions can sometimes ulcerate or bleed, and may develop a crust or scab. This sometimes appear as a flat, scaly, or pigmented patch on the skin.

**Treatment**<br>
Basal cell carcinoma treatment plan depends on the size, location, and extent of the tumor. Surgical removal of the tumor is the primary treatment, and there are different techniques available, such as excision, curettage and electrodesiccation, Mohs surgery, or radiation therapy. For small tumors, excision may be sufficient, while Mohs surgery is recommended for larger or more advanced tumors to ensure complete removal. Radiation therapy may be used as an alternative to surgery for some cases. Systemic chemotherapy or immunotherapy is rarely used for advanced basal cell carcinoma that has spread to other parts of the body.

**i. Moving BCC images in the Images folder to their own folder**

In [None]:
# filter df to get rows where BKL = 1.0
BCC_df = df.copy()
BCC_df = BCC_df[BCC_df['BCC'] == 1.0]
BCC_df = BCC_df[~BCC_df["images"].str.contains("downsampled")]
BCC_df['skin_disorder_name'] = BCC_df['images']

# drop the unwanted columns and rows from df
BCC_df = BCC_df.drop(['MEL', 'NV', 'BCC', 'AK', 'BKL', 'DF', 'VASC', 'SCC', 'UNK'], axis=1)
BCC_df = BCC_df[:1073]

# Getting the BCC images file names
BCC_img = [image_name for image_name in os.listdir('extra_images/extra_Bcc_images') if any(x in image_name for x in BCC_df['images'].tolist())]

# Checking if the folder exists and deleting it if it exists
if os.path.exists('cleaned_images/Bcc_images'):
    # deleting the folder and its contents
    shutil.rmtree('cleaned_images/Bcc_images')
    
# Creating a new folder with just BCC images to make cleaning easier
os.mkdir('cleaned_images/Bcc_images')
for img in BCC_img:
    origin = os.path.join('extra_images/extra_Bcc_images/', img)
    destination = os.path.join('cleaned_images/Bcc_images/', img)
    shutil.copy(origin, destination)
    
# Number of BCC images after moving them to a separate folder
BCC_img = [image_name for image_name in os.listdir('cleaned_images/Bcc_images')] 
print('There are', len(BCC_img),'BCC images')

There are 1073 BCC images


**ii. Removing duplicated images from the folder**

In [None]:
# use the function to drop duplicates from image folder
duplicated_images2 = drop_duplicated_images('cleaned_images/Bcc_images/')

# number of images after removing duplicates
BCC_img = [image_name for image_name in os.listdir('cleaned_images/Bcc_images')] 
print('Number of BCC images after removing duplicated images:', len(BCC_img))

Number of BCC images after removing duplicated images: 1000


## **<u>Tinea </u>**

**Meaning**<br>
Tinea is popularly known as Ringworm (scalp and body). It is a type of fungal infection of the skin, hair, or nails. It is caused by a group of fungi called dermatophytes, which can thrive on the skin's keratin, a tough protein that forms the outer layer of the skin, hair, and nails. It can affect various parts of the body, including the feet (athlete's foot), groin (jock itch), scalp (tinea capitis), beard area (tinea barbae), and body (tinea corporis).<br>
Tinea infections are highly contagious and can spread through contact with infected skin or objects. Treatment for tinea typically involves antifungal medications, which can be applied topically or taken orally. It is also important to practice good hygiene and avoid sharing personal items, such as towels and clothing, to prevent the spread of infection. <br>

**Causes**<br>
Tinea is caused by a group of fungi called dermatophytes. These fungi can thrive on the skin, hair, or nails and cause infections in various parts of the body. The specific type of dermatophyte that causes tinea may vary depending on the affected area. <br>
Some of the common causes of tinea include:<br>

<ol>
  <li>Direct contact with an infected person or animal - Tinea can spread from person to person or from animal to person through direct contact with infected skin or hair.</li>
  <li>Sharing personal items - Sharing personal items such as towels, clothing, or hairbrushes can also spread tinea.</li>
  <li>Warm and humid environment - Dermatophytes thrive in warm and humid environments, making certain areas of the body more susceptible to tinea infections, such as the feet and groin.</li>
  <li>Weakened immune system - People with weakened immune systems, such as those with HIV or undergoing chemotherapy, may be more prone to tinea infections.</li>
  <li>Skin injury or irritation - Skin that is injured or irritated, such as from scratching or wearing tight-fitting clothing, may be more susceptible to tinea infections.</li>
</ol>
Preventing the spread of tinea involves good hygiene practices, such as keeping the skin clean and dry, avoiding sharing personal items, and wearing protective clothing in public areas such as locker rooms and swimming pools.

**Symptoms**<br>
The symptoms of tinea can vary depending on the affected area of the body. Some common symptoms of tinea infections include:<br>
<ol>
  <li>Itching - Tinea infections can cause intense itching, which can be worse at night.</li>
  <li>Scaling or flaking - Tinea infections can cause the skin to become scaly or flaky.</li>
  <li>Blisters - Some types of tinea infections, such as tinea pedis (athlete's foot), can cause small fluid-filled blisters.</li>
  <li>Hair loss - Tinea infections of the scalp can cause hair to become brittle and break off, resulting in hair loss.</li>
  <li>Thickened or discolored nails - Tinea infections of the nails can cause the nails to become thickened, discolored, and brittle.</li>
</ol>

**Treatment**<br>
The treatment for tinea infections typically involves antifungal medications, which can be applied topically or taken orally. The specific treatment will depend on the location and severity of the infection. Some common treatment options include:<br>
<ol>
  <li>Topical antifungal medications - These medications are applied directly to the skin or nails and include creams, ointments, sprays, and powders. Topical antifungal medications are often effective for mild to moderate tinea infections.</li>
  <li>Oral antifungal medications - These medications are taken by mouth and may be prescribed for more severe or widespread tinea infections. Oral antifungal medications include terbinafine, fluconazole, and itraconazole.</li>
  <li>Medicated shampoo - A medicated shampoo may be recommended for tinea infections of the scalp. These shampoos contain antifungal medication and are used to help control the infection and reduce symptoms.</li>
  <li>Removal of infected nails - In severe cases of tinea infections of the nails, the infected nail may need to be removed to allow for the application of antifungal medication to the underlying nail bed.</li>
</ol>

**i. Moving tinea images in the Images folder to their own folder**

In [None]:
## Labels representing tinea in DermNet's scrapped data
tinea_labels = image_df[image_df['skin_disorder_name'].str.contains('tinea')]['skin_disorder_name'].unique()
tinea_labels

array(['tinea corporis images', 'tinea pedis images'], dtype=object)

In [None]:
# Getting the tinea images file names
tinea_img = [image_name for image_name in os.listdir('Images/') if 'tinea' in image_name]

# Checking if tinea folder exists and deleting it if it exists
if os.path.exists('cleaned_images/tinea/'):
    # deleting the folder and its contents
    shutil.rmtree('cleaned_images/tinea/')
    
# Creating a new folder with just tinea images to make cleaning easier
os.mkdir('cleaned_images/tinea/')
for img in tinea_img:
    origin = os.path.join('Images/', img)
    destination = os.path.join('cleaned_images/tinea/', img)
    shutil.copy(origin, destination)
    
for filename in os.listdir('extra_images/extra_tinea_images/'):
    src_path = os.path.join('extra_images/extra_tinea_images/', filename)
    dst_path = os.path.join('cleaned_images/tinea/', filename)
    shutil.copy(src_path, dst_path)
    
# Number of tinea images after moving them to a separate folder
tinea_img = [image_name for image_name in os.listdir('cleaned_images/tinea/')] 
print('There are', len(tinea_img),'tinea images')

There are 1678 tinea images


**ii. Removing duplicated images from the folder**

In [None]:
# Dropping duplicates
duplicated_images = drop_duplicated_images('cleaned_images/tinea/')

tinea_img = [image_name for image_name in os.listdir('cleaned_images/tinea/')] 
print('There are', len(tinea_img),'tinea images after removing duplicate images')

There are 1661 tinea images after removing duplicate images


In [None]:
# select 1000 images
reduce_images('cleaned_images/tinea')
print('Number of tinea images:', len([image_name for image_name in os.listdir('cleaned_images/tinea_images/')]))

Number of tinea images: 1000


**import required libraries**

In [1]:
# import required libraries
import pandas as pd
import shutil
import os
from PIL import Image
import imagehash 
import re

**Loading the scrapped data from DermNet.**

In [2]:
# load and preview dataset
image_df = pd.read_csv('Data/data1-294.csv')
print(image_df.shape)
image_df.head()

(13992, 2)


Unnamed: 0,skin_disorder_name,images
0,acne affecting the back images,https://dermnetnz.org/assets/Uploads/acne/acne...
1,acne affecting the back images,https://dermnetnz.org/assets/Uploads/acne/acne...
2,acne affecting the back images,https://dermnetnz.org/assets/Uploads/acne/acne...
3,acne affecting the back images,https://dermnetnz.org/assets/Uploads/acne/acne...
4,acne affecting the back images,https://dermnetnz.org/assets/Uploads/acne/acne...


## **<u>Acne</u>**

**Meaning**<br>
Acne is a common skin condition that occurs when hair follicles become clogged with oil and dead skin cells. This leads to the formation of pimples, blackheads, whiteheads, and sometimes deeper cysts. Acne usually appears on the face, neck, chest, back, and shoulders, and can affect people of all ages, although it is most common during puberty.<br>

**Causes**<br>
The causes of acne are multifactorial and can include hormonal imbalances, genetics, stress, certain medications, and an overproduction of sebum, the oily substance that lubricates the skin. Certain factors such as diet and hygiene practices have also been implicated in the development of acne, although the evidence for these is less clear.<br>

**Symptoms**<br>
The symptoms of acne can vary depending on the severity of the condition. Mild acne may only present with a few blackheads or whiteheads, while moderate acne can involve a combination of pimples, blackheads, and whiteheads. Severe acne may include deep, painful cysts that can lead to scarring. Acne can also have a significant impact on a person's self-esteem and mental health, particularly if it is severe or persistent.

**Treatment**<br>
reatment options for acne depend on the severity of the condition. Mild acne can often be managed with over-the-counter topical treatments that contain benzoyl peroxide or salicylic acid. These products work by reducing the amount of oil on the skin and unclogging pores. More severe acne may require prescription medications, such as topical retinoids or oral antibiotics, which can help to reduce inflammation and kill the bacteria that cause acne. In cases of severe, persistent acne, isotretinoin, a powerful oral medication, may be prescribed. Additionally, lifestyle modifications such as maintaining good hygiene practices, avoiding certain foods, and managing stress can also be helpful in managing acne.


### **Cleaning Acne images**


**Creating a dataframe with acne images from the data scrapped from DermNet**

In [3]:
# Labels representing acne in DermNet's scrapped data
acne_labels = list(image_df[image_df['skin_disorder_name'].str.contains('acne')]['skin_disorder_name'].unique())

# removing acne labels whose images will not be used because there are not clear
acne_labels.remove('infantile acne images')
acne_labels.remove('steroid acne images')

acne_labels

['acne affecting the back images',
 'acne affecting the face images',
 'acne and other follicular disorder images',
 'facial acne images']

In [4]:
# There are six labels representing acne
len(acne_labels)

4

In [5]:
# Creating a dataframe with just acne labels for easier cleaning

acne_df = image_df[(image_df['skin_disorder_name'] == acne_labels[0]) | \
                   (image_df['skin_disorder_name'] == acne_labels[1]) | \
                   (image_df['skin_disorder_name'] == acne_labels[2]) | \
                   (image_df['skin_disorder_name'] == acne_labels[3]) 
                 ]
acne_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 702 entries, 0 to 5023
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   skin_disorder_name  702 non-null    object
 1   images              702 non-null    object
dtypes: object(2)
memory usage: 16.5+ KB


### **Extra acne images**

In [6]:
extra_acne = [image_name for image_name in os.listdir('extra_images/extra_acne_images')]
extra_acne[:5]

['07Acne081101.jpg',
 '07Acne0811011 - Copy.jpg',
 '07Acne0811011.jpg',
 '07AcnePittedScars.jpg',
 '07AcnePittedScars1 - Copy.jpg']

In [7]:
#Creating a dataframe for the extra acne images

label =['acne' for img in extra_acne]
extra_acne_df = pd.DataFrame(extra_acne, label).reset_index()
extra_acne_df.columns =['skin_disorder_name', 'images']
extra_acne_df.head()

Unnamed: 0,skin_disorder_name,images
0,acne,07Acne081101.jpg
1,acne,07Acne0811011 - Copy.jpg
2,acne,07Acne0811011.jpg
3,acne,07AcnePittedScars.jpg
4,acne,07AcnePittedScars1 - Copy.jpg


**i. Moving acne images in the Images folder to their own folder**

In [8]:
# Getting the acne images file names
original_acne_img = [image_name for image_name in os.listdir('Images/') \
                     if ('acne affecting the back images' in image_name) |\
                        ('acne affecting the face images' in image_name) |\
                        ('acne and other follicular disorder images' in image_name) |\
                        ('facial acne images' in image_name) 
                        ] 

# Confirming the number of acne images before any cleaning
print('There are', len(original_acne_img),'acne images')
original_acne_img[:5]

There are 702 acne images


['acne affecting the back images0.jpg',
 'acne affecting the back images1.jpg',
 'acne affecting the back images10.jpg',
 'acne affecting the back images11.jpg',
 'acne affecting the back images12.jpg']

In [9]:
# Creating a new folder with just acne images to make cleaning easier
folder_name = 'cleaned_images/acne_images/'



# Note📝: For reproducibility of the code, this step is important.
         # If the folder is not dropped before an error will occur if you rerun this cell
         
# Checking if the folder exists and deleting it if it exists        
if os.path.exists(folder_name):
    # deleting the folder and its contents
    shutil.rmtree(folder_name)

# create the new folder
os.mkdir(folder_name)

# Moving the images into that folder
for img in original_acne_img:
    origin = os.path.join('Images/', img)
    destination = os.path.join(folder_name, img)
    shutil.copy(origin, destination)

In [10]:
# Confirming that the number of acne images after moving them to a separate folder is still 702
acne_img = [image_name for image_name in os.listdir('cleaned_images/acne_images/')] 
print('There are', len(acne_img),'acne images.')

There are 702 acne images.


**ii. Dropping links from the 'images' column in the acne_df and replacing them with the image name**

In [11]:
# So that the two dataframes can match, we dropped the image links in  acne_df 
# and replaced them with the image names

acne_images = pd.DataFrame(acne_img, columns=['images'])
acne_df = acne_df.copy()
acne_df.drop('images', axis=1, inplace=True)
acne_df['images'] = acne_images['images'].values
acne_df.head()

Unnamed: 0,skin_disorder_name,images
0,acne affecting the back images,acne affecting the back images0.jpg
1,acne affecting the back images,acne affecting the back images1.jpg
2,acne affecting the back images,acne affecting the back images10.jpg
3,acne affecting the back images,acne affecting the back images11.jpg
4,acne affecting the back images,acne affecting the back images12.jpg


**iii. Joining the two dataframes**

In [12]:
# Creating a dataframe with all of the acne images

acne_df_complete = pd.concat([acne_df, extra_acne_df], axis=0).reset_index()
acne_df_complete.drop('index', axis=1, inplace=True)
print(acne_df_complete.info())
acne_df_complete.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1427 entries, 0 to 1426
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   skin_disorder_name  1427 non-null   object
 1   images              1427 non-null   object
dtypes: object(2)
memory usage: 22.4+ KB
None


Unnamed: 0,skin_disorder_name,images
0,acne affecting the back images,acne affecting the back images0.jpg
1,acne affecting the back images,acne affecting the back images1.jpg
2,acne affecting the back images,acne affecting the back images10.jpg
3,acne affecting the back images,acne affecting the back images11.jpg
4,acne affecting the back images,acne affecting the back images12.jpg


**iv. Combining the images into one folder**

In [13]:
# This was done by moving the extra images into the acne folder
for img in extra_acne:
    origin = os.path.join('extra_images/extra_acne_images/', img)
    destination = os.path.join('cleaned_images/acne_images/', img)
    shutil.copy(origin, destination)

In [14]:
# Confirming that the total acne images is 1427 before any cleaning

acne_img = [image_name for image_name in os.listdir('cleaned_images/acne_images/')] 
print('There are a total of', len(acne_img),'acne images.')

There are a total of 1427 acne images.


**v. Removing duplicated images from the folder**

In [15]:
# Function for removing duplicated images.
def drop_duplicated_images(folder):

    # Define a threshold for image similarity
    threshold = 8

    # Define a dictionary to store the hash values and file paths of the images
    image_hashes = {}
    duplicated_images = []

    # Loop through all the image files in a directory
    for filename in os.listdir(folder):
        # Load the image file
        image = Image.open(os.path.join(folder, filename))

         # Compute the hash value of the image using the average hash algorithm
        hash_value = imagehash.average_hash(image)

        # Check if the hash value is already in the dictionary
        if hash_value in image_hashes:
            # If a similar hash value already exists, delete the duplicate image
            duplicated_images.append(filename)
            os.remove(os.path.join(folder, filename))
        else:
             # Otherwise, add the hash value and file path to the dictionary
            image_hashes[hash_value] = os.path.join(folder, filename)
            
    return duplicated_images

In [16]:
# Dropping duplicates
duplicated_images = drop_duplicated_images('cleaned_images/acne_images/')
duplicated_images[:5]

['07Acne0811011.jpg',
 '07AcnePittedScars1.jpg',
 '17a4d1a917e5faa9f2675ebd83e526791.jpg',
 '19280a95363d065b1dfbc654e28530291.jpg',
 '2 (52)1.jpg']

In [17]:
acne_img = [image_name for image_name in os.listdir('cleaned_images/acne_images/')] 
print('There are', len(acne_img),'acne images after removing duplicated images')

There are 1109 acne images after removing duplicated images


In [18]:
# Getting the indexes of the duplicated images so that they can be dropped from the acne_df_complete too.

duplicated_indexes = [acne_df_complete[acne_df_complete['images'] == image_name].index[0] \
                      for image_name in acne_df_complete['images']\
                      if image_name in duplicated_images]
duplicated_indexes[:10]

[369, 377, 394, 430, 448, 449, 450, 451, 452, 453]

In [19]:
# Dropping duplicated images from the dataframe.
acne_df_complete = acne_df_complete.copy()
acne_df_complete.drop(index=duplicated_indexes, inplace=True)
acne_df_complete.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1109 entries, 0 to 1425
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   skin_disorder_name  1109 non-null   object
 1   images              1109 non-null   object
dtypes: object(2)
memory usage: 26.0+ KB


Acne affecting the back images, Acne affecting the face images all have correct images. The Only change that will be made is changing the name to acne. </br>
Acne and other follicular disorder images has a collection of different images. Only images that have acne as a specific label will be included, the others will be dropped from the dataset. </br>
***
**Dealing with the collection of different images in Acne and other follicular disorder images**</br>

After careful evaluation of the images, the images that represent acne are:</br>
>>> **[299, 301, 302, 305, 306, 312, 324, 327, 331, 332, 334, 335, 336,</br>, 340, 352, 353, 356, 357, 358, 360, 365, 370, 379, 383, 386, 391, </br>, 394, 399, 400, 401, 404, 406, 407, 410, 410, 412, 414, 418, 435, </br> 439, 440, 442]**

In [20]:
# Note 📝: The indexes were confirmed to be the same even after merging the two dataframes
        #: This is because the acne_df is at the top in the complete dataframe

# indexes of the images in 'acne and other follicular disorder images'
indexes = acne_df[acne_df['skin_disorder_name'] == 'acne and other follicular disorder images'].index

# indexes of the acne images in 'acne and other follicular disorder images'
acne_indexes = [299, 301, 302, 305, 306, 312, 324, 327, 331, 332, 334, 335, 336,
                340, 352, 353, 356, 357, 358, 360, 365, 370, 379, 383, 386, 391,
                394, 399, 400, 401, 404, 406, 407, 410, 410, 412, 414, 418, 435,
                439, 440, 442]

# indexes of the other follicular disorder images in 'acne and other follicular disorder images'. This indexes will be dropped.
to_drop = []

for index in  indexes:
    if (index not in acne_indexes) and (index not in duplicated_indexes):
        to_drop.append(index)

# dropping indexes in to_drop
acne_df_complete.drop(to_drop, axis = 0, inplace=True)

In [21]:
# After dropping non-acne images, we still have 1000 images left
acne_df_complete.shape

(1000, 2)

In [22]:
# dropping those images from the acne_images folder

# Finding the image file names to be dropped from the folder
img_to_drop = []

for index in to_drop:
    for img_name in original_acne_img:
        if str(index) in img_name:
            img_to_drop.append(img_name)

# Dropping those images form the acne_images folder
for filename in img_to_drop:
    os.remove(os.path.join("cleaned_images/acne_images/", filename))

# Confirming that the number of images left is 1000
acne_img = [image_name for image_name in os.listdir('cleaned_images/acne_images/')] 
print('There are', len(acne_img),'acne images left.')

There are 1000 acne images left.


**vi. Changing the label to just acne**

In [23]:
acne_df_complete['skin_disorder_name'] = 'acne'
print(acne_df_complete.shape)
acne_df_complete.head()

(1000, 2)


Unnamed: 0,skin_disorder_name,images
0,acne,acne affecting the back images0.jpg
1,acne,acne affecting the back images1.jpg
2,acne,acne affecting the back images10.jpg
3,acne,acne affecting the back images11.jpg
4,acne,acne affecting the back images12.jpg


**vii. Saving the acne_df_complete dataframe as a csv file**

In [24]:
acne_df_complete.to_csv('cleaned_data/acne.csv', index=False)

## **<u>Atopic dermatitis(Eczema) </u>**

**Meaning**<br>
Atopic dermatitis, also known as eczema, is a chronic inflammatory skin condition that is characterized by dry, itchy, and inflamed patches of skin. It is a common condition that can affect people of all ages, but it is most common in infants and children. <br>

**Causes**<br>
The exact causes of atopic dermatitis are not fully understood, but it is believed to be a combination of genetic and environmental factors. People with atopic dermatitis often have a genetic predisposition to the condition, and environmental triggers such as allergens, irritants, and stress can exacerbate the symptoms.

**Symptoms**<br>
The symptoms of atopic dermatitis can vary depending on the severity of the condition. Mild cases may only present with dry, itchy skin, while more severe cases can lead to red, inflamed, and weeping skin lesions. In some cases, the skin may become thickened and scaly. Atopic dermatitis can also cause significant discomfort and interfere with a person's quality of life.

**Treatment**<br>
Treatment options for eczema include using gentle soaps and moisturizers, avoiding harsh chemicals and irritants, and taking short, lukewarm baths or showers. Prescription creams or ointments containing corticosteroids or immunosuppressants may be used for more severe cases of eczema. Antihistamines can also be helpful in reducing itching. <br>
Preventing flare-ups of eczema can be done by avoiding triggers such as certain foods, allergens, and irritants. Regular use of moisturizers can also help to keep the skin hydrated and reduce the risk of flare-ups.<br>

### **Cleaning Eczema images**

**Creating a dataframe with eczema images from the data scrapped from DermNet**

In [25]:
# Labels representing eczema in Dermnet's scraped data.

eczema_labels = image_df[(image_df['skin_disorder_name'].str.contains('eczema')) | \
                         (image_df['skin_disorder_name'].str.contains('atopic dermatitis images')) |\
                         (image_df['skin_disorder_name'].str.contains('hand dermatitis images')) |\
                         (image_df['skin_disorder_name'] == 'dermatitis images') |\
                         (image_df['skin_disorder_name'].str.contains('nummular dermatitis images'))] \
                         ['skin_disorder_name'].unique()
len(eczema_labels)

8

In [26]:
# Creating a dataframe with just eczema labels for easier cleaning

eczema_df = image_df[(image_df['skin_disorder_name'] == eczema_labels[0]) | \
                     (image_df['skin_disorder_name'] == eczema_labels[1]) | \
                     (image_df['skin_disorder_name'] == eczema_labels[2]) | \
                     (image_df['skin_disorder_name'] == eczema_labels[3]) | \
                     (image_df['skin_disorder_name'] == eczema_labels[4]) | \
                     (image_df['skin_disorder_name'] == eczema_labels[5]) | \
                     (image_df['skin_disorder_name'] == eczema_labels[6]) | \
                     (image_df['skin_disorder_name'] == eczema_labels[7]) 
                 ]
eczema_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 631 entries, 1058 to 8989
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   skin_disorder_name  631 non-null    object
 1   images              631 non-null    object
dtypes: object(2)
memory usage: 14.8+ KB


### **Extra eczema images**

In [27]:
extra_eczema = [image_name for image_name in os.listdir('extra_images/extra_eczema')]
extra_eczema[:5]

['0_0.jpg', '0_1.jpg', '0_10.jpg', '0_11.jpg', '0_12.jpg']

In [28]:
# The folder has a mixture of images. We will filter out the eczema images only

extra_eczema_images = [image_name for image_name in extra_eczema\
                        if ('dermatitis' in image_name) |\
                        ('eczema' in image_name)]
extra_eczema_images[:5]

['eczema234.jpg',
 'eczema289.jpg',
 'eczema301.jpg',
 'eczema403.jpg',
 'eczema500.jpg']

In [29]:
# Moving this images into their own folder called 'extra_eczema_images_clean'
folder_name = 'extra_images/extra_eczema_images_clean/'


# Note📝: For reproducibility of the code, this step is important.
         # If the folder is not dropped before an error will occur if you rerun this cell
         
# Checking if the folder exists and deleting it if it exists        
if os.path.exists(folder_name):
    # deleting the folder and its contents
    shutil.rmtree(folder_name)

# create the new folder
os.mkdir(folder_name)

for img in extra_eczema_images:
    origin = os.path.join('extra_images/extra_eczema/', img)
    destination = os.path.join(folder_name, img)
    shutil.copy(origin, destination)

In [30]:
#Creating a dataframe for the extra eczema images

label =['eczema' for img in extra_eczema_images]
extra_eczema_df = pd.DataFrame(extra_eczema_images, label).reset_index()
extra_eczema_df.columns =['skin_disorder_name', 'images']
print(extra_eczema_df.shape)
extra_eczema_df.head()

(736, 2)


Unnamed: 0,skin_disorder_name,images
0,eczema,eczema234.jpg
1,eczema,eczema289.jpg
2,eczema,eczema301.jpg
3,eczema,eczema403.jpg
4,eczema,eczema500.jpg


**i. Moving eczema images in the Image folder to their own folder**

In [31]:
# Getting the eczema images file names
eczema_img = [image_name for image_name in os.listdir('Images/') if ('eczema' in image_name) |
                                                                    ('atopic dermatitis images' in image_name) |
                                                                    ('hand dermatitis images' in image_name) | 
                                                                    (image_name.startswith('dermatitis images'))|
                                                                    ('nummular dermatitis images' in image_name)
                                                                     ] 

# Confirming the number of eczema images before any cleaning
print('There are', len(eczema_img),'eczema images.')
eczema_img[:5]

There are 631 eczema images.


['atopic dermatitis images1058.jpg',
 'atopic dermatitis images1059.jpg',
 'atopic dermatitis images1060.jpg',
 'atopic dermatitis images1061.jpg',
 'atopic dermatitis images1062.jpg']

In [32]:
# Creating a new folder with just eczema images to make cleaning easier
folder_name = 'cleaned_images/eczema_images/'

# Note📝: For reproducibility of the code, this step is important.
         # If the folder is not dropped before an error will occur if you rerun this cell
         
# Checking if the folder exists and deleting it if it exists         
if os.path.exists(folder_name):
    # deleting the folder and its contents
    shutil.rmtree(folder_name)

# create the new folder
os.mkdir(folder_name)

# Moving the images into that folder
for img in eczema_img:
    origin = os.path.join('Images/', img)
    destination = os.path.join(folder_name, img)
    shutil.copy(origin, destination)

In [33]:
# Confirming that the number of eczema images after moving them to a separate folder is still 631
eczema_img = [image_name for image_name in os.listdir('cleaned_images/eczema_images/')] 
print('There are', len(eczema_img),'eczema images.')

There are 631 eczema images.


**ii. Dropping links from the 'images' column in the eczema_df and replacing them with the image name**

In [34]:
# So that the two dataframes can match, we dropped the image links in  eczema_df 
# and replaced them with the image names

eczema_images = pd.DataFrame(eczema_img, columns=['images'])
eczema_df = eczema_df.copy()
eczema_df.drop('images', axis=1, inplace=True)
eczema_df['images'] = eczema_images['images'].values
eczema_df.head()

Unnamed: 0,skin_disorder_name,images
1058,atopic dermatitis images,atopic dermatitis images1058.jpg
1059,atopic dermatitis images,atopic dermatitis images1059.jpg
1060,atopic dermatitis images,atopic dermatitis images1060.jpg
1061,atopic dermatitis images,atopic dermatitis images1061.jpg
1062,atopic dermatitis images,atopic dermatitis images1062.jpg


**iii. Joining the two dataframes**

In [35]:
# Creating a dataframe with all of the eczema images

eczema_df_complete = pd.concat([eczema_df, extra_eczema_df], axis=0).reset_index()
eczema_df_complete.drop('index', axis=1, inplace=True)
print(eczema_df_complete.info())
eczema_df_complete.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1367 entries, 0 to 1366
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   skin_disorder_name  1367 non-null   object
 1   images              1367 non-null   object
dtypes: object(2)
memory usage: 21.5+ KB
None


Unnamed: 0,skin_disorder_name,images
0,atopic dermatitis images,atopic dermatitis images1058.jpg
1,atopic dermatitis images,atopic dermatitis images1059.jpg
2,atopic dermatitis images,atopic dermatitis images1060.jpg
3,atopic dermatitis images,atopic dermatitis images1061.jpg
4,atopic dermatitis images,atopic dermatitis images1062.jpg


**iv. Combining the images into one folder**

In [36]:
# This was done by moving the extra images into the eczema folder
for img in extra_eczema_images:
    origin = os.path.join('extra_images/extra_eczema_images_clean/', img)
    destination = os.path.join('cleaned_images/eczema_images/', img)
    shutil.copy(origin, destination)

In [37]:
# Confirming that the total acne images is 1367 before any cleaning

eczema_img = [image_name for image_name in os.listdir('cleaned_images/eczema_images/')] 
print('There are a total of', len(eczema_img),'eczema images.')

There are a total of 1367 eczema images.


**v. Removing duplicated images from the folder**

In [38]:
# Using a function created earlier to drop duplicates

duplicated_images = drop_duplicated_images('cleaned_images/eczema_images/')
duplicated_images[:5]

['atopic eczema images1147.jpg',
 'atopic eczema images1148.jpg',
 'atopic eczema images1149.jpg',
 'atopic eczema images1150.jpg',
 'atopic eczema images1151.jpg']

In [39]:
# Confirming the number of images after dropping duplicates

eczema_img = [image_name for image_name in os.listdir('cleaned_images/eczema_images/')] 
print('There are', len(eczema_img),'eczema images after removing duplicated images.')

There are 1000 eczema images after removing duplicated images.


In [40]:
# Getting the indexes of the duplicated images so that they can be dropped from the eczema_df_complete too.

duplicated_indexes = [eczema_df_complete[eczema_df_complete['images'] == image_name].index[0] \
                      for image_name in eczema_df_complete['images']\
                      if image_name in duplicated_images]
duplicated_indexes[:10]

[89, 90, 91, 92, 93, 94, 95, 96, 97, 98]

In [41]:
# Dropping duplicated images from the dataframe.
eczema_df_complete = eczema_df_complete .copy()
eczema_df_complete .drop(duplicated_indexes, axis=0, inplace=True)
eczema_df_complete .info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 0 to 1366
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   skin_disorder_name  1000 non-null   object
 1   images              1000 non-null   object
dtypes: object(2)
memory usage: 23.4+ KB


**vi. Changing the label to just eczema**

In [42]:
eczema_df_complete['skin_disorder_name'] = 'eczema'
print(eczema_df_complete.shape)
eczema_df_complete.head()

(1000, 2)


Unnamed: 0,skin_disorder_name,images
0,eczema,atopic dermatitis images1058.jpg
1,eczema,atopic dermatitis images1059.jpg
2,eczema,atopic dermatitis images1060.jpg
3,eczema,atopic dermatitis images1061.jpg
4,eczema,atopic dermatitis images1062.jpg


**vii. Saving the acne_df_complete dataframe as a csv file**

In [43]:
eczema_df_complete.to_csv('cleaned_data/eczema.csv', index=False)

### Actinic keratosis

**Meaning** <br>
Actinic keratosis(AK) is a skin condition that is caused by long-term exposure to UV rays, resulting in the formation of rough, scaly patches on the skin. It is considered a precancerous condition because it has the potential to develop into squamous cell carcinoma, a type of skin cancer

**Causes** <br>
The primary cause of actinic keratosis is long-term exposure to UV rays from the sun or other sources such as tanning beds. People with fair skin, light-colored hair, and light-colored eyes are at a higher risk of developing AK. Other risk factors include a history of frequent sunburns, a weakened immune system, and exposure to chemicals such as coal tar or arsenic.

**Symptoms** <br>
The most common symptom of actinic keratosis is the formation of rough, scaly patches or lesions on the skin. These patches can be pink, red, or brown in color and may feel like sandpaper. They are usually found on areas of the skin that are frequently exposed to the sun, such as the face, scalp, ears, neck, hands, and arms. In some cases, the patches may itch or burn, and they may become inflamed or bleed if they are scratched or rubbed.

**Treatment** <br>
The treatment of actinic keratosis depends on the severity of the condition. Mild cases may be treated with topical creams or gels that contain medications such as imiquimod, fluorouracil, or diclofenac. These medications work by stimulating the immune system or causing the abnormal cells to die off. In more severe cases, cryotherapy (freezing the lesions with liquid nitrogen) or curettage (scraping off the lesions with a special tool) may be necessary. In rare cases where the lesions have developed into skin cancer, surgical removal may be required. It is also important to take steps to prevent further damage to the skin, such as wearing protective clothing and sunscreen, avoiding tanning beds, and staying out of the sun during peak hours.


In [44]:
# image labels with the name keratosis
print(image_df[image_df['skin_disorder_name'].str.contains('keratosis')]['skin_disorder_name'].unique())

['actinic keratosis affecting the face images'
 'actinic keratosis affecting the hand images'
 'actinic keratosis affecting the legs and feet images'
 'actinic keratosis affecting the scalp images'
 'actinic keratosis dermoscopy images'
 'actinic keratosis on the nose images'
 'actinic keratosis treated with imiquimod images'
 'granular parakeratosis images' 'keratosis pilaris images'
 'seborrhoeic keratosis dermoscopy images' 'seborrhoeic keratosis images'
 'solar keratosis affecting the face images'
 'solar keratosis affecting the hand images'
 'solar keratosis affecting the legs and feet images'
 'solar keratosis affecting the scalp images'
 'solar keratosis on the nose images'
 'solar keratosis treated with imiquimod images']


Actinic keratosis is also known as solar keratosis or senile keratosis

In [45]:
# dataframe with actinic keratosis and solar keratosis labels
keratosis_df = image_df[(image_df['skin_disorder_name'].str.contains('actinic keratosis')) | \
                  (image_df['skin_disorder_name'].str.contains('solar keratosis'))]
print(keratosis_df.shape)
keratosis_df.head(2)

(427, 2)


Unnamed: 0,skin_disorder_name,images
504,actinic keratosis affecting the face images,https://dermnetnz.org/assets/Uploads/lesions/a...
505,actinic keratosis affecting the face images,https://dermnetnz.org/assets/Uploads/lesions/a...


In [46]:
# extra keratosis dataframes
#first dataframe
df = pd.read_csv('Data/ISIC_2019_Training_GroundTruth.csv')

# filter df to get rows where AK = 1.0
df1 = df.copy()
df1 = df1[df1['AK'] == 1.0]
df1['skin_disorder_name'] = df1['images']

# drop the unwanted columns from df
df1 = df1.drop(['MEL', 'NV', 'BCC', 'AK', 'BKL', 'DF', 'VASC', 'SCC', 'UNK'], axis=1)

# Loop through each file in the folder and add the skin disorder name to the list
img_names = []
image_paths = []

for file in os.listdir('extra_images/extra_actinic_keratosis_images'):
    if file.endswith(".jpg"):
        skin_disorder_name = file.split(".")[0]
        img_names.append(skin_disorder_name)
        image_paths.append(file)

# Create a second Pandas DataFrame with the list of skin disorder names
df2 = pd.DataFrame({"skin_disorder_name": img_names, "images": image_paths})

# Merge the dataframes
AK_df = pd.concat([keratosis_df, df1, df2], axis=0)

# display the merged dataframe
print(AK_df.shape)
AK_df.head(4)

(1391, 2)


Unnamed: 0,skin_disorder_name,images
504,actinic keratosis affecting the face images,https://dermnetnz.org/assets/Uploads/lesions/a...
505,actinic keratosis affecting the face images,https://dermnetnz.org/assets/Uploads/lesions/a...
506,actinic keratosis affecting the face images,https://dermnetnz.org/assets/Uploads/lesions/a...
507,actinic keratosis affecting the face images,https://dermnetnz.org/assets/Uploads/lesions/a...


In [47]:
# Getting the keratosis images file names
keratosis_img = [image_name for image_name in os.listdir('Images/') if ('actinic keratosis' in image_name) | ('solar keratosis' in image_name)]
AK_img = [image_name for image_name in os.listdir('extra_images/extra_AK_and_BKL_images') if any(x in image_name for x in df1['images'].tolist())]
AK_img2 = [image_name for image_name in os.listdir('extra_images/extra_actinic_keratosis_images')]

# Checking if the folder exists and deleting it if it exists
if os.path.exists('cleaned_images/keratosis_images/'):
    # deleting the folder and its contents
    shutil.rmtree('cleaned_images/keratosis_images/')
    
# Creating a new folder with just keratosis images to make cleaning easier
os.mkdir('cleaned_images/keratosis_images/')
for img in keratosis_img:
    origin = os.path.join('Images/', img)
    destination = os.path.join('cleaned_images/keratosis_images/', img)
    shutil.copy(origin, destination)

for img in AK_img:
    origin = os.path.join('extra_images/extra_AK_and_BKL_images/', img)
    destination = os.path.join('cleaned_images/keratosis_images/', img)
    shutil.copy(origin, destination)

for img in AK_img2:
    origin = os.path.join('extra_images/extra_actinic_keratosis_images/', img)
    destination = os.path.join('cleaned_images/keratosis_images/', img)
    shutil.copy(origin, destination)
    
# Confirming that the number of keratosis images after moving them to a separate folder is still 1294
keratosis_img = [image_name for image_name in os.listdir('cleaned_images/keratosis_images/')] 
print('There are', len(keratosis_img),'actinic keratosis images')

There are 1391 actinic keratosis images


In [48]:
# call function to drop duplicates from image folder 
duplicated_images = drop_duplicated_images('cleaned_images/keratosis_images/')
duplicated_images[184:188]

['ISIC_0072940.jpg',
 'ISIC_0073068.jpg',
 'ISIC_0073157.jpg',
 'ISIC_0073198.jpg']

In [49]:
# number of images after removing duplicates
keratosis_img = [image_name for image_name in os.listdir('cleaned_images/keratosis_images/')] 
print('Number of ctinic keratosis images after removing duplicated images:', len(keratosis_img))

# Remove duplicate images from the dataframe
# for the first keratosis_df, the image name on folder has index added to skin_disorder_name 
mask = AK_df['skin_disorder_name'].str.contains('actinic keratosis') | AK_df['skin_disorder_name'].str.contains('solar keratosis')
AK_df.loc[mask, 'skin_disorder_name'] = AK_df.loc[mask, 'skin_disorder_name'] + AK_df.loc[mask].index.astype(str)
AK_df['skin_disorder_name'] = AK_df['skin_disorder_name'].apply(lambda x: x + '.jpg')
duplicated_df = AK_df[AK_df['skin_disorder_name'].isin(duplicated_images)]
merged_df = AK_df.merge(duplicated_df, on="skin_disorder_name", how="outer", indicator=True)
AK_df = merged_df.loc[merged_df["_merge"]=="left_only"].drop_duplicates(subset=["skin_disorder_name"]).drop(columns=['images_y', '_merge'])
print(f'Shape of Actinic keratosis dataframe{AK_df.shape}')

Number of ctinic keratosis images after removing duplicated images: 1000
Shape of Actinic keratosis dataframe(1000, 2)


The images and the dataframe have the same number of rows, 1000, on removing duplicates

In [50]:
# rename the values of skin_disorder_name column to actinic keratosis
AK_df['skin_disorder_name'] = AK_df['skin_disorder_name'].replace(AK_df['skin_disorder_name'].unique(), 'actinic keratosis')
AK_df = AK_df.rename(columns={'images_x': 'images'})
AK_df.head(4)

Unnamed: 0,skin_disorder_name,images
0,actinic keratosis,https://dermnetnz.org/assets/Uploads/lesions/a...
1,actinic keratosis,https://dermnetnz.org/assets/Uploads/lesions/a...
2,actinic keratosis,https://dermnetnz.org/assets/Uploads/lesions/a...
3,actinic keratosis,https://dermnetnz.org/assets/Uploads/lesions/a...


In [51]:
# save BKL dataframe to csv file
AK_df.to_csv('cleaned_data/AK.csv', index=False)

## Benign Keratosis-like Lesions

**Meaning** <br>
Benign Keratosis-like Lesions (BKL) are a group of benign skin lesions that resemble actinic keratosis (AK) but are not classified as AK because they do not have the same degree of dysplasia. BKL lesions can appear as small, scaly, or waxy bumps on the skin, ranging in color from light tan to dark brown. They typically occur on areas of the skin that have been exposed to the sun, such as the face, neck, scalp, and hands. Examples of BKL lesions include seborrheic keratosis, solar lentigo, and lichen planus-like keratosis.
 
**Causes** <br>
The exact cause of BKL is not known, but it is believed to be related to long-term sun exposure. Other factors that may contribute to the development of BKL include a weakened immune system, age, and a history of other skin conditions.

**Symptoms**
BKL lesions typically appear as small, scaly, or waxy bumps on the skin. They may be light tan to dark brown in color and may have a rough, textured surface. They can be single or multiple and can occur on any part of the body, but are most commonly found on the face, neck, scalp, and hands.

**Treatment**
BKL lesions are usually benign and do not require treatment unless they are causing symptoms or affecting the patient's appearance. Treatment options may include cryotherapy (freezing the lesion with liquid nitrogen), curettage (scraping the lesion off the skin), or topical medications such as 5-fluorouracil or imiquimod. In some cases, BKL lesions may be biopsied to confirm the diagnosis or rule out other skin conditions. It is important to protect the skin from sun exposure and to seek medical attention for any suspicious skin lesions.

In [52]:
# filter df to get rows where BKL = 1.0
BKL_df = df.copy()
BKL_df = BKL_df[BKL_df['BKL'] == 1.0]
BKL_df = BKL_df[~BKL_df["images"].str.contains("downsampled")]
BKL_df['skin_disorder_name'] = BKL_df['images']

# drop the unwanted columns and rows from df
BKL_df = BKL_df.drop(['MEL', 'NV', 'BCC', 'AK', 'BKL', 'DF', 'VASC', 'SCC', 'UNK'], axis=1)
BKL_df = BKL_df[:1003]

# display the dataframe
print(BKL_df.shape)
BKL_df.head(3)

(1003, 2)


Unnamed: 0,images,skin_disorder_name
1008,ISIC_0010491,ISIC_0010491
1544,ISIC_0012811,ISIC_0012811
1619,ISIC_0012998,ISIC_0012998


In [53]:
# Getting the BKL images file names
BKL_img = [image_name for image_name in os.listdir('extra_images/extra_AK_and_BKL_images') if any(x in image_name for x in BKL_df['images'].tolist())]

# Checking if the folder exists and deleting it if it exists
if os.path.exists('cleaned_images/BKL_images/'):
    # deleting the folder and its contents
    shutil.rmtree('cleaned_images/BKL_images/')
    
# Creating a new folder with just BKL images to make cleaning easier
os.mkdir('cleaned_images/BKL_images/')
for img in BKL_img:
    origin = os.path.join('extra_images/extra_AK_and_BKL_images/', img)
    destination = os.path.join('cleaned_images/BKL_images/', img)
    shutil.copy(origin, destination)
    
# Confirming that the number of BKL images after moving them to a separate folder is still 1000
BKL_img = [image_name for image_name in os.listdir('cleaned_images/BKL_images/')] 
print('There are', len(BKL_img),'BKL images')

There are 1003 BKL images


In [54]:
# drop duplicates from image folder
duplicated_images2 = drop_duplicated_images('cleaned_images/BKL_images/')
duplicated_images2

['ISIC_0027218.jpg', 'ISIC_0031511.jpg', 'ISIC_0032315.jpg']

In [55]:
# number of images after removing duplicates
BKL_img = [image_name for image_name in os.listdir('cleaned_images/BKL_images/')] 
print('Number of BKL images after removing duplicated images:', len(BKL_img))

#Remove duplicate images from the dataframe
BKL_df['skin_disorder_name'] = BKL_df['skin_disorder_name'].apply(lambda x: x + '.jpg')
duplicated_df = BKL_df[BKL_df['skin_disorder_name'].isin(duplicated_images2)]
merged_df = BKL_df.merge(duplicated_df, on="skin_disorder_name", how="outer", indicator=True)
BKL_df = merged_df.loc[merged_df["_merge"]=="left_only"].drop_duplicates(subset=["skin_disorder_name"]).drop(columns=['images_y', '_merge'])
print(f'Shape of BKL dataframe: {BKL_df.shape}')

Number of BKL images after removing duplicated images: 1000
Shape of BKL dataframe: (1000, 2)


In [56]:
# rename the values of skin_disorder_name column to Benign Keratosis-like Lesions
BKL_df['skin_disorder_name'] = BKL_df['skin_disorder_name'].replace(BKL_df['skin_disorder_name'].unique(), 'Benign Keratosis-like Lesions')
BKL_df = BKL_df.rename(columns={'images_x': 'images'})
BKL_df.head(4)

Unnamed: 0,images,skin_disorder_name
0,ISIC_0010491,Benign Keratosis-like Lesions
1,ISIC_0012811,Benign Keratosis-like Lesions
2,ISIC_0012998,Benign Keratosis-like Lesions
3,ISIC_0024312,Benign Keratosis-like Lesions


In [57]:
# save BKL dataframe to csv file
BKL_df.to_csv('cleaned_data/BKL.csv', index=False)

## Melanoma

**Definition**


Melanoma is a disease in which malignant (cancer) cells form in melanocytes (cells that color the skin). There are different types of cancer that start in the skin. Melanoma can occur anywhere on the skin. Unusual moles, exposure to sunlight, and health history can affect the risk of melanoma.

The most common type of melanoma is superficial spreading melanoma. It tends to spread across the surface of the skin, has uneven borders, and varies in color from brown to black, pink, or red.

Nodular melanoma is another type that grows down into deeper layers of the skin and may appear as a raised bump or growth.

Lentigo maligna melanoma tends to appear on parts of the body that get more sun, especially the face, and it often affects older people. It looks like a large, uneven dark patch on the surface of the skin.

Metastatic melanoma occurs when the cancer spreads, or metastasizes, to other parts of the body, possibly including the lymph nodes, organs, or bones.

Other rare types of melanoma also exist, and while it most commonly affects the skin, some affect internal tissues, as well as the eyes.

## Cleaning Melanoma Images

**Creating a dataframe with melanoma images from the data scrapped from DermNet**

In [58]:
## Labels representing melanoma in DermNet's scrapped data
melanoma_labels = image_df[image_df['skin_disorder_name'].str.contains('melanoma')]['skin_disorder_name'].unique()
melanoma_labels

array(['acral lentiginous melanoma images', 'amelanotic melanoma images',
       'hypomelanotic malignant melanoma images',
       'lentigo maligna melanoma images', 'melanoma in situ images',
       'melanoma of nail unit images', 'metastatic melanoma images',
       'nodular melanoma images', 'superficial spreading melanoma images'],
      dtype=object)

In [59]:
#there are 9 labels representing melanoma
len(melanoma_labels)

9

In [60]:
# Creating a dataframe with just melanoma labels for easier cleaning

melanoma_df = image_df[(image_df['skin_disorder_name'] == melanoma_labels[0]) | \
                   (image_df['skin_disorder_name'] == melanoma_labels[1]) | \
                   (image_df['skin_disorder_name'] == melanoma_labels[2]) | \
                   (image_df['skin_disorder_name'] == melanoma_labels[3]) | \
                   (image_df['skin_disorder_name'] == melanoma_labels[4]) | \
                   (image_df['skin_disorder_name'] == melanoma_labels[5]) | \
                   (image_df['skin_disorder_name'] == melanoma_labels[6]) | \
                   (image_df['skin_disorder_name'] == melanoma_labels[7]) | \
                   (image_df['skin_disorder_name'] == melanoma_labels[8])
                 ]
melanoma_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 469 entries, 473 to 12721
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   skin_disorder_name  469 non-null    object
 1   images              469 non-null    object
dtypes: object(2)
memory usage: 11.0+ KB


#### Moving Melanoma images to their own folder

In [61]:
# Getting the melanoma images file names
melanoma_img = [image_name for image_name in os.listdir('Images/') if 'melanoma' in image_name] 

# Confirming the number of melanoma images before any cleaning
print('There are', len(melanoma_img),'melanoma images')
melanoma_img[:10]

There are 469 melanoma images


['acral lentiginous melanoma images473.jpg',
 'acral lentiginous melanoma images474.jpg',
 'acral lentiginous melanoma images475.jpg',
 'acral lentiginous melanoma images476.png',
 'acral lentiginous melanoma images477.jpg',
 'acral lentiginous melanoma images478.jpg',
 'acral lentiginous melanoma images479.jpg',
 'acral lentiginous melanoma images480.jpg',
 'acral lentiginous melanoma images481.jpg',
 'acral lentiginous melanoma images482.png']

In [62]:
# Creating a new folder with just melanoma images to make cleaning easier
folder_name = 'cleaned_images/melanoma_images/'

# Note: For reproducibility of the code, this step is important.
# If the folder is not dropped before an error will occur if you rerun this cell

# Checking if the folder exists and deleting it if it exists
if os.path.exists(folder_name):
    # deleting the folder and its contents
    shutil.rmtree(folder_name)

# create the parent directory if it does not exist
parent_dir = os.path.dirname(folder_name)
if not os.path.exists(parent_dir):
    os.makedirs(parent_dir)

# create the new folder if it doesn't exist
if not os.path.exists(folder_name):
    os.mkdir(folder_name)

# Moving the images into that folder
for img in melanoma_img:
    origin = os.path.join('Images/', img)
    destination = os.path.join(folder_name, img)
    shutil.copy(origin, destination)

In [63]:
# Confirming that the total melanoma images is 1,452 before any cleaning

melanoma_img = [image_name for image_name in os.listdir('cleaned_images/melanoma_images/')] 
print('There are a total of', len(melanoma_img),'melanoma images.')

There are a total of 469 melanoma images.


#### Removing duplicated images

In [64]:
def drop_duplicated_images(folder):

    # Define a threshold for image similarity
    threshold = 8

    # Define a dictionary to store the hash values and file paths of the images
    image_hashes = {}
    duplicated_images = []

    # Loop through all the image files in a directory
    for filename in os.listdir(folder):
        # Load the image file
        image = Image.open(os.path.join(folder, filename))

         # Compute the hash value of the image using the average hash algorithm
        hash_value = imagehash.average_hash(image)

        # Check if the hash value is already in the dictionary
        if hash_value in image_hashes:
            # If a similar hash value already exists, delete the duplicate image
            duplicated_images.append(filename)
            os.remove(os.path.join(folder, filename))
        else:
             # Otherwise, add the hash value and file path to the dictionary
            image_hashes[hash_value] = os.path.join(folder, filename)
            
    return duplicated_images

In [65]:
# Dropping duplicates
duplicated_images = drop_duplicated_images('cleaned_images/melanoma_images/')
duplicated_images[:5]

['hypomelanotic malignant melanoma images6320.jpg',
 'hypomelanotic malignant melanoma images6321.jpg',
 'hypomelanotic malignant melanoma images6322.jpg',
 'hypomelanotic malignant melanoma images6323.jpg',
 'hypomelanotic malignant melanoma images6324.jpg']

acral lentiginous melanoma images, amelanotic melanoma images,hypomelanotic malignant melanoma images, lentigo maligna melanoma images, melanoma in situ images, melanoma of nail unit images, metastatic melanoma images, nodular melanoma images and superficial spreading melanoma images are all correct images of melanoma. We will now standardize our image labels to **melanoma** 

#### Changing image labels to Melanoma

In [66]:
# convert list to pandas DataFrame
melanoma_df = pd.DataFrame(melanoma_img, columns=['image_path'])

# add new column with value 'melanoma'
melanoma_df['skin_disorder_name'] = 'melanoma'
melanoma_df.head()

Unnamed: 0,image_path,skin_disorder_name
0,acral lentiginous melanoma images473.jpg,melanoma
1,acral lentiginous melanoma images474.jpg,melanoma
2,acral lentiginous melanoma images475.jpg,melanoma
3,acral lentiginous melanoma images476.png,melanoma
4,acral lentiginous melanoma images477.jpg,melanoma


In [67]:
# save melanoma dataframe to csv file
melanoma_df.to_csv('cleaned_data/melanoma.csv',index=False)

## **<u>Psoriasis </u>**

**Meaning**<br>
Psoriasis is a chronic autoimmune skin disorder characterized by the rapid buildup of skin cells that form thick, silvery scales and itchy, dry, and red patches on the skin. It is a non-contagious condition that can affect any part of the body, including the scalp, nails, and joints. Psoriasis occurs when the immune system mistakenly attacks healthy skin cells, causing the skin cells to grow too quickly and accumulate on the skin's surface. The condition is typically lifelong and can vary in severity from mild to severe. While there is no cure for psoriasis, there are treatments available that can help manage symptoms and improve quality of life. <br>

**Causes**<br>
The exact cause of psoriasis is not fully understood, but it is believed to be a combination of genetic, environmental, and immune system factors. Some of the known factors that can trigger or exacerbate psoriasis include:<br>

<ol>
  <li>Genetics: Psoriasis tends to run in families, suggesting a genetic component to the condition.</li>
  <li>Immune system dysfunction: Psoriasis is believed to be an autoimmune disorder, in which the immune system mistakenly attacks healthy skin cells, causing inflammation and other symptoms.</li>
  <li>Environmental factors: Certain environmental factors can trigger or worsen psoriasis, such as infections, injuries to the skin, stress, smoking, and alcohol consumption.</li>
  <li>Medications: Certain medications, such as lithium, beta-blockers, and antimalarials, can trigger or worsen psoriasis.</li>
  <li>Hormonal changes: Changes in hormone levels, such as those that occur during puberty, pregnancy, and menopause, can trigger or worsen psoriasis.</li>
</ol>

**Symptoms**<br>
Psoriasis symptoms can vary depending on the type and severity of the condition, but some common symptoms include:<br>
<ol>
  <li>Red, inflamed patches of skin: These patches may be covered with thick, silvery scales that may flake off or bleed if scratched.</li>
  <li>Dry, cracked skin: The affected skin may be dry and itchy, and may crack and bleed in severe cases.</li>
  <li>Thickened, pitted, or ridged nails: Psoriasis can affect the nails, causing them to become thickened, discolored, pitted, or ridged.</li>
  <li>Joint pain and stiffness: In some cases, psoriasis can also cause joint pain and stiffness, a condition called psoriatic arthritis.</li>
  <li>Itching and burning: Psoriasis patches may be itchy and burning, which can cause discomfort and distress.</li>
  <li>Soreness or discomfort: Psoriasis patches can be painful and tender to the touch.</li>
</ol>


**Treatment**<br>
Treatment options for eczema include using gentle soaps and moisturizers, avoiding harsh chemicals and irritants, and taking short, lukewarm baths or showers. Prescription creams or ointments containing corticosteroids or immunosuppressants may be used for more severe cases of eczema. Antihistamines can also be helpful in reducing itching. <br>
Preventing flare-ups of eczema can be done by avoiding triggers such as certain foods, allergens, and irritants. Regular use of moisturizers can also help to keep the skin hydrated and reduce the risk of flare-ups.<br>
<ol>
  <li>Topical medications: These are creams, ointments, gels, or foams that are applied directly to the affected skin to reduce inflammation and itching. Topical medications may include corticosteroids, vitamin D analogues, retinoids, and tar preparations.</li>
  <li>Phototherapy: This involves exposing the skin to ultraviolet light to slow down the growth of affected skin cells and reduce inflammation. Phototherapy can be done in a doctor's office or at home using a special light box.</li>
  <li>Systemic medications: These are medications that are taken orally or by injection to suppress the immune system and reduce inflammation. Systemic medications may include methotrexate, cyclosporine, and biologics.</li>
  <li>Lifestyle changes: Making changes to your diet, reducing stress, and avoiding triggers such as smoking and alcohol consumption may help to reduce the frequency and severity of psoriasis flare-ups.</li>
  <li>Moisturizers: Applying moisturizers regularly can help to soothe dry, itchy skin and reduce the risk of flare-ups.</li>
</ol>

### **Cleaning Psoriasis images**


**Creating a dataframe with psoriasis images from the data scrapped from DermNet**

In [68]:
# Labels representing acne in DermNet's scrapped data
psoriasis_labels = list(image_df[image_df['skin_disorder_name'].str.contains('psoriasis')]['skin_disorder_name'].unique())

psoriasis_labels

['chronic plaque psoriasis images',
 'facial psoriasis images',
 'flexural psoriasis images',
 'generalised pustular psoriasis images',
 'genital psoriasis images',
 'guttate psoriasis images',
 'nail psoriasis images',
 'palmoplantar psoriasis images',
 'psoriasis affecting the face images',
 'psoriasis of the scalp images',
 'pustular psoriasis of the hand and feet images']

In [69]:
# Count of labels representing psoriasis
len(psoriasis_labels)

11

In [70]:
# Creating a dataframe with just psoriasis labels for easier cleaning

psoriasis_df = image_df[(image_df['skin_disorder_name'] == psoriasis_labels[0]) | \
                   (image_df['skin_disorder_name'] == psoriasis_labels[1]) | \
                   (image_df['skin_disorder_name'] == psoriasis_labels[2]) | \
                   (image_df['skin_disorder_name'] == psoriasis_labels[3]) | \
                   (image_df['skin_disorder_name'] == psoriasis_labels[4]) | \
                   (image_df['skin_disorder_name'] == psoriasis_labels[5]) | \
                   (image_df['skin_disorder_name'] == psoriasis_labels[6]) | \
                   (image_df['skin_disorder_name'] == psoriasis_labels[7]) | \
                   (image_df['skin_disorder_name'] == psoriasis_labels[8]) | \
                   (image_df['skin_disorder_name'] == psoriasis_labels[9]) | \
                   (image_df['skin_disorder_name'] == psoriasis_labels[10])
                 ]
psoriasis_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 522 entries, 2796 to 10354
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   skin_disorder_name  522 non-null    object
 1   images              522 non-null    object
dtypes: object(2)
memory usage: 12.2+ KB


### **Extra Psoriasis images**

In [71]:
#  copy psoriasis images
source_dir = 'extra_images/extra_psoriasis_images'
dest_dir = 'extra_images/psoriasis_images'

if not os.path.exists(dest_dir):
    os.mkdir(dest_dir)

for filename in os.listdir(source_dir):
    if 'psor' in filename:
        shutil.copy(os.path.join(source_dir, filename), dest_dir)

In [72]:
extra_psoriasis = [image_name for image_name in os.listdir('extra_images/psoriasis_images')]
extra_psoriasis[:5]

['t-08psoriasisIntertrig020606.jpg',
 't-08psoriasisIntertrig0206061.jpg',
 't-psoriasis-Chronic-plaque-00325.jpg',
 't-psoriasis-digits-1.jpg',
 't-psoriasis-digits-3.jpg']

In [73]:
#Creating a dataframe for the extra acne images

label =['psoriasis' for img in extra_psoriasis]
extra_psoriasis_df = pd.DataFrame(extra_psoriasis, label).reset_index()
extra_psoriasis_df.columns =['skin_disorder_name', 'images']
extra_acne_df.head()

Unnamed: 0,skin_disorder_name,images
0,acne,07Acne081101.jpg
1,acne,07Acne0811011 - Copy.jpg
2,acne,07Acne0811011.jpg
3,acne,07AcnePittedScars.jpg
4,acne,07AcnePittedScars1 - Copy.jpg


**i. Moving psoriasis images in the Images folder to their own folder**

In [74]:
# Getting the acne images file names
original_psoriasis_img = [image_name for image_name in os.listdir('Images/') \
                     if ('chronic plaque psoriasis images' in image_name) |\
                        ('facial psoriasis images' in image_name) |\
                        ('flexural psoriasis images' in image_name) |\
                        ('generalised pustular psoriasis images' in image_name) |\
                        ('genital psoriasis images' in image_name) |\
                        ('guttate psoriasis images' in image_name) |\
                        ('palmoplantar psoriasis images' in image_name) |\
                        ('psoriasis affecting the face images' in image_name) |\
                        ('psoriasis of the scalp images' in image_name) |\
                        ('pustular psoriasis of the hand and feet images' in image_name) |\
                        ('nail psoriasis images' in image_name) 
                        ] 

# Confirming the number of acne images before any cleaning
print('There are', len(original_psoriasis_img),'psoriasis images')
original_psoriasis_img[:5]

There are 522 psoriasis images


['chronic plaque psoriasis images2796.jpg',
 'chronic plaque psoriasis images2797.jpg',
 'chronic plaque psoriasis images2798.jpg',
 'chronic plaque psoriasis images2799.jpg',
 'chronic plaque psoriasis images2800.jpg']

In [75]:
# Creating a new folder with just acne images to make cleaning easier
folder_name = 'cleaned_images/psoriasis_images/'



# Note📝: For reproducibility of the code, this step is important.
         # If the folder is not dropped before an error will occur if you rerun this cell
         
# Checking if the folder exists and deleting it if it exists        
if os.path.exists(folder_name):
    # deleting the folder and its contents
    shutil.rmtree(folder_name)

# create the new folder
os.mkdir(folder_name)

# Moving the images into that folder
for img in original_psoriasis_img:
    origin = os.path.join('Images/', img)
    destination = os.path.join(folder_name, img)
    shutil.copy(origin, destination)

In [76]:
# Confirming that the number of acne images after moving them to a separate folder is still 522
psoriasis_img = [image_name for image_name in os.listdir('cleaned_images/psoriasis_images/')] 
print('There are', len(psoriasis_img),'psoriasis images.')

There are 522 psoriasis images.


**ii. Dropping links from the 'images' column in the psoriasis_df and replacing them with the image name**

In [77]:
# So that the two dataframes can match, we dropped the image links in  psoriasis_df 
# and replaced them with the image names

psoriasis_images = pd.DataFrame(psoriasis_img, columns=['images'])
psoriasis_df = psoriasis_df.copy()
psoriasis_df.drop('images', axis=1, inplace=True)
psoriasis_df['images'] = psoriasis_images['images'].values
psoriasis_df.head()

Unnamed: 0,skin_disorder_name,images
2796,chronic plaque psoriasis images,chronic plaque psoriasis images2796.jpg
2797,chronic plaque psoriasis images,chronic plaque psoriasis images2797.jpg
2798,chronic plaque psoriasis images,chronic plaque psoriasis images2798.jpg
2799,chronic plaque psoriasis images,chronic plaque psoriasis images2799.jpg
2800,chronic plaque psoriasis images,chronic plaque psoriasis images2800.jpg


**iii. Joining the two dataframes**

In [78]:
# Creating a dataframe with all of the psoriasis images

psoriasis_df_complete = pd.concat([psoriasis_df, extra_psoriasis_df], axis=0).reset_index()
psoriasis_df_complete.drop('index', axis=1, inplace=True)
print(psoriasis_df_complete.info())
psoriasis_df_complete.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 874 entries, 0 to 873
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   skin_disorder_name  874 non-null    object
 1   images              874 non-null    object
dtypes: object(2)
memory usage: 13.8+ KB
None


Unnamed: 0,skin_disorder_name,images
0,chronic plaque psoriasis images,chronic plaque psoriasis images2796.jpg
1,chronic plaque psoriasis images,chronic plaque psoriasis images2797.jpg
2,chronic plaque psoriasis images,chronic plaque psoriasis images2798.jpg
3,chronic plaque psoriasis images,chronic plaque psoriasis images2799.jpg
4,chronic plaque psoriasis images,chronic plaque psoriasis images2800.jpg


**iv. Combining the images into one folder**

In [79]:
# This was done by moving the extra images into the psoriasis folder
for img in extra_psoriasis:
    origin = os.path.join('extra_images/extra_psoriasis_images/', img)
    destination = os.path.join('cleaned_images/psoriasis_images/', img)
    shutil.copy(origin, destination)

**v. Removing duplicated images from the folder**

In [80]:
# Function for removing duplicated images.

def drop_duplicated_images(folder):

    # Define a threshold for image similarity
    threshold = 8

    # Define a dictionary to store the hash values and file paths of the images
    image_hashes = {}
    duplicated_images = []

    # Loop through all the image files in a directory
    for filename in os.listdir(folder):
        # Load the image file
        image = Image.open(os.path.join(folder, filename))

         # Compute the hash value of the image using the average hash algorithm
        hash_value = imagehash.average_hash(image)

        # Check if the hash value is already in the dictionary
        if hash_value in image_hashes:
            # If a similar hash value already exists, delete the duplicate image
            duplicated_images.append(filename)
            os.remove(os.path.join(folder, filename))
        else:
             # Otherwise, add the hash value and file path to the dictionary
            image_hashes[hash_value] = os.path.join(folder, filename)
            
    return duplicated_images

In [81]:
# Confirming  the total psoriasis images 

psoriasis_img = [image_name for image_name in os.listdir('cleaned_images/psoriasis_images/')] 
print('There are a total of', len(psoriasis_img),'psoriasis images.')

There are a total of 874 psoriasis images.


In [82]:
# Dropping duplicates
duplicated_images = drop_duplicated_images('cleaned_images/psoriasis_images/')
duplicated_images[:5]

['genital psoriasis images5363.jpg',
 'genital psoriasis images5364.jpg',
 'genital psoriasis images5376.jpg',
 'genital psoriasis images5385.jpg',
 'nail psoriasis images8573.jpg']

In [83]:
# Getting the indexes of the duplicated images so that they can be dropped from the psoriasis_df_complete too.

duplicated_indexes = [psoriasis_df_complete[psoriasis_df_complete['images'] == image_name].index[0] \
                      for image_name in psoriasis_df_complete['images']\
                      if image_name in duplicated_images]
duplicated_indexes[:10]

[143, 144, 156, 165, 219, 222, 337, 339, 340, 350]

In [84]:
# Dropping duplicated images from the dataframe.
psoriasis_df_complete = psoriasis_df_complete.copy()
psoriasis_df_complete.drop(index=duplicated_indexes, inplace=True)
psoriasis_df_complete.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 827 entries, 0 to 873
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   skin_disorder_name  827 non-null    object
 1   images              827 non-null    object
dtypes: object(2)
memory usage: 19.4+ KB


**vi. Changing the label to just acne**

In [85]:
psoriasis_df_complete['skin_disorder_name'] = 'psoriasis'
print(psoriasis_df_complete.shape)
psoriasis_df_complete.head()

(827, 2)


Unnamed: 0,skin_disorder_name,images
0,psoriasis,chronic plaque psoriasis images2796.jpg
1,psoriasis,chronic plaque psoriasis images2797.jpg
2,psoriasis,chronic plaque psoriasis images2798.jpg
3,psoriasis,chronic plaque psoriasis images2799.jpg
4,psoriasis,chronic plaque psoriasis images2800.jpg


vii. Saving the psoriasis_df_complete dataframe as a csv file

In [86]:
psoriasis_df_complete.to_csv('cleaned_data/psoriasis.csv', index=False)

# Basal cell carcinoma
### Meaning
Basal cell carcinoma (BCC) is a malignant tumor that also arises from the basal cells in the skin. It is the most common type of skin cancer. Malignant tumors are more dangerous than benign tumors because they can grow quickly and invade and destroy surrounding tissues, leading to significant damage to the body's normal functions.

#### Causes
Basal cell carcinoma are primarily caused by exposure to ultraviolet (UV) radiation from the sun or tanning beds. Prolonged exposure to UV radiation damages the DNA in the skin cells, leading to mutations that can cause the cells to grow and divide uncontrollably, eventually forming a tumor. Other factors that may increase the risk of developing these skin cancers include having fair skin, a history of sunburns or intense sun exposure, a weakened immune system, a family history of skin cancer, and certain genetic conditions.

### Symptoms
Basal cell carcinoma can have similar symptoms, but there are some differences. The condition typically present as raised, pearly, or translucent bumps or lesions on the skin that may be pink, red, or white in color. These lesions can sometimes ulcerate or bleed, and may develop a crust or scab. This sometimes appear as a flat, scaly, or pigmented patch on the skin.

### Treatment
Basal cell carcinoma treatment plan depends on the size, location, and extent of the tumor. Surgical removal of the tumor is the primary treatment, and there are different techniques available, such as excision, curettage and electrodesiccation, Mohs surgery, or radiation therapy. For small tumors, excision may be sufficient, while Mohs surgery is recommended for larger or more advanced tumors to ensure complete removal. Radiation therapy may be used as an alternative to surgery for some cases. Systemic chemotherapy or immunotherapy is rarely used for advanced basal cell carcinoma that has spread to other parts of the body.

In [87]:
import pandas as pd

In [88]:
extra_bcc = [image_name for image_name in os.listdir('Bcc_images')]
extra_bcc[:5]

['ISIC_0024331.jpg',
 'ISIC_0024332.jpg',
 'ISIC_0024345.jpg',
 'ISIC_0024360.jpg',
 'ISIC_0024403.jpg']

In [89]:
#Creating a dataframe for the extra bcc images

label =['Basal cell carcinoma' for img in extra_bcc]
extra_bcc_df = pd.DataFrame(extra_bcc, label).reset_index()
extra_bcc_df.columns =['skin_disorder_name', 'images']
extra_bcc_df.head()

Unnamed: 0,skin_disorder_name,images
0,Basal cell carcinoma,ISIC_0024331.jpg
1,Basal cell carcinoma,ISIC_0024332.jpg
2,Basal cell carcinoma,ISIC_0024345.jpg
3,Basal cell carcinoma,ISIC_0024360.jpg
4,Basal cell carcinoma,ISIC_0024403.jpg


In [90]:
# Confirming the number of bcc images before any cleaning
print('There are', len(extra_bcc_df),'Bcc images')
extra_bcc_df[:5]

There are 3323 Bcc images


Unnamed: 0,skin_disorder_name,images
0,Basal cell carcinoma,ISIC_0024331.jpg
1,Basal cell carcinoma,ISIC_0024332.jpg
2,Basal cell carcinoma,ISIC_0024345.jpg
3,Basal cell carcinoma,ISIC_0024360.jpg
4,Basal cell carcinoma,ISIC_0024403.jpg


> This images has no duplicates

Taking the top 1000 images

In [91]:
new_bcc_df = extra_bcc_df[0:1000]

In [92]:
new_bcc_df.to_csv('cleaned_data\Bcc.csv')

In [93]:
#saving the first 1000 images to anew folder
import os
import shutil

# Define the source and destination directories
src_dir = 'Bcc_images'
dst_dir = 'cleaned_images/Bcc_images'

# Create the destination directory if it doesn't exist
if not os.path.exists(dst_dir):
    os.mkdir(dst_dir)

# Get the list of image names
extra_bcc = [image_name for image_name in os.listdir(src_dir)]
new_bcc_df = extra_bcc[0:1000]

# Copy the first 1000 images to the destination directory
for image_name in new_bcc_df:
    src_path = os.path.join(src_dir, image_name)
    dst_path = os.path.join(dst_dir, image_name)
    shutil.copy(src_path, dst_path)


## **<u>Tinea popularly known as Ringworm (scalp and body) </u>**

**Meaning**<br>
Tinea is a type of fungal infection of the skin, hair, or nails. It is caused by a group of fungi called dermatophytes, which can thrive on the skin's keratin, a tough protein that forms the outer layer of the skin, hair, and nails.<br>

<br>It can affect various parts of the body, including the feet (athlete's foot), groin (jock itch), scalp (tinea capitis), beard area (tinea barbae), and body (tinea corporis).<br>

<br>Tinea infections are highly contagious and can spread through contact with infected skin or objects. Treatment for tinea typically involves antifungal medications, which can be applied topically or taken orally. It is also important to practice good hygiene and avoid sharing personal items, such as towels and clothing, to prevent the spread of infection. <br>

**Causes**<br>
Tinea is caused by a group of fungi called dermatophytes. These fungi can thrive on the skin, hair, or nails and cause infections in various parts of the body. The specific type of dermatophyte that causes tinea may vary depending on the affected area

Some of the common causes of tinea include:<br>

<ol>
  <li>Direct contact with an infected person or animal - Tinea can spread from person to person or from animal to person through direct contact with infected skin or hair.</li>
  <li>Sharing personal items - Sharing personal items such as towels, clothing, or hairbrushes can also spread tinea.</li>
  <li>Warm and humid environment - Dermatophytes thrive in warm and humid environments, making certain areas of the body more susceptible to tinea infections, such as the feet and groin.</li>
  <li>Weakened immune system - People with weakened immune systems, such as those with HIV or undergoing chemotherapy, may be more prone to tinea infections.</li>
  <li>Skin injury or irritation - Skin that is injured or irritated, such as from scratching or wearing tight-fitting clothing, may be more susceptible to tinea infections.</li>
</ol>

Preventing the spread of tinea involves good hygiene practices, such as keeping the skin clean and dry, avoiding sharing personal items, and wearing protective clothing in public areas such as locker rooms and swimming pools.

**Symptoms**<br>
The symptoms of tinea can vary depending on the affected area of the body. Some common symptoms of tinea infections include:<br>
<ol>
  <li>Itching - Tinea infections can cause intense itching, which can be worse at night.</li>
  <li>Scaling or flaking - Tinea infections can cause the skin to become scaly or flaky.</li>
  <li>Blisters - Some types of tinea infections, such as tinea pedis (athlete's foot), can cause small fluid-filled blisters.</li>
  <li>Hair loss - Tinea infections of the scalp can cause hair to become brittle and break off, resulting in hair loss.</li>
  <li>Thickened or discolored nails - Tinea infections of the nails can cause the nails to become thickened, discolored, and brittle.</li>
</ol>


**Treatment**<br>
The treatment for tinea infections typically involves antifungal medications, which can be applied topically or taken orally. The specific treatment will depend on the location and severity of the infection. Some common treatment options include:<br>
<ol>
  <li>Topical antifungal medications - These medications are applied directly to the skin or nails and include creams, ointments, sprays, and powders. Topical antifungal medications are often effective for mild to moderate tinea infections.</li>
  <li>Oral antifungal medications - These medications are taken by mouth and may be prescribed for more severe or widespread tinea infections. Oral antifungal medications include terbinafine, fluconazole, and itraconazole.</li>
  <li>Medicated shampoo - A medicated shampoo may be recommended for tinea infections of the scalp. These shampoos contain antifungal medication and are used to help control the infection and reduce symptoms.</li>
  <li>Removal of infected nails - In severe cases of tinea infections of the nails, the infected nail may need to be removed to allow for the application of antifungal medication to the underlying nail bed.</li>
</ol>

In [94]:
## Labels representing melanoma in DermNet's scrapped data
ringworm_labels = image_df[image_df['skin_disorder_name'].str.contains('tinea')]['skin_disorder_name'].unique()
ringworm_labels

array(['tinea corporis images', 'tinea pedis images'], dtype=object)

In [95]:
#labels representing ringworms
len(ringworm_labels)

2

In [96]:
# Creating a dataframe with just melanoma labels for easier cleaning

ringworm_df = image_df[(image_df['skin_disorder_name'] == ringworm_labels[0]) | \
                   
                   (image_df['skin_disorder_name'] == ringworm_labels[1])
                 ]
ringworm_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 46 entries, 12879 to 12924
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   skin_disorder_name  46 non-null     object
 1   images              46 non-null     object
dtypes: object(2)
memory usage: 1.1+ KB


### *Moving Ringworm images to their own folder*

In [97]:
# Getting the melanoma images file names
ringworm_img = [image_name for image_name in os.listdir('Images/') if 'tinea' in image_name] 

# Confirming the number of ringwoem images before any cleaning
print('There are', len(ringworm_img),'ringworm images')
ringworm_img[:10]

There are 46 ringworm images


['tinea corporis images12879.jpg',
 'tinea corporis images12880.jpg',
 'tinea corporis images12881.jpg',
 'tinea corporis images12882.jpg',
 'tinea corporis images12883.jpg',
 'tinea corporis images12884.jpg',
 'tinea corporis images12885.jpg',
 'tinea corporis images12886.jpg',
 'tinea corporis images12887.jpg',
 'tinea corporis images12888.jpg']

In [98]:
# Creating a new folder with just melanoma images to make cleaning easier
folder_name = 'cleaned_images/ringworm_images/'

# Note: For reproducibility of the code, this step is important.
# If the folder is not dropped before an error will occur if you rerun this cell


# create the parent directory if it does not exist
parent_dir = os.path.dirname(folder_name)
if not os.path.exists(parent_dir):
    os.makedirs(parent_dir)

# create the new folder if it doesn't exist
if not os.path.exists(folder_name):
    os.mkdir(folder_name)

# Moving the images into that folder
for img in ringworm_img:
    origin = os.path.join('Images/', img)
    destination = os.path.join(folder_name, img)
    shutil.copy(origin, destination)

In [99]:
# Confirming that the total ringworm images is 1,690 before any cleaning

ringworm_img = [image_name for image_name in os.listdir('cleaned_images/ringworm_images/')] 
print('There are a total of', len(ringworm_img),'ringworm images.')

There are a total of 46 ringworm images.


In [100]:
def drop_duplicated_images(folder):

    # Define a threshold for image similarity
    threshold = 8

    # Define a dictionary to store the hash values and file paths of the images
    image_hashes = {}
    duplicated_images = []

    # Loop through all the image files in a directory
    for filename in os.listdir(folder):
        # Load the image file
        image = Image.open(os.path.join(folder, filename))

         # Compute the hash value of the image using the average hash algorithm
        hash_value = imagehash.average_hash(image)

        # Check if the hash value is already in the dictionary
        if hash_value in image_hashes:
            # If a similar hash value already exists, delete the duplicate image
            duplicated_images.append(filename)
            os.remove(os.path.join(folder, filename))
        else:
             # Otherwise, add the hash value and file path to the dictionary
            image_hashes[hash_value] = os.path.join(folder, filename)
            
    return duplicated_images

In [101]:
# Dropping duplicates
duplicated_images = drop_duplicated_images('cleaned_images/ringworm_images/')
duplicated_images[:5]

[]

In [102]:
ringworm_img = [image_name for image_name in os.listdir('cleaned_images/ringworm_images/')] 
print('There are', len(ringworm_img),'acne images after removing duplicated images')

There are 46 acne images after removing duplicated images


In [103]:
# Define the source and destination directories
src_dir = 'cleaned_images/ringworm_images'
dst_dir = 'cleaned_images/tinea_images'

# Create the destination directory if it doesn't exist
if not os.path.exists(dst_dir):
    os.mkdir(dst_dir)

# Get the list of image names
extra_ringworm = [image_name for image_name in os.listdir(src_dir)]
new_tinea_df = extra_ringworm[0:1000]

# Copy the first 1000 images to the destination directory
for image_name in new_tinea_df:
    src_path = os.path.join(src_dir, image_name)
    dst_path = os.path.join(dst_dir, image_name)
    shutil.copy(src_path, dst_path)


In [104]:
# convert list to pandas DataFrame
ringworm_df = pd.DataFrame(ringworm_img, columns=['cleaned_images/tinea_images'])

# add new column with value 'ringworm'
ringworm_df['skin_disorder_name'] = 'ringworm'
ringworm_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 46 entries, 0 to 45
Data columns (total 2 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   cleaned_images/tinea_images  46 non-null     object
 1   skin_disorder_name           46 non-null     object
dtypes: object(2)
memory usage: 864.0+ bytes


In [105]:
ringworm_df = ringworm_df[0:1000]

In [106]:
ringworm_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 46 entries, 0 to 45
Data columns (total 2 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   cleaned_images/tinea_images  46 non-null     object
 1   skin_disorder_name           46 non-null     object
dtypes: object(2)
memory usage: 864.0+ bytes


In [107]:
# save ringworm dataframe to csv file
ringworm_df.to_csv('cleaned_data/ringworm.csv',index=False)