Mount drive and access proper directory to place and read the files

In [None]:
from google.colab import drive
drive.mount('/content/drive')

%cd /content/drive/MyDrive/Colab\ Notebooks/AI/Assignment/
%pwd

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/Colab Notebooks/AI/Assignment


'/content/drive/MyDrive/Colab Notebooks/AI/Assignment'

Import the relevant libraries (os, shutil, random) in order to manipulate files into the correct folders, and choose a train/test distribution with random samples

In [None]:
import os
import shutil
import random

Define where to read the images from, where they will go, the training/test ratio, and finally a mapping for the abbreviations used in the filenames versus the actual emotions that will be used in the FER Classification Model

In [None]:
INPUT_PATH = "./JAFFE/Original"
OUTPUT_PATH = "./JAFFE/Converted"

# 70% Training, 30% Testing
TRAIN_RATIO=0.7

# Mapping from JAFFE abbreviations to emotion names
EMOTION_MAP = {
    "AN": "Anger",
    "DI": "Disgust",
    "FE": "Fear",
    "HA": "Happiness",
    "NE": "Neutral",
    "SA": "Sadness",
    "SU": "Surprise"
}

Create the relevant folders for training and testing, and get all the image files (JAFFE uses .tiff).

Then, shuffle the files around so that it is somewhat random if the notebook is ever ran more than once.

Finally, perform the split by deciding where the "split point" should be (split_index) and using list manipulation

In [None]:
# Make sure output folders exist
train_dir = os.path.join(OUTPUT_PATH, "Training")
test_dir = os.path.join(OUTPUT_PATH, "Test")
os.makedirs(train_dir, exist_ok=True)
os.makedirs(test_dir, exist_ok=True)

# Get all image filenames
files = [f for f in os.listdir(INPUT_PATH) if f.lower().endswith((".tiff"))]

# Shuffle using random library to make sure the split of testing/training is not
# always the same or similar whenever this nb is ran
random.shuffle(files)

# Perform the split into training and testing
split_index = int(len(files) * TRAIN_RATIO)
train_files = files[:split_index]
test_files = files[split_index:]

print("Length of files ", len(files))
print("Train files ", len(train_files))
print("Test files ", len(test_files))

Length of files  213
Train files  149
Test files  64


Now that the training and testing folders have been created, it's necessary to make the emotion folders inside of them. Split up the name by parts using a period delimiter. Then use the `EMOTION_MAP` to determine what the name of each emotion folder should be.

In [None]:
def process_file_list(file_list, destination):
    # Every image is being considered
    for filename in file_list:
        # Example filename: "KA.AN1.39.tiff" --> split on every "."
        parts = filename.split(".")

        if len(parts) < 2:
            malformed_filename_count = malformed_filename_count + 1
            continue

        # Remove the number of the pose, e.g.: 'AN1' --> 'AN'
        emotion_abbreviation = parts[1][:2]

        # Map the abbreviation to the actual emotion name. Skip invalid
        # emotions not in the list or Disgust -- more information in report
        emotion = EMOTION_MAP.get(emotion_abbreviation)
        if emotion is None or emotion == "Disgust":
            continue

        # Create subfolder for this emotion, don't overwrite existing ones
        emotion_folder = os.path.join(destination, emotion)
        os.makedirs(emotion_folder, exist_ok=True)

        # Copy the file into the converted folder instead of moving it
        # to maintain the original structure
        src_path = os.path.join(INPUT_PATH, filename)
        dst_path = os.path.join(emotion_folder, filename)

        shutil.copy(src_path, dst_path)

Output the status of the conversion as it's happeninng

In [None]:
print("Processing the Training set..")
process_file_list(train_files, train_dir)

print("Processing the Test set..")
process_file_list(test_files, test_dir)

print("Complete")

Processing the Training set..
Processing the Test set..
Complete


Output the number of each, this can be used to confirm the split worked correctly

In [None]:
print(f"Training images: {len(train_files)}")
print(f"Test images: {len(test_files)}")

Training images: 149
Test images: 64
