# Initial Experimental Setup - Scenario 4 Builder

## Implementation

The purpose of this notebook is to implement the code for build the Scenario 4 for the experimental setup as outlined in sections 3.2.1 of the bachelor thesis.

As a reminder, the Scenario 4 cases are composed by random selected data from the conventional augmented and synthetic augmented datasets.

The code provided in this notebook was developed using the Kaggle platform.

## Step 1 - Importing Dependencies

- Importing the necessary libraries to execute the code.

In [None]:
from torchvision.datasets import ImageFolder
from torch.utils.data import ConcatDataset
import random
import os

## Step 2 - Defining Utils Functions

- Functions and variables to sample the datasets.

In [None]:
# Discrete classes for the analysis
discrete_class = {0 : "0_punching_hole",
                  1 : "1_welding_line",
                  2 : "2_crescent_gap",
                  3 : "3_water_spot",
                  4 : "4_oil_spot",
                  5 : "5_silk_spot",
                  6 : "6_inclusion",
                  7 : "7_rolled_pit",
                  8 : "8_crease",
                  9 : "9_waist_folding"}

# Desired final number of instances per class
target = {0: 860, 1: 826, 2: 856, 3: 816, 4: 870, 5: 584, 6: 862, 7: 981, 8: 967, 9: 904}


def count_instances(dataset):
    """
    Counting the number of instances is a desired dataset
    Used for double check the final results
    """
    class_count = {}
    for _, label in dataset:
        if label in class_count:
            class_count[label] += 1
        else:
            class_count[label] = 1
    return class_count


def random_sample(class_count, synthetic_dataset):
    """
    Random sampling the dataset until the class_count value
    """
    synthetic_subset = []
    for label in class_count.keys():
        label_instances = [(image, label) for image, l in synthetic_dataset if l == label]
        sample_size = class_count[label]
        if sample_size > len(label_instances):
            sample_size = len(label_instances)
        sampled_instances = random.sample(label_instances, sample_size)
        synthetic_subset.extend(sampled_instances)
    return synthetic_subset

## Step 3 - Building the Scenario 4

- Building the Scenario 4.
- This code will generate a Scenario 4 for the desired combination of conventional and synthetic data.

In [None]:
augmentedDataset  = ImageFolder(root='/path/to/the/conventinal/augmented/images')
syntheticDataset  = ImageFolder(root='/path/to/the/synthetic/augmented/images')

scenario4_dataset = ConcatDataset([augmentedDataset, syntheticDataset])
scenario4_dataset = random_sample(target, scenario4_dataset)

scenario_instances = count_instances(scenario4_dataset)
print(f"Instances per class intances: {scenario_instances}")

## Step 4 - Saving the Dataset

- Saving the Scenario 4 dataset in a specific location.

In [None]:
path="/path/to/save"

i = 0 
for image, label in scenario4_dataset:
    folder_name = discrete_class[label]
    image_name = os.path.basename(f"Class_{label}_Image_{i}.jpg") 
    folder_path = os.path.join(path, folder_name)
    os.makedirs(folder_path, exist_ok=True)
    image_path = os.path.join(folder_path, image_name)
    image.save(image_path)
    i += 1