# Data Preprocessing

## By Art Style
The goal is to resize the photos to 224x224 pixels for ResNet.

The options are:
- Resize to 224x224
- Resize with padding to 224x224
- Segment into parts and break into 224x224

We will use the last two options as they will not distort the images. The concern of resizing to 224x224 is the loss of resolution which may reduce the information or eliminate features which would be useful. An example would be Pointillism. This art style is characterized by the use of dots to form its images. If our resolution is too low, our images may no longer show this feature. This is why we will also try to break the image into a 3x3 grid. If necessary the grid can be increased to prevent any information loss, but that would be computationally expensive.

The results will be uploaded to [Google Drive](https://drive.google.com/drive/folders/1fsx3uTF6Ho_kbfiKEjF-EojjG6GhN7is).

### Resize

We will add zero padding and resize the images to 224x224.

In [9]:
from PIL import Image, ImageOps
import os

# Input and output paths
input_folder = "E:\\wikiART\\"
output_folder = "E:\\wikiART224\\"

target_size = (224, 224) #RESNET

# Iterate through each style folder
for style_folder in os.listdir(input_folder):
    style_path = os.path.join(input_folder, style_folder)
    output_style_path = os.path.join(output_folder, style_folder)

    # Create output folder if it doesn't exist
    os.makedirs(output_style_path, exist_ok=True)

    # Iterate through images in the style folder
    for image_file in os.listdir(style_path):
        if image_file.endswith(".jpg"):
            image_path = os.path.join(style_path, image_file)
            output_path = os.path.join(output_style_path, image_file)

            # Open the image
            with Image.open(image_path) as img:
                # Resize the image with zero padding
                padded_img = ImageOps.pad(img, target_size, method=0, color=0)
                
                # Save the resized image
                padded_img.save(output_path)

print("Resizing and saving completed.")


Resizing and saving completed.


### Segment

We will add zero padding to one dimension to transform the images to a 1:1 aspect ratio. We will then segment into a 3x3 grid and resize each to 224x224. A 3x3 grid was chosen as it's the smallest dimensions which will provide a center crop. 

In [10]:
from PIL import Image, ImageOps
import os

# Input and output paths
input_folder = "E:\\wikiART\\"
output_folder = "E:\\wikiART9\\"

# Size and number of splits (square grid)
target_size = (224, 224) #ResNET
num_splits = 3  # 3x3 grid

# Check if the input folder exists
if not os.path.exists(input_folder) or not os.path.isdir(input_folder):
    print(f"Error: Input folder '{input_folder}' does not exist.")
    exit()

# Create output folder if it doesn't exist
os.makedirs(output_folder, exist_ok=True)

# Iterate through each style folder
for style_folder in os.listdir(input_folder):
    style_path = os.path.join(input_folder, style_folder)

    # Check if the item in the input folder is a directory
    if os.path.isdir(style_path):
        output_style_path = os.path.join(output_folder, style_folder)
        os.makedirs(output_style_path, exist_ok=True)

        # Iterate through images in the style folder
        for image_file in os.listdir(style_path):
            if image_file.endswith(".jpg"):
                image_path = os.path.join(style_path, image_file)
                output_path_prefix = os.path.join(output_style_path, image_file.split('.')[0])

                # Open the image
                with Image.open(image_path) as img:
                    # Resize the image with zero padding to make it 1:1 aspect ratio
                    padded_img = ImageOps.pad(img, (max(img.size), max(img.size)), method=0, color=0)

                    # Split the image into num_splits x num_splits smaller images
                    for i in range(num_splits):
                        for j in range(num_splits):
                            left = i * (padded_img.width // num_splits)
                            upper = j * (padded_img.height // num_splits)
                            right = left + (padded_img.width // num_splits)
                            lower = upper + (padded_img.height // num_splits)

                            # Crop the image
                            cropped_img = padded_img.crop((left, upper, right, lower))

                            # Save the cropped image
                            output_path = f"{output_path_prefix}_crop_{i}_{j}.jpg"
                            cropped_img.save(output_path)

print("Resizing, padding, and splitting completed.")


Resizing, padding, and splitting completed.


## By Genre

Move files into the format that Pytorch needs (class folders) using `labels.csv`