# Dataset preprocessing

- Images source: https://www.deviantart.com/where-is-waldo-wally/gallery/all

## Crop orignal images

Original images are cropped in order to remove redundant pixels and make their resolution multiple of 256.

Original dimension: `4130 x 2455` pixels\
Target dimension: `3584 x 2304` pixels

In [1]:
import glob
from PIL import Image


images = glob.glob('original/*_image.jpg')
print("Found %d images" % len(images))

for image_path in images:
    image_name = image_path[image_path.rindex('/') + 1:image_path.rindex('.')]
    print("Processsing %s..." % image_name)
    
    image = Image.open(image_path)
    cropped = image.crop((0, 70, 3584, 2374))
    cropped.save('cropped/' + image_name + '.jpg', quality=95)

Found 60 images
Processsing 47_image...




Processsing 57_image...
Processsing 46_image...
Processsing 12_image...
Processsing 38_image...
Processsing 14_image...
Processsing 19_image...
Processsing 05_image...
Processsing 35_image...
Processsing 18_image...
Processsing 04_image...
Processsing 50_image...
Processsing 36_image...
Processsing 27_image...
Processsing 13_image...
Processsing 11_image...
Processsing 23_image...
Processsing 07_image...
Processsing 20_image...
Processsing 55_image...
Processsing 32_image...
Processsing 54_image...
Processsing 49_image...
Processsing 06_image...
Processsing 08_image...
Processsing 56_image...
Processsing 01_image...
Processsing 53_image...
Processsing 33_image...
Processsing 16_image...
Processsing 37_image...
Processsing 31_image...
Processsing 34_image...
Processsing 45_image...
Processsing 40_image...
Processsing 52_image...
Processsing 59_image...
Processsing 42_image...
Processsing 21_image...
Processsing 22_image...
Processsing 10_image...
Processsing 28_image...
Processsing 29_i

## Create sub-images

Cropped images and relative masks are divided in `256 x 256` sub-images.

Only images associated to a mask which contains at least one white pixel are saved.

In [None]:
import glob
from PIL import Image


images = glob.glob('cropped/*_image.jpg')
masks = glob.glob('cropped/*_mask.jpg')
print("Found %d images and %d masks" % (len(images), len(masks)))
    
for image_path, mask_path in zip(images, masks):
    
    image_name = image_path[image_path.rindex('/') + 1:image_path.rindex('.')]
    print("Processsing %s..." % image_name)
        
    image = Image.open(image_path)
    mask = Image.open(mask_path)

    assert image.size == mask.size
    width, height = image.size
    rangex = int(width / 256)
    rangey = int(height / 256)

    for x in range(rangex):
        for y in range(rangey):
            bbox = (x * 256, y * 256, (x + 1) * 256, (y + 1) * 256)
            
            sub_mask = mask.crop(bbox).convert('1')
            if sub_mask.getextrema() == (0, 255):
                sub_mask.save(
                    'masks/' + image_name + '_' + str(x) + '_' + str(y) + '.jpg',
                    quality=95
                )

                sub_image = image.crop(bbox)
                sub_image.save(
                    'images/' + image_name + '_' + str(x) + '_' + str(y) + '.jpg',
                    quality=95
                )