# Image pre-processing
This notebook contains the code to pre-process images for our blur detection.
The input data contains microscopy images (JPG in RGB) from the MR4 set. We create a dataset of sharp (original) images and blurry images (by blurring the original sharp image).

The following operations are performed:
1. read images from folder and loop through them
2. resize the image to a square and apply zero (black) padding
3. create a blurred version of the image by applying Gaussian Blur
4. save the images

In [3]:
import cv2 as cv
import matplotlib.pyplot as plt
import numpy as np
import glob
import math

In [5]:
IMAGE_FOLDER = '../samples/'
OUTPUT_FOLDER = '../processed/'
BLUR_SIZE = 9
OUTPUT_DIM = 200

In [7]:
# Load files from input folder
dir_content = glob.glob(IMAGE_FOLDER+'*.jpg')
input_filenames = [x.split('/')[-1] for x in dir_content]

In [4]:
%%time
# Perform reading, resizing, blurring, saving

counter = 0
num_files = len(input_filenames)

step = math.floor(0.1*num_files)

for file in input_filenames:
    
    # Create an empty black output image
    blur_out_img = np.zeros((OUTPUT_DIM, OUTPUT_DIM,3))
    orig_out_img = np.zeros((OUTPUT_DIM, OUTPUT_DIM,3))
    
    # Read the raw image into memory
    raw_img = cv.imread(IMAGE_FOLDER+file)

    # Scale the original image to fit in the output format
    scale = OUTPUT_DIM/max(raw_img.shape[0],raw_img.shape[1])
    img = cv.resize(raw_img, (0,0), fx=scale, fy=scale)
        
    # Apply blur
    blurred_img = cv.GaussianBlur(img, (BLUR_SIZE, BLUR_SIZE),0)
    
    # Fit the scaled blurred image into the empty black image (as zero padding)
    blur_out_img[ :blurred_img.shape[0], :blurred_img.shape[1] ] = blurred_img
    orig_out_img[ :img.shape[0], :img.shape[1] ] = img
    
    cv.imwrite(OUTPUT_FOLDER+'blur/'+file, blur_out_img)
    cv.imwrite(OUTPUT_FOLDER+'orig/'+file, orig_out_img)
    
    if counter % step == 0:
        percentage = round( counter / num_files,1 )
        print( '{} done'.format(percentage) )
    counter += 1
        
print('done.')

0.0 done
0.1 done
0.2 done
0.3 done
0.4 done
0.5 done
0.6 done
0.7 done
0.8 done
0.9 done
1.0 done
done.
CPU times: user 9min 34s, sys: 12 s, total: 9min 46s
Wall time: 9min 48s
