Here we do some experiments with image resizing.  
We will use `pillow` module (a fork of `PIL`), included with Anaconda 2.2.  
For more info: http://pillow.readthedocs.org/ 

Let us set up for parallelization:

In [1]:
from IPython.parallel import Client
rc = Client()
print "Running on %d engines" % len(rc.ids)

Running on 2 engines


In [2]:
%%px --local
import os
import os.path
from PIL import Image

Let us set the parameters in one place:

In [3]:
%%px --local
src_dir = "/kaggle/retina/sample" # source directory of images to resize 
trg_dir = "/kaggle/retina/resized" # target directory of the resized images 
prefix = "resized_" # string to prepend to the resized file name
hsize = 256 # horizontal size of the resized image
vsize = 256 # vertical size of the resized image  
all_files = filter(lambda x: x.endswith(".jpeg"), os.listdir(src_dir))

Let us use a single engine first.

**Load** an image:

In [4]:
filename = all_files[0]
filepath = os.path.join(src_dir, filename)
%timeit Image.open(filepath)
im = Image.open(filepath)

The slowest run took 79.37 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 134 µs per loop


**Resize** the image with default downsampling:  

In [5]:
%timeit im.resize((hsize, vsize))
resized_im = im.resize((hsize, vsize), Image.NEAREST)

The slowest run took 483.11 times longer than the fastest. This could mean that an intermediate result is being cached 
1 loops, best of 3: 450 µs per loop


LANCZOS anti-aliasing method is recommended for downsampling by PIL tutorial, but is much slower:

In [6]:
%timeit im.resize((hsize, vsize), Image.LANCZOS) 

1 loops, best of 3: 191 ms per loop


**Save** the resized image.  
Parameter value `quality` > 95 is not recommended due to excessive file size with minimal benefits, but we do not care.  
More info on file formats can be found here: http://pillow.readthedocs.org/handbook/image-file-formats.html

In [7]:
if not os.path.exists(trg_dir):
    os.makedirs(trg_dir)
    
resized_filepath = os.path.join(trg_dir, prefix + filename)
%timeit resized_im.save(resized_filepath, "JPEG", quality = 100) 

100 loops, best of 3: 5.43 ms per loop


Let us define functions that do the above in one go:    

In [8]:
%%px --local
def resize_method(filename, method):
    filepath = os.path.join(src_dir, filename)
    im = Image.open(filepath)
    resized_im = im.resize((hsize, vsize), method)
    resized_filepath = os.path.join(trg_dir, prefix + filename)
    resized_im.save(resized_filepath, "JPEG", quality = 100)          
    
def resize_NEAREST(filename):
    resize_method(filename, Image.NEAREST)
    
def resize_LANCZOS(filename):
    resize_method(filename, Image.LANCZOS)

For quick and dirty experiments we can use the default downsampling. 
Here we create downsized copies of all files in the sample directory with two downsampling methods:

In [9]:
%timeit -n1 -r1 map(resize_NEAREST, all_files)
%timeit -n1 -r1 map(resize_LANCZOS, all_files)

1 loops, best of 1: 1.65 s per loop
1 loops, best of 1: 3.15 s per loop


Since the processing is dominated by the CPU-bound resizing we can benefit from parallelization:

In [10]:
v = rc[:]
%timeit -n1 -r1 v.map_sync(resize_NEAREST, all_files)
%timeit -n1 -r1 v.map_sync(resize_LANCZOS, all_files)

1 loops, best of 1: 1.02 s per loop
1 loops, best of 1: 1.89 s per loop


Vendor:  Continuum Analytics, Inc.
Package: mkl
Message: trial mode expires in 17 days
Vendor:  Continuum Analytics, Inc.
Package: mkl
Message: trial mode expires in 17 days
