# Image iterators using the multiprocessing module
*Jonas Teuwen*

In this notebook we show how to combine the multiprocessing module in Python with iterators.

This can be useful in deep learning scripts when you, for instance, want to write an iterator which extracts and augments patches from your image on-the-fly to feed to your convolutional neural network (CNN).

We simulate the following problem:
- You have a list of images, in this case represented by filenames in `self.images`.
- We continuously load an image, and put these into a queue, ready to read.
- Next should give your next image, in this case we only output the filename.

Each process will load one of the images, but as the processes are now separated we need some way to track which images have already been passed and which ones have not. To implement such a counter we can use shared memory in python. The multiprocessing `RawValue` implements such a `ctype` which allows multiple processes to read from the same variable. There is one problem: race conditions. It might happen that the counter value has not been updated yet and then the next process will output the same image. To handle this we use `Lock` to lock the counter when we either read or write to it to prevent such conditions.

Note that your output will not necessarily be in the same order as your input as some of the processes processes might complete faster. For CNNs with stochastic gradient decent this is definitely no problem as shuffling improves the result. Check out the Stochastic Gradient Descent Tricks (Sec. 4) at http://cilvr.cs.nyu.edu/diglib/lsml/bottou-sgd-tricks-2012.pdf. 

For more information check https://docs.python.org/2/library/multiprocessing.html.

In [1]:
import numpy as np
from multiprocessing import Process, RawValue, Lock, Queue

In [2]:
class CounterMulti(object):
    def __init__(self, val=0):
        self.val = RawValue('i', val)
        self.lock = Lock()

    def incr(self):
        with self.lock:
            self.val.value += 1

    def value(self):
        with self.lock:
            return self.val.value

class MultiImageIter(object):
    def __init__(self):
        self.images = ['image_{}.jpg'.format(i) for i in range(10)]
        self.n_images = len(self.images)
        
        # We define a cursor to track if we have gone through the whole list already
        self.cursor = 0
        self.q = Queue(maxsize=2)
        
        self.counter = CounterMulti(0)
        self.procs = [Process(target=self.writer, args=(self.counter,)) for i in range(4)]

        for p in self.procs: 
            p.deamon = True
            p.start()
    
    def writer(self, counter):
        while True:
            if counter.value() < self.n_images:
                next_value = self.images[counter.value()]
                counter.incr()
                self.q.put(next_value)
            
    def next(self):
        if self.cursor < self.n_images:
            self.cursor += 1
            image = self.q.get()
            # Here you can do stuff with your image
            return image
        else:
            raise StopIteration()
        

In [3]:
s = MultiImageIter()
vals = []
for i in range(10):
    vals.append(s.next())

In [4]:
vals

['image_1.jpg',
 'image_2.jpg',
 'image_3.jpg',
 'image_0.jpg',
 'image_5.jpg',
 'image_6.jpg',
 'image_4.jpg',
 'image_7.jpg',
 'image_9.jpg',
 'image_8.jpg']