# Multiprocessing in Python: getting around the GIL

## What is the GIL?

https://wiki.python.org/moin/GlobalInterpreterLock

> In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)

In practice, it means that multithreaded python code doesn't *actually* use multiple CPUs.

## MP Basics

Deserializing JSON is slow, but JSON is such a convenient storage format. I've been logging requests to a file (1M requests) with one JSON blob per line, and I want to some basic analytics.

In [None]:
import collections
import json

def get_ip(line):
    item = json.loads(line)
    return item['IP']

counter = collections.Counter()
with open('requests.log') as handle:
    %time ips = list(map(get_ip, handle.readlines()))
    counter.update(ips)

counter.most_common(5)

In [None]:
from multiprocessing import Pool

counter = collections.Counter()
with open('requests.log') as handle:
    with Pool(4) as mp:
        %time ips = list(mp.map(get_ip, handle.readlines()))
        counter.update(ips)

counter.most_common(5)

## Something more fun

Counting IP addresses in half the time is useful, but not a lot of fun. There's also so much time wasted creating and destroying processes, that you don't really see a significant boost from running them in parallel.

I wanted a more fun example, but also something that makes better use of offloading computation to multiple processors. So I thought, image processing!

In [None]:
from IPython.core import display
from PIL import Image

import filters

with Image.open('./clementine.jpg') as img:
    %time img = filters.ascii(img)
    img.save('./ascii_clementine.jpg')

`filters` is a library that I wrote myself that just converts a regular image into an ascii version of the same. In the above example, I'm converting [my girlfriend's cat](/files/clementine.jpg) into an [ascii version](/files/ascii_clementine.jpg) of the same.

How do we offload this to multiple processors? Animated GIFs.

In [None]:
from gif_frames import read_frames, write_frames

def convert(frames):
    for frame in frames:
        yield filters.ascii(frame)

with Image.open('./kitten.gif') as img:
    %time ascii_frames = list(convert(read_frames(img)))

with open('./ascii_kitten.gif', 'wb') as handle:
    write_frames(handle, ascii_frames)

[kitten.gif](/files/kitten.gif) -> [ascii_kitten.gif](/files/ascii_kitten.gif)

In [None]:
from multiprocessing import Manager, Process, JoinableQueue
import queue

def runner(q, l):
    while True:
        try:
            frame = q.get_nowait()
            l.append(filters.ascii(frame))
            q.task_done()
        except queue.Empty:
            break
    
def convert_mp(frames):
    with Manager() as manager:
        q = JoinableQueue()
        l = manager.list()
        
        for frame in frames:
            q.put(frame)

        pool = [Process(target=runner, args=(q, l))
                for x in range(4)]
        for p in pool:
            p.start()
            
        q.join()

        return list(l)

with Image.open('./kitten.gif') as img:
    %time ascii_frames = convert_mp(read_frames(img))

with open('./ascii_kitten2.gif', 'wb') as handle:
    write_frames(handle, ascii_frames)

[kitten.gif](/files/kitten.gif) -> [ascii_kitten2.gif](/files/ascii_kitten2.gif)

In [None]:
# First, you will need to be running
#   redis-server
# And:
#   celery -A celery_tasks worker
import celery_tasks

with Image.open('./kitten.gif') as img:
     %time ascii_frames = celery_tasks.ascii_filter.map(read_frames(img)).apply_async().get()
        
with open('./ascii_kitten3.gif', 'wb') as handle:
    write_frames(handle, ascii_frames)

[kitten.gif](/files/kitten.gif) -> [ascii_kitten3.gif](/files/ascii_kitten3.gif)

In [None]:
with Image.open('./kitten.gif') as img:
    with Pool(4) as mp:
        %time ascii_frames = mp.map(filters.ascii, read_frames(img))

with open('./ascii_kitten4.gif', 'wb') as handle:
    write_frames(handle, ascii_frames)

[kitten.gif](/files/kitten.gif) -> [ascii_kitten4.gif](/files/ascii_kitten4.gif)

## Shared state

You can't share state with variables defined outside of your map function.

In [None]:
foo = []

def add_to_foo(i):
    foo.append(i)
    
list(map(add_to_foo, range(0, 10)))
print(foo)

In [None]:
bar = []

def add_to_bar(i):
    bar.append(i)

with Pool(4) as mp:
    list(mp.map(add_to_bar, range(0, 10)))
print(bar)