In [2]:
square = lambda x: x*x

print ([ square(x) for x in range(1,10)])

[1, 4, 9, 16, 25, 36, 49, 64, 81]


# Multiprocessing

Multiprocessing is a very large and complex library that ships with Python by default. I am only going to talk about one of its many features. For more info visit the [library documentation](https://docs.python.org/2/library/multiprocessing.html)


## Mapping functions

In [part 2](Functional%20Concepts.ipynb) of this tutorial we looked at List comprehensions and applying a function to every item in a list using code like the following:

Python's built in `map` function is synonymous for the above list comprehension.



In [3]:
print( list(map(square, range(1,10)) ))

[1, 4, 9, 16, 25, 36, 49, 64, 81]


What if you have a huge list to run an operation on? It's going to take ages. Lets use the timer to find out how long.

In [4]:
import time
import math

start = time.clock()
#we are going to work out N factorial for every number between 1 and 100000
for i in map(math.factorial, range(1,10000)):
   pass 

end = time.clock()

print ("Took {} seconds".format(end-start) )

Took 77.082002 seconds


These days most machines have multiple cores. Lets use multiprocessing's Pool class to farm out work to all processors

In [21]:
from multiprocessing import Pool

p = Pool()

start = time.clock()

for i in p.map(math.factorial, range(1,10000)):
   pass 

end = time.clock()

print ("Took {} seconds".format(end-start) )

Took 0.18581500000000517 seconds


By creating a task pool that has the same number of worker threads as the current machine has CPU cores, we are able to work through the large list of jobs much faster!

## Conclusion

Multiprocessing with `Pool.map()` allows you to significantly boost the speed with which you process a large batch of data by farming it out across your CPU's cores.

Next we look at [Filesystem IO](TextFileIO.ipynb)