Comparison between Multiprocessing and Multithreading in Python
Some peaple said that numpy in python releases GIL, which is an eval in python, so that multithreading with numpy is actually faster than or equal to multiprocssing with numpy in speed performance. Thus, I run benchmark since in my experience, multiprocessing is bettern than multithreading in any case even if loading data task which is IO-bound one.
Environment and Experiment settings
The followings are the settings of this expeirments.
- OS: Ubuntu 16.04
- CPU: 1 x Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
- Memory: 64 GB
- python: 2.7.12
- numpy: 1.11.3
Experiments is just compute numpy.dot betwen the same ndarray for iteration-times and use n_workers-times workers; each worker implemented are either multiprocessing or multithreading. I did not compute some times of the same experiment for calculating stats; mean or std. See the codes under the same directory in more details.
|Iter||NW=4 (P)||NW=8 (P)||NW=16 (P)||NW=32 (P)||NW=4 (T)||NW=8 (T)||NW=16 (T)||NW=32 (T)|
Contrary to my initial understanding, when the number of workers is 4 and the number of iteration is 1, mutithreading is faster than multiprocessing, and when the number of workers is 8 and the number of iteration is 1, mutithreading is almost the same as multiprocessing, which complies with someone's arguments. Thus, if the logic per a thread is not complex enough and the number of workers is relatively small, multithreading x numpy peforms well. However, if the logic per a thread is complex, one should use multiprocessing x numpy even if numpy (or other library which explicitly releases GIL) is used.