# 并行方式如何选择

看起来多进程,多线程,协程都是以并行的方式运行的,那么我们该如何选择使用什么技术呢?

+ 首先我们可以简单的通过分析目标功能来选择,如果我们的项目主要是计算密集型的,比如是并行计算多个数据是否是质数这类,那么没得选,只有多进程才可以做到最大化利用cpu资源,另外两个都只能跑满一个cpu核心.

+ 接着就是主要是io操作的任务了,io密集型任务首选当然是协程,也只有协程可以搞定[10k](http://www.kegel.com/c10k.html)问题,但python的默认I/O多是同步I/O,因此在所需依赖无法满足的情况下只能使用多线程方式替代.



# 并行编程的常用同步机制

python包含多种同步机制,这些工具使用思路上是一致的,因此无论是协程,线程还是进程都可以使用,只是使用的模块会有些许不同,用途也会有写不同

## 信号量 Semaphore

在并行编程中，为了防止不同的过程(线程/进程/协程)同时对一个公用的资源进行修改，需要进行同时访问的数量（通常是1）。信号量同步基于内部计数器，每调用一次acquire()，计数器减1；每调用一次release()，计数器加1.当计数器为0时，acquire()调用被阻塞。

Semaphore的接口有两个:

+ acquire() 获取一个信号量,协程中这个方法是一个协程
+ release() 释放一个信号量
+ `*`locked() 协程中独有,用来判断是否被锁定

信号量通常是在上下文中使用,利用with来实现,实际上信号量在不同的并行方式下用处并不相同,

+ 在协程中我们用它来限制并行流程的数量
+ 在多线程中我们用他来限制同时访问资源的数量
+ 在多进程中我们将它作为一个有上限可锁定的跨进程共有变量来使用

信号量在线程,进程,协程中的使用的模块并不一样:

+ 协程--`asynico.Semaphore(value=1, *, loop=None)`

+ 线程--`threading.Semaphore(value=1)`

+ 进程--`multiprocessing.Semaphore([value])`

协程版本

In [17]:
import aiohttp
import asyncio
NUMBERS = range(12)
URL = 'http://httpbin.org/get?a={}'
sema = asyncio.Semaphore(3)
async def fetch_async(a):
    async with aiohttp.request('GET', URL.format(a)) as r:
        data = await r.json()
    return data['args']['a']

async def print_result(a):
    async with sema:
        r = await fetch_async(a)
        print('fetch({}) = {}'.format(a, r))
#loop = asyncio.new_event_loop()
#asyncio.set_event_loop(loop)
loop = asyncio.get_event_loop()
f = asyncio.wait([print_result(num) for num in NUMBERS])
loop.run_until_complete(f)


fetch(6) = 6
fetch(8) = 8
fetch(2) = 2
fetch(3) = 3
fetch(5) = 5
fetch(4) = 4
fetch(11) = 11
fetch(0) = 0
fetch(10) = 10
fetch(9) = 9
fetch(1) = 1
fetch(7) = 7


({<Task finished coro=<print_result() done, defined at <ipython-input-17-27696bcf1a1e>:11> result=None>,
  <Task finished coro=<print_result() done, defined at <ipython-input-17-27696bcf1a1e>:11> result=None>,
  <Task finished coro=<print_result() done, defined at <ipython-input-17-27696bcf1a1e>:11> result=None>,
  <Task finished coro=<print_result() done, defined at <ipython-input-17-27696bcf1a1e>:11> result=None>,
  <Task finished coro=<print_result() done, defined at <ipython-input-17-27696bcf1a1e>:11> result=None>,
  <Task finished coro=<print_result() done, defined at <ipython-input-17-27696bcf1a1e>:11> result=None>,
  <Task finished coro=<print_result() done, defined at <ipython-input-17-27696bcf1a1e>:11> result=None>,
  <Task finished coro=<print_result() done, defined at <ipython-input-17-27696bcf1a1e>:11> result=None>,
  <Task finished coro=<print_result() done, defined at <ipython-input-17-27696bcf1a1e>:11> result=None>,
  <Task finished coro=<print_result() done, defined at 

多线程版本

In [2]:
import time
from random import random
from threading import Thread, Semaphore
sema = Semaphore(3)
def foo(tid):
    with sema:
        print('{} acquire sema'.format(tid))
        wt = random() * 2
        time.sleep(wt)
    print('{} release sema'.format(tid))
threads = []
for i in range(5):
    t = Thread(target=foo, args=(i,))
    threads.append(t)
    t.start()
for t in threads:
    t.join()


0 acquire sema
1 acquire sema
2 acquire sema
0 release sema3 acquire sema

3 release sema
4 acquire sema
2 release sema
1 release sema
4 release sema


多进程

In [32]:
%%writefile semaphore.py
from multiprocessing import Process, Semaphore

def foo(tid,sema):
    import time
    from random import random
    with sema:
        print('{} acquire sema'.format(tid))
        wt = random() * 2
        time.sleep(wt)
    print('{} release sema'.format(tid))
    
if __name__ == "__main__":
    sema = Semaphore(3)
    processes = []
    for i in range(5):
        t = Process(target=foo, args=(i,sema))
        processes.append(t)

    for t in processes:     
        t.start()
    for t in processes:
        t.join()


Writing semaphore.py


In [33]:
!python semaphore.py

4 acquire sema
4 release sema
1 acquire sema
1 release sema
0 acquire sema
0 release sema
2 acquire sema
2 release sema
3 acquire sema
3 release sema


## 锁Lock

Lock也可以叫做互斥锁，其实相当于信号量为1。

在多线程中锁的作用是用于锁定读写,以确认同一个资源同一时间只能被一个操作访问.我们先看一个不加锁的例子:

In [5]:
import time
from threading import Thread
value = 0
def getlock():
    global value
    new = value + 1
    time.sleep(0.001)  # 使用sleep让线程有机会切换
    value = new
threads = []
for i in range(100):
    t = Thread(target=getlock)
    t.start()
    threads.append(t)
for t in threads:
    t.join()
print(value)


24


不加锁的情况下，结果会远远的小于100。那我们加上互斥锁看看

In [7]:
import time
from threading import Thread, Lock
value = 0
lock = Lock()
def getlock():
    global value
    with lock:
        new = value + 1
        time.sleep(0.001)
        value = new
threads = []
for i in range(100):
    t = Thread(target=getlock)
    t.start()
    threads.append(t)
for t in threads:
    t.join()
print(value)

100


锁作为一种特殊信号量,它的接口与Semaphore一致.在线程,进程,协程中的使用的模块分别为:

+ 协程--`asynico.Lock(*,loop=None)`

+ 线程--`threading.Lock(value=1)`

+ 进程--`multiprocessing.Lock([value])`

在协程中,实际上协程并没有抢占资源的情况,因此此处的锁更多的是用来作为一个全局的变量锁定一些流程用

In [12]:
import asyncio
import functools
def unlock(lock):
    print('callback releasing lock')
    lock.release()
async def test(locker, lock):
    print('{} waiting for the lock'.format(locker))
    with await lock:
        print('{} acquired lock'.format(locker))
    print('{} released lock'.format(locker))
async def main(loop):
    lock = asyncio.Lock()
    await lock.acquire()
    loop.call_later(0.1, functools.partial(unlock, lock))
    await asyncio.wait([test('l1', lock), test('l2', lock)])

loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(main(loop))
loop.close()

l1 waiting for the lock
l2 waiting for the lock
callback releasing lock
l1 acquired lock
l1 released lock
l2 acquired lock
l2 released lock


而针对于多进程,锁同样起到一个全局信号的作用,比如多个进程处理同一个文件,就需要加锁来限制

In [30]:
%%writefile lock.py
import multiprocessing  
import sys  
  
def worker_with(lock, f):  
    with lock:  
        with open(f,"a+") as fs:
            fs.write('Lock acquired via with\n')  

if __name__ == '__main__':
    f = "source/file.txt"  
    lock = multiprocessing.Lock()  
    w = multiprocessing.Process(target=worker_with, args=(lock, f))  
    w.start()  
    w.join()

Overwriting lock.py


In [31]:
!python lock.py

## 事件

一个过程发送/传递事件，所谓事件是指的一个保存标记状态的对象,如果内部标记为True则表示事件发生了,反之就是没发生

事件的接口包括:

+ clear()

事件内部标记为False

+ is_set()
返回事件的内部标记

+ set()

调用则设置内部标记为True

+ wait()

等待事件被标记为True,协程中该接口为协程

另外的过程等待事件的触发。我们用「生产者/消费者」模型的例子.



线程

In [None]:
import time
import threading
from random import randint

TIMEOUT = 2
def consumer(event, l):
    t = threading.currentThread()
    while 1:
        try:
            event_is_set = event.wait(TIMEOUT)
        except:
            break
        if event_is_set:
            try:
                integer = l.pop()
                print('{} popped from list by {}'.format(integer, t.name))
                event.clear()  # 重置事件状态
            except IndexError:  # 为了让刚启动时容错
                pass
def producer(event, l):
    t = threading.currentThread()
    for i in range(20):
        integer = randint(10, 100)
        l.append(integer)
        print('{} appended to list by {}'.format(integer, t.name))
        event.set() # 设置事件
        time.sleep(1)
event = threading.Event()
l = []
threads = []
for name in ('consumer1', 'consumer2'):
    t = threading.Thread(name=name, target=consumer, args=(event, l))
    t.start()
    threads.append(t)
p = threading.Thread(name='producer1', target=producer, args=(event, l))
p.start()
threads.append(p)
for t in threads:
    t.join()

16 appended to list by producer1
16 popped from list by consumer2
86 appended to list by producer186 popped from list by consumer2

91 appended to list by producer191 popped from list by consumer2

88 appended to list by producer188 popped from list by consumer2

31 appended to list by producer131 popped from list by consumer2

30 appended to list by producer130 popped from list by consumer2

95 appended to list by producer195 popped from list by consumer1

71 appended to list by producer1
71 popped from list by consumer1
19 appended to list by producer119 popped from list by consumer2

59 appended to list by producer159 popped from list by consumer2

33 appended to list by producer133 popped from list by consumer2

18 appended to list by producer118 popped from list by consumer2

19 appended to list by producer119 popped from list by consumer2

97 appended to list by producer197 popped from list by consumer2

19 appended to list by producer119 popped from list by consumer2

30 appende

## 条件Condition

条件用于信号通信,它的除了拥有锁的所有接口外,还有接口:


+ notify(n=1)

    释放出通知,让使用相同Condition对象的几个过程知道这个条件已被激活

+ notify_all()

    释放出通知,让使用相同Condition对象的所有过程知道这个条件已被激活

+ wait()

    等待使用相同Condition对象的过程的通知.

+ wait_for(predicate)
    
    相当于
    ```python
    while not predicate():
        cv.wait()
    ```

一个过程等待特定条件，而另一个过程发出特定条件满足的信号。最好说明的例子就是「生产者/消费者」模型：

协程方式

In [34]:
import asyncio
import functools
async def consumer(cond, name, second):
    await asyncio.sleep(second)
    async with cond:
        await cond.wait()
        print('{}: Resource is available to consumer'.format(name))
        
async def producer(cond):
    await asyncio.sleep(2)
    async with cond:
        print('Making resource available')
        cond.notify_all()
        
        
async def main(loop):
    condition = asyncio.Condition()
    task = loop.create_task(producer(condition))
    consumers = [consumer(condition, name, index)
                 for index, name in enumerate(('c1', 'c2'))]
    await asyncio.wait(consumers)
    task.cancel()
    
    
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(main(loop))
loop.close()

Making resource available
c1: Resource is available to consumer
c2: Resource is available to consumer


线程方式

In [38]:

import time
import threading
def consumer(cond):
    t = threading.currentThread()
    with cond:
        cond.wait()  # wait()方法创建了一个名为waiter的锁，并且设置锁的状态为locked。这个waiter锁用于线程间的通讯
        print('{}: Resource is available to consumer'.format(t.name))
def producer(cond):
    t = threading.currentThread()
    with cond:
        print('{}: Making resource available'.format(t.name))
        cond.notify_all()  # 释放waiter锁，唤醒消费者
condition = threading.Condition()
c1 = threading.Thread(name='c1', target=consumer, args=(condition,))
c2 = threading.Thread(name='c2', target=consumer, args=(condition,))
p = threading.Thread(name='p', target=producer, args=(condition,))
c1.start()
time.sleep(1)
c2.start()
time.sleep(1)
p.start()

p: Making resource availablec2: Resource is available to consumerc1: Resource is available to consumer




进程方式

In [39]:
%%writefile cond.py
import time
import multiprocessing
def consumer(cond):
    t = multiprocessing.current_process()
    with cond:
        cond.wait()  # wait()方法创建了一个名为waiter的锁，并且设置锁的状态为locked。这个waiter锁用于线程间的通讯
        print('{}: Resource is available to consumer'.format(t.name))
def producer(cond):
    t = multiprocessing.current_process()
    with cond:
        print('{}: Making resource available'.format(t.name))
        cond.notify_all()  # 释放waiter锁，唤醒消费者
        
if __name__=='__main__':
    condition = multiprocessing.Condition()
    c1 = multiprocessing.Process(name='c1', target=consumer, args=(condition,))
    c2 = multiprocessing.Process(name='c2', target=consumer, args=(condition,))
    p = multiprocessing.Process(name='p', target=producer, args=(condition,))
    c1.start()
    time.sleep(1)
    c2.start()
    time.sleep(1)
    p.start()

Writing cond.py


In [40]:
!python cond.py

p: Making resource available
c1: Resource is available to consumer
c2: Resource is available to consumer


## 队列