Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于第四章第2节书中程序的疑问 #16

Closed
Microndgt opened this issue Oct 10, 2017 · 1 comment
Closed

关于第四章第2节书中程序的疑问 #16

Microndgt opened this issue Oct 10, 2017 · 1 comment

Comments

@Microndgt
Copy link
Contributor

在第二小节如何做中,书中给了一段程序:

import concurrent.futures
import time
number_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

def evaluate_item(x):
    # 计算总和,这里只是为了消耗时间
    result_item = count(x)
    # 打印输入和输出结果
    print ("item " + str(x) + " result " + str(result_item))

def  count(number) :
    for i in range(0, 10000000):
        i=i+1
    return i * number

if __name__ == "__main__":
    # 顺序执行
    start_time = time.clock()
    for item in number_list:
        evaluate_item(item)
    print("Sequential execution in " + str(time.clock() - start_time), "seconds")
    # 线程池执行
    start_time_1 = time.clock()
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        for item in number_list:
            executor.submit(evaluate_item,  item)
    print ("Thread pool execution in " + str(time.clock() - start_time_1), "seconds")
    # 进程池
    start_time_2 = time.clock()
    with concurrent.futures.ProcessPoolExecutor(max_workers=5) as executor:
        for item in number_list:
            executor.submit(evaluate_item,  item)
    print ("Process pool execution in " + str(time.clock() - start_time_2), "seconds")

首先time.clock() 在UNIX系统上,它返回的应该是"进程时间",它是用秒表示的浮点数。对于第一个顺序执行和第二个多线程执行,应该是准确的,因为都在当前进程执行,统计时间也是当前进程执行的时间。但是对于第三个多进程执行,当前进程只起到调度作用,执行时间分布到了其他进程里,因此我认为统计的时间是有问题的。按照常理也不可能顺序执行时间是6秒,多进程就0.03秒,这个提升了近200倍。

其次,executor.submit应该是排定任务,但是没有具体执行,会返回一个Future,但是并不是立即执行(需要看pool中是否有可用线程或者进程),所以我认为书中给出的测试程序是存在问题的。按照模块文档中给出的例子,应该是这样:

import concurrent.futures
import time
number_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

def evaluate_item(x):
    # 计算总和,这里只是为了消耗时间
    result_item = count(x)
    # 打印输入和输出结果
    return result_item

def  count(number) :
    for i in range(0, 10000000):
        i=i+1
    return i * number

if __name__ == "__main__":
    # 顺序执行
    start_time = time.time()
    for item in number_list:
        print(evaluate_item(item))
    print("Sequential execution in " + str(time.time() - start_time), "seconds")
    # 线程池执行
    start_time_1 = time.time()
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        futures = [executor.submit(evaluate_item, item) for item in number_list]
        for future in concurrent.futures.as_completed(futures):
            print(future.result())
    print ("Thread pool execution in " + str(time.time() - start_time_1), "seconds")
    # 进程池
    start_time_2 = time.time()
    with concurrent.futures.ProcessPoolExecutor(max_workers=5) as executor:
        futures = [executor.submit(evaluate_item, item) for item in number_list]
        for future in concurrent.futures.as_completed(futures):
            print(future.result())
    print ("Process pool execution in " + str(time.time() - start_time_2), "seconds")

使用as_completed函数,可以保证等待所有Future对象运行完成,这时候统计的时间应该才是准确的。我的电脑CPU: Intel 酷睿i5 5257U,顺序执行和多线程在6.3秒,多进程在3.7秒。

P.S.

  1. concurrent.futures模块文档
  2. concurrent.futures翻译
@laixintao
Copy link
Owner

@Microndgt 确实是这样,确认了一下,即使单独跑一次也是超过0.2s的

In [1]: import time
   ...: def  count(number) :
   ...:     time1 = time.time()
   ...:     for i in range(0, 10000000):
   ...:         i=i+1
   ...:     time2 = time.time()
   ...:     print time2 - time1
   ...:     return i * number
   ...:

In [3]: count(1)
0.74836397171
Out[3]: 10000000

ps: 翻译中运算结果有些是我贴的我自己跑的结果。

我在原文中改成你的代码。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants