Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

next() 函数提高性能的疑惑 #43

Closed
klmjoi opened this issue Mar 27, 2020 · 3 comments
Closed

next() 函数提高性能的疑惑 #43

klmjoi opened this issue Mar 27, 2020 · 3 comments

Comments

@klmjoi
Copy link

klmjoi commented Mar 27, 2020

Hi @piglei , 您的系列文章让人收益匪浅。

读到第四章“容器的门道”,对 next() 可以高效的实现 “从列表中查找第一个满足条件的成员”,我自己测了一下性能,似乎还不如普通方法。

In [2]: numbers = [3, 7, 8, 2, 21]

In [3]: %timeit [i for i in numbers if i % 2 == 0][0]
445 ns ± 5.35 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit [i for i in numbers if i % 2 == 0][0]
449 ns ± 5.75 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit next((i for i in numbers if i % 2 == 0))
501 ns ± 10.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [6]: %timeit next((i for i in numbers if i % 2 == 0))
500 ns ± 4.05 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

@piglei
Copy link
Owner

piglei commented Mar 27, 2020

你好 @klmjoi

使用“列表表达式”和 “next + 生成器表达式“这两种方法最大的区别在于 何时中断,对于前者来说,无论找没找到,都会遍历完整个列表。而后者在找到第一个元素就会返回。

如果用时间复杂度 O 来度量二者效率:

  • “列表表达式” / “next() + 生成器”
  • 平均:二者均为 O(n)
  • 最好情况下的时间复杂度:O(n) 对比 O(1),要找的东西刚好出现在第一个元素
  • 最差情况下的时间复杂度:O(n) 对比 O(n),要找的东西刚好出现在最后一个元素

如果要对比二者的效率,可以选一个规模稍微大点的 list:

# 构造一个 1 到 999 的列表,找里面第一个可以被 200 整除的数,也就是 200
In [3]: numbers = list(range(1, 1000))

# 使用列表表达式,要搜完整个列表,大概耗费 40 µs
In [4]: %timeit [i for i in numbers if i % 200 == 0][0]
43 µs ± 2.65 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [6]: %timeit [i for i in numbers if i % 200 == 0][0]
40.8 µs ± 1.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

### 使用 next,找到第一个就返回,大概耗费 9µs
In [10]: %timeit next((i for i in numbers if i % 200 == 0))
8.61 µs ± 83.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [11]: %timeit next((i for i in numbers if i % 200 == 0))
9.24 µs ± 339 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

为什么你的例子里使用 next() 更慢?主要原因是源列表太短,使用 next() 少做的那几次检索还抵不上多调用一次 next 的开销。

感谢你的留言,我在以后修订文章时会补一些这部分的内容。

@klmjoi klmjoi closed this as completed Mar 29, 2020
@klmjoi
Copy link
Author

klmjoi commented Mar 29, 2020

确实如此,感谢~

@chfw
Copy link

chfw commented Mar 29, 2020

在大数据的情况下,这个数列是以百万个为单位,省的时间就更加明显了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants