Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Use collections.deque instead of list to store MiddlewareManager's methods #3476

Merged
merged 1 commit into from Dec 26, 2018

Conversation

@elacuesta
Copy link
Member

@elacuesta elacuesta commented Oct 29, 2018

According to https://wiki.python.org/moin/TimeComplexity list.insert has an average time complexity of O(n), while deque.appendleft is O(1).

From that page (emphasis mine):

Internally, a list is represented as an array; the largest costs come from growing beyond the current allocation size (because everything must move), or from inserting or deleting somewhere near the beginning (because everything after that must move). If you need to add/remove at both ends, consider using a collections.deque instead.

It seems like for deques "looking at the middle is slow, and adding to or removing from the middle is slower still", which should not be a problem given that these method containers are iterated sequentially and not accessed randomly.

I first saw this while working on #2061, I'll adapt that one if this PR gets merged.

@codecov
Copy link

@codecov codecov bot commented Oct 29, 2018

Codecov Report

Merging #3476 into master will not change coverage.
The diff coverage is 87.5%.

@@           Coverage Diff           @@
##           master    #3476   +/-   ##
=======================================
  Coverage   84.38%   84.38%           
=======================================
  Files         167      167           
  Lines        9376     9376           
  Branches     1392     1392           
=======================================
  Hits         7912     7912           
  Misses       1206     1206           
  Partials      258      258
Impacted Files Coverage Δ
scrapy/middleware.py 90% <100%> (ø) ⬆️
scrapy/core/downloader/middleware.py 100% <100%> (ø) ⬆️
scrapy/core/spidermw.py 88% <66.66%> (ø) ⬆️
@kmike kmike changed the title Use collections.deque instead of list to store MiddlewareManager's methods [MRG+1] Use collections.deque instead of list to store MiddlewareManager's methods Dec 26, 2018
@kmike
Copy link
Member

@kmike kmike commented Dec 26, 2018

I've checked on a small benchmark that iteration is indeed of about the same speed for list and deque:

In [1]: from collections import deque
In [2]: lst = [x for x in range(100000)]
In [3]: dq = deque(lst)
In [8]: %timeit [x for x in dq]
2.51 ms ± 33.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [9]: %timeit [x for x in lst]
2.5 ms ± 42.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
@dangra dangra merged commit f6ce716 into scrapy:master Dec 26, 2018
3 checks passed
3 checks passed
codecov/patch 87.5% of diff hit (target 84.38%)
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@elacuesta elacuesta deleted the elacuesta:deque_appendleft branch Jan 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants