Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fix engine to support filtered start_requests #707

Merged
merged 2 commits into from Apr 28, 2014

Conversation

@redapple
Copy link
Contributor

@redapple redapple commented Apr 26, 2014

start_requests can contain duplicates, and when DupeFilter is active, in ExecutionEngine _next_request() these duplicate requests are not enqueued by self.crawl(request, spider)

This fix simply adds a test to spider_is_idle() to test is there are still start requests to consume before stopping everything.

@redapple redapple changed the title [WIP] Fix engine to support filtered start_requests [MRG] Fix engine to support filtered start_requests Apr 26, 2014
@nramirezuy
Copy link
Contributor

@nramirezuy nramirezuy commented Apr 28, 2014

@redapple checking if it's None will be probably more accurate than bool
.

log.err(None, 'Obtaining request from start requests', \
spider=spider)
else:
self.crawl(request, spider)

if self.spider_is_idle(spider) and slot.close_if_idle:
if (self.spider_is_idle(spider) and slot.close_if_idle):

This comment has been minimized.

@dangra

dangra Apr 28, 2014
Member

nitpick, no need to change this line :)

This comment has been minimized.

@redapple

redapple Apr 28, 2014
Author Contributor

ah, yeah, sorry. it was from the previous trials adding the test here instead of spider_is_idle...

@@ -112,12 +112,13 @@ def _next_request(self, spider):
except StopIteration:
slot.start_requests = None
except Exception as exc:
slot.start_requests = None

This comment has been minimized.

@dangra

dangra Apr 28, 2014
Member

good one, I hope this was spotted by our current test suite.

This comment has been minimized.

@redapple

redapple Apr 28, 2014
Author Contributor

yes, the tests are blocked without this

This comment has been minimized.

@dangra

dangra Apr 28, 2014
Member

yes, that's perfect.

This comment has been minimized.

@nramirezuy

nramirezuy Apr 29, 2014
Contributor

Can I ask, Why Exception and not the specific one?

This comment has been minimized.

@dangra

dangra Apr 29, 2014
Member

@nramirezuy: because that catch-all prevents unexpected errors in spider code from leaving the engine in hanged state. If an error on .start_requests() happens it will be logged as such and engine gracefully stopped once idle.

This comment has been minimized.

@nramirezuy

nramirezuy Apr 29, 2014
Contributor

@dangra We were missing this cases, #708

@dangra
Copy link
Member

@dangra dangra commented Apr 28, 2014

LGTM!

@redapple
Copy link
Contributor Author

@redapple redapple commented Apr 28, 2014

I'll squash that

@dangra
Copy link
Member

@dangra dangra commented Apr 28, 2014

are you going to squash into a single commit?

@redapple
Copy link
Contributor Author

@redapple redapple commented Apr 28, 2014

last 3: 1 for tests and one for fix

@dangra
Copy link
Member

@dangra dangra commented Apr 28, 2014

last 3: 1 for tests and one for fix

ok, squash them and I merge asap.

dangra added a commit that referenced this pull request Apr 28, 2014
[MRG] Fix engine to support filtered start_requests
@dangra dangra merged commit 197e360 into scrapy:master Apr 28, 2014
1 check passed
1 check passed
continuous-integration/travis-ci The Travis CI build passed
Details
@redapple redapple deleted the redapple:gh-issue-706-start-requests branch Jul 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.