CrawlSpider: support process_links as generator #555
Merged
Conversation
body=self.test_body) | ||
|
||
class _CrawlSpider(self.spider_class): | ||
import re |
redapple
Jan 23, 2014
Author
Contributor
probably not the best location
probably not the best location
This also adds some changes to CrawlSpider (perhaps those which motivated the tests?). Can we split them in two PR with a description on the intention for the CrawlSpider changes? |
You mean this change? - links = [l for l in rule.link_extractor.extract_links(response) if l not in seen]
+ links = list(set(rule.link_extractor.extract_links(response)) - seen) it was just cosmetic since there's a The main change is this one though, - seen = seen.union(links)
for link in links:
+ seen.add(link) otherwise the |
scrapy/contrib/spiders/crawl.py
Outdated
@@ -49,11 +49,11 @@ def _requests_to_follow(self, response): | |||
return | |||
seen = set() | |||
for n, rule in enumerate(self._rules): | |||
links = [l for l in rule.link_extractor.extract_links(response) if l not in seen] | |||
links = list(set(rule.link_extractor.extract_links(response)) - seen) |
kmike
Jan 28, 2014
Member
This changes the order requests are scheduled - is it intended?
This changes the order requests are scheduled - is it intended?
redapple
Jan 28, 2014
Author
Contributor
ah good point! not intended no.
There could be a test for that
ah good point! not intended no.
There could be a test for that
squash and merge? |
Adding tests for CrawlSpider's process_links
Done. |
anything preventing this from getting merged? |
dangra
added a commit
that referenced
this pull request
Feb 18, 2014
CrawlSpider: support process_links as generator
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Adding tests for CrawlSpider's process_links