Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

public Crawler.create_crawler method #1528

Merged
merged 3 commits into from Oct 30, 2015
Merged

public Crawler.create_crawler method #1528

merged 3 commits into from Oct 30, 2015

Conversation

@kmike
Copy link
Member

@kmike kmike commented Oct 6, 2015

CrawlerRunner.crawl returns a Deferred, but in all cases I used this method in addition to waiting for crawl to finish it was necessary to connect signal handlers, and you need Crawler instance for that. So

d = runner.crawl('spidername')  # or runner.create_crawler(spider_cls)

was not enough, I wanted

crawler = runner.create_crawler('spidername')  # or runner.create_crawler(spider_cls)
crawler.signals.connect(...)
d = runner.crawl(crawler)

This means it was necessary to copy-paste the same code

crawler = crawler_or_spidercls
if not isinstance(crawler_or_spidercls, Crawler):
    crawler = runner._create_crawler(crawler_or_spidercls)

in all CrawlerRunner subclasses / wrappers, and using a private _create_crawler method. What about adding a public method for it? As a bonus point, we won't be using a private method in scrapy.utils.test.get_crawler.

@kmike kmike added this to the Scrapy 1.1 milestone Oct 6, 2015
@kmike kmike force-pushed the create-crawler branch from e95f538 to 67874f7 Oct 8, 2015
@codecov-io
Copy link

@codecov-io codecov-io commented Oct 8, 2015

Current coverage is 82.87%

Merging #1528 into master will increase coverage by +0.04% as of b728a9e

Powered by Codecov. Updated on successful CI builds.

@@ -30,7 +30,7 @@ def get_crawler(spidercls=None, settings_dict=None):
from scrapy.spiders import Spider

runner = CrawlerRunner(Settings(settings_dict))

This comment has been minimized.

@curita

curita Oct 30, 2015
Member

This is unrelated to these changes, but now that I see it we could pass settings_dict as it is to CrawlerRunner, we don't need to wrap it here into a Settings instance.

* use CrawlerRunner.create_crawler instead of get_crawler helper in test_crawl;
* add a test for loading spiders by name;
* add a test for passing Crawler objects instead of Spider objects;
* add a test for CrawlerRunner.join
@curita
Copy link
Member

@curita curita commented Oct 30, 2015

+1 to merge, but it needs to be rebased.

@kmike kmike force-pushed the create-crawler branch from 67874f7 to 0000b6e Oct 30, 2015
curita added a commit that referenced this pull request Oct 30, 2015
public Crawler.create_crawler method
@curita curita merged commit 57f87b9 into master Oct 30, 2015
2 checks passed
2 checks passed
codecov/patch 100.00% of diff hit (target 100.00%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@kmike kmike deleted the create-crawler branch Jan 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.