Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document changed CrawlerProcess.crawl(spider) functionality in Release notes #3872

nyov opened this issue Jul 12, 2019 · 2 comments · Fixed by #3846

Document changed CrawlerProcess.crawl(spider) functionality in Release notes #3872

nyov opened this issue Jul 12, 2019 · 2 comments · Fixed by #3846


Copy link

nyov commented Jul 12, 2019

Possible Regression. See explanation beneath spider.

MWE Testcode:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import logging
import scrapy

logger = logging.getLogger(__name__)

class Spider(scrapy.Spider):

    name = 'Spidy'

    def start_requests(self):
        yield scrapy.Request('')

    def parse(self, response):'Here I fetched %s for you. [%s]' % (response.url, response.status))
        return {
            'status': response.status,
            'url': response.url,
            'test': 'item',

class LogPipeline(object):

    def process_item(self, item, spider):
        logger.warning('HIT ME PLEASE')'Got hit by:\n %r' % item)
        return item

if __name__ == "__main__":
    from scrapy.settings import Settings
    from scrapy.crawler import CrawlerProcess

    settings = Settings(values={
        'TELNETCONSOLE_ENABLED': False, # necessary evil :(
        'EXTENSIONS': {
            'scrapy.extensions.telnet.TelnetConsole': None,
        'ITEM_PIPELINES': {
            '__main__.LogPipeline': 800,

    spider = Spider()

    process = CrawlerProcess(settings=settings)

I just tried this functional (with Scrapy 1.5.1) example script on current master codebase and I got this error:

2019-07-12 13:54:16 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: scrapybot)
2019-07-12 13:54:16 [scrapy.utils.log] INFO: Versions: lxml, libxml2 2.9.4, cssselect 1.0.3, parsel 1.5.0, w3lib 1.20.0, Twisted 18.9.0, Python 3.7.3 (default, Apr  3 2019, 05:39:12) - [GCC 8.3.0], pyOpenSSL 19.0.0 (OpenSSL 1.1.1c  28 May 2019), cryptography 2.6.1, Platform Linux-4.9.0-8-amd64-x86_64-with-debian-10.0
Traceback (most recent call last):
  File "./", line 60, in <module>
  File "[...]/scrapy.git/scrapy/", line 180, in crawl
    'The crawler_or_spidercls argument cannot be a spider object, '
ValueError: The crawler_or_spidercls argument cannot be a spider object, it must be a spider class (or a Crawler object)

Looking at the codebase, blame blames this change: #3610

But that procedure (passing a spider instance as process.crawl(spider)) is taken pretty much verbatim from the (latest) docs, so it should continue to work, or first get deprecated?:

edit:/ to clarify, I don't mind the functionality getting removed without deprecation, if it was never documented, as it seems it wasn't.

Copy link
Contributor Author

nyov commented Jul 12, 2019

Actually the docs don't pass an object. Huh, strange. I distinctly remember the following to be acceptable. In fact it worked. Is this deprecated style?

spider1 = MySpider(name="spider1", 
process = CrawlerProcess(settings=settings)

@kmike kmike added this to the v1.7 milestone Jul 12, 2019
Copy link

kmike commented Jul 12, 2019

It was working only by accident previously: Crawler was not using spider instance you've passed, but instead it was calling spider.from_crawler to create a new instance. It means that if there are e.g. some __init__ arguments in a spider, they were not preserved; if you assign any attributes to spider object, these attributes were not preserved as well.

But that's a surprise for me that it worked in simple cases before, I was thinking it is only about providing a slightly nicer error message; I think it may worth adding a note to the release notes.

@nyov nyov changed the title Broken CrawlerProcess.crawl(spider) functionality in master Document changed CrawlerProcess.crawl(spider) functionality in Release notes Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

Successfully merging a pull request may close this issue.

3 participants