Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to pass objects in Settings? #3870

Closed
nyov opened this issue Jul 12, 2019 · 5 comments
Closed

Allow to pass objects in Settings? #3870

nyov opened this issue Jul 12, 2019 · 5 comments

Comments

@nyov
Copy link
Contributor

nyov commented Jul 12, 2019

See code example; why can I not reference plain objects into Settings(), but need to let Scrapy handle the import magic?
Would it make sense to have this? it seems "unclean" to do this in the usual settings.py environment, but in a single-script setup it looks less convoluted than to refer scrapy to import from current module?

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
import logging
import scrapy

logger = logging.getLogger(__name__)


class Spider(scrapy.Spider):

    name = 'Spidy'

    def start_requests(self):
        yield scrapy.Request('https://scrapy.org/')

    def parse(self, response):
        logger.info('Here I fetched %s for you. [%s]' % (response.url, response.status))
        return {
            'status': response.status,
            'url': response.url,
            'test': 'item',
        }


class LogPipeline(object):

    def process_item(self, item, spider):
        logger.warning('HIT ME PLEASE')
        logger.info('Got hit by:\n %r' % item)
        return item


if __name__ == "__main__":
    from scrapy.settings import Settings
    from scrapy.crawler import CrawlerProcess

    settings = Settings(values={
        'TELNETCONSOLE_ENABLED': False, # necessary evil :(
        'EXTENSIONS': {
            'scrapy.extensions.telnet.TelnetConsole': None,
        },
        'ITEM_PIPELINES': {
            #'myproject.pipelines.LogPipeline': 800, # << as resolved by `scrapy.utils.project.get_project_settings()`
            #'__main__.LogPipeline': 800, # << works, but still resolved and imported by scrapy magic

            LogPipeline: 800, # << WHY CAN'T I DO THIS?
        },
    })

    spider = Spider()

    process = CrawlerProcess(settings=settings)
    process.crawl(spider)
    process.start()
@kmike
Copy link
Member

kmike commented Jul 12, 2019

Thanks for bringing it up, I don't think we have a dedicated ticket for this. +1 to add this feature.

It has been discussed in #1215, #1032; I recall https://github.com/scrapy/scrapy/pull/1272/files also has an implementation for this.

@nyov
Copy link
Contributor Author

nyov commented Jul 12, 2019

Oh, now that you bring those links, I faintly remember...
In fact I have been using the same or similar code as in #1032 myself, to work around this issue back in Scrapy 0.24 already. haha:

# haxx: sad monkeypatch, might break
from importlib import import_module
def load_object(path):
    try:
        dot = path.rindex('.')
    except ValueError:
        raise ValueError("Error loading object '%s': not a full path" % path)
    except AttributeError:
        return path # hax

    module, name = path[:dot], path[dot+1:]
    mod = import_module(module)

    try:
        obj = getattr(mod, name)
    except AttributeError:
        raise NameError("Module '%s' doesn't define any object named '%s'" % (module, name))

    return obj 

scrapy.utils.misc.load_object = load_object
# end haxx

@kmike, I see there were some reservations about this in #1215. Would you say they continue to apply?
Or was that only against the use of instantiated objects, not classes per-se?

@kmike
Copy link
Member

kmike commented Jul 12, 2019

It was years ago, but as I recall I had reservations about instantiated objects (they look tricky because they often need to be tied to Crawler, or be created with from_crawler method); classes are fine.

@nyov
Copy link
Contributor Author

nyov commented Jul 12, 2019

Okay in that case, have fun reviewing once more, in #3873 :)

That one doesn't load instanced objects, if I tried, I'd get
TypeError: 'LogPipeline' object is not callable

nyov added a commit to nyov/scrapy that referenced this issue Mar 2, 2020
nyov added a commit to nyov/scrapy that referenced this issue Mar 3, 2020
@Gallaecio
Copy link
Member

Gallaecio commented Aug 26, 2020

Done in #3873

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants