-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Load settings dynamically on a per-spider basis #2392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
To read settings in a spider one can use Spider.settings attribute. It doesn't work in If the goal is to change settings, it becomes more complicated. Generally, one can change settings only before other components are configured, so initialization order is important. There is an undocumented Spider.update_settings method which receives project-wide settings and updates them; maybe we should document it and make it public. But I'm not sure it should be a final solution. See also discussion at #1305 - there is a proposal to allow changing settings in There is also a PR for 'addons' - components which can change settings (#1272). |
https://doc.scrapy.org/en/latest/topics/settings.html#command-line-options For example: |
Hi, just faced with same problem. My scenario is:
where The I will be happy to hear any ideas, thanks. |
Feature request.
It would be good if scrapy had an easily accessible means of reading settings on a per-spider basis, and then making them accessible to the spider. From my many attempts to do this so far, in theory all of the components for this appear to be in place. Populating settings is already done:
https://doc.scrapy.org/en/latest/topics/settings.html#populating-the-settings - but then the problem is accessing them.
Ideally in a fashion that's compatible with scrapyd (so no calling
process.crawl(spider, my_settings)
).Ideally: A project could have a generic project wide settings.py file with both the standard settings and any custom ones added by the developer. Then, using a command-line argument to indicate the settings file to use, the
__init__
method of the spider overrides specific settings, (much ascustom_settings
does), and these settings are then accessible throughout the spider viaself.settings
in the usual way.Current Problems
custom_settings
Unfortunately
custom_settings
doesn't seem to be usable for this because it cannot be declared in__init__
, but needs to be declared earlier.settings.py
Currently, even if a user is willing to just use a different settings.py file entirely for each spider (thereby duplicating most of it), that's not readily possible either.
The above only gets the settings into the variable
these_settings
, they're not used by the spider or accessible viaself.settings
.Desire for feature
Based on StackOverFlow, this is something a lot of people want. The fact there are so many answers that are all so different shows there isn't a particular good way of doing it.
http://stackoverflow.com/questions/9814827/creating-a-generic-scrapy-spider
http://stackoverflow.com/questions/12996910/how-to-setup-and-launch-a-scrapy-spider-programmatically-urls-and-settings
http://stackoverflow.com/questions/35662146/dynamic-spider-generation-with-scrapy-subclass-init-error
http://stackoverflow.com/questions/40510526/how-to-load-different-settings-for-different-scrapy-spiders
http://stackoverflow.com/questions/2396529/using-one-scrapy-spider-for-several-websites
Being able to readily get allowed_domains and start_urls within it would also be good.
The text was updated successfully, but these errors were encountered: