New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it possible to update settings in __init__
or from_crawler
#3663
Comments
usage of Option is to set list setting like scrapy/scrapy/settings/__init__.py Lines 161 to 178 in b859435
I noticed that scrapy/scrapy/settings/__init__.py Lines 352 to 360 in b859435
scrapy/scrapy/settings/__init__.py Lines 336 to 338 in b859435
But if we change Spider code with updating settings inside
|
@GeorgeA92 , thanks for your reply. I also agree that your solutions for My main reasoning here is to check if it is the scope of Scrapy to allow this kind of configuration or if this behavior (freeze the settings before the spider is created) is expected for some reason. |
I am in a similar issue. I have to change the settings of a specific spider based on information read in a JSON file that is used as a configuration file for the Spider behaviour. Problem is, I have to change the setting according to the argument of the spider as well. Is there a better way to address this problem? I do not want to create a single spider for this case, more code to maintain = more problems. |
Hey, I was looking to work on this issue, but I got in a kind of deadlock 😛 The spider is instantiated here https://github.com/scrapy/scrapy/blob/master/scrapy/crawler.py#L84 I thought about moving the spider instantiation to https://github.com/scrapy/scrapy/blob/master/scrapy/crawler.py#L41 and make |
I’m quite unsure on how to best address this. Maybe we could allow defining a new class method on spiders that has access to both spider arguments and settings? |
@Gallaecio, probably this is the best way to go. |
As we can see in crawler.py after But the rest of scrapy modules (downloader,downloadermiddlewares,itempipelines,spidermiddlewares etc.) instatiate after spider in Some project related modules (Spiderloader, Logging...) instantiated before |
I have a need to modify settings based off a spider argument and I found this thread. Is there any new information about how this could be accomplished? |
A possible workaround for this use case, depending on your run model, is to override settings using
|
Hi, see my https://gist.github.com/iamumairayub/452432a2e78255de890e5e3d925efaa4 on how to take settings values from command line or even from DB. |
One possible solution for this could also be creating a few class variables and using them in the Example:
|
According to scrapyd docs to transfer scrapy settings is require to set |
use @classmethod
def update_settings(cls, settings):
configs = {}
configs_file = settings.get("CONFIGS")
if configs_file:
configs = data.load(configs_file)
cls.custom_settings = cls.custom_settings or {}
for k, v in configs.get("custom_settings", {}).items():
cls.custom_settings[k] = v
cls.custom_settings["CONFIGS"] = configs
super().update_settings(settings)
def __init__(self, name=None, **kwargs):
super().__init__(name, **kwargs)
self.configs = self.custom_settings["CONFIGS"] then, settings can be set/override by |
This issue might be related to #1305
I noticed that
settings
are frozen in https://github.com/scrapy/scrapy/blob/master/scrapy/crawler.py#L57However, in a given project I had a requirement to change some settings based on some spider arguments. An alternative would be to write this spider as a base class and extend it from specific spiders setting the proper
settings
.However, I think it would make sense to only freeze settings after the spider and other components were initialized. Or, provide some other entry point to configure settings based on arguments.
The other option is to use
-s
arguments, but in my case I was changing theFEED_EXPORT_FIELDS
setting (https://docs.scrapy.org/en/latest/topics/feed-exports.html#std:setting-FEED_EXPORT_FIELDS).Any thoughts here?
The text was updated successfully, but these errors were encountered: