Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Spider.custom_settings #1276

Merged
merged 2 commits into from Jun 3, 2015
Merged

Fix Spider.custom_settings #1276

merged 2 commits into from Jun 3, 2015

Conversation

@kmike
Copy link
Member

@kmike kmike commented Jun 2, 2015

I noticed that a CloseSpider extension doesn't get activated when CLOSESPIDER_PAGECOUNT option is set in spider's custom_settings. Other extensions (AutoThrottle, LogStats, etc) are also not activated.

This happens because extensions check settings in from_crawler methods, and crawler.settings doesn't contain spider settings yet because spidercls.update_settings call was moved to a later stage by #1128.

kmike added 2 commits Jun 2, 2015
#1128 moved spidercls.update_settings
call to a later stage; this commit moves it back.
@kmike kmike added this to the Scrapy 1.0 milestone Jun 2, 2015
@curita
Copy link
Member

@curita curita commented Jun 3, 2015

Makes sense, +1 to merge. I placed wrongly the settings update and freeze in 6f9265b. It should be safe to move the self.settings.freeze() line too, settings were frozen at this point because the extensions shouldn't update them.

@kmike
Copy link
Member Author

@kmike kmike commented Jun 3, 2015

@curita I kept settings.freeze() at the bottom because of your comment at #1128:

Settings given to each Crawler will be frozen after the extensions initialization. This allows the extensions to update the settings. That's something we want to implement for the Simplified Addons Idea for this GSoC (issue #591).

Is it no longer needed?

@curita
Copy link
Member

@curita curita commented Jun 3, 2015

Oh, forgot about that comment, I was thinking it's probably better to deal with that once the addons support is implemented. I guess keeping the freeze call where it is is less restrictive now that the change it's already implemented. +1 to merge this pull request as it is then.

kmike added a commit that referenced this pull request Jun 3, 2015
Fix Spider.custom_settings
@kmike kmike merged commit f312ffc into master Jun 3, 2015
2 checks passed
2 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@chekunkov
Copy link
Contributor

@chekunkov chekunkov commented Jun 10, 2015

@kmike @curita I want to ask you a question. Why extensions, middlewares and pipelines are initialized during Crawler.__init__, while spider and engine objects are initialized only during Crawler.crawl? Does it work like that because we want to preserve an ability to run multiple spiders via the same set of middlewares?

I'm asking this because sometimes I feel like I want to change some crawl settings after spider initialization and initialize middlewares only after that. For example I got request from customer to make it possible to set CLOSESPIDER_TIMEOUT based on passed spider argument. Due to CloseSpider implementation to support this I need to override it, disable default and set custom extension in settings. If initialization order was

'spider init' -> 'update settings' -> 'settings.freeze' -> 'middleware init' 

that task would be as easy as set CLOSESPIDER_TIMEOUT in custom_settings

@chekunkov
Copy link
Contributor

@chekunkov chekunkov commented Jun 10, 2015

@dangra please check comment above, maybe you can explain what design decision is behind current initialization order? I forgot to mention you in the comment, and you're obviously the person who knows almost everything about Scrapy.

@nramirezuy
Copy link
Contributor

@nramirezuy nramirezuy commented Jun 10, 2015

@chekunkov -s doesn't works for you?
ex: scrapy crawl <spider> -s CLOSESPIDER_TIMEOUT=900

@chekunkov
Copy link
Contributor

@chekunkov chekunkov commented Jun 10, 2015

@nramirezuy -s does work in command line, but doesn't work on dash, API doesn't provide a way to set custom settings during schedule and customer wants to be able to set this setting per crawl.

@chekunkov
Copy link
Contributor

@chekunkov chekunkov commented Jun 10, 2015

Also it could happen that spider has some logic to decide whether or not this setting should be set based on spider arguments, -s doesn't help in this case

@dangra
Copy link
Member

@dangra dangra commented Jun 12, 2015

@chekunkov: I think you have a point, the order of components initialization is not clear but it is better to propose changing it in a new ticket. Scrapy 1.0 is pretty much ready, so we can work on your problem for 1.1.

@dangra dangra deleted the fix-spider-settings branch Jun 15, 2015
@chekunkov
Copy link
Contributor

@chekunkov chekunkov commented Jun 15, 2015

Opened a new ticket #1305

@kmike
Copy link
Member Author

@kmike kmike commented Jun 15, 2015

@chekunkov 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.