New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change extensions/spiders/settings initialisation order #1580
Conversation
1: Moving extension's Two tests fail because they essentially test the implementation that we just changed. The functionality they're testing (spider settings) should still be intact. |
2: Moving Three new tests failing because we're trying to close a One of the previously failing tests now fails earlier because the spider settings are not applied in It is no longer possible to change the |
3: Initialise spider before calling its I have not yet made No new failing tests. |
4: Allow No new failing tests. |
The tests that fail with "Tried to stop a LoopingCall that was not running." all have two calls to I think the real problem is that we're trying to write to frozen settings (or that we try to freeze settings twice). The error then manifests because we hit the These three tests pass when I comment out the |
This initialisation order would be great for #1442 (add-on callbacks for spiders) because the spider arguments could easily be mapped into the spider's add-on config. |
as well as LOG_LEVEL and LOG_FORMATTER. don't thinks they are often changed from spider but this is inconsistent behaviour, and it sucks. is there any better way? |
I don't really see a smart way to keep supporting changing these settings from the spider. It makes sense to set stats and logging up as soon as the crawler is initialized, so I don't think we can move that to For backwards compatibility we could implement a check whether |
What happens between |
In other words, why does crawler need a logger and stats as soon as it is initialized? Can it do anything useful before |
The There are many failing tests when I move the stats & log init into |
@kmike One thing which looks important to me - setup logging before
@jdemaeyer Maybe it makes sense to pass spider args to |
|
Yes, but it we keep logging setup before I might regret telling that, but now I'm not sure it's a good idea to do settings configuration during/after |
Another option is to do better job in separating "core" settings which must not be changed by spider from extensions/middlewares/pipelines/addons settings that can be changed by spider. |
That would work for add-ons. I think changing the
An issue with this is that the core settings could then not be changed in stand-alone spiders (except through the command line). This is still the option I like most though.
Hmm, but running the same spider twice (in the same crawler) with different spider args is still supported (at least some tests do it). |
Some settings doesn't even makes sense to be changed on the Maintain backwards compatibility to people using Running 2 spiders on the same |
I like the idea of separating settings, at least documenting what can be changed per-spider. There is a few other settings which are global, e.g. thread pool size.
+1 |
Current coverage is
|
If we can settle for:
then initialising the spider and extensions in I've overhauled the tests that had multiple calls to the same crawler's I wonder if we should make class MySpider(Spider):
def __init__(self, *args, **kwargs):
self.custom_settings.update({'XY': 'blah'}) won't work otherwise (since |
@jdemaeyer That last change would totally fix using custom_settings when sub classing. |
sounds ok if we document this properly
sounds ok as well - don't have an idea where
in addition to your example - it will be possible to update settings this way class MySpider(Spider):
def update_settings(self, settings):
# for simplicity - imagine that value is validated in __init__
if self.close_after:
# intentionally omitted priority='spider'
settings.set('CLOSESPIDER_TIMEOUT', self.close_after) so one can ignore |
@kmike this is fairly stale at this point. Given the 7 years of changes on the tool, is this still a relevant PR, or could we close it? |
Wow, what a sweet rush of nostalgia washing over me 😅 |
This is an implementation of @chekunkov's suggested changes from #1305. I will be uploading consecutive commits one-by-one in the next minutes so we can see which tests break at what point.