Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
I totally dig this idea, but I also think this needs to be well thought-out or it could spell more trouble than it solves and be hard to deprecate and fix later.
My thoughts on the current proposal:
Also, if two different addons expose or use the same settings, it gets ugly.
This is obviously already a problem, but I think with add-ons, people would more expect a "plug-and-play" system which "just works", and be less careful with checking individual addon-components for name clashes or dependency issues.
Unless addons should be expected to be configured prior to use, or only as a bundle for hardcoded configuration settings, I think we need another layer deep to have addons and their settings in the same place.
[httpcache] # (looking in python path, no settings) #enabled = True # (does ini-style require a parameter to recognize the section?) [mongodb_pipeline] path = /path/to/mongodb_pipeline.py # (explicit path) host = 'localhost' port = 27017
These settings would then expand to
(Would this copy settings on scrapyd deployment?)
A rather more complex system.
For the uninitiated scrapy-user, per-addon post-initialization checks may not be safe enough,
Since add-ons would encapsulate the whole spectrum of scrapy hooks and settings,
For a truly robust addon system we may need something like a dependency tree in scrapy components?
Mostly this won't be an issue for simple pipeline and middleware addons,
Now the addon could be paranoid, and maybe check for all the settings or extensions actually being what it would expect (e.g. "I need httpcache_storage to be FilesystemCacheStorage"),
But I wonder if it makes sense to address this issue.
But this could also work instead of fixed priority-based ordering for middlewares/pipelines/extensions, which would seem a more robust base for complex add-on trees:
With these hints, scrapy could then auto-sort middlewares, extensions,... and find unresolvable or circular dependencies in other components.
oh, and /cc @pablohoffman
What i have understood from the last two posts by Nyov is that if we make it simple enough to add extensions through the configuration file, the following issues may occur.
Nyov has provided possible solutions but i could not picturise its implementation as such.
Another thing that may be done is to assign priorities to the dependency that will be affected. What i mean is that whenever the extension is in use, the settings maybe changed according to it, but the moment it seizes ,change the settings back to normal( or Default ).
I have just expressed my preliminary views on this issue and i am not sure if this will be of much importance, however please let me know if i have understood something the wrong way.
I have been trying to come up with a feasible solution for this problem. Would appreciate any feedback on the same:
The get_add_ons function will call the
I believe that this mechanism will handle add-ons for extensions, middlewares, pipelines and downloaders but would love to hear from the maintainers and mentors on this topic.
+1 seems reasonable
I understand that adding an
I haven't made a in-depth review yet, but I think this has a few implementation possibilities. For example, we could call in
I'd personally like to keep the MiddlewareManagers as they are, say configuring the addons before or after running the managers (some managers can be configured by settings, I think that's my main reason for wanting to keep them untouched, that way we can maintain backward compatibility with user defined managers), but I'm not sure this is possible.
Makes sense, the AddOn class could have those "name", "version" and "requirements" as class attributes, and some method in AddOnManager could takes those and construct a dependency tree (I'd use an attribute in the crawler for this instead of the global settings, since those are user defined).
As @nyov said, it's kind of hard to get a robust dependency manager since there are a lot of components that could interact in different ways, maybe we could continue with the approach described in the SEP (setting the order of the middlewares in their '<component_MIDDLEWARE>' setting and checking dependencies manually in
On a really side note, I hate the name "crawler_ready" :P if it's going to be used just for checking dependencies, let's change it to "check_dependencies" or "check_configuration" or something along those lines. I like changing "addon_configure" to "update_settings" as well, that's how it's called in the Spider class.
/cc @nramirezuy I recall you wanted to let different components update the settings, this idea is going to do that.
Actually my original idea was to place addons in a single folder, further classifying them into sub-folders (eg: extensions, spidermiddlewares etc). This way the
But on second thoughts, your method lets a particular add-on be loaded across multiple
Indeed this approach seems to be the best. Regarding the dependency manager, one another issue I can think of is the case of spurious/unneeded settings. Eg:
We are running
I guess we can handle this either by assuming extension developers will implement a
Two more things:
5). If an addon is enabled and the dependencies aren't satisfied, let it explode and close the crawler.
I'm new, so here's a quick intro: Hi! I'm Jakob :) I've also put some thoughts into add-on management for my GSoC proposal.
I don't think Scrapy should continue when there is an error while loading one of the add-ons. Depending on the add-on it probably means a substantial change in functionality (e.g. results are lost when a pipeline add-on fails). The user should be made to explicitly acknowledge that by disabling the add-on herself, not just be informed through a warning message or similar. Instead of marking an add-on as non-critical they might as well start the spider for a second and see if there are any problems.
I'm not sure if there is an advantage in an
For the dependency trees, maybe as a first step towards a full-fledged solution the add-ons could be advised to supply some additional variables (besides their name and version), e.g.
Providing additional variables to manage dependencies seems like a reasonable implementation. Those variables, if implemented, should be carefully considered to avoid backward incompatible changes or too many rewrites in existing addons after we promote their usage.
I'm agreeing on an AddonManager class as proposed to handle addons, but I would see an interface implementation (using zope.interface as explained here) as more versatile alternative to requiring the subclassing of an
On the other hand, if all addons are required to be split into their own
@curita : So my basic idea goes like this: We just pass a variable
My reasons for proposing the
@nyov : Indeed
I will be working on an implementation of add-on management starting now!
While I finish up per-key settings priorities (#1149) as prerequisite (almost all add-ons will want to touch the dictionary settings), I will draft an updated version of the SEP this week. I think these are some of the major open issues (sorry, lots of text, feel free to skip sections):
1. How is the add-on interface enforced?
I think our options here are:
I tend to
2. Do we want to control how add-ons interact with settings?
@SudShekhar raised the question whether we want to provide some kind of API with which the add-ons interact with Scrapy's settings, rather than directly handing them the
3. Are add-ons configured in