New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linkextractors and ItemLoader Unified API #578
Comments
I think that a task of extracting links is sufficiently different from a task of populating items, and using ItemLoader with a processor shouldn't be a blessed way to extract links e.g. for navigation. Can't comment about other points (link extractors, rules, ..) as they are used mostly with CrawlSpider which I'm not very familiar with. It seems there is an unnecessary overlap between #331 and link extractors; it seems that if link extractors are important they should be best implemented on top of Selectors, and #331 should be implemented as a convenient shortcut for link extractors. |
I'm not thinking on using Did you saw #346 ? Those are our processors. These processors are common tasks that we can do with comprehensions, but we have to think about that and most likely carry bugs. Like the "simple" task of taking the first element of an |
Could you please elaborate on "Loader" part? What would it load, how it differs from Selector, and why do we need both Loader and Selector? Yes, I've seen #346. I think these processors are useful, but we shouldn't dump a lot of new classes on users if they want to do something simple. These classes help to structure the code, but people shouldn't have to learn them to do their tasks. Also, this is a style question - some people would prefer I think this is not our battle. It is good to have such common processors, but IMHO they shouldn't be required to get started, and we should equally support people who use their existing skills to write e.g. a list comprehension instead of learning our function composition framework. Investing in learning these processors is justified when you're writing a lot of spiders and this is your primary task, but for many people writing a spider is not a primary task - they want to quickly get some data for their work or research, and they have lots of other things to do and to learn. |
I also haven't thought much about this, but if your goal is to split ItemLoader into two parts:
then +1. Devil is in details, as usual :) |
See also @nyov's comments: #568 (comment) and #568 (comment). @nramirezuy - are you proposing something similar? |
@kmike We aren't going to remove The
Selector will not change. |
+1 to uplifting the What my idea was in this quoted comment, IIRC, is that the 'output processor' could be relegated back to the Item with this logic. THEN, all that people would need to understand is this From an architecture overview, this would look like |
I’m unsure whether this makes more sense in Scrapy or in https://github.com/scrapy/itemloaders now. If anyone has a better grasp and you think it makes sense to move the issue to https://github.com/scrapy/itemloaders, let me know. |
There are lots of linkextractors with different flavors, but we don't need linkextractors we just need the filters (or processors) and a good way to handle them.
What is the different between using extractors and this approach?
My proposal have different points:
ItemLoader
to aclass Loader
. (this is not needed, but fancier)Rule
instead of use a linkextractor should receive a callable that receives a Response as argument and returns a iterable of urls.Link
, we should add support for instantiateRequest
using aLink
object instead of an url.The text was updated successfully, but these errors were encountered: