-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feeds Enhancement: Item Filters #5161
Comments
Note that, the way you describe it here, this is already possible using item pipelines. The point of item filtering in feed exports is to be able to send different items to different output files.
As long as we are discussing the API, and not the specific implementation of the default item-filtering class, and following the KISS principle, I think a single method that gets the item should be enough. There’s also my last point about item-filtering in #4963 (comment). Item-filtering classes should have access to feed options, so that users can write item-filtering classes that behave differently depending on custom feed options. So maybe that method should also get a Edit: I’ve just though that getting the feed path, i.e. the key of the
I think the naming in #4576 for this option is more descriptive, |
Ok yeah that's a little misleading. I'll correct it.
I thought making the user override
I was planning on passing whole FEEDS dictionary.
Okay 👍 |
It can be. So it probably makes sense to have those methods in the default class. That way, if they decide to subclass it and use their subclass for filtering, they can do as you say. However, for the API that is expected of the class, we should keep things simple. When I say API, I mean the methods that Scrapy will call on the component. And if I understand your proposal correctly, your plan is for Scrapy to call ItemFilter.accepts, and ItemFilter.accepts to call then the other 2 methods. So Scrapy would never call those other 2 methods, and hence we don’t need every item filtering class to implement them.
The point of item filter is to allow filtering items differently for different feeds. So |
I think there's some misunderstanding on how
Ok understood I was a little confused about the which APIs we should be discussing here so I wasn't sure what to include here. 😅 |
OK, so you are saying that there is going to be 1 instance of the indicated item-filtering class per feed. Makes sense. I also see you have updated the API to expect My only remaining feedback is that |
I think |
Summary
Currently there are no convenient ways to filter items before they can be exported. An
ItemChecker
class can be used to filter items while also providing flexibility to the user.Motivation/Proposal
Scrapy currently doesn't have any convenient APIs to customize conditions for item exports. An
ItemChecker
class can be used by the user to define constraints for acceptable items for particular feeds.The
ItemChecker
class can have 3 main public methodsaccepts
,accepts_class
andaccepts_fields
. Scrapy will mainly useaccepts
method to decide if an item is acceptable,accepts_class
andaccepts_fields
will have certain default behaviors which can be overriden by the user should they want to customize them.Such custom filters can be declared in
settings.py
. For convenience Items can also be declared here without needing to create a customItemChecker
class.Describe alternatives you've considered
This feature builds upon #4576.
Additional context
This feature proposal is part of a GSoC project (see #4963). This issue has been created to get inputs from the Scrapy community to refine the proposed feature.
The text was updated successfully, but these errors were encountered: