Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion for cleaning Item data (removing or opt-in pipelining) #197

Closed
Hedde opened this issue Nov 23, 2012 · 2 comments
Closed

Suggestion for cleaning Item data (removing or opt-in pipelining) #197

Hedde opened this issue Nov 23, 2012 · 2 comments

Comments

@Hedde
Copy link

Hedde commented Nov 23, 2012

I don't really like the pipeline cycle as a basic clean cycle because most of the time it requires some sort of differentiation per Item leaf class.

My suggestion is to create something similar as django's forms.Form clean loop. Which does something similar like my usual work-around, e.g.:

class ExtendedItem(Item):
    def _process(self):
        # Runs every '_process_FIELD_NAME' before being reached.
        [getattr(self, func)() for func in dir(self) if func.split('_')[-1] in self.fields and callable(getattr(self, func))]

Now in my class definitions I can do something like (normally extended with optional arguments):

class Necklace(ExtendedItem):
    pearls = Field()
    diamonds = Field()

    def _process_pearls(self):
        # clean field

    def _process(self):
        # override clean all
        super(Necklace, self)._process()

This plays nicely without intervening other classes their data unless you want to.. but that's where the pipeline is a nice opt-in 👍

@pablohoffman
Copy link
Member

Where would _process() be called in your proposal?.

If we're gonna add these methods, we'll probably be adding a new SanitizeItem class that subclasses item.

There are probably also mechanism to make the _process() code more efficient using meta-classes.

@Hedde
Copy link
Author

Hedde commented Nov 27, 2012

I could have a closer look how django implements the clean cycle. I am not proposing code, just an idea, I'll investigate some more

@dangra dangra closed this as completed Jan 10, 2013
lucywang000 pushed a commit to lucywang000/scrapy that referenced this issue Feb 24, 2019
happybase version update and use scrapinghub docker repo.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants