Skip to content

Suggestion: different pipeline processor for each item type #102

Closed
Scorpil opened this Issue Mar 18, 2012 · 1 comment

2 participants

@Scorpil
Scorpil commented Mar 18, 2012

I didn't find a good way to use different 'process_item' methods to different classes of Items.

For example we could create two item classes to crawl and then store in db:

class Headers(item):
    url = Field()
    response_code = Field()
    content_type = Field()

class Body(item):
    title = Field()
    h1 = Field()

Than in item pipeline we would need to do something like this:

class StoreInDB(object):

    def process_item(self. item, spider):
        if isinstance(item, Headers):
            return self.storeHeaders(item, spider)
        elif isinstance(item, Body):
            return self.storeBody(item, spider)

    def storeHeaders(item, spider):
        pass # make some things with Headers item here

    def storeBody(item, spider):
        pass # make some things with Body item here

Wouldn't it be nice to put this functionality in base class, so we would have some dict or function to map item to the correct processor? Sure, current behavior would stay as default. Here is what i'm talking about:

from project.items import Headers, Body
from scrapy.contrib.pipeline import Pipeline

class StoreInDB(Pipeline):

    def __init__(self):
        self.assignItemProcessor(itemclass=Headers, processor=self.storeHeaders)
        self.assignItemProcessor(itemclass=Body, processor=self.storeBody)

    def storeHeaders(item, spider):
        pass # make some things with Headers item here

    def storeBody(item, spider):
        pass # make some things with Body item here

I can write the code if you guys think it's useful. It certainly is for me.

@Scorpil Scorpil closed this Mar 18, 2012
@Scorpil Scorpil reopened this Mar 18, 2012
@dangra
Scrapy project member
dangra commented Jan 29, 2013

This looks like a contrib pipeline that implements the basis for item type delegation, users still need to extend it to add its projects functionality.

I don't think this worth the pain of maintaining another contrib as part of Scrapy project, the functionality described is easily implementable and there is no concensus about the approach to handle multiple item types. Others have proposed building an item pipeline per type instead.

IMHO this base pipeline is more for a blog post, recipe or external scrapy cookrecipes project.

thanks

@dangra dangra closed this Jan 29, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.