Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small improvement to the docs for setting ITEM_PIPELINES #3775

Open
intotecho opened this issue May 14, 2019 · 4 comments
Open

Small improvement to the docs for setting ITEM_PIPELINES #3775

intotecho opened this issue May 14, 2019 · 4 comments

Comments

@intotecho
Copy link

intotecho commented May 14, 2019

In the docs

https://github.com/scrapy/scrapy/blob/65d631329a1434ec013f24341e4b8520241aec70/scrapy/templates/project/module/pipelines.py.tmpl

It says, in the comments:

Define your item pipelines here

Don't forget to add your pipeline to the ITEM_PIPELINES setting
See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html

Please change the instruction to:

Don't forget to add your pipeline to the ITEM_PIPELINES setting in settings.py

I added the setting to my spider's init, and it was hard to find out what was going wrong.
Mentioning settings.py would help others who make the same mistake.

@Gallaecio
Copy link
Member

Gallaecio commented May 14, 2019

The thing is, settings.py is just one of the places where it can be defined, and I think mentioning all places where you can define settings everywhere we mention a setting in the documentation would make things too verbose.

I understand your frustration, but I’m not sure how we can improve things. Users are expected to have read https://docs.scrapy.org/en/latest/topics/settings.html#populating-the-settings by the time they look up specific settings in the documentation.

@abe-winter
Copy link

abe-winter commented Jan 26, 2023

Out of curiosity, why does an Item need to be declared in ITEM_PIPELINES in order to be processed?

I'm learning scrapy and this just bit me -- I was yielding an Item subclass with a process_item method, but process_item wasn't called until I added my class to ITEM_PIPELINES.

This was counterintuitive to me as a learner -- is there a reason someone would yield an Item without wanting its process method to be called?

Related issue for the pipeline docs: #2350

I kind of agree with 2350 -- I'm an experienced python programmer, but it took me a while to figure out the item pipeline from docs. I couldn't find a complete example -- the entire 'item pipelines' docs page, for example, doesn't have the yield keyword anywhere. A small self-contained example (which includes the ITEM_PIPELINES reminder) would have helped a lot.

Happy to submit a (small) docs PR if helpful, but fair warning I'm not a scrapy expert.

@Gallaecio
Copy link
Member

I'm learning scrapy and this just bit me -- I was yielding an Item subclass with a process_item method, but process_item wasn't called until I added my class to ITEM_PIPELINES.

This was counterintuitive to me as a learner -- is there a reason someone would yield an Item without wanting its process method to be called?

This is the first time I hear of someone defining a process_item method on an item class itself. Item pipeline classes are intended to be separate from item classes, it is not customary to use an item class also as an item pipeline.

@abe-winter
Copy link

ahhh that makes sense -- I misunderstood the API here

fwiw it would really help to add an end-to-end example in the 'item pipelines' docs page

one that included yielding from a spider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants