Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow spiders to return dicts. #1081

Merged
merged 5 commits into from Mar 27, 2015
Merged

Allow spiders to return dicts. #1081

merged 5 commits into from Mar 27, 2015

Conversation

@kmike
Copy link
Member

@kmike kmike commented Mar 18, 2015

A PR to fix GH-1064.

Docs are missing.

@pablohoffman
Copy link
Member

@pablohoffman pablohoffman commented Mar 18, 2015

good job @kmike !

@dangra
Copy link
Member

@dangra dangra commented Mar 18, 2015

nice!

kmike added 2 commits Mar 19, 2015
* start with scrapy.Spider, then mention spider arguments,
  then describe generic spiders;
* change wording regarding start_urls/start_requests;
* show an example of start_requests vs start_urls;
* show an example of dicts as items;
* as defining Item is an optional step now, docs for Items are
  moved below Spider docs.
@kmike
Copy link
Member Author

@kmike kmike commented Mar 19, 2015

Please check - all docs except for overview & tutorial should be updated.

What do you think about adding FEED_EXPORT_FIELDS option, to allow defining a list of fields to export? Without Item classes CSV exporter can't figure out the header robustly (currently fields of a first item are used). @nramirezuy also mentioned this feature here.

@nramirezuy
Copy link
Contributor

@nramirezuy nramirezuy commented Mar 19, 2015

With FEED_EXPORT_FIELDS you can also set less fields than you have in your item.

@pablohoffman
Copy link
Member

@pablohoffman pablohoffman commented Mar 19, 2015

FEED_EXPORT_FIELDS 👍


::
user input or other changing conditions you can return regular Python
dicts from spiders.

This comment has been minimized.

@eliasdorneles

eliasdorneles Mar 21, 2015
Member

What do you think about making it like you can return regular Python dicts from spiders since Scrapy 1.0. For older versions, you can dynamically create Item classes:: ?

The dynamic creation just doesn't seem to make much sense when you have the ability to return arbitrary dicts.

This comment has been minimized.

@kmike

kmike Mar 23, 2015
Author Member

A good catch. I just removed the whole section. I don't think we should document workarounds for limitations of older Scrapy versions.

It was a hack, and dicts-as-items cover most use cases.

Dicts don't allow to attach metadata to fields,
but e.g. adding "_meta" key and removing it in a custom serializer
is no worse than creating classes dynamically.
@kmike
Copy link
Member Author

@kmike kmike commented Mar 23, 2015

This PR doesn't allow items to be arbitrary dict-like objects, like @shaneaevans proposes in #1064 (comment) - item must be either a subclass of BaseItem or a dict / subclass of a dict.

Maybe instead of checking for dict/BaseItem explicitly we can start checking for MutableMapping, but it is more risky. I think that starting with more strict requirements on spider output is better - be can make them less strict in future.

@kmike
Copy link
Member Author

@kmike kmike commented Mar 23, 2015

@kmike kmike mentioned this pull request Mar 27, 2015
pablohoffman added a commit that referenced this pull request Mar 27, 2015
Allow spiders to return dicts.
@pablohoffman pablohoffman merged commit bb4c922 into master Mar 27, 2015
1 of 2 checks passed
1 of 2 checks passed
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@dangra dangra deleted the dict-items branch Mar 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants