Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrapy 1.1.0 ImagesPipeline backward incompatibility #1985

Closed
redapple opened this issue May 11, 2016 · 3 comments
Closed

Scrapy 1.1.0 ImagesPipeline backward incompatibility #1985

redapple opened this issue May 11, 2016 · 3 comments
Labels
Milestone

Comments

@redapple
Copy link
Contributor

redapple commented May 11, 2016

#1891 is not backward compatible.

Users of Scrapy<1.1 ImagesPipeline could access upper-case attributes,
e.g.

class CustomImagesPipeline(ImagesPipeline):
    (...)
    def item_completed(self, results, item, info):
        item = super(CustomImagesPipeline, self).item_completed(
            results, item, info)
        # Note: not all items do have an images field.
        if self.IMAGES_RESULT_FIELD not in item.fields:
            return item

Leading to the following exception in 1.1.0(rc4):

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/tmp/unpacked-eggs/__main__.egg/*****/pipelines/screenshots.py", line 539, in item_completed
    results, item, info)
  File "/tmp/unpacked-eggs/__main__.egg/*****/pipelines/screenshots.py", line 242, in item_completed
    images = item.pop(self.IMAGES_RESULT_FIELD, [])
AttributeError: 'CustomImagesPipeline' object has no attribute 'IMAGES_RESULT_FIELD'
@redapple redapple added the bug label May 11, 2016
@redapple redapple added this to the 1.1.1 milestone May 11, 2016
@djunzu
Copy link
Contributor

djunzu commented May 12, 2016

I see.

But I must ask how is the philosophy to write Scrapy code?
And use it!?

Because I would expect that self.IMAGES_RESULT_FIELD belongs only to the pipeline's internal code and does not belong to the public API. As such, it could be changed freely without prejudice to the public API (as long as the public API remains untouched).

Also, I would expect that the item_completed method should use only results, item and info. If it can access any attributes from self, there is absolutely no encapsulation.

Plus, I would expect that any attribute intended to be exposed and accessibly should have its own get method.

@kmike
Copy link
Member

kmike commented May 12, 2016

@djunzu I think your approach is right, but if a change breaks users code and there is an easy workaround then we usually prefer to add a backwards compatibility shim with deprecation warnings. It doesn't mean users should write code like that (relying on undocumented attributes), but the userbase is large, and we don't want to break user code without a good reason.

Also, we should have named attribute self._IMAGES_RESULT_FIELD if we wanted to communicate it is private.

@djunzu
Copy link
Contributor

djunzu commented May 12, 2016

I agree it's not good to break user code. Especially when there is an easy workaround.

I was asking just to know what to expect and what to do as a final user (and eventually as a contributor).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants