Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception in image/files pipeline are quietly suppressed #496

Open
max-arnold opened this issue Dec 19, 2013 · 3 comments
Open

Exception in image/files pipeline are quietly suppressed #496

max-arnold opened this issue Dec 19, 2013 · 3 comments

Comments

@max-arnold
Copy link
Contributor

Exceptions (TypeError, Exception and others) raised in media_to_download(), media_failed(), media_downloaded(), file_key() and other pipeline methods are quietly suppressed.

Example can be found in #490

@rmax
Copy link
Contributor

rmax commented Dec 19, 2013

@max-arnold
Copy link
Contributor Author

Exceptions are silenced somewhere up the stack, probably in scrapy/utils/defer.py(39)mustbe_deferred():

34 def mustbe_deferred(f, *args, **kw):
     35     """Same as twisted.internet.defer.maybeDeferred, but delay calling
     36     callback/errback to next reactor loop
     37     """
     38     try:
---> 39         result = f(*args, **kw)
     40     # FIXME: Hack to avoid introspecting tracebacks. This to speed up
     41     # processing of IgnoreRequest errors which are, by far, the most common
     42     # exception in Scrapy - see #125
     43     except IgnoreRequest as e:
     44         return defer_fail(failure.Failure(e))

And the actual caller who should deal with deferred exceptions is scrapy/contrib/pipeline/media.py _process_request():

        # Download request checking media_to_download hook output first
        info.downloading.add(fp)
        dfd = mustbe_deferred(self.media_to_download, request, info)
        dfd.addCallback(self._check_media_to_download, request, info)
        dfd.addBoth(self._cache_result_and_execute_waiters, fp, info)
        dfd.addErrback(log.err, spider=info.spider)
        return dfd.addBoth(lambda _: wad)  # it must return wad at last

@pablohoffman
Copy link
Member

I updated this issue title to reflect that it's something that affects the images/files pipeline, not every scrapy pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants