Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot retry requests in scrapy parse #3095

Open
jdemaeyer opened this issue Jan 26, 2018 · 4 comments
Open

Cannot retry requests in scrapy parse #3095

jdemaeyer opened this issue Jan 26, 2018 · 4 comments
Labels

Comments

@jdemaeyer
Copy link
Contributor

jdemaeyer commented Jan 26, 2018

This spider:

import scrapy


class HttpBinSpider(scrapy.Spider):

    name = "httpbin.org"

    start_urls = ['https://httpbin.org/'] 

    def parse(self, response):
        if response.meta.get('retried'):
            self.logger.info("done")
            return
        response.meta['retried'] = True
        yield response.request.replace(dont_filter=True)

works fine with scrapy crawl httpbin.org, but scrapy parse "https://httpbin.org/" -d 2 fails with the following traceback:

Traceback (most recent call last):
  File "/home/jakob/.local/lib/python3.6/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/jakob/.local/lib/python3.6/site-packages/scrapy/commands/parse.py", line 195, in callback
    items, requests = self.run_callback(response, cb)
  File "/home/jakob/.local/lib/python3.6/site-packages/scrapy/commands/parse.py", line 117, in run_callback
    for x in iterate_spider_output(cb(response)):
  File "/home/jakob/.local/lib/python3.6/site-packages/scrapy/commands/parse.py", line 195, in callback
    items, requests = self.run_callback(response, cb)
  File "/home/jakob/.local/lib/python3.6/site-packages/scrapy/commands/parse.py", line 117, in run_callback
    for x in iterate_spider_output(cb(response)):

  [... the four lines above over and over again ...]

  File "/home/jakob/.local/lib/python3.6/site-packages/scrapy/commands/parse.py", line 195, in callback
    items, requests = self.run_callback(response, cb)
  File "/home/jakob/.local/lib/python3.6/site-packages/scrapy/commands/parse.py", line 117, in run_callback
    for x in iterate_spider_output(cb(response)):
  File "/home/jakob/.local/lib/python3.6/site-packages/scrapy/commands/parse.py", line 169, in callback
    cb = response.meta['_callback']
  File "/home/jakob/.local/lib/python3.6/site-packages/scrapy/http/response/__init__.py", line 30, in meta
    return self.request.meta
RecursionError: maximum recursion depth exceeded while calling a Python object

Explicitly also replacing the request callback (yield response.request.replace(dont_filter=True, callback=self.parse)) fixes the issue.

This is possibly caused by not properly cleaning up this and this line.

Shout-out to @VMRuiz for finding this. ;)

@kmike kmike added the bug label Jan 26, 2018
@malloxpb
Copy link
Member

malloxpb commented Feb 9, 2018

Hi,
I would like to start contributing to the project with this bug fix. This is my first time trying to do this so can anybody help me? :) Thank you!

@jdemaeyer
Copy link
Contributor Author

jdemaeyer commented Feb 12, 2018

Hi @nctl144,

the easiest way for us to help you is that you fork the repository, create a branch in which you try to fix the bug, then open a PR so we can give you some feedback and guidance while actually looking at code.

For fixing the bug, first try to reproduce the issue on your machine (there is a minimum example described above). Once you got that, you can start hacking on Scrapy's code. You will probably need to find a way to restore a Request's callback attribute after the custom callback method is called, but before the original callback method is called.

@malloxpb
Copy link
Member

Thank you so much for your help @jdemaeyer . I'll work on that as soon as possible!

@malloxpb
Copy link
Member

I just made a first PR! Can you please let me know if this solves the problem correctly? @jdemaeyer. Thank you! PR #3129

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants