Skip to content

Ability to retry a request from inside a spider callback #3590

@kasun

Description

@kasun

There are situations where websites return 200 responses but the content is not available due to bans or temporal issues which can be fixed by retrying requests.

There should be an easier way to retry requests inside spider callbacks, which should ideally reuse the code in Retry downloader middleware.

I see two approaches for this.

  1. Introduce new exception called RetryRequest which can be raised inside a spider callback to indicate a retry. I personally prefer this but the implementation of this is a little untidy due to this bug process_spider_exception() not invoked for generators #220

    from scrapy.exceptions import RetryRequest
    
    def parse(self, response):
        if response.xpath('//title[text()="Content not found"]'):
            raise RetryRequest('Missing content')
    
  2. Introduce a new class RetryRequest which wraps a request that needs to be retried. A RetryRequest can be yielded from a spider callback to indicate a retry

    from scrapy.http import RetryRequest
    
    def parse(self, response):
        if response.xpath('//title[text()="Content not found"]'):
            yield RetryRequest(response.request, reason='Missing content')
    

Will be sending two PRs for the two approaches. Happy to hear about any other alternatives too.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions