A decorator to write coroutine-like spider callbacks.
Python Makefile
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
docs
example
src/inline_requests
tests
.bumpversion.cfg
.cookiecutterrc
.coveragerc
.editorconfig
.gitignore
.travis.yml
AUTHORS.rst
CONTRIBUTING.rst
HISTORY.rst
LICENSE
MANIFEST.in
Makefile
README.rst
TODO.rst
VERSION
dev-requirements.in
dev-requirements.txt
pytest.ini
requirements-dev.txt
requirements-install.txt
requirements-setup.txt
requirements-tests.txt
requirements.in
requirements.txt
setup.cfg
setup.py
tox.ini

README.rst

Scrapy Inline Requests

Documentation Status Coverage Status Code Quality Status Requirements Status

A decorator for writing coroutine-like spider callbacks.

Quickstart

The spider below shows a simple use case of scraping a page and following a few links:

from inline_requests import inline_requests
from scrapy import Spider, Request

class MySpider(Spider):
    name = 'myspider'
    start_urls = ['http://httpbin.org/html']

    @inline_requests
    def parse(self, response):
        urls = [response.url]
        for i in range(10):
            next_url = response.urljoin('?page=%d' % i)
            try:
                next_resp = yield Request(next_url)
                urls.append(next_resp.url)
            except Exception:
                self.logger.info("Failed request %s", i, exc_info=True)

        yield {'urls': urls}

See the examples/ directory for a more complex spider.

Warning

The generator resumes its execution when a request's response is processed, this means the generator won't be resume after yielding an item or a request with it's own callback.

Known Issues

  • Middlewares can drop or ignore non-200 status responses causing the callback to not continue its execution. This can be overcome by using the flag handle_httpstatus_all. See the httperror middleware documentation.
  • High concurrency and large responses can cause higher memory usage.
  • This decorator assumes your method have the following signature (self, response).
  • Wrapped requests may not be able to be serialized by persistent backends.
  • Unless you know what you are doing, the decorated method must be a spider method and return a generator instance.