New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Requests scheduled when idle never go through the spider middlewares #542
Comments
The spider middlewares are meant to process the spider output/input. By using the I suppose you have an spider middleware, can you make it a extension or downloader middleware? |
@darkrho I dissagree, requests scheduled with @nside: it's more convenient to call |
@dangra crawl() fixed it for me. Still I'd expect any requests scheduled to have the same treatment, whatever their "entry point" in the pipeline is. Feel free to close if you disagree. |
@nside: what Scrapy version are you using? historically, there were three engine entrypoints for requests: Used to be a big difference between
That said, engine api is not documented and can't be considered stable. |
I'm on 0.21 (dev). These are good subtleties to know! |
TL;DR:
|
Nice TL;DR. I think it should be somewhere in the docs because there are a On Fri, Jan 17, 2014 at 12:35 PM, Daniel Graña notifications@github.comwrote:
|
When my spider enters the idle state I schedule a request through the engine like this:
and I expect that request to go through the spider middlewares. But after some investigation it looks like only requests returned by Spider.start_requests and those returned from callbacks are processed by these middlewares.
Maybe I'm scheduling the request in a bad way but if so it shouldn't be public (ie I'd prefix schedule with _).
The text was updated successfully, but these errors were encountered: