-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Bind request to response only if not bound already #4529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Binding at core/engine.py predates our public release (83dcf8a) Binding at core/scraper.py came 4 years later (0256a34#diff-efe9358e1656d355258b3d73c1a4d189R139) Pablo's merge commit explain why we need scraper's bounding 1e1cc76 |
I'm not sure I understand the implicatiions of the change. But scrapy-splash had a similar challenge (if I understand the problem properly), and in the response it provides url + a few other attributes of both requests, via separate attributes. I recall implementing it was not nice, and there are caveats, but it was doable. Don't recall the details, but https://github.com/scrapy-plugins/scrapy-splash/blob/e40ca4f3b367ab463273bee1357d3edfe0601f0d/scrapy_splash/response.py#L22 can be a good starting point. |
Right, I should have checked the blame 😅 For the record, all tests pass with the above changes. |
Summary
Currently requests are bound to responses in the engine and/or in the scraper.
I would like to start a conversation to figure out the implications of binding them only if the
response.request
attribute is not already set, i.e. something like this:Motivation
Recently I've been working on a downloader middleware to transparently redirect requests to a service which takes an URL and returns its response body along with some other metadata. The
process_request
method replaces the URL, and theprocess_response
decodes the response body and creates a new response, making it look like it came directly from the target website, not from the download service. At this point, I'd like to be able to restore the original request as well, but any request I return in theResponse.request
attribute gets overridden.Describe alternatives you've considered
ATM I don't see a way to modify this behaviour without altering the Scrapy core.
The text was updated successfully, but these errors were encountered: