Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a NO_CALLBACK value for Request.callback #5798
Implement a NO_CALLBACK value for Request.callback #5798
Changes from 11 commits
50500a6
a493464
5c1559f
4242ae4
818d69f
9272c4a
c883a13
9d07be6
1f3e428
f03b47d
ccd1385
e169947
389fd99
4239f7e
78eaf06
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update the
FilesPipeline
as well: https://github.com/scrapy/scrapy/blob/master/scrapy/pipelines/files.py#L520There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since tests were actually passing for files, I had a deeper look at it, and I think tests pass because those
Request(u)
objects are later parsed with the very method here, so the callback gets set before the request object leaves the middleware. So I think no further changes may be necessary specific to files or images.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, it might still be cleaner to add the no callback marker to these requests, as they're not supposed to use "parse" callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we set callback to
NO_CALLBACK
twice, inget_media_requests
and in_process_request
(both called fromprocess_item
, one after the other)?I am not against it, I just want to be certain that I made it clear enough that the reason the callback is not set here is because these request objects are processed further before they leave the pipeline, so with the current code there is no risk of anything outside the pipeline itself to receive a request with
callback=None
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think if the reader sees
FilesPipeline.get_media_requests()
withRequest(u, callback=NO_CALLBACK)
, it helps re-assure the idea that theparse()
method isn't supposed to be involved here.Although they could also further inspect
MediaPipeline._process_request()
and see thatNO_CALLBACK
is assigned, they won't have to ifFilesPipeline.get_media_requests()
already shows it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m making the change, then.
I wonder if we should go further, though, by changing
_process_request
to:callback
isNone
.callback
is anything other thanNone
orNO_CALLBACK
. Or the same behavior as above, to avoid a backward-incompatible change. But I think it may be wise to actually break such code, to force users to not set a callback that is being reset in_process_request
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I'm not quite sure about this, since there might be some Scrapy project out there that does things differently with their MediaPipeline/FilePipeline. For example, they've overridden
_process_request
to not directly use the downloader.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The callback is actually not just reset, but stored and used. So maybe my point is void, we should continue to support callbacks on requests from
get_media_requests()
as usual._process_request
will make sure that the request leaves the pipeline withcallback=NO_CALLBACK
, but the original callback will be called nonetheless by the pipeline.