-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attribute to control offsite filtering #3691
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3691 +/- ##
==========================================
+ Coverage 84.54% 84.71% +0.17%
==========================================
Files 167 168 +1
Lines 9420 9460 +40
Branches 1402 1407 +5
==========================================
+ Hits 7964 8014 +50
+ Misses 1199 1188 -11
- Partials 257 258 +1
|
LGTM |
scrapy/spidermiddlewares/offsite.py
Outdated
@@ -28,7 +28,7 @@ def from_crawler(cls, crawler): | |||
def process_spider_output(self, response, result, spider): | |||
for x in result: | |||
if isinstance(x, Request): | |||
if x.dont_filter or self.should_follow(x, spider): | |||
if x.meta.get('allow_offsite_requests', False) or self.should_follow(x, spider): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking this would keep this change backward-compatible:
if x.meta.get('allow_offsite_requests', False) or self.should_follow(x, spider): | |
if x.meta.get('allow_offsite_requests', x.dont_filter) or self.should_follow(x, spider): |
If you agree, then I suggest you also change your test changes, so that you keep existing tests unchanged and simply add additional tests scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too sure about this even with the advantage of backward compatibility because when we want to duplicate a request but not skip the offsite filter ( which I assume is the case for majority of instances where we set dont_filter=True) we have to add meta={'allow_offsite': False} which is extra work. Plus there will be two ways to skip offsite filtering.
Hence I'm proposing to do this in a backward incompatible way if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not opposing to a backward-incompatible change here, but lets get more feedback.
Author might have a point about making this backward-incompatible on purpose being the best choice, but I am not sure.
This is a fix for #3690