-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't check robots.txt for local files #5807
Conversation
Could you add a test for it? |
I've no experience with tox, so it may be easier for you to do it. |
Tox is trivial to use ( I will look into it when I can find some time, but for the record, it might take weeks. |
If that test is good enough 🚀 but if you want something more advanced we'll have to wait until you're available. I guess if we are missing 2.8 it doesn't matter if it's merged tomorrow or in a few weeks. |
Codecov Report
@@ Coverage Diff @@
## master #5807 +/- ##
==========================================
+ Coverage 88.91% 88.93% +0.02%
==========================================
Files 162 162
Lines 10988 10990 +2
Branches 1797 1798 +1
==========================================
+ Hits 9770 9774 +4
+ Misses 938 937 -1
+ Partials 280 279 -1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!!!
Thanks! |
Currently robots.txt is attempted to be checked for local files (and data urls). You can reproduce with:
$ pipenv run scrapy runspider test.py -s ROBOTSTXT_OBEY=True
$ pipenv run scrapy runspider test.py -s ROBOTSTXT_OBEY=False
This patch silently ignores those urls so the rest of the project can continue to honour robots.txt.