-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSoC 2019] Adds integration with Protego robots.txt parser. #3935
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3935 +/- ##
==========================================
- Coverage 85.54% 85.39% -0.16%
==========================================
Files 166 167 +1
Lines 9681 9720 +39
Branches 1445 1455 +10
==========================================
+ Hits 8282 8300 +18
- Misses 1146 1162 +16
- Partials 253 258 +5
|
Github references can now point to https://github.com/anubhavp28/protego Note: I’m delaying a proper review until scrapy/protego#1 is merged and a release with it published in PyPI. |
scrapy/robotstxt.py
Outdated
exc_info=sys.exc_info(), | ||
extra={'spider': self.spider}) | ||
robotstxt_body = '' | ||
robotstxt_body = decode_robotstxt(robotstxt_body, spider, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please use keyword argument to pass boolean value (True), to make code more readable - here and in other places?
tests/test_robotstxt_interface.py
Outdated
|
||
|
||
class ProtegoRobotParserTest(BaseRobotParserTest, unittest.TestCase): | ||
if not rerp_available(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should check for protego, not for rerp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tried running the code, but the code itself looks good to me, thanks @anubhavp28! +1 to merge.
[For Google Summer of Code 2019] Make Scrapy support Protego robots.txt parser.