-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use protego as a default robots.txt parser #4006
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4006 +/- ##
==========================================
+ Coverage 85.43% 85.52% +0.08%
==========================================
Files 167 167
Lines 9732 9741 +9
Branches 1456 1461 +5
==========================================
+ Hits 8315 8331 +16
+ Misses 1159 1153 -6
+ Partials 258 257 -1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good to me, I just left a couple of minor comments.
Co-Authored-By: elacuesta <elacuesta@users.noreply.github.com>
Co-Authored-By: elacuesta <elacuesta@users.noreply.github.com>
Co-Authored-By: Mikhail Korobov <kmike84@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a few minor style issues with the documentation changes. Mainly, inconsistent use of capitalization (paragraphs that start with lowercase, lists where some entries are capitalized while others are not), inconsistent ending of list entries (lists where some entries end in a comma), etc.
I’m thinking it may be better to keep documentation changes here to the minimum here (e.g. just change which parser is the default), and propose a separate pull request which works on improving the documentation of the different parsers.
Co-Authored-By: Adrián Chaves <adrian@chaves.io>
Co-Authored-By: Adrián Chaves <adrian@chaves.io>
Hey! The PR looks fine to me. I'm on fence on it, but maybe it can be nice to review protego API a bit before making it default, to avoid churn in future - e.g. in Python's default parser (in upcoming 3.8) there is .site_maps() method, while in protego there is .sitemaps property. On the other hand, it doesn't affect Scrapy directly, so maybe it is not needed. |
I see no problem here. We don't directly use parsers' API in scrapy. Every parser should provide proper adapter conforming common interface. We tied to interface's API, not parser's API. |
Ok, let's do it! |
Implementation of proposal #3969