-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds ROBOTSTXT_USER_AGENT setting #3966
Conversation
cffb38a
to
00fe05e
Compare
Codecov Report
@@ Coverage Diff @@
## master #3966 +/- ##
==========================================
+ Coverage 85.35% 85.39% +0.03%
==========================================
Files 167 167
Lines 9699 9724 +25
Branches 1453 1456 +3
==========================================
+ Hits 8279 8304 +25
Misses 1162 1162
Partials 258 258
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great.
I’ve left just a couple of comments regarding documentation.
@@ -1074,6 +1074,21 @@ implementing the methods described below. | |||
.. autoclass:: RobotParser | |||
:members: | |||
|
|||
RobotsTxtMiddleware Settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The setting should only be described in one page, probably the settings page. In this page you can instead mention that the user agent that the middleware uses may be overridden with this setting, providing a link to the settings page entry. See how ROBOTSTXT_OBEY
is referenced in this page but its documentation is in the settings page only.
docs/topics/settings.rst
Outdated
@@ -1409,7 +1421,9 @@ USER_AGENT | |||
|
|||
Default: ``"Scrapy/VERSION (+https://scrapy.org)"`` | |||
|
|||
The default User-Agent to use when crawling, unless overridden. | |||
The default User-Agent to use when crawling, unless overridden. This user agent is | |||
also used in robots.txt if :setting:`ROBOTSTXT_USER_AGENT` setting is ``None`` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of in robots.txt
, I would say by RobotsTxtMiddleware
, with a link to the middleware documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice feature!
Thanks @anubhavp28! |
Fixes #3931