-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
robotparser crawl_delay and request_rate do not work with no matching entry #80103
Comments
RobotFileParser.crawl_delay and RobotFileParser.request_rate raise AttributeError for a robots.txt with no matching entry for the given user-agent, including no default entry, rather than returning None which would be correct according to the documentation. E.g.: >>> from urllib.robotparser import RobotFileParser
>>> parser = RobotFileParser()
>>> parser.parse([])
>>> parser.crawl_delay('example')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/urllib/robotparser.py", line 182, in crawl_delay
return self.default_entry.delay
AttributeError: 'NoneType' object has no attribute 'delay' |
Thanks for your report Joseph, I opened a new PR to fix this. |
The PR is looking good, I'll likely merge it soon. I'm quite sure this should go into 3.8, but should it be backported to 3.7? This is certainly a bugfix, but still a slight change of behavior, so perhaps we should avoid changing this in 3.7? |
Yes, this looks like a bugfix. Who wants an AttributeError? :-) |
Rémi, thanks for the great work writing the PR and quickly going through several iterations of reviews and revisions! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: