-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
False positive hazards in robotparser #65668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
>>> from urllib.robotparser import RobotFileParser
>>> rp = RobotFileParser('http://en.wikipedia.org/robots.txt')
>>> rp.can_fetch('UbiCrawler', 'http://en.wikipedia.org/index.html')
True
>>> rp.read()
>>> rp.can_fetch('UbiCrawler', 'http://en.wikipedia.org/index.html')
False
>>> rp.mtime()
0
>>> rp.modified()
>>> rp.mtime()
1399740268.628497 Suggested improvements:
|
Attaching a draft patch:
|
Update patch to move the modified() call to parse(). That lets the mtime update whenever rules (either by a read() or by directly parsing text). |
Changes LGTM. This module could certainly use some cleanup and updates. For example, last_changed should be a property and always accessed one way (instead of either .mtime() or .last_changed) and should be initialized to None instead of zero to avoid ambiguity, and the and/or trick should be replaced with if/else. Would anyone review such a patch if I created one? |
Can this change be (easily) tested? If so, a test case akin to your original example would be nice. |
Thanks for the review :-)
Yes, the API is a mess, but I would like to be very conservative with API modifications (preferably none at all) so we don't break the code of very few people who ever cared enough to use this module. My goal here was just to fix the risk of a false positives.
It's too late for fixing the published API. The time for that was when the module was introduced.
Yes, would be a reasonable minor clean-up that wouldn't affect the API.
Yes. Just add the one-line patch to this tracker item and I'll incorporate it with the rest. FWIW, it is perfectly reasonable to add new well-designed API extensions. You can post patches to the open tracker items for Bug 16099 and Bug 21475. |
New changeset 4ea86cd87f95 by Raymond Hettinger in branch '3.4': |
New changeset f67cf5747a26 by Raymond Hettinger in branch '3.4': |
New changeset d4fd55278cec by Raymond Hettinger in branch '2.7': |
New changeset 560320c10564 by Raymond Hettinger in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: