-
-
Notifications
You must be signed in to change notification settings - Fork 33.5k
Open
Labels
Description
Documentation
In the Python documentation for urllib.robotparser, the example currently references a page that is no longer available (musi-cal.com). The example code now points to an inactive website:
>>> import urllib.robotparser
>>> rp = urllib.robotparser.RobotFileParser()
>>> rp.set_url("http://www.musi-cal.com/robots.txt")
>>> rp.read()
>>> rrate = rp.request_rate("*")
>>> rrate.requests
3
>>> rrate.seconds
20
>>> rp.crawl_delay("*")
6
>>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
False
>>> rp.can_fetch("*", "http://www.musi-cal.com/")
TrueAdditionally, the current robots.txt file at http://www.musi-cal.com/robots.txt contains:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.phpBecause of this, both can_fetch() calls now return True, which doesn't align with the expected output from the example.
Proposed fix:
Update the example in urlib.robotparser.rst to replace the outdated musi-cal.com URL with a valid URL (e.g. https://www.python.org).
I would be happy to work on this issue and put together a PR for the update.
Linked PRs
Metadata
Metadata
Assignees
Labels
Projects
Status
Todo