Skip to content

Dead example URL in urlib.robotparser documentation #141444

@lexzlei

Description

@lexzlei

Documentation

In the Python documentation for urllib.robotparser, the example currently references a page that is no longer available (musi-cal.com). The example code now points to an inactive website:

>>> import urllib.robotparser
>>> rp = urllib.robotparser.RobotFileParser()
>>> rp.set_url("http://www.musi-cal.com/robots.txt")
>>> rp.read()
>>> rrate = rp.request_rate("*")
>>> rrate.requests
3
>>> rrate.seconds
20
>>> rp.crawl_delay("*")
6
>>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
False
>>> rp.can_fetch("*", "http://www.musi-cal.com/")
True

Additionally, the current robots.txt file at http://www.musi-cal.com/robots.txt contains:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Because of this, both can_fetch() calls now return True, which doesn't align with the expected output from the example.

Proposed fix:
Update the example in urlib.robotparser.rst to replace the outdated musi-cal.com URL with a valid URL (e.g. https://www.python.org).

I would be happy to work on this issue and put together a PR for the update.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsDocumentation in the Doc direasy

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions