Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

robotparser only applies first applicable rule #38016

Closed
f8dy mannequin opened this issue Feb 20, 2003 · 3 comments
Closed

robotparser only applies first applicable rule #38016

f8dy mannequin opened this issue Feb 20, 2003 · 3 comments
Assignees
Labels
stdlib Python modules in the Lib dir

Comments

@f8dy
Copy link
Mannequin

f8dy mannequin commented Feb 20, 2003

BPO 690214
Nosy @smontanaro
Files
  • 690214.patch: Patch for robotparser.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/smontanaro'
    closed_at = <Date 2003-03-06.08:27:12.000>
    created_at = <Date 2003-02-20.18:55:13.000>
    labels = ['invalid', 'library']
    title = 'robotparser only applies first applicable rule'
    updated_at = <Date 2003-03-06.08:27:12.000>
    user = 'https://bugs.python.org/f8dy'

    bugs.python.org fields:

    activity = <Date 2003-03-06.08:27:12.000>
    actor = 'skip.montanaro'
    assignee = 'skip.montanaro'
    closed = True
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2003-02-20.18:55:13.000>
    creator = 'f8dy'
    dependencies = []
    files = ['796']
    hgrepos = []
    issue_num = 690214
    keywords = []
    message_count = 3.0
    messages = ['14708', '14709', '14710']
    nosy_count = 3.0
    nosy_names = ['skip.montanaro', 'calvin', 'f8dy']
    pr_nums = []
    priority = 'normal'
    resolution = 'not a bug'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue690214'
    versions = []

    @f8dy
    Copy link
    Mannequin Author

    f8dy mannequin commented Feb 20, 2003

    robotparser robotparser.py::RobotFileParser::can_fetch
    currently returns the result of the first applicable rule. It
    should loop through all rules looking for anything that
    disallows access. For example, if your first rule applies
    to 'wget' and 'python' and disallows access to /dir1/, and
    your second rule is a 'python' rule that disallows access
    to /dir2/, robotparser will falsely claim that python is
    allowed to access /dir2/.

    Patch against current source attached.

    @f8dy f8dy mannequin closed this as completed Feb 20, 2003
    @f8dy f8dy mannequin added the invalid label Feb 20, 2003
    @f8dy f8dy mannequin assigned smontanaro Feb 20, 2003
    @f8dy f8dy mannequin added the stdlib Python modules in the Lib dir label Feb 20, 2003
    @f8dy f8dy mannequin closed this as completed Feb 20, 2003
    @f8dy f8dy mannequin added the invalid label Feb 20, 2003
    @f8dy f8dy mannequin assigned smontanaro Feb 20, 2003
    @f8dy f8dy mannequin added the stdlib Python modules in the Lib dir label Feb 20, 2003
    @calvin
    Copy link
    Mannequin

    calvin mannequin commented Mar 3, 2003

    Logged In: YES
    user_id=9205

    Mark, if you dive into
    http://www.robotstxt.org/wc/norobots-rfc.txt you'll note
    that the first matching user-agent line as well as the first
    matching allow or disallow line must be obeyed by the robot
    (see 3.2.1 and 3.2.2).

    Now, I am not opposed to disobey the above rfc, but there
    are other arguments against your patch:
    a) it breaks current implementations of robots.txt
    (potentially disallowing access to sites)
    b) your problem is easily solved by moving Disallow and/or
    User-Agent lines to the top

    Therefore my count is -1 for this patch.

    Cheers, Bastian

    @smontanaro
    Copy link
    Contributor

    Logged In: YES
    user_id=44345

    Closing as it appears robotparser's behavior matches the rfc as Bastian
    indicated.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant