robotparser only applies first applicable rule #38016

f8dy · 2003-02-20T18:55:13Z

BPO	690214
Nosy	@smontanaro
Files	690214.patch: Patch for robotparser.py

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/smontanaro'
closed_at = <Date 2003-03-06.08:27:12.000>
created_at = <Date 2003-02-20.18:55:13.000>
labels = ['invalid', 'library']
title = 'robotparser only applies first applicable rule'
updated_at = <Date 2003-03-06.08:27:12.000>
user = 'https://bugs.python.org/f8dy'

bugs.python.org fields:

activity = <Date 2003-03-06.08:27:12.000>
actor = 'skip.montanaro'
assignee = 'skip.montanaro'
closed = True
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2003-02-20.18:55:13.000>
creator = 'f8dy'
dependencies = []
files = ['796']
hgrepos = []
issue_num = 690214
keywords = []
message_count = 3.0
messages = ['14708', '14709', '14710']
nosy_count = 3.0
nosy_names = ['skip.montanaro', 'calvin', 'f8dy']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue690214'
versions = []

f8dy · 2003-02-20T18:55:13Z

robotparser robotparser.py::RobotFileParser::can_fetch
currently returns the result of the first applicable rule. It
should loop through all rules looking for anything that
disallows access. For example, if your first rule applies
to 'wget' and 'python' and disallows access to /dir1/, and
your second rule is a 'python' rule that disallows access
to /dir2/, robotparser will falsely claim that python is
allowed to access /dir2/.

Patch against current source attached.

calvin · 2003-03-03T11:46:44Z

Logged In: YES
user_id=9205

Mark, if you dive into
http://www.robotstxt.org/wc/norobots-rfc.txt you'll note
that the first matching user-agent line as well as the first
matching allow or disallow line must be obeyed by the robot
(see 3.2.1 and 3.2.2).

Now, I am not opposed to disobey the above rfc, but there
are other arguments against your patch:
a) it breaks current implementations of robots.txt
(potentially disallowing access to sites)
b) your problem is easily solved by moving Disallow and/or
User-Agent lines to the top

Therefore my count is -1 for this patch.

Cheers, Bastian

smontanaro · 2003-03-06T08:27:12Z

Logged In: YES
user_id=44345

Closing as it appears robotparser's behavior matches the rfc as Bastian
indicated.

f8dy mannequin closed this as completed Feb 20, 2003

f8dy mannequin added the invalid label Feb 20, 2003

f8dy mannequin assigned smontanaro Feb 20, 2003

f8dy mannequin added the stdlib Python modules in the Lib dir label Feb 20, 2003

f8dy mannequin closed this as completed Feb 20, 2003

f8dy mannequin added the invalid label Feb 20, 2003

f8dy mannequin assigned smontanaro Feb 20, 2003

f8dy mannequin added the stdlib Python modules in the Lib dir label Feb 20, 2003

ezio-melotti transferred this issue from another repository Apr 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

robotparser only applies first applicable rule #38016

robotparser only applies first applicable rule #38016

f8dy mannequin commented Feb 20, 2003

f8dy mannequin commented Feb 20, 2003

calvin mannequin commented Mar 3, 2003

smontanaro commented Mar 6, 2003

robotparser only applies first applicable rule #38016

robotparser only applies first applicable rule #38016

Comments

f8dy mannequin commented Feb 20, 2003

f8dy mannequin commented Feb 20, 2003

calvin mannequin commented Mar 3, 2003

smontanaro commented Mar 6, 2003