Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robotparser does not handle empty paths #36161

Closed
cmalamas mannequin opened this issue Feb 26, 2002 · 2 comments
Closed

Robotparser does not handle empty paths #36161

cmalamas mannequin opened this issue Feb 26, 2002 · 2 comments
Labels
stdlib Python modules in the Lib dir

Comments

@cmalamas
Copy link
Mannequin

cmalamas mannequin commented Feb 26, 2002

BPO 522898
Nosy @loewis

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2002-02-28.15:32:04.000>
created_at = <Date 2002-02-26.10:40:51.000>
labels = ['library']
title = 'Robotparser does not handle empty paths'
updated_at = <Date 2002-02-28.15:32:04.000>
user = 'https://bugs.python.org/cmalamas'

bugs.python.org fields:

activity = <Date 2002-02-28.15:32:04.000>
actor = 'loewis'
assignee = 'none'
closed = True
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2002-02-26.10:40:51.000>
creator = 'cmalamas'
dependencies = []
files = []
hgrepos = []
issue_num = 522898
keywords = []
message_count = 2.0
messages = ['9423', '9424']
nosy_count = 2.0
nosy_names = ['loewis', 'cmalamas']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue522898'
versions = []

@cmalamas
Copy link
Mannequin Author

cmalamas mannequin commented Feb 26, 2002

The robotparser module handles incorrectly empty paths
in the allow/disallow directives.

According to: http://www.robotstxt.org/wc/norobots-
rfc.html, the following rule should be a global
*allow*:
User-agent: *
Disallow:

My reading of the RFC is that an empty path is always
a global allow (for both Allow and Disallow
directives) so that the syntax is backwards
compatible --there was no Allow directive in the
original syntax.

Suggested fix:
robotparser.RuleLine.applies_to() becomes:
def applies_to(self, filename):
if not self.path:
self.allowance = 1
return self.path=="*" or re.match(self.path,
filename)

@cmalamas cmalamas mannequin closed this as completed Feb 26, 2002
@cmalamas cmalamas mannequin added the stdlib Python modules in the Lib dir label Feb 26, 2002
@cmalamas cmalamas mannequin closed this as completed Feb 26, 2002
@cmalamas cmalamas mannequin added the stdlib Python modules in the Lib dir label Feb 26, 2002
@loewis
Copy link
Mannequin

loewis mannequin commented Feb 28, 2002

Logged In: YES
user_id=21627

This is fixed in robotparser.py 1.11.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

0 participants