Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.splithost parses incorrectly #43078

Closed
onlynone mannequin opened this issue Mar 23, 2006 · 3 comments
Closed

urllib.splithost parses incorrectly #43078

onlynone mannequin opened this issue Mar 23, 2006 · 3 comments
Labels
stdlib Python modules in the Lib dir

Comments

@onlynone
Copy link
Mannequin

onlynone mannequin commented Mar 23, 2006

BPO 1457264
Nosy @birkenfeld

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2006-03-26.21:00:33.000>
created_at = <Date 2006-03-23.20:49:08.000>
labels = ['library']
title = 'urllib.splithost parses incorrectly'
updated_at = <Date 2006-03-26.21:00:33.000>
user = 'https://bugs.python.org/onlynone'

bugs.python.org fields:

activity = <Date 2006-03-26.21:00:33.000>
actor = 'georg.brandl'
assignee = 'none'
closed = True
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2006-03-23.20:49:08.000>
creator = 'onlynone'
dependencies = []
files = []
hgrepos = []
issue_num = 1457264
keywords = []
message_count = 3.0
messages = ['27856', '27857', '27858']
nosy_count = 2.0
nosy_names = ['georg.brandl', 'onlynone']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue1457264'
versions = ['Python 2.3']

@onlynone
Copy link
Mannequin Author

onlynone mannequin commented Mar 23, 2006

urllib.splithost(url) requires that the url passed in
be of the form '//host[:port]/path'. Yet I've run
across some urls that are of the form
'//host[:port]?querystring'. This causes splithost to
return everything as the host and nothing as the path.

Section 3.2 of rfc2396 (Uniform Resource Identifiers:
Generic Syntax) states that 'The authority component is
preceded by a double slash "//" and is terminated by
the next slash "/", question-mark "?", or by the end of
the URI.'

Also, this is how it defines a URI:

absoluteURI   = scheme ":" ( hier_part | opaque_part )
hier_part     = ( net_path | abs_path ) [ "?" query ]
net_path      = "//" authority [ abs_path ]
abs_path      = "/"  path_segments

Based on the above, you could certainly have:
'http://authority?query' as a valid url.

In python2.3 you would just need to change line 939 in
urllib.py from:

        _hostprog = re.compile('^//([^/]*)(.*)$')

to:

        _hostprog = re.compile('^//([^/?]*)(.*)$')

This appears to affect all python versions, I just
happened to be using 2.3.

@onlynone onlynone mannequin closed this as completed Mar 23, 2006
@onlynone onlynone mannequin added the stdlib Python modules in the Lib dir label Mar 23, 2006
@onlynone onlynone mannequin closed this as completed Mar 23, 2006
@onlynone onlynone mannequin added the stdlib Python modules in the Lib dir label Mar 23, 2006
@onlynone
Copy link
Mannequin Author

onlynone mannequin commented Mar 24, 2006

Logged In: YES
user_id=1299996

The problem I was having specifically was that the url had a
colon in the query string. Since the query string was being
parsed as part of the host, the text after the colon was
being treated as the port when urllib.splitport was called
later. The following is a simple testcase:

import urllib2
webpage = urllib2.urlopen("http://host.com?a=b:3b")

You will then get a "httplib.InvalidURL: nonnumeric port: '3b'"

@birkenfeld
Copy link
Member

Logged In: YES
user_id=849994

Fixed in rev. 43330.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

1 participant