Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.py https redirect-302 bug #34141

Closed
boswell mannequin opened this issue Mar 13, 2001 · 6 comments
Closed

urllib.py https redirect-302 bug #34141

boswell mannequin opened this issue Mar 13, 2001 · 6 comments
Labels
stdlib Python modules in the Lib dir

Comments

@boswell
Copy link
Mannequin

boswell mannequin commented Mar 13, 2001

BPO 408085

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2001-04-09.14:38:17.000>
created_at = <Date 2001-03-13.01:05:37.000>
labels = ['library']
title = 'urllib.py https redirect-302 bug'
updated_at = <Date 2001-04-09.14:38:17.000>
user = 'https://bugs.python.org/boswell'

bugs.python.org fields:

activity = <Date 2001-04-09.14:38:17.000>
actor = 'moshez'
assignee = 'moshez'
closed = True
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2001-03-13.01:05:37.000>
creator = 'boswell'
dependencies = []
files = []
hgrepos = []
issue_num = 408085
keywords = []
message_count = 6.0
messages = ['3843', '3844', '3845', '3846', '3847', '3848']
nosy_count = 3.0
nosy_names = ['nobody', 'moshez', 'boswell']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue408085'
versions = []

@boswell
Copy link
Mannequin Author

boswell mannequin commented Mar 13, 2001

Using urllib.urlopen("https://...") seems
to hang because of a redirect problem. Looks
like its trying to follow the redirect with
http not https.

>>> import urllib 
>>> params = ... 
>>> f = urllib.urlopen("https://...", params) 
connect: (securesite.com, 80) 
#a printout from httplib, line 354 

Traceback (most recent call last): 
File "<stdin>", line 1, in ? 
File "/usr/local/lib/python2.0/urllib.py", line 63, in
urlopen 
return _urlopener.open(url, data) 
File "/usr/local/lib/python2.0/urllib.py", line 168, in
open 
return getattr(self, name)(url, data) 
File "/usr/local/lib/python2.0/urllib.py", line 367, in
open_https 
data) 
File "/usr/local/lib/python2.0/urllib.py", line 301, in
http_error 
result = method(url, fp, errcode, errmsg, headers,
data) 
File "/usr/local/lib/python2.0/urllib.py", line 537, in
http_error_302 
return self.open(newurl, data) 
File "/usr/local/lib/python2.0/urllib.py", line 168, in
open 
return getattr(self, name)(url, data) 
File "/usr/local/lib/python2.0/urllib.py", line 269, in
open_http 
h.putrequest('POST', selector) 
File "/usr/local/lib/python2.0/httplib.py", line 428,
in putrequest 
self.send(str) 
File "/usr/local/lib/python2.0/httplib.py", line 370,
in send 
self.connect() 
File "/usr/local/lib/python2.0/httplib.py", line 354,
in connect 
self.sock.connect((self.host, self.port)) 
KeyboardInterrupt 
>>>

@boswell boswell mannequin closed this as completed Mar 13, 2001
@boswell boswell mannequin assigned moshez Mar 13, 2001
@boswell boswell mannequin added the stdlib Python modules in the Lib dir label Mar 13, 2001
@boswell boswell mannequin closed this as completed Mar 13, 2001
@boswell boswell mannequin assigned moshez Mar 13, 2001
@boswell boswell mannequin added the stdlib Python modules in the Lib dir label Mar 13, 2001
@moshez
Copy link
Mannequin

moshez mannequin commented Mar 18, 2001

Logged In: YES
user_id=11645

Errr....I'm not sure I see the bug. Perhaps the "Location"
header actually contained an "http://" URL? If you can give
me the site or more information (like a printout of newurl),
I can probably be of more help.

In testing (sadly, against a server inside a firewall, so I
cannot give the URL) I have found that it seems to work.

One thing, that may or may not have to do with your problem:
when POSTing, a 302 means "POST to that other URL", not
"GET that other URL". Many webserver writers seem to ignore
this, and many browsers compensate for that server bug.
urllib2 does *not* compensate for that bug -- I haven't
thought through whether *that* may be the explanation.

@boswell
Copy link
Mannequin Author

boswell mannequin commented Mar 19, 2001

Logged In: YES
user_id=153527

The server is https://trading.etrade.com

Unless you have an account there to try it yourself,
there's not much else specific information I can give you.

I know for sure that the redirection is to another
https url. The "Location" header is actually a relative
one, which is where the bug in urllib.py is. The problem
is that when open_https is called, if an error is
encountered, it calls http_error, which assumes the
url was an http, and so when a relative url is encountered,
just prepends a http:// to the front. I can't think
of an elegant fix to this. Maybe when http_error realizes
it's a relative location, it should prepend "proto" (some
argument to the function that doesn't exist yet) and
prepend THAT one to it...

def open_https(self, url, data=None):
  if errcode == 200:
     return addinfourl(fp, headers, url)
  else:
     if data is None:
        return self.http_error(url, fp, errcode, errmsg,
headers)
     else:
        return self.http_error(url, fp, errcode, errmsg,
headers, data)

... and here's the function called after the error is
realized...

  def http_error_302(self, url, fp, errcode, errmsg,
headers, data=None):
        """Error 302 -- relocated (temporarily)."""
        ######Here's the problem#############
        # In case the server sent a relative URL, join with
original:
        newurl = basejoin("http:" + url, newurl)
	#uh, what if it isn't http? we seem to have lost that
information...
        if data is None:
            return self.open(newurl)
        else:
            return self.open(newurl, data)

I originally was developing my project in JAVA and
had it working, but was realizing that I was re-inventing
the wheel (i.e. redirection handling). So I switched to
Python (for other reasons too). But I went back and
placed a POST instead of GET in the redirection handling
and everything still worked, so as for the possible GET vs.
POST redirect server bug, it wasn't that (although that's
very interesting to know...).

Am I making any sense?

@nobody
Copy link
Mannequin

nobody mannequin commented Mar 26, 2001

Logged In: NO

the location header must be an absolute uri
(rfc2616 section 14.30 and rfc1945 10.11).

@moshez
Copy link
Mannequin

moshez mannequin commented Apr 9, 2001

Logged In: YES
user_id=11645

Fixed in urllib.py v 1.125
urllib.py added http: to the url, instead of self.type.
I haven't checked with the original server or with POSTs
since I couldn't find such a server -- but I verified it by
opening https://sourceforge.net/account which redirects
to https://sourceforge.net/account/. It redirects properly,
unfortunately, but I did check that I'm adding the correct
thing.

@moshez
Copy link
Mannequin

moshez mannequin commented Apr 9, 2001

Logged In: YES
user_id=11645

Forgot to actually close the bug report.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

0 participants