urllib.FancyURLopener.redirect_internal looses data on POST! #42869

kxroberto · 2006-02-04T17:35:20Z

BPO	1424148
Nosy	@akuchling, @orsenthil, @devdanzin
Dependencies	bpo-549151: urllib2 POSTs on redirect

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/orsenthil'
closed_at = <Date 2009-08-20.14:50:49.637>
created_at = <Date 2006-02-04.17:35:20.000>
labels = ['easy', 'invalid', 'type-bug', 'library']
title = 'urllib.FancyURLopener.redirect_internal looses data on POST!'
updated_at = <Date 2009-08-20.14:50:49.635>
user = 'https://bugs.python.org/kxroberto'

bugs.python.org fields:

activity = <Date 2009-08-20.14:50:49.635>
actor = 'orsenthil'
assignee = 'orsenthil'
closed = True
closed_date = <Date 2009-08-20.14:50:49.637>
closer = 'orsenthil'
components = ['Library (Lib)']
creation = <Date 2006-02-04.17:35:20.000>
creator = 'kxroberto'
dependencies = ['549151']
files = []
hgrepos = []
issue_num = 1424148
keywords = ['easy']
message_count = 16.0
messages = ['27426', '27427', '27428', '27429', '27430', '27431', '27432', '27433', '62579', '86314', '91560', '91570', '91571', '91605', '91655', '91777']
nosy_count = 7.0
nosy_names = ['akuchling', 'jjlee', 'jimjjewett', 'orsenthil', 'kxroberto', 'ajaksu2', 'crocowhile']
pr_nums = []
priority = 'low'
resolution = 'not a bug'
stage = 'test needed'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue1424148'
versions = ['Python 2.6', 'Python 2.5']

kxroberto · 2006-02-04T17:35:20Z

    def redirect_internal(self, url, fp, errcode,
errmsg, headers, data):
        if 'location' in headers:
            newurl = headers['location']
        elif 'uri' in headers:
            newurl = headers['uri']
        else:
            return
        void = fp.read()
        fp.close()
        # In case the server sent a relative URL, join
with original:
        newurl = basejoin(self.type + ":" + url, newurl)
        return self.open(newurl)

... has to become ...

    def redirect_internal(self, url, fp, errcode,
errmsg, headers, data):
        if 'location' in headers:
            newurl = headers['location']
        elif 'uri' in headers:
            newurl = headers['uri']
        else:
            return
        void = fp.read()
        fp.close()
        # In case the server sent a relative URL, join
with original:
        newurl = basejoin(self.type + ":" + url, newurl)
        return self.open(newurl,data)

... i guess? ( ",data" added )

Robert

kxroberto · 2006-02-04T20:10:21Z

Logged In: YES
user_id=972995

Found http://www.faqs.org/rfcs/rfc2616.html (below).
But the behaviour is still strange, and the bug even more
serious: a silent redirection of a POST as GET without data
is obscure for a Python language. Leads to unpredictable
results. The cut half execution is not stopable and all is
left to a good reaction of the server, and complex
reinterpreation of the client. Python urllibX should by
default yield the 30X code for a POST redirection and
provide the first HTML: usually a redirection HTML stub with
< a href=...
That would be consistent with the RFC: the User
(=Application! not Python!) can redirect under full control
without generating a wrong call! In my application, a bug
was long unseen because of this wrong behaviour. with
30X-stub it would have been easy to discover and understand ...

urllib2 has the same bug with POST redirection.

=======
10.3.2 301 Moved Permanently

The requested resource has been assigned a new permanent
URI and any
future references to this resource SHOULD use one of the
returned
URIs. Clients with link editing capabilities ought to
automatically
re-link references to the Request-URI to one or more of
the new
references returned by the server, where possible. This
response is
cacheable unless indicated otherwise.

The new permanent URI SHOULD be given by the Location
field in the
response. Unless the request method was HEAD, the entity
of the
response SHOULD contain a short hypertext note with a
hyperlink to
the new URI(s).

If the 301 status code is received in response to a
request other
than GET or HEAD, the user agent MUST NOT automatically
redirect the
request unless it can be confirmed by the user, since
this might
change the conditions under which the request was issued.

  Note: When automatically redirecting a POST request after
  receiving a 301 status code, some existing HTTP/1.0

user agents
will erroneously change it into a GET request.

jjlee · 2006-02-06T00:54:06Z

Logged In: YES
user_id=261020

This is not a bug.
See the long discussion here:
http://python.org/sf/549151

kxroberto · 2006-02-06T10:29:18Z

Logged In: YES
user_id=972995

http://python.org/sf/549151

the analyzation of the browsers is right. lynx is best ok to
ask.
But urllibX is not a browser (application) but a lib: As of
now with standard urllibX error handling you cannot code a lynx.

gvr's initial suggestion to raise a clear error (with
redirection-link as attribute of the exception value) is
best ok. Another option would be to simly yield the
undirected stub HTML and leave the 30X-code (and redirection
LOCATION in header).

To redirect POST as GET _while_ simply loosing (!) the data
(and not appending it to the GET-URL) is most bad for a lib.
Transcribing smart a short formlike POST to a GET w QUERY
would be so la la.
Don't know if the MS & netscape's also transpose to GET with
long data? ...

The current behaviour is most worst of all 4. All other
methods whould at least have raisen an early hint/error in
my case.

jimjjewett · 2006-02-06T17:57:18Z

Logged In: YES
user_id=764593

In theory, a GET may be automatic, but a POST requires user
interaction, so the user can be held accountable for the
results of a POST, but not of a GET.

Often, the page will respond to either; not sending the
queries protects privacy in case of problems, and works more
often than not. (That said, I too would prefer a raised
error or a transparent repost, at least as options.)

jjlee · 2006-02-06T20:24:25Z

Logged In: YES
user_id=261020

First, anyone replying to this, *please* read this page (and
the whole of this tracker note!) first:

http://ppewww.ph.gla.ac.uk/~flavell/www/post-redirect.html

kxroberto: you say that with standard urllibX error handling
you cannot get an exception on redirected 301/302/307 POST.
That's not true of urllib2, since you may override
HTTPRedirectHandler.redirect_request(), which method was
designed and documented for precisely that purpose. It
seems sensible to have a default that does what virtually
all browsers do (speaking as a long-time lynx user!). I
don't know about the urllib case.

It's perfectly reasonable to extend urllib (if necessary) to
allow the option of raising an exception. Note that (IIRC!)
urllib's exceptions do not contain the response body data,
however (urllib2's HTTPErrors do contain the response body
data).

It would of course break backwards compatibility to start
raising exceptions by default here. I don't think it's
reasonable to break old code on the basis of a notional
security issue when the de-facto standard web client
behaviour is to do the redirect. In reality, the the only
"security" value of the original prescriptive rule was as a
convention to be followed by white-hat web programmers and
web client implementors to help users avoid unintentionally
re-submitting non-idempotent requests. Since that
convention is NOT followed in the real world (lynx doesn't
count as the real world ;-), I see no value in sticking
rigidly to the original RFC spec -- especially when 2616
even provides 307 precisely in response to this problem.
Other web client libraries, for example libwww-perl and Java
HTTPClient, do the same as Python here IIRC. RFC 2616
section 10.3.4 even suggests web programmers use 302 to get
the behaviour you complain about!

The only doubtful case here is 301. A decision was made on
the default behaviour in that case back when the tracker
item I pointed you to was resolved. I think it's a mistake
to change our minds again on that default behaviour.

kxroberto.seek(nrBytes)
assert kxroberto.readline() == """\
To redirect POST as GET _while_ simply loosing (!) the data
(and not appending it to the GET-URL) is most bad for a lib."""

No. There is no value in supporting behaviour which is
simply contrary to both de-facto and prescriptive standards
(see final paragraph of RFC 2616 section 10.3.3: if we
accept the "GET on POST redirect" rule, we must accept that
the Location header is exactly the URL that should be
followed). FYI, many servers return a redirect URL
containing the urlencoded POST data from the original request.

kxroberto: """Don't know if the MS & netscape's also
transpose to GET with long data? ..."""

urllib2's behaviour (and urllib's, I believe) on these
issues is identical to that of IE and Firefox.

jimjewett: """In theory, a GET may be automatic, but a POST
requires user interaction, so the user can be held
accountable for the results of a POST, but not of a GET."""

That theory has been experimentally falsified ;-)

jimjjewett · 2006-02-06T20:52:23Z

Logged In: YES
user_id=764593

Sorry, I was trying to provide a quick explanation of why we
couldn't just "do the obvious thing" and repost with data.

Yes, I realize that in practice, GET is used for non-
idempotent actions, and POST is (though less often) done
automatically.

But since that is the official policy, I wouldn't want to
bet too heavily against it in a courtroom -- so python
defaults should be at least as conservative as both the spec
and the common practice.

jjlee · 2006-02-06T21:19:28Z

Logged In: YES
user_id=261020

Conservative or not, I see no utility in changing the
default, and several major harmful effects: old code breaks,
and people have to pore over the specs to figure out why
"urlopen() doesn't work".

akuchling · 2008-02-20T00:13:44Z

Can this item be closed, given jjlee's argument against changing the
behaviour?

devdanzin · 2009-04-22T18:48:23Z

I agree that changing the default isn't an option.

However, IMHO, having to override HTTPRedirectHandler.redirect_request
or FancyURLopener.redirect_internal to get RFC compliant (albeit
non-useful in 99.99% of use cases) is a bit weird.

Maybe the docs should contain an example of how to be compliant?

crocowhile · 2009-08-14T16:41:57Z

I am not sure where we stand with this issue. It seems to be an old one.
urllib2 still claim (as of python 2.6) the following;

# Strictly (according to RFC 2616), 301 or 302 in response
# to a POST MUST NOT cause a redirection without confirmation
# from the user (of urllib2, in this case). In practice,
# essentially all clients do redirect in this case, so we
# do the same.
# be conciliant with URIs containing a space

This is just not true, we don't do the same at all. redirect_request
does not pass data along and it even changes the headers to reflect
content-size, thus behaving perfectly in accordance with RFC.

For those who stumbled upon this page looking for a workaround, this is
how to do: create a new class inheriting from HTTPRedirectHandler and
use this one instead:

class AutomaticHTTPRedirectHandler(urllib2.HTTPRedirectHandler):

    def redirect_request(self, req, fp, code, msg, headers, newurl):
        """Return a Request or None in response to a redirect.
        
        The default response in redirect_request claims not to 
        follow directives in RFC 2616 but in fact it does
        This class does not and makes handling 302 with POST
        possible
        """
        m = req.get_method()
        if (code in (301, 302, 303, 307) and m in ("GET", "HEAD")
            or code in (301, 302, 303) and m == "POST"):
            newurl = newurl.replace(' ', '%20')
            return urllib2.Request(newurl,
                           data=req.get_data(),
                           headers=req.headers,
                           origin_req_host=req.get_origin_req_host(),
                           unverifiable=True)
        else:
            raise urllib2.HTTPError(req.get_full_url(), code, msg,
headers, fp)

jjlee · 2009-08-14T20:21:29Z

This issue is not a bug, and should be closed. It was discussed at
length many years ago (different bug tracker ticket), and resolved.
Since then the same issue seems to come up every year or so,
apparently raised by people who haven't checked the issue tracker for
previous discussion. Please, somebody close this issue!

It seems to be an old one.
urllib2 still claim (as of python 2.6) the following;

Strictly (according to RFC 2616), 301 or 302 in response

to a POST MUST NOT cause a redirection without confirmation

from the user (of urllib2, in this case). In practice,

essentially all clients do redirect in this case, so we

do the same.

Note that this is NOT a statement about whether the request sent as a
result of the redirect response contains the original POST data.

This is just not true, we don't do the same at all. redirect_request
does not pass data along and it even changes the headers to reflect
content-size, thus behaving perfectly in accordance with RFC.

This appears to be a statement about (amongst other things) whether
the request sent as a result of the redirect response contains the
original POST data.

So where's the connection between the comment you quote and your
response to it, Giorgio? Actually, I hope you don't mind if I ask you
not to answer that question, but instead to go and read, very
carefully, the tracker discussion for the original fix that introduced
the comment you posted (you should be able to find it by svn
annotating the source, finding the appropriate commit, then looking
for a reference in the commit message to a bug tracker issue ID).
Once you've done that, please stop posting on this issue <0.2 wink>

Sorry, I'm not normally this grumpy, but this issue just seems to keep
coming back forever, because people haven't spent the time to test
browser behaviour, carefully read the RFC, tracker discussion, commit
messages, etc. If you have done all that and thought carefully and
still think there's a bug, by all means come back, but please make
sure you're extremely clear about *exactly* what you think is wrong,
and why. Write a test case, and cite specific RFC wording. If what
you think is wrong is not the same as the original issue described in
the opening comment of this bug tracker ticket, please raise a new
ticket rather than commenting on this one.

For those who stumbled upon this page looking for a workaround, this
is
how to do: create a new class inheriting from HTTPRedirectHandler
and
use this one instead:

I don't know what this is a workaround *for*.

crocowhile · 2009-08-14T20:47:54Z

I don't know what this is a workaround *for*.

As you can see yourself, that code does a complete redirection, taking
along the post_data too which is simply not possible by default (and
that is obviously a pain in the neck).

I never said it was "bug" nor that the code had to be changed. I am just
saying this is "a lack of a feature" that obviously many would like to
see implemented - and this is probably why it "seems to come back forever".

jjlee · 2009-08-15T11:58:14Z

If you have a feature request, please open a separate ticket. This one
is about an alleged bug.

orsenthil · 2009-08-17T02:26:06Z

I am assigning this to myself. I shall do some research on this issue +
plus current standings by other clients/libraries and come out with a
summary.

orsenthil · 2009-08-20T14:50:49Z

I agree with John on this ticket. At the outset, this is Not a bug.
And reading through the referenced ticket indicates the design decision
for the behavior.
In summary:
<quote>
This suggests to me that *no* automatic repeat of POST
requests should ever be done, and that in the case of a 302
or 303 response, a POST should be replaced by a GET; this
may also be done for a 301 response -- even though the
standard calls that an error, it admits that it is done by
old clients.
</quote>
That was Guido's point at that time.

The least that could be done is take a call on 301 response, but this
would break the other clients which rely on 'earlier standard behavior
though not compliant with RFC'.

At the moment, this wont be necessary as it just break clients using
urllib.

Giorgio's point in rekindling this issue, is not related to urllib
module and specifically w.r.t to redirect_request implementation. So, an
alternate behavior is desired on urllib2's redirects (if they are
observed by existing clients), it could be handled by another request.

So, effectively closing this request.

kxroberto mannequin added stdlib Python modules in the Lib dir labels Feb 4, 2006

devdanzin mannequin added type-bug An unexpected behavior, bug, or error labels Feb 12, 2009

devdanzin mannequin added easy labels Apr 22, 2009

orsenthil assigned orsenthil Aug 17, 2009

orsenthil closed this as completed Aug 20, 2009

orsenthil added the invalid label Aug 20, 2009

orsenthil closed this as completed Aug 20, 2009

orsenthil added the invalid label Aug 20, 2009

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

urllib.FancyURLopener.redirect_internal looses data on POST! #42869

urllib.FancyURLopener.redirect_internal looses data on POST! #42869

kxroberto mannequin commented Feb 4, 2006

kxroberto mannequin commented Feb 4, 2006

kxroberto mannequin commented Feb 4, 2006

jjlee mannequin commented Feb 6, 2006

kxroberto mannequin commented Feb 6, 2006

jimjjewett mannequin commented Feb 6, 2006

jjlee mannequin commented Feb 6, 2006

jimjjewett mannequin commented Feb 6, 2006

jjlee mannequin commented Feb 6, 2006

akuchling commented Feb 20, 2008

devdanzin mannequin commented Apr 22, 2009

crocowhile mannequin commented Aug 14, 2009

jjlee mannequin commented Aug 14, 2009

Strictly (according to RFC 2616), 301 or 302 in response

to a POST MUST NOT cause a redirection without confirmation

from the user (of urllib2, in this case). In practice,

essentially all clients do redirect in this case, so we

do the same.

crocowhile mannequin commented Aug 14, 2009

jjlee mannequin commented Aug 15, 2009

orsenthil commented Aug 17, 2009

orsenthil commented Aug 20, 2009

urllib.FancyURLopener.redirect_internal looses data on POST! #42869

urllib.FancyURLopener.redirect_internal looses data on POST! #42869

Comments

kxroberto mannequin commented Feb 4, 2006

kxroberto mannequin commented Feb 4, 2006

kxroberto mannequin commented Feb 4, 2006

jjlee mannequin commented Feb 6, 2006

kxroberto mannequin commented Feb 6, 2006

jimjjewett mannequin commented Feb 6, 2006

jjlee mannequin commented Feb 6, 2006

jimjjewett mannequin commented Feb 6, 2006

jjlee mannequin commented Feb 6, 2006

akuchling commented Feb 20, 2008

devdanzin mannequin commented Apr 22, 2009

crocowhile mannequin commented Aug 14, 2009

jjlee mannequin commented Aug 14, 2009

Strictly (according to RFC 2616), 301 or 302 in response

to a POST MUST NOT cause a redirection without confirmation

from the user (of urllib2, in this case). In practice,

essentially all clients do redirect in this case, so we

do the same.

crocowhile mannequin commented Aug 14, 2009

jjlee mannequin commented Aug 15, 2009

orsenthil commented Aug 17, 2009

orsenthil commented Aug 20, 2009