Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.FancyURLopener.redirect_internal looses data on POST! #42869

Closed
kxroberto mannequin opened this issue Feb 4, 2006 · 16 comments
Closed

urllib.FancyURLopener.redirect_internal looses data on POST! #42869

kxroberto mannequin opened this issue Feb 4, 2006 · 16 comments
Assignees
Labels
easy stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@kxroberto
Copy link
Mannequin

kxroberto mannequin commented Feb 4, 2006

BPO 1424148
Nosy @akuchling, @orsenthil, @devdanzin
Dependencies
  • bpo-549151: urllib2 POSTs on redirect
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/orsenthil'
    closed_at = <Date 2009-08-20.14:50:49.637>
    created_at = <Date 2006-02-04.17:35:20.000>
    labels = ['easy', 'invalid', 'type-bug', 'library']
    title = 'urllib.FancyURLopener.redirect_internal looses data on POST!'
    updated_at = <Date 2009-08-20.14:50:49.635>
    user = 'https://bugs.python.org/kxroberto'

    bugs.python.org fields:

    activity = <Date 2009-08-20.14:50:49.635>
    actor = 'orsenthil'
    assignee = 'orsenthil'
    closed = True
    closed_date = <Date 2009-08-20.14:50:49.637>
    closer = 'orsenthil'
    components = ['Library (Lib)']
    creation = <Date 2006-02-04.17:35:20.000>
    creator = 'kxroberto'
    dependencies = ['549151']
    files = []
    hgrepos = []
    issue_num = 1424148
    keywords = ['easy']
    message_count = 16.0
    messages = ['27426', '27427', '27428', '27429', '27430', '27431', '27432', '27433', '62579', '86314', '91560', '91570', '91571', '91605', '91655', '91777']
    nosy_count = 7.0
    nosy_names = ['akuchling', 'jjlee', 'jimjjewett', 'orsenthil', 'kxroberto', 'ajaksu2', 'crocowhile']
    pr_nums = []
    priority = 'low'
    resolution = 'not a bug'
    stage = 'test needed'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue1424148'
    versions = ['Python 2.6', 'Python 2.5']

    @kxroberto
    Copy link
    Mannequin Author

    kxroberto mannequin commented Feb 4, 2006

        def redirect_internal(self, url, fp, errcode,
    errmsg, headers, data):
            if 'location' in headers:
                newurl = headers['location']
            elif 'uri' in headers:
                newurl = headers['uri']
            else:
                return
            void = fp.read()
            fp.close()
            # In case the server sent a relative URL, join
    with original:
            newurl = basejoin(self.type + ":" + url, newurl)
            return self.open(newurl)

    ... has to become ...

        def redirect_internal(self, url, fp, errcode,
    errmsg, headers, data):
            if 'location' in headers:
                newurl = headers['location']
            elif 'uri' in headers:
                newurl = headers['uri']
            else:
                return
            void = fp.read()
            fp.close()
            # In case the server sent a relative URL, join
    with original:
            newurl = basejoin(self.type + ":" + url, newurl)
            return self.open(newurl,data)

    ... i guess? ( ",data" added )

    Robert

    @kxroberto kxroberto mannequin added stdlib Python modules in the Lib dir labels Feb 4, 2006
    @kxroberto
    Copy link
    Mannequin Author

    kxroberto mannequin commented Feb 4, 2006

    Logged In: YES
    user_id=972995

    Found http://www.faqs.org/rfcs/rfc2616.html (below).
    But the behaviour is still strange, and the bug even more
    serious: a silent redirection of a POST as GET without data
    is obscure for a Python language. Leads to unpredictable
    results. The cut half execution is not stopable and all is
    left to a good reaction of the server, and complex
    reinterpreation of the client. Python urllibX should by
    default yield the 30X code for a POST redirection and
    provide the first HTML: usually a redirection HTML stub with
    < a href=...
    That would be consistent with the RFC: the User
    (=Application! not Python!) can redirect under full control
    without generating a wrong call! In my application, a bug
    was long unseen because of this wrong behaviour. with
    30X-stub it would have been easy to discover and understand ...

    urllib2 has the same bug with POST redirection.

    =======
    10.3.2 301 Moved Permanently

    The requested resource has been assigned a new permanent
    URI and any
    future references to this resource SHOULD use one of the
    returned
    URIs. Clients with link editing capabilities ought to
    automatically
    re-link references to the Request-URI to one or more of
    the new
    references returned by the server, where possible. This
    response is
    cacheable unless indicated otherwise.

    The new permanent URI SHOULD be given by the Location
    field in the
    response. Unless the request method was HEAD, the entity
    of the
    response SHOULD contain a short hypertext note with a
    hyperlink to
    the new URI(s).

    If the 301 status code is received in response to a
    request other
    than GET or HEAD, the user agent MUST NOT automatically
    redirect the
    request unless it can be confirmed by the user, since
    this might
    change the conditions under which the request was issued.

      Note: When automatically redirecting a POST request after
      receiving a 301 status code, some existing HTTP/1.0
    

    user agents
    will erroneously change it into a GET request.

    @jjlee
    Copy link
    Mannequin

    jjlee mannequin commented Feb 6, 2006

    Logged In: YES
    user_id=261020

    This is not a bug.
    See the long discussion here:
    http://python.org/sf/549151

    @kxroberto
    Copy link
    Mannequin Author

    kxroberto mannequin commented Feb 6, 2006

    Logged In: YES
    user_id=972995

    http://python.org/sf/549151

    the analyzation of the browsers is right. lynx is best ok to
    ask.
    But urllibX is not a browser (application) but a lib: As of
    now with standard urllibX error handling you cannot code a lynx.

    gvr's initial suggestion to raise a clear error (with
    redirection-link as attribute of the exception value) is
    best ok. Another option would be to simly yield the
    undirected stub HTML and leave the 30X-code (and redirection
    LOCATION in header).

    To redirect POST as GET _while_ simply loosing (!) the data
    (and not appending it to the GET-URL) is most bad for a lib.
    Transcribing smart a short formlike POST to a GET w QUERY
    would be so la la.
    Don't know if the MS & netscape's also transpose to GET with
    long data? ...

    The current behaviour is most worst of all 4. All other
    methods whould at least have raisen an early hint/error in
    my case.

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Feb 6, 2006

    Logged In: YES
    user_id=764593

    In theory, a GET may be automatic, but a POST requires user
    interaction, so the user can be held accountable for the
    results of a POST, but not of a GET.

    Often, the page will respond to either; not sending the
    queries protects privacy in case of problems, and works more
    often than not. (That said, I too would prefer a raised
    error or a transparent repost, at least as options.)

    @jjlee
    Copy link
    Mannequin

    jjlee mannequin commented Feb 6, 2006

    Logged In: YES
    user_id=261020

    First, anyone replying to this, *please* read this page (and
    the whole of this tracker note!) first:

    http://ppewww.ph.gla.ac.uk/~flavell/www/post-redirect.html

    kxroberto: you say that with standard urllibX error handling
    you cannot get an exception on redirected 301/302/307 POST.
    That's not true of urllib2, since you may override
    HTTPRedirectHandler.redirect_request(), which method was
    designed and documented for precisely that purpose. It
    seems sensible to have a default that does what virtually
    all browsers do (speaking as a long-time lynx user!). I
    don't know about the urllib case.

    It's perfectly reasonable to extend urllib (if necessary) to
    allow the option of raising an exception. Note that (IIRC!)
    urllib's exceptions do not contain the response body data,
    however (urllib2's HTTPErrors do contain the response body
    data).

    It would of course break backwards compatibility to start
    raising exceptions by default here. I don't think it's
    reasonable to break old code on the basis of a notional
    security issue when the de-facto standard web client
    behaviour is to do the redirect. In reality, the the only
    "security" value of the original prescriptive rule was as a
    convention to be followed by white-hat web programmers and
    web client implementors to help users avoid unintentionally
    re-submitting non-idempotent requests. Since that
    convention is NOT followed in the real world (lynx doesn't
    count as the real world ;-), I see no value in sticking
    rigidly to the original RFC spec -- especially when 2616
    even provides 307 precisely in response to this problem.
    Other web client libraries, for example libwww-perl and Java
    HTTPClient, do the same as Python here IIRC. RFC 2616
    section 10.3.4 even suggests web programmers use 302 to get
    the behaviour you complain about!

    The only doubtful case here is 301. A decision was made on
    the default behaviour in that case back when the tracker
    item I pointed you to was resolved. I think it's a mistake
    to change our minds again on that default behaviour.

    kxroberto.seek(nrBytes)
    assert kxroberto.readline() == """\
    To redirect POST as GET _while_ simply loosing (!) the data
    (and not appending it to the GET-URL) is most bad for a lib."""

    No. There is no value in supporting behaviour which is
    simply contrary to both de-facto and prescriptive standards
    (see final paragraph of RFC 2616 section 10.3.3: if we
    accept the "GET on POST redirect" rule, we must accept that
    the Location header is exactly the URL that should be
    followed). FYI, many servers return a redirect URL
    containing the urlencoded POST data from the original request.

    kxroberto: """Don't know if the MS & netscape's also
    transpose to GET with long data? ..."""

    urllib2's behaviour (and urllib's, I believe) on these
    issues is identical to that of IE and Firefox.

    jimjewett: """In theory, a GET may be automatic, but a POST
    requires user interaction, so the user can be held
    accountable for the results of a POST, but not of a GET."""

    That theory has been experimentally falsified ;-)

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Feb 6, 2006

    Logged In: YES
    user_id=764593

    Sorry, I was trying to provide a quick explanation of why we
    couldn't just "do the obvious thing" and repost with data.

    Yes, I realize that in practice, GET is used for non-
    idempotent actions, and POST is (though less often) done
    automatically.

    But since that is the official policy, I wouldn't want to
    bet too heavily against it in a courtroom -- so python
    defaults should be at least as conservative as both the spec
    and the common practice.

    @jjlee
    Copy link
    Mannequin

    jjlee mannequin commented Feb 6, 2006

    Logged In: YES
    user_id=261020

    Conservative or not, I see no utility in changing the
    default, and several major harmful effects: old code breaks,
    and people have to pore over the specs to figure out why
    "urlopen() doesn't work".

    @akuchling
    Copy link
    Member

    Can this item be closed, given jjlee's argument against changing the
    behaviour?

    @devdanzin devdanzin mannequin added type-bug An unexpected behavior, bug, or error labels Feb 12, 2009
    @devdanzin
    Copy link
    Mannequin

    devdanzin mannequin commented Apr 22, 2009

    I agree that changing the default isn't an option.

    However, IMHO, having to override HTTPRedirectHandler.redirect_request
    or FancyURLopener.redirect_internal to get RFC compliant (albeit
    non-useful in 99.99% of use cases) is a bit weird.

    Maybe the docs should contain an example of how to be compliant?

    @devdanzin devdanzin mannequin added easy labels Apr 22, 2009
    @crocowhile
    Copy link
    Mannequin

    crocowhile mannequin commented Aug 14, 2009

    I am not sure where we stand with this issue. It seems to be an old one.
    urllib2 still claim (as of python 2.6) the following;

    # Strictly (according to RFC 2616), 301 or 302 in response
    # to a POST MUST NOT cause a redirection without confirmation
    # from the user (of urllib2, in this case). In practice,
    # essentially all clients do redirect in this case, so we
    # do the same.
    # be conciliant with URIs containing a space

    This is just not true, we don't do the same at all. redirect_request
    does not pass data along and it even changes the headers to reflect
    content-size, thus behaving perfectly in accordance with RFC.

    For those who stumbled upon this page looking for a workaround, this is
    how to do: create a new class inheriting from HTTPRedirectHandler and
    use this one instead:

    class AutomaticHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    
        def redirect_request(self, req, fp, code, msg, headers, newurl):
            """Return a Request or None in response to a redirect.
            
            The default response in redirect_request claims not to 
            follow directives in RFC 2616 but in fact it does
            This class does not and makes handling 302 with POST
            possible
            """
            m = req.get_method()
            if (code in (301, 302, 303, 307) and m in ("GET", "HEAD")
                or code in (301, 302, 303) and m == "POST"):
                newurl = newurl.replace(' ', '%20')
                return urllib2.Request(newurl,
                               data=req.get_data(),
                               headers=req.headers,
                               origin_req_host=req.get_origin_req_host(),
                               unverifiable=True)
            else:
                raise urllib2.HTTPError(req.get_full_url(), code, msg,
    headers, fp)

    @jjlee
    Copy link
    Mannequin

    jjlee mannequin commented Aug 14, 2009

    This issue is not a bug, and should be closed. It was discussed at
    length many years ago (different bug tracker ticket), and resolved.
    Since then the same issue seems to come up every year or so,
    apparently raised by people who haven't checked the issue tracker for
    previous discussion. Please, somebody close this issue!

    It seems to be an old one.
    urllib2 still claim (as of python 2.6) the following;

    Strictly (according to RFC 2616), 301 or 302 in response

    to a POST MUST NOT cause a redirection without confirmation

    from the user (of urllib2, in this case). In practice,

    essentially all clients do redirect in this case, so we

    do the same.

    Note that this is NOT a statement about whether the request sent as a
    result of the redirect response contains the original POST data.

    This is just not true, we don't do the same at all. redirect_request
    does not pass data along and it even changes the headers to reflect
    content-size, thus behaving perfectly in accordance with RFC.

    This appears to be a statement about (amongst other things) whether
    the request sent as a result of the redirect response contains the
    original POST data.

    So where's the connection between the comment you quote and your
    response to it, Giorgio? Actually, I hope you don't mind if I ask you
    not to answer that question, but instead to go and read, very
    carefully, the tracker discussion for the original fix that introduced
    the comment you posted (you should be able to find it by svn
    annotating the source, finding the appropriate commit, then looking
    for a reference in the commit message to a bug tracker issue ID).
    Once you've done that, please stop posting on this issue <0.2 wink>

    Sorry, I'm not normally this grumpy, but this issue just seems to keep
    coming back forever, because people haven't spent the time to test
    browser behaviour, carefully read the RFC, tracker discussion, commit
    messages, etc. If you have done all that and thought carefully and
    still think there's a bug, by all means come back, but please make
    sure you're extremely clear about *exactly* what you think is wrong,
    and why. Write a test case, and cite specific RFC wording. If what
    you think is wrong is not the same as the original issue described in
    the opening comment of this bug tracker ticket, please raise a new
    ticket rather than commenting on this one.

    For those who stumbled upon this page looking for a workaround, this
    is
    how to do: create a new class inheriting from HTTPRedirectHandler
    and
    use this one instead:

    I don't know what this is a workaround *for*.

    @crocowhile
    Copy link
    Mannequin

    crocowhile mannequin commented Aug 14, 2009

    I don't know what this is a workaround *for*.

    As you can see yourself, that code does a complete redirection, taking
    along the post_data too which is simply not possible by default (and
    that is obviously a pain in the neck).

    I never said it was "bug" nor that the code had to be changed. I am just
    saying this is "a lack of a feature" that obviously many would like to
    see implemented - and this is probably why it "seems to come back forever".

    @jjlee
    Copy link
    Mannequin

    jjlee mannequin commented Aug 15, 2009

    If you have a feature request, please open a separate ticket. This one
    is about an alleged bug.

    @orsenthil
    Copy link
    Member

    I am assigning this to myself. I shall do some research on this issue +
    plus current standings by other clients/libraries and come out with a
    summary.

    @orsenthil
    Copy link
    Member

    I agree with John on this ticket. At the outset, this is Not a bug.
    And reading through the referenced ticket indicates the design decision
    for the behavior.
    In summary:
    <quote>
    This suggests to me that *no* automatic repeat of POST
    requests should ever be done, and that in the case of a 302
    or 303 response, a POST should be replaced by a GET; this
    may also be done for a 301 response -- even though the
    standard calls that an error, it admits that it is done by
    old clients.
    </quote>
    That was Guido's point at that time.

    The least that could be done is take a call on 301 response, but this
    would break the other clients which rely on 'earlier standard behavior
    though not compliant with RFC'.

    At the moment, this wont be necessary as it just break clients using
    urllib.

    Giorgio's point in rekindling this issue, is not related to urllib
    module and specifically w.r.t to redirect_request implementation. So, an
    alternate behavior is desired on urllib2's redirects (if they are
    observed by existing clients), it could be handled by another request.

    So, effectively closing this request.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    easy stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants