Failure to get through digest page which redirects to temporary AWS url #627

Closed
vlcinsky opened this Issue May 22, 2012 · 31 comments

Comments

Projects
None yet
4 participants

-- edit --

This relates to finding of @lukas, who found, that hooks are not passed to a new request on redirect, which is not described elsewhere (yet?).

As soon as this hook related problem is resolved, there seem to be solution using hook, removing problem making header Authorization from final redirecting request (as sigmavirus24 has found).

-- end of edit --

I use web app, which first asks user to authenticate (usign digest auth) and then redirects him to temporarily valid url at AWS s3 storage.

Using requests fails to handle this correctly, it reports 401.

Asking for easy pages does not make these problems.

Using the app over web browser works well.

I created test utility::

import requests
from requests.auth import HTTPDigestAuth

def test_redirect_to_easypage():
    url =  'http://localhost:8080/feed'
    auth = HTTPDigestAuth("user", "password")
    params = {"key": "easypage"}
    r = requests.get(url, auth = auth, params = params)
    assert r.status_code == 200
    assert len(r.history) == 2
    assert r.history[1].status_code == 401
    assert r.history[0].status_code == 302
    return

def test_redirect_to_awspage():
    url =  'http://localhost:8080/feed'
    auth = HTTPDigestAuth("user", "password")
    params = {"key": "awsredir"}
    r = requests.get(url, auth = auth, params = params)
    assert r.status_code == 200
    assert len(r.history) == 2
    assert r.history[1].status_code == 401
    assert r.history[0].status_code == 302
    return 

test_redirect_to_awspage fails by 401.

And here is simple web app, written in cherrypy::

"""Test cherrypy web server for "requests" test redirection to AWS S3 temporary url.

2012-05-22 by jan.vlcinsky@gmail.com
"""
import cherrypy
from cherrypy.lib import auth_digest

class DigestForwarder(object):
    """Server, serving only authorized users ("user", "password") by redirecting to some secret destination url

    Home page (gives you instructions and links to use)
        http://localhost:8080/
    Easy page redirection:
        http://localhost:8080/feed?key=easypage
    Redirection to tmp AWS page (valid till end 2012):
        http://localhost:8080/feed?key=awsredir
    """

    targets=dict(
                easypage= "http://docs.python-requests.org/en/latest/index.html",
                awsredir= "http://janvlcinsky.s3.amazonaws.com/files.basex.org/xml/xmark.xml?AWSAccessKeyId=AKIAJBXTB3W6TPH7OEQA&Expires=1356908390&Signature=t6HE4oEdTVx4Mi9Q7TrDbe6%2FQxs%3D"
            )
    @cherrypy.expose
    def index(self):
        login = cherrypy.request.login
        lines = ""
        for key, value in self.targets.items():
            request_url = "/feed?key=" + key
            lines += """<li>Key: %(key)s: <a href="%(request_url)s">%(request_url)s</a></li>""" % locals()

        return """\
        <html>
            <body>
                <h1>Digest based redirector</h1>
                <p>Your user account <b>%(login)s</b> has access to following feeds:</p>
                <ul>
                    %(lines)s
                </ul>
            </body>
        </html>""" % locals()

    @cherrypy.expose
    def feed(self, key):
        if key in self.targets.keys():
          url = self.targets[key]
          raise cherrypy.HTTPRedirect(url, 302)
        else:
          raise cherrypy.HTTPError(400, "Unexpected value of feed key: %s." % key)

        return "You asked for key %(key)s and are redirected to %(ulr)s" % locals()

conf =  {"/": {
            "log.error_file":            "digestforwarder.error.log",
            "log.access_file":           "digestforwarder.logaccess.log",
            "tools.auth_digest.on":      True,
            "tools.auth_digest.realm":   'localhost',
            "tools.auth_digest.get_ha1": cherrypy.lib.auth_digest.get_ha1_dict_plain({'user': 'password'}),
            "tools.auth_digest.key":     'abcd12345efgh',
            "tools.auth_digest.debug":    True
            }
        }
controller = DigestForwarder()
print controller.__doc__
cherrypy.quickstart(controller, config = conf)

Assuming you have cherrypy installed, just run the script which will provide what is needed for testing locally.

Owner

sigmavirus24 commented Nov 26, 2012

@vlcinsky does this still exist in 0.14.2?

I will review this issue in following days, at last within two weeks (so till 10th of December 2012).

Owner

sigmavirus24 commented Nov 26, 2012

@vlcinsky in my impatience I installed cherrypy in a virtualenv and ran your script. If you instead return the response for the awspage function instead of calling assertions, I get

>>> resp = test_redirect_to_awspage()
>>> resp
<Response [400]>
>>> resp.reason
'Bad Request'
>>> resp.history
[<Response [302]>, <Response [401]>]
>>> resp.history[0]
<Response [302]>
>>> resp.history[1]
<Response [401]>
>>> resp.history[1].reason
'Unauthorized'
>>> resp.history[1].request
<Request [GET]>
>>> resp.history[1].request.auth
<requests.auth.HTTPDigestAuth object at 0x...>
>>> resp.history[0].request     
>>> resp.request.auth
<requests.auth.HTTPDigestAuth object at 0x..>
>>> resp.request.auth is resp.history[1].request.auth
True
>>> resp.content
'<?xml version="1.0" encoding="UTF-8"?>\n<Error><Code>InvalidArgument</Code><Message>Either the Signature query string parameter or the Authorization header should be specified, not both</Message><ArgumentValue>Digest username="user", realm="localhost", nonce="1353967118:71f8dba5f51e2a56cee9bbfd7d7b4f2a", uri="/files.basex.org/xml/xmark.xml?AWSAccessKeyId=AKIAJBXTB3W6TPH7OEQA&amp;Expires=1356908390&amp;Signature=t6HE4oEdTVx4Mi9Q7TrDbe6%2FQxs%3D", response="68b0a0f7ce7428b25c850bb11a45f7c4", qop=auth, nc=00000003, cnonce="8612fd9ebd12e180"</ArgumentValue><ArgumentName>Authorization</ArgumentName><RequestId>20D399DFBB080D7B</RequestId><HostId>MyWasdwbty/IODTWvp6UxF7B8uDWd/H4GDlHEXtJyuOVqthmyLdvvhpI58gAz5Ca</HostId></Error>'

What's interesting is that both the Signature query string and Authorization header can not both be specified. There-in lies your problem.

You can probably fix this with a hook that will remove the auth header on redirect. @kennethreitz @Lukasa @piotr-dobrogost would any of you be able to help him write the hook?

Owner

sigmavirus24 commented Nov 26, 2012

Also, this gist has my updated files: https://gist.github.com/a39953b468000fb5515a

Owner

Lukasa commented Nov 26, 2012

Yeah, pulling off the Auth header should be fine. I'll see if I can whip up a quick hook that works.

Owner

sigmavirus24 commented Nov 26, 2012

❤️ @Lukasa

Owner

Lukasa commented Nov 26, 2012

Oh, this has been interesting. Turns out that hooks are not passed to a new request on redirect, as you can see here. In the meantime, I can hack around this. @kennethreitz, is this a bug?

Owner

Lukasa commented Nov 26, 2012

I have to assume it's a bug. If it isn't, there's effectively no way to intercept a redirect and put a hook on it beyond interrupting the normal redirect path entirely, like Digest Auth does with HTTP 401s. I monkeypatched my version of Requests to pass hooks onwards, and was able to resolve this problem.

I'll give Kenneth some time to get back on the bug front. If this is intentional behaviour (it might be), I'll come up with a more complicated hook to do the same job. =)

Owner

sigmavirus24 commented Nov 26, 2012

Yeah, I'm fairly certain this is a bug too. It should be fairly trivial to send on the existing hooks to the next request though.

-- EDIT --

It might be that hooks were added after the _build_response method was written and no one really noticed this before. In that case, I can easily see this being an accidental omission.

Owner

Lukasa commented Nov 26, 2012

Yup, it's a one-line change, just want to make sure there isn't a good reason behind it. =)

Contributor

piotr-dobrogost commented Nov 26, 2012

@Lukasa
Nice find. Generally there's asymmetry between the first request and all subsequent ones. Notice history is not passed, either.

Owner

sigmavirus24 commented Nov 26, 2012

What does your one-line change look like? Adding hook=self.hooks, and using the post_response hook aren't working unless I'm miswriting my callback function.

Owner

sigmavirus24 commented Nov 26, 2012

@piotr-dobrogost, history can't be passed to even the first request, but it isn't exactly important.

Owner

Lukasa commented Nov 26, 2012

The one line change is hook=self.hooks.

In my mental picture the correct type of hook to use here is a pre_send hook, because this is actually enforcing a specific AWS requirement. This means it should take effect only on requests that violate the requirement, namely those that have both an Authorization header and a Signature query string. If you add hook=self.hooks and then register the following hook as a pre_send hook, you should get a 404:

def strip_auth_hook(request):
    url = request.url

    if "signature=" in url.lower():
        # Remove the Authorization header.
        try:
            request.headers.pop('Authorization')
        except KeyError:
            pass

    return

Obviously, this is just a demo hook, a real hook should be more specific about the content of the URL, e.g. it should enforce that it is an AWS URL.

Owner

sigmavirus24 commented Nov 26, 2012

Ah, and this is why I asked you guys for help with the hook writing. I'm not as experienced with writing callbacks and am too distracted with measure theory to have thought this through as you did. Also, I'm sure you have it written correctly locally but you need to return request (more for @vlcinsky's benefit).

@sigmavirus24, @Lukasa , @piotr-dobrogost, you are awesome.
Now I see, it was worth to write the test code.

I found that the file, I was using for testing, got removed from my AWS bucket (I forgot about testing purpose of it).

The file is back, so it shall return some content as soon as this solution comes into effect.

I will have to study your hooks and this stuff, which I am not so familiar with yet, but probably not today, as we have here deep night 00:44 a.m CET.

Thanks for your effort.

Owner

sigmavirus24 commented Nov 26, 2012

You're very welcome.

Owner

Lukasa commented Nov 27, 2012

@sigmavirus24: Actually you don't always have to return the Request, as I didn't do it locally. =) The docs aren't super clear on this, but my reading of them is that if you don't return anything from the hook then the variable passed into the hook is used. Because Python is primarily pass-by-reference, when I pop the item from the dict it affects the version of the dict in the original Request. Admittedly it's not super explicit and so returning the Request is better form, but it does work.

@vlcinsky: Please note that there is probably a bug in Requests that will prevent that hook from functioning as written. When we resolve that, I will post some sample code that functions correctly. =)

-- edit note--
This description was modified to include more general observation about web browser behaviour, so not only Firefox, but also Chromium and IE
-- end of edit note

@ALL
API of request is nice in it's simplicity. Using hook is a solution, but is it really necessary?

Findings from my short investigation

How are browsers behaving - using Authorization header only if redirecting into the same domain

Using Wireshark I looked at how Firefox, Chromium and partially also IE are behaving in regard to use of header Authorization in redirected requests after previous 302 (redirect) using obtained address from Location.

If redirect points into the same domain, it does reuse existing Authorization header in the request.

If redirect points out of current domain, header Authorization is never reused.

Note, that if the same IP has different domain name, then it is considered different domain. I used one domain "localhost" and another "zen" (name of my computer), both having IP 127.0.0.1.

Note about IE test: On MS Windows I had problem to track network traffic with Wireshark on localhost, which is typically invisible to Wireshark. I could only prove, that leaving to external domain from localhost was not including Authorization header. However, it is very likely, it behaves exactly the same way as was observed with Firefox and Chromium.

RFC2617: Authorization header allowed even without preceding 401

http://tools.ietf.org/html/rfc2617#page-4 tells:

A user agent that wishes to authenticate itself with an origin
server--usually, but not necessarily, after receiving a 401
(Unauthorized)--MAY do so by including an Authorization header field
with the request. A client that wishes to authenticate itself with a
proxy--usually, but not necessarily, after receiving a 407 (Proxy
Authentication Required)--MAY do so by including a Proxy-
Authorization header field with the request. Both the Authorization
field value and the Proxy-Authorization field value consist of
credentials containing the authentication information of the client
for the realm of the resource being requested. The user agent MUST
choose to use one of the challenges with the strongest auth-scheme it
understands and request credentials from the user based upon that
challenge.

So it seems, there is no strict rule, defining, if agent (browser) shall first get challenge via 401, and then be allowed to provide Authorization header.

Developers often expect, redirection will be able to reuse Authorization

As redirect goes often within the same domain, it is natural, one expects, it will be able to authorize even on the redirected url.

I found some links about Android and iOS related bugs and implementations, which finally allowed to redirect and authorize using the same credentials. I lost the links to these pages, but it seem to follow the pattern observed in Firefox, Chromium and IE on desktop.

Proposed solutions

NoAutoAuthorization: Do not automatically use Authorization on first redirect, only if challenged

Do not use this

If there is request to be redirected, go there, but do not automatically include Authorization header.

If there comes response, challenging for authentication, do so, but not sooner.

Drawback: this generates one request more in cases, where Authorization is really requested for target url.

HomeAutoAuthorization: Do use Authorization for redirects within the same domain

Use this one, as it follows behaviour seen on desktop browsers

If there is redirection, use Authorization header with first request to new url if Location aims into the same domain as original request, otherwise do not use it.

This is sort of optimized NoAutoAuthorization solution and this one I would prefer.

FailOverAuthorization: Always use Authorization, retry without it if failed

Do not use this, desktop browser never attempted to reuse known credentials (could this be even sort of security risk?)

Always use Authorization header with first request to new location, but if it fails, try once more without header Authorization.

This seems a bit messy approach.

HooksWhereNeeded: Add/Remove Authorization case by case

Do not use this, try to follow behaviour of desktop browsers.

In fact, this is solution, which extends any of previous ones.

The best is to avoid this, as it makes use more complex. But it is fixture which comes handy when needed.

Final thoughts

  • Existing behaviour is not simply wrong, as RFC2617 does allow Authorization header to be placed even without challenge
  • However, behaviour observed with desktop browsers Firefox, Chromium and IE shows use of HomeAutoAuthorization approach. Following the same pattern with Requests would be very intuitive and would also make use quite simple even in the case, reported originally with this bug report.

Question: shall anybody file a bug report for currently discovered problem with hooks being sometime ignored?

Contributor

piotr-dobrogost commented Nov 27, 2012

Very nice summary.

Question: shall anybody file a bug report for currently discovered problem with hooks being sometime ignored?

Yes. It could be the most interested person here - you :)

Owner

sigmavirus24 commented Nov 27, 2012

I love it when people quote RFC's so thank you for doing your research and giving us the necessary information. The problem is your use case seems to be in the 10% and knowing what little I do about Kenneth's design philosophy means that your proposed solutions are likely to not be implemented. As you said, requests is technically correct in how it behaves since the specification is permissive in its language.

Furthermore, I think treating this issue as the hooks issue is the better option to avoid creating more issues than needed. We're just waiting on @kennethreitz to affirm that the exclusion of hooks from the redirected requests is not intentional.

@piotr-dobrogost :-)
My question regarding filing a bug has two sides:

Shall it be filed at all?

In case, the problem is almost fixed and about to be committed and adopted into code, then fileing a new issue is not worth to do.

Who is the best person to write such a bug report

I do my best to describe things, I am at least familiar with or where I can provide good quality description.

But with hook being ignored - I did not even try to use one so far and I would be talking about completely new topic to me.

I agree with @sigmavirus24 we could keep this issue as discussed bug report hooks are not passed to a new request on redirect

I will edit my original description to point to this aspect, if anyone is interested in detailing, do it.

@ALL this is my first experience with Github issue cooperation incorporating more then two persons - I love it. So fast, so effective. It is like chatting while being quite productive.

Owner

Lukasa commented Nov 27, 2012

@vlcinsky: It's probably not worth filing a bug report, I'll just open a pull request with the fix when I get home from work this evening. =)

Owner

sigmavirus24 commented Nov 27, 2012

Also, @Lukasa TIL

Owner

Lukasa commented Nov 27, 2012

Agreed, I think the hooks is the way to go here. =)

Lukasa closed this Nov 27, 2012

Contributor

piotr-dobrogost commented Nov 27, 2012

Maybe I'm missing something here but do we know how major browsers behave in various scenarios described by @vlcinsky in this thread? I think we don't and as long as we don't find out and make Requests follow their behavior the issue of sending authorization data at right moment still exists.

@sigmavirus24

history can't be passed to even the first request, but it isn't exactly important.

It should be passed to the first request (being empty at this moment of course) and all subsequent requests. Every request should have knowledge about history as this could be needed to make decisions in hooks for example.
I did not imply it's important in this particular issue but it's logical to note that here, as this is yet another missing piece of data not propagated to subsequent requests similarly to hooks.
If you agree I'd like to open issue for this.

@piotr-dobrogost I fully agree, that perfect solution should follow, how browsers are doing redirection.

To move this on, I have spent couple of hours, sniffing Firefox, Chromium and IE and have found, they do follow Home AutoAuthorization pattern as described in my longest text above.

Following the same pattern by Requests would follow current "beauty of simple and intuitive API by design" approach, which I like a lot.

It is also possible, current behaviour is sort of security risk - the Authorization header is "reporting" sensitive information to another server. However, I cannot claim, it could be really misused, I am not expert on this topic and there could be some sort of protection against record - reply attack.

Could someone reopen this issue?

For history not being passed, I would ask someone like @piotr-dobrogost to file separate issue, as it is not really affecting resolution of this one, but deserves some attention.

Owner

Lukasa commented Nov 28, 2012

I agree, the Auth handler is doing the wrong thing here. However, for the moment, I'm leaving this issue closed because of the refactor (see #970). The whole mechanism of how we do Auth is going to change (probably), so we'll have to rewrite the Auth handlers. That will be the right time to fix this issue. =)

@Lukasa I see, you prefer issues being closed :-).

But: you agree, the Auth handler is doing things wrong - and you prefer to keep the issue closed.

Isn't this sort of issue maintainance anti-pattern?

Is anyone supposed to review closed issues after 1.0.0 is completed? Are we hoping, someone will just remember it?

To me, milestone "later" being assigned to this issue seems natural. And as 1.0.0 is on the table, it is quite likely, it would be closed even before 1.0.0 is completed.

You and others put some effort on analysing and resolving current problem (so there is some temporary solution in devel branch now), I put some effort on providing bug report and proposal for handling authorization. I think, having this issue open and living under "later" milestone would ensure, this effort will not be forgotten and is more likely to contribute to higher usability and quality of Requests in future.

Other option would be to extract remaining issues from this bug (handling Authorization header with redirect and History not being kept over border of redirect), assign them to "later" milestone and close this bug, as there is temporary solution available.

Owner

Lukasa commented Nov 28, 2012

Oh, sorry, I just reread my message and I didn't adequately explain what I wanted to do.

This issue, the one originally raised, is fixed. The Auth handler is neither broken nor bugged, but in an unusual subset of cases it can leak information. That is a new issue.

We need a new issue that clearly indicates what our actual problem is.

(NB: For the record, I'm way more likely to remember closing the issue than I am to ever look at the milestones. I don't think anyone on this project does that. =D )

Owner

sigmavirus24 commented Nov 28, 2012

@vlcinsky see #975 if you haven't already

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment