Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Digest auth middleware #305

Closed

Conversation

yeraydiazdiaz
Copy link
Contributor

This PR implements Digest authentication as a middleware.

Digest is mentioned as pending work in the README.

Tested against httpbin.org's digest auth endpoint and mimicking request's implementation.

Note this PR does not include handling the auth-int Quality of Protection (qop) option in the RFC mainly because requests does not do it either. I'm happy to add it in a separate PR if it is deemed necessary though.

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whole bunch of little comments, looks pretty good so far! :)

httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/utils.py Outdated Show resolved Hide resolved
Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great piece of work here — I didn't realize Digest Auth was this complex to put together.

Obviously then, we need to make sure the implementation is nailed down and as easy to understand as possible. I've left a couple of comments with a focus on readability, as I reviewed this following along the Digest access authentication page on Wikipedia. 😄

As a general comment on ._build_auth_header(), perhaps we can clearly separate the various parts of the function with some comments that give a hint on the general algorithm:

  1. Extract all useful variables.
  2. Compute the client nonce.
  3. Compute the digest response. (This is where we compute HA1, HA2, and the final call to digest().)
  4. Assemble parts of the authorization header. (Construction of format_args.)
  5. Build the final authorization header.

I haven't looked at the test code yet.

httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
@yeraydiazdiaz
Copy link
Contributor Author

Thanks for the reviews @sethmlarson and @florimondmanca, brace for incoming commits 😄

@yeraydiazdiaz
Copy link
Contributor Author

This is now ready for re-review after implementing most of the comments.

@florimondmanca I couldn't find a clean way of applying your suggestion of laying the code out as a hint for the general algorithm since values produced in the calculation of the response are also used in the final digest value. I felt like after it was already cleaner after the minor refactorings but if you have suggestions let me know.

Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body of the header building logic looks good to me, a couple more comments. I’ll take a look at tests soon. :)

httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
@yeraydiazdiaz
Copy link
Contributor Author

@encode/httpx-maintainers this is ready for re-review. Thanks for all the helpful comments and links. 🌟

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny comment from me then I'm happy with this implementation. 🎉

tests/client/test_auth.py Show resolved Hide resolved
@tomchristie
Copy link
Member

This is fabulous stuff!

I've got one suggestion here, to consider...

  • Currently auth= accepts either a tuple, or a request->request callable.
  • I think we should change that so that auth= accepts either a tuple, or a request->request callable, or an instance of BaseMiddleware.

Then we can:

  • Drop the DigestAuth model.
  • Just name the basic auth middleware BasicAuth.
  • Just name the digest auth middleware DigestAuth.
  • Import both BasicAuth and DigestAuth from the top level of the package.

That way we're not introducing an extra model, and we have a nice consistency where auth=httpx.BasicAuth(username=..., password=...) is also a valid expression.

Make sense?

httpx/client.py Outdated Show resolved Hide resolved
@tomchristie
Copy link
Member

tomchristie commented Sep 4, 2019

Plus, if a user is implementing a custom auth style that needs to be introduced as middleware, then they can still install it using auth=..., rather than installing it as middleware.

That'll alsp give us better ability to control any "only run auth if we're still on the initial request origin". (Because if a user installed auth as middleware instead, we don't exactly know that it's an authenticating thing.)

@yeraydiazdiaz
Copy link
Contributor Author

Sounds good to me! The only thing that comes to mind is the check for BaseMiddleware may be a bit broad, but given that we're only allowing users to pass them as auth I think it should be fine. 👍

@yeraydiazdiaz
Copy link
Contributor Author

@tomchristie this is ready for review with your suggested changes

Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few final comments from me as well.

About accepting any BaseMiddleware as auth — should we introduce a BaseAuthMiddleware? This base middleware wouldn’t have anything special but at least we’d have some semantic enforcement, and we wouldn’t be able to pass eg a RedirectMiddleware in there.

httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/client.py Outdated Show resolved Hide resolved
@florimondmanca
Copy link
Member

florimondmanca commented Sep 5, 2019

One question: have we validated that this works when making a real request to a server exposing digest auth, eg httpbin? :)

httpx/models.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
@yeraydiazdiaz
Copy link
Contributor Author

Glad you asked! I have been testing this against httpbin.org which has been quite useful. So much in fact that I thought we should consider having a "integration" test suite using its Docker image.

Not sure how we feel about that, but just now I noticed that my test script fails for some of the algorithms 😓

@florimondmanca
Copy link
Member

florimondmanca commented Sep 5, 2019

@yeraydiazdiaz Did you know about pytest-httpbin? It spins up a local httpbin instance (Flask server) for requesting during tests. I provide an example in #298 since the Requests test suite use it and I didn’t want to change the test setup too much.

We’re already live-testing most of the time via the uvicorn server, but I agree providing integration tests for cases when we use a mock dispatcher (here and in HTTP/2) would be nice. Not sure if httpbin supports HTTP/2 though.

Anyway, I think if we agree that we’ll end up using pytest-httpbin via #298, you could setup an integration test with it here?

@yeraydiazdiaz
Copy link
Contributor Author

Oh cool! I don't know how I missed that. I'll definitely add it to the PR.

Any thoughts on whether we should keep the mock dispatch tests?

@florimondmanca
Copy link
Member

Any thoughts on whether we should keep the mock dispatch tests?

I think we should keep those that can’t be addressed via httpbin (eg special error cases or situations).

@yeraydiazdiaz
Copy link
Contributor Author

So turns out it wasn't a bug in the implementation but rather a fairly subtle usability bug.

Since the DigestAuth holds state you cannot pass an instance to the client and reuse it for subsequent requests; you must pass a new instance to each request for it to work correctly. I've pushed a check to disallow passing DigestAuth instances on client initialization.

Side note about tests with pytest-httpbin, I think I'll leave it out of this PR for clarity. If this PR is merged before #298 I'm happy to add the tests there.

@sethmlarson
Copy link
Contributor

I'd think we want people passing in auth as a part of client init. This will be a problem when people implement custom middleware as well.

@florimondmanca
Copy link
Member

florimondmanca commented Sep 5, 2019

Since the DigestAuth holds state you cannot pass an instance to the client and reuse it for subsequent requests; you must pass a new instance to each request for it to work correctly.

Ah, good point. Since DigestAuth() only accepts static arguments (username and password), I think we should be able to rework the implementation to hide this detail from the user.

For example, the whole content of DigestAuth.call() could be moved into a separate (stateful) helper class which DigestAuth would instanciate and defer request handling to — if that makes sense?

(One learning point of this is that middleware shouldn’t hold per-request state themselves.)

Side note about tests with pytest-httpbin, I think I'll leave it out of this PR for clarity. If this PR is merged before #298 I'm happy to add the tests there.

Sounds good to me. :)

httpx/middleware/auth.py Outdated Show resolved Hide resolved
httpx/middleware/auth.py Outdated Show resolved Hide resolved
self.per_nonce_count = per_nonce_count
self.num_401_responses = 0

async def __call__(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still worried about this middleware not being able to handle multiple requests concurrently. Unless we mitigate somehow we're going to have to warn users implementing middleware that hold onto state about how to properly implement them and the pitfalls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still worried about this middleware not being able to handle multiple requests concurrently

Not sure what you mean? It is capable of handling concurrent requests, it's just that there's global state to keep track of. Or am I missing something?



class DigestAuth(BaseMiddleware):
per_nonce_count: typing.Dict[bytes, int] = defaultdict(lambda: 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note on this global variable. Requests has a much pragmatic approach to nonce counting, it simply keeps track of the last nonce and increases the nonce count if it is the same. However I chose to adhere a bit more closely to the RFC at the potential risk of this structure growing uncontrollably.

I'm not sure what the best solution for this would be. Maybe keep the last N nonce values? Reset on successful authentication?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Just make it a global. It's actually a benefit for it to be shared across clients, across threads, across tasks.
  2. Make it essentially be an LRU dict, storing N items. (1000, 10000, or whatever). Whenever it's more than N items, delete the first entry. (Easy now that python dicts are ordered)
  3. We don't need the _RequestDigestAuth class. Nonce state stuff can just be a dumb global, and everything can just be implemented on the middleware class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make it a global

It is a mutable class variable so it should qualify as global, do you mean moving it to the module?

We don't need the _RequestDigestAuth class. Nonce state stuff can just be a dumb global, and everything can just be implemented on the middleware class.

I had this intuition as well, but the num_401_requests can't really be global since it would prevent concurrent non-authenticated requests from going through the flow correctly. For instance if I fire 3 concurrent requests the only the first one would go through the flow and the rest would assume the legitimate 401 responses are due to credentials being invalid and return the 401 as the final response.

Following closely the RFC the ideal solution would be to prevent the additional requests to be sent until the first authentication succeeds and then reuse the auth header (which the RFC suggests we should).

However this logic can become even more complex as the server might decide to reject one of the requests and ask for reauthentication so we'd have to block concurrent requests again.

It seemed like quite a complex solution, though there are some benefits, like potentially saving a round trip for authentication but the RFC says the server may or may not take advantage of it. In the end I decided to take the pragmatic approach requests seems to take.

Of course if we decide otherwise I'm happy to implement the full fledged solution.

Make it essentially be an LRU dict, storing N items. (1000, 10000, or whatever). Whenever it's more than N items, delete the first entry. (Easy now that python dicts are ordered)

Yep, that's what I was thinking as well 👍

@yeraydiazdiaz
Copy link
Contributor Author

@sethmlarson @florimondmanca any opinions on #305 (comment)?

Personally I'm torn between having a more complete implementation and introducing much more complexity in this middleware.

For reference request's implementation takes a much more pragmatic approach that has been valuable for many years (the fact that auth-int has not been implemented in that time is probably a testament on how many servers actually require it).

@sethmlarson
Copy link
Contributor

@yeraydiazdiaz Hmm requests uses thread locals to hold onto state... I'm a lot more comfortable with how they're implementing it than with putting so much state into globals / class variables. Could we do something with contextvars potentially?

@florimondmanca
Copy link
Member

florimondmanca commented Sep 7, 2019

@yeraydiazdiaz In retrospect, I agree with @tomchristie that the request middleware is not strictly necessary. I think we should be able to convert all utility methods into regular functions, and end up with the typical __init__()/__call__() pair of methods, with the logic in __call__ delegating as much as possible to the helper functions.

I also think @sethmlarson's idea about context variables is interesting. I have the intuition we can use them to ditch num_401_responses, and only track whether the server has already rejected our credentials. It's hard to explain in words without implementing the whole thing, so I might draft a PR to illustrate?

Lastly, to help reduce the scope of this PR I'm going to submit one for refactoring the middleware into separate modules. This way we'll be able to solely focus on middleware/digest_auth.py, if that sounds alright?

@yeraydiazdiaz
Copy link
Contributor Author

Thanks all.

Hmm requests uses thread locals to hold onto state... I'm a lot more comfortable with how they're implementing it than with putting so much state into globals / class variables. Could we do something with contextvars potentially?

This is something I suggested in a TODO comment on my first attempt on introducing Digest auth, @tomchristie was not too keen on them and suggested using class variables. Personally I agree with him but I'm not particularly familiar with contextvars and their benefits, maybe someone can break them down for me?

I have the intuition we can use them to ditch num_401_responses, and only track whether the server has already rejected our credentials. It's hard to explain in words without implementing the whole thing, so I might draft a PR to illustrate?

Sure, actually I'd be happy to close this PR in favor of yours if that's preferable. Frankly this one has gotten unwieldy and there's just too much information for me to figure out what the correct approach would be. Maybe it's best if someone with a fresh pair of eyes approaches the problem.

I would suggest scoping out how much we want to stick to the RFC, there's quite a bit of SHOULDs and MAYs in there that requests does not implement on top of them not having to support concurrency.

Digest auth: fix merge conflicts with master
@florimondmanca
Copy link
Member

florimondmanca commented Sep 8, 2019

Personally I agree with him but I'm not particularly familiar with contextvars and their benefits, maybe someone can break them down for me?

Not a contextvars expert either, but the way I see them context variables are attached to a particular async context, i.e. a stack of coroutines that await one another. (Put differently, coroutines started concurrently via loop.create_task() or a trio nursery won't share the same async context.)

One typical use case I can think of is setting a context-local REQUEST_ID of some kind, e.g. for application monitoring. Gist

That said, when I wrote…

…and only track whether the server has already rejected our credentials

we don't have to use context variables to track that kind of information. A dirty trick would be to add _credentials_rejected=False to the middleware signature, and set the flag to True when we make the recursive call. In fact, seeing the difficulties faced in #326 I also think using contextvars when we don't have to is probably not the best way to go. :-)

@tomchristie
Copy link
Member

Lastly, to help reduce the scope of this PR I'm going to submit one for refactoring the middleware into separate modules. This way we'll be able to solely focus on middleware/digest_auth.py, if that sounds alright?

Fantastic, yes!

@florimondmanca
Copy link
Member

@yeraydiazdiaz

Sure, actually I'd be happy to close this PR in favor of yours if that's preferable.

I think we're nearly there, though! If this PR has too much old/now irrelevant content (which I'd definitely agree with), we can perfectly close it and re-open one to start from a clean slate. I can also open a PR to your fork with a refactoring proposal, if you'd like? :-)

@tomchristie
Copy link
Member

I'd suggest that the best tack onto this would be to start with a PR for Digest Auth that simply doesn't include any nonce checking at all.

We can then issue another PR on top of that that adds the nonce checking and nothing else. I think I have a clear handle onto that, but it will make it far easier to discuss and tackle if we're treating it in isolation. 👍

@yeraydiazdiaz
Copy link
Contributor Author

Thanks all, I'll close this one and pick off bits on a new one as Tom suggests 🌟 🌟 🌟

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants