New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HTTP basic auth. #6495

Merged
merged 13 commits into from Sep 17, 2018

Conversation

Projects
None yet
4 participants
@benjyw
Copy link
Contributor

benjyw commented Sep 12, 2018

Currently pants has no concept of authentication against a remote service (e.g., a remote artifact cache or a stats upload endpoint). This change introduces HTTP basic auth functionality.

This includes:

  1. A subsystem that configures basic auth providers, mapping a name to a url that accepts basic auth creds.
  2. Code that sends basic auth creds to the url, and captures the persistent cookies it returns. Said creds may be provided explicitly, or via netrc.
  3. A login task that triggers 2. It currently accepts no creds, so it can only be used with netrc.
  4. Application of those cookies to stats upload requests (but not, yet, cache requests).

@benjyw benjyw requested review from stuhood and jsirois Sep 12, 2018

benjyw added some commits Sep 12, 2018

@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Sep 12, 2018

See also fast-follow: #6496

@jsirois
Copy link
Member

jsirois left a comment

LGTM mod the concurrent cookiejar mutation issue and - I think - a missed invalidation issue on cookier jar move.

Show resolved Hide resolved src/python/pants/auth/__init__.py Outdated
Show resolved Hide resolved src/python/pants/auth/basic_auth.py Outdated
Show resolved Hide resolved src/python/pants/auth/basic_auth.py Outdated
Show resolved Hide resolved src/python/pants/auth/cookies.py Outdated
Show resolved Hide resolved src/python/pants/auth/cookies.py Outdated
Show resolved Hide resolved tests/python/pants_test/auth/test_basic_auth.py
import os
import threading

from six.moves import BaseHTTPServer

This comment has been minimized.

@jsirois

jsirois Sep 12, 2018

Member

from future.moves.http.server import BaseHTTPServer

Show resolved Hide resolved tests/python/pants_test/auth/test_basic_auth.py Outdated
cookie_jar = self.get_cookie_jar()
for cookie in cookies:
cookie_jar.set_cookie(cookie)
cookie_jar.save()

This comment has been minimized.

@jsirois

jsirois Sep 12, 2018

Member

I think the saves will need OwnerPrintingInterProcessFileLock since the the default (and sensible) path is global. The LWPCookieJar docs don't say anything about multi-process safety and a quick look at the 2.7 source says its not. Towards this end, maybe it makes more sense to default the path to ~/.cache/pants/auth/... or ~/.config/pants/auth/... to give a more scoped location for the lock file. Not fully convinced of this latter bit.

This comment has been minimized.

@benjyw

benjyw Sep 13, 2018

Contributor

Hmm, you are correct.

It's slightly weird, now that I think of it, that logins will be shared across all repos. However putting the cookies under .pants.d seems wrong, because we probably don't want a clean to log you out. Or do we? Maybe we do. I can't make up my mind on this. Thoughts?

If we put the cookies under .pants.d then the regular pants lock will suffice.

This comment has been minimized.

@jsirois

jsirois Sep 13, 2018

Member

No - I think sharing across repos makes loads of sense. It might not for the services you have in mind ;), but - in general, I should only have to log into a given provider url once. If two repos both use that same provider url, don't make me log in again.

@classmethod
def register_options(cls, register):
super(Cookies, cls).register_options(register)
register('--path', advanced=True, default='~/.pants.cookies',

This comment has been minimized.

@jsirois

jsirois Sep 12, 2018

Member

A change in this option should invalidate the Login task.

This comment has been minimized.

@benjyw

benjyw Sep 13, 2018

Contributor

I guess you're right conceptually, although that task doesn't currently consult the invalidation cache at all. It could if we put the cookies under .pants.d, as discussed earlier. I'm now leaning more towards that, I guess. Will think on it for a few minutes.

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Sep 12, 2018

A larger PR description would be helpful in reviewing this... there are a lot of moving parts, and I've not seen basic auth used with session cookies before. In particular, it's not obvious how someone would use this to auth with a ~/.netrc, for example (which we have support for elsewhere in pants).

@stuhood
Copy link
Member

stuhood left a comment

Would it be possible to include a doc change with this one? That would likely be more useful than the PR description edit.

from pants.task.task import Task


class Login(Task):

This comment has been minimized.

@stuhood

stuhood Sep 12, 2018

Member

Having a task for this seems like overkill. Could this just be a memoized method instead?

This comment has been minimized.

@benjyw

benjyw Sep 13, 2018

Contributor

How would a user trigger this method?

This comment has been minimized.

@stuhood

stuhood Sep 13, 2018

Member

I realized why this is a synchronous task in my second comment, but forgot to remove this. This task is intended to be used directly... but clearly the .netrc usecase doesn't justify a manual login step, so I'm guessing you guys have some more advanced usage in mind.

requested_providers = list(filter(None, [self.get_options().to] + self.get_passthru_args()))
if len(requested_providers) != 1:
raise TaskError('Must specify exactly one provider.')
# TODO: An interactive mode where we prompt for creds. Currently we assume they are in netrc.

This comment has been minimized.

@stuhood

stuhood Sep 12, 2018

Member

Gotcha.

@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Sep 13, 2018

@stuhood will add docs. When you say pants supports .netrc elsewhere, what's an example? I assume this support just falls out of the fact that requests consults .netrc?

@jsirois

This comment has been minimized.

Copy link
Member

jsirois commented Sep 13, 2018

When you say pants supports .netrc elsewhere, what's an example?

There is only one and its not a good example. We have to physically plumb netrc to ivy because it isn't unix-savvy like requests. We use this to log in to artifactory and sonatype.

@stuhood
Copy link
Member

stuhood left a comment

When you say pants supports .netrc elsewhere, what's an example?

There is only one and its not a good example. We have to physically plumb netrc to ivy because it isn't unix-savvy like requests. We use this to log in to artifactory and sonatype.

Right. That's what I was thinking of. But come to think of it, if requests is already going to consult the .netrc, then enabling basic auth for buildcache probably doesn't need any additional infrastructure.

Fine with adding this, but giving it at least one "real" usecase in the pants codebase would be good.

from pants.task.task import Task


class Login(Task):

This comment has been minimized.

@stuhood

stuhood Sep 13, 2018

Member

I realized why this is a synchronous task in my second comment, but forgot to remove this. This task is intended to be used directly... but clearly the .netrc usecase doesn't justify a manual login step, so I'm guessing you guys have some more advanced usage in mind.

@jsirois

This comment has been minimized.

Copy link
Member

jsirois commented Sep 13, 2018

so I'm guessing you guys have some more advanced usage in mind.

Not super-advanced, just any login process that needs to be interactive. IE:

  • includes "do you grant XYZ privielge?" prompts
  • includes "do you want to let 3rdparty OAuth provider X let us know your email?

Etc. The standard stuff you see when you delegate authority these days with Google, Github, etc.

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Sep 13, 2018

Gotcha. Works for me although a concrete usecase in the codebase would be awesome.

@illicitonion : What are the chances that gRPC auth to the RBE environment would be able to use this mechanism?

@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Sep 13, 2018

Even if not interactive (e.g., relying on netrc), you still need this login, so you can exchange creds for a server session cookie. Otherwise you have to auth on every request, which is not typically what you want.

@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Sep 13, 2018

I mean, the creds will be sent on every request to the host in that case, but the server doesn't want to have to check them every time. But true, in the non-interactive case the login can be implicit. So the explicit login task is more for interactive logins, or for servers that don't want to deal with this.

@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Sep 13, 2018

The more I think about this, the more convinced I am that auth should be per-repo. One reason is that you may want to be using different creds for the same service on different repos, and with a global cookie file we can't support that (granted, we can't support that with netrc anyway, but once we allow interactive creds entry, this becomes feasible).

The price of requiring you to log in once per repo doesn't seem too crazy. In fact, it seems quite intuitive. Like using different browsers. And this would allow us to put the cookie file under the protection of the regular pants lock.

@stuhood , any thoughts on this?

@jsirois

This comment has been minimized.

Copy link
Member

jsirois commented Sep 14, 2018

The more I think about this, the more convinced I am that auth should be per-repo....

I would think if a service wanted the same user to log in with different creds, they'd hit different endpoints (urls), ie: https://accelerator/pantsbuild/pants (we bought the fancy plan) vs https://accelerator/pantsbuild/pex (we only paid for the standard plan).

As to the analogy, I think its more like opening several tabs to the same site in the same browser and having to log in in each tab.

My one tool is pants, I clone my pants-using repo 3 times - I shouldn't have to log in in each clone (I actually have 3 pants clones I actively use).

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Sep 14, 2018

@stuhood , any thoughts on this?

Earlier today we were discussing what should be in .pants.d vs in ~/.cache, and the other consideration beyond "which scope is it accurate in" is: "what do I have to remove to clear it". In particular, bootstrapping the zinc bridge jar takes on the order of 20 seconds, and in order to avoid drastically increasing integration test times by rebuilding it, it needs to go out in ~/.cache.

In the context of logins: it feels like it would be super annoying to need to log back in after a ./pants clean-all. So even if you were to make this per-repo config, I think you might want to store it outside of .pants.d... possibly by giving each repo a human configurable key but then storing information outside in ~/.cache.

But I don't know the usecase well enough to push in one direction or the other.

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Sep 14, 2018

And yea, John's point makes sense. If this is a thing you can put in the URL, that would be preferable I expect.

@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Sep 14, 2018

OK, can leave it global for now and revisit later. I will need to add locking around the file then.

But note that hitting different paths will require all services under one auth to share a path prefix. Not all servers may be set up that way.

@illicitonion

This comment has been minimized.

Copy link
Contributor

illicitonion commented Sep 14, 2018

Gotcha. Works for me although a concrete usecase in the codebase would be awesome.

@illicitonion : What are the chances that gRPC auth to the RBE environment would be able to use this mechanism?

From skimming the docs, nope :(

@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Sep 16, 2018

Just realized that if you want different logins for different repos you can point them to different cookie files via config, so we have that base covered. Will add locking around the file.

benjyw added some commits Sep 16, 2018

@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Sep 17, 2018

OK added locking and a couple of other fixes. PTAQL.

help='Path to file that stores persistent cookies.')
register('--path', advanced=True, fingerprint=True, default=None,
help='Path to file that stores persistent cookies. '
'Defaults to <pants bootstrap dir>/auth/cookies.')

This comment has been minimized.

@jsirois

jsirois Sep 17, 2018

Member

Any reason not to just populate default above? default=os.path.join(register.bootstrap.pants_bootstrapdir, 'auth', 'cookies') should do it.

This comment has been minimized.

@benjyw

benjyw Sep 17, 2018

Contributor

Only that I didn't remember how to access pants_bootstrapdir... Done.

@stuhood
Copy link
Member

stuhood left a comment

The docs are great. Thanks!

@benjyw benjyw merged commit 0ee8419 into pantsbuild:master Sep 17, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@benjyw benjyw deleted the benjyw:basic_auth2 branch Sep 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment