Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auth refactor #58

Merged
merged 7 commits into from
Jan 5, 2018
Merged

auth refactor #58

merged 7 commits into from
Jan 5, 2018

Conversation

martindurant
Copy link
Member

Fixes #54
Fixes #56

Trying to turn this around quickly, reviews welcome. Some of the auth modes will not be testable in typical CI or locally.

@martindurant
Copy link
Member Author

So far, I have confirmed manually that using token=None, you get the default token or anon if there are no credentials available, and you can select the various connection methods either by setting token= as required or by directly calling the various methods. Is it useful to have token=False be a proxy to not calling connect() at all?

The attribute method is stored so that in testing (or introspection) we can tell which auth method finally succeeded.

Note that for now, token='cloud' will fail silently (since it would fall back to anon when token=None); perhaps we want to warn and/or raise for some combinations.

@martindurant
Copy link
Member Author

(ps: tests pass locally in direct mode, since in the common case, the connection is not affected at all by these changes)

gcsfs/core.py Outdated
following order: gcloud CLI default, gcsfs cached token, google compute
metadata service, anonymous.
- ``token='gtoken'``, your default gcloud credentials will be used, which
are typically established by doing ``gcloud login`` in a terminal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is gtoken a standard term?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I wanted something different from the previously provided "cloud" term. "gcloud" might otherwise have been more reasonable.

gcsfs/core.py Outdated
- ``token='gtoken'``, your default gcloud credentials will be used, which
are typically established by doing ``gcloud login`` in a terminal.
- ``token=='cache'``, credentials from previoudly successful gcsfs
authentication will be used (use this after "browser" auth succeeded)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spellcheck previoudly

This may be at
``~/.config/gcloud/application_default_credentials.json``,
`` ~/.config/gcloud/credentials``, or
``~\AppData\Roaming\gcloud\credentials``, etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might we also provide the token explicitly as a dictionary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can provide a dictionary - it does say that, but could provide an example.

gcsfs/core.py Outdated
method = 'token'
if method is None:
for meth in ['gtoken', 'cache', 'cloud', 'anon']:
self.connect(method=meth)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should return success or failure? Perhaps it should raise and we should try-except around it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, if self.token remains unset (None), auth failed for that method. I had tried return True/False, but it felt pretty ugly. The exception is 'anon', so that could instead give self.token=False or other special value.

@mrocklin
Copy link
Contributor

mrocklin commented Jan 4, 2018

cc @asford

@mrocklin
Copy link
Contributor

mrocklin commented Jan 4, 2018

I'm trying this out locally and running into a couple of small problems:

  1. Tests fail
gcsfs/tests/test_core.py::test_simple FAILED

==================================================== FAILURES =====================================================
___________________________________________________ test_simple ___________________________________________________

token_restore = None

    @my_vcr.use_cassette(match=['all'])
    def test_simple(token_restore):
        assert not GCSFileSystem.tokens
>       gcs = GCSFileSystem(TEST_PROJECT, token=GOOGLE_TOKEN)

gcsfs/tests/test_core.py:18: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
gcsfs/core.py:188: in __init__
    self.connect(method=token)
gcsfs/core.py:316: in connect
    self.__getattribute__('_connect_' + method)()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <gcsfs.core.GCSFileSystem object at 0x7fc407596978>

    def _connect_token(self):
>       if 'type' in self.token or isinstance(self.token, str):
E       TypeError: argument of type 'NoneType' is not iterable

2.  There are merge conflicts with the fuse branch, though I can guess at how to merge short-term

@asford
Copy link
Collaborator

asford commented Jan 4, 2018

I'm testing a bit now on a GCE instance I use as a dev box. On this machine I've the compute sdk installed, but I'm using the service account for credentials:

fordas@salish:~/distributed_dev/gcsfs$ gcloud auth list
                  Credentialed Accounts
ACTIVE  ACCOUNT
*       353972334481-compute@developer.gserviceaccount.com

In this context auth2client.client.GoogleCredentials.get_application_default returns an token component that manages access to the service credentials and does not include a refresh_token, client_secret. It instead provides access tokens via the get_access_token interface, which handles token expiration and refresh.

This currently breaks the current master implementation, which expects a refresh token to be available. This will also break this implementation.

My recommendation, though it's a bit of a refactor, would be to store the credential object returned from get_application_default and then access the access token via credential.get_access_token().access_token. I can follow up later today to determine if this will also work for situations in which the user has obtained credentials via gcloud auth login.

@asford
Copy link
Collaborator

asford commented Jan 4, 2018

I've tested again after setting the application default credentials to my user account via gcloud auth application-default login. In this context auth2client.client.GoogleCredentials.get_application_default returns a GoogleCredentials object that can also handle access token generation and refresh.

Given that behavior, my current recommendation would be to hold the credential object returned by the get_application_default and use the interface credential.get_access_token().access_token to manage access. gcsfs would then not be responsible for managing auth token management and refresh.

In a broader sense, I would suggest emphasizing use of the default app credentials instead of obtaining and directly caching a credential object for gcsfs. In "off cloud" contexts, the user can then manage authorization via the gcloud auth application-default command, which provides login, revoke. In "on cloud" this then provides easy access to the default service account, with the option for a user to override this setting by gcloud auth login.

@asford
Copy link
Collaborator

asford commented Jan 4, 2018

@martindurant How quickly are you trying to turn this around? If you can wait until tomorrow I can open a PR-onto-this-PR that implements the 'application-default' behavior I've described above. As it stands now the strategy of unpacking the credential object returned by GoogleCredentials will fail in contexts where a service account is available.

@mrocklin
Copy link
Contributor

mrocklin commented Jan 4, 2018

(for context @asford I've asked @martindurant to accelerate some work here for an upcoming conference talk this coming Monday)

(not rerecorded/debugged vcr yet, tests only pass in direct mode)
@martindurant
Copy link
Member Author

@asford , @mrocklin , would it be a good idea to merge this, to allow for the service account PR to follow and fix merge conflicts in the fuse branch (I can handle that).
Again, vcr tests can wait, but should be re-recorded and working before release.

@mrocklin
Copy link
Contributor

mrocklin commented Jan 4, 2018 via email

@asford
Copy link
Collaborator

asford commented Jan 4, 2018

No problem with that plan on my end, we've all been there. Let's plan you merging what's needed for your talk on Monday. I can setup a follow-up PR that handles application default credentials afterward.

In light of that, can we agree that the internal API implemented here may need to change to reflect the final authorization story> I think we'll be able to continue supporting all the named token modes (cloud, anon, browser, etc...) but I'd like to have a chance to think through the client library integration and make sure it's being used properly.

@martindurant As a side note, it may be better if gtoken was named application-default, as that's the name of the related gcloud auth subcommand.

@martindurant
Copy link
Member Author

martindurant commented Jan 5, 2018

@asford , I specifically decided against 'application_default', because it sounds like we think of it as the default, which it isn't. I've decided that 'google_default' is sufficiently unambiguous.

Note as another point for follow up, _call() should maybe check for the continued validity of the token as given by its expiry and call connect again self.connect(method=self.method) if necessary.

@martindurant martindurant changed the title WIP: auth refactor auth refactor Jan 5, 2018
@martindurant martindurant merged commit 899a451 into fsspec:master Jan 5, 2018
@martindurant martindurant mentioned this pull request Jan 8, 2018
hanseaston pushed a commit to hanseaston/gcsfs that referenced this pull request Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants