Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't create GCSFileSystem with instance of google.oauth2.service_account.Credentials #151

Closed
jiajie-chen opened this issue Jun 10, 2019 · 5 comments

Comments

@jiajie-chen
Copy link

jiajie-chen commented Jun 10, 2019

When trying to create an instance of GCSFileSystem by passing in a service account Credentials object, an error is thrown in the _connect_token method.

System Details:
Python version 3.6.7 64-bit, on Ubuntu 18.04.1 (WSL)
gcsfs==0.2.2
google-auth==1.6.3
google-auth-oauthlib==0.3.0

Expected Behavior:
For GCSFileSystem to successfully connect with provided the service account credentials.

Actual Behavior:
ValueError: Token format no understood

Steps to Recreate:
Run this code (fill in the relevant details):

import google.oauth2.service_account
import gcsfs

# Make google.oauth2.service_account.Credentials object
key_path = "/path/to/key_file.json"
scopes = ["https://www.googleapis.com/auth/devstorage.full_control"]
credentials = (
    google.oauth2.service_account.Credentials.from_service_account_file(
        key_path, scopes=scopes)
)
# Try to make gcsfs (will raise "ValueError: Token format no understood")
project="PROJECT-NAME"
gcsfs.GCSFileSystem(project=project,token=credentials)
@leehart
Copy link

leehart commented Jun 11, 2019

I'm hitting similar confusion with a service account, where the class of object seems ambiguous.
Maybe tripping on elif isinstance(token, Credentials): in core.py

I can't pass the credentials of a service account as a token (maybe I shouldn't be doing this), e.g.

gcs = gcsfs.GCSFileSystem(project='myproject-jupyterhub', token=gcs_service_account.session.credentials)

The type of the credentials object seems to vary:

gcs_service_account.session.credentials
<google.oauth2.service_account.Credentials at 0x7fa6d25aa6d8>
gcs_service_account.session.credentials
<google.auth.compute_engine.credentials.Credentials at 0x7fd2697608d0>

So I guess the check for a valid Credentials object is being confused by the namespace clash? e.g.

from google.oauth2.credentials import Credentials as oauth2Credentials
isinstance(gcs_service_account.session.credentials, oauth2Credentials)
from google.auth.compute_engine import Credentials as authCredentials
isinstance(gcs_service_account.session.credentials, authCredentials)

Or maybe this is expected behaviour and the wrong way to roll?

Ref: #26

@jiajie-chen
Copy link
Author

jiajie-chen commented Jun 11, 2019

I think the issue is that the Google auth libraries implement different concrete types of Credentials, for App Engine, OAuth2, service accounts, etc.

Looking at the auth library's source code it seems the correct class to check is google.auth.credentials.Credentials, which is the ABC that all the other classes use.

As far as use-cases go, I'm currently using gcsfs with Dask to run automated reporting on GCE. We're using Airflow to automate the reporting tasks, so we're getting a service account Credential from Airflow's hook system.

Since it's automated, it doesn't make sense to auth with a user token. In particular, GCE and GKE make use of service accounts to manage authentication from VM instances, so it makes sense to use the built-in service account for this.

@martindurant
Copy link
Member

Google have had a history of moving around these definitions, so happy to change the instance check to whatever seems to be the most appropriate right now. Would someone like to put this in a PR?

@leehart
Copy link

leehart commented Jun 11, 2019

I haven't tried changing just the instance check yet, but I can at least say that changing

from google.oauth2.credentials import Credentials

To

from google.auth.credentials import Credentials

in gcsfs/core.py
allows me to pass a service account's session.credentials as a token, i.e. it satisfies isinstance(token, Credentials) in core.py, with its type being reported as google.oauth2.service_account.Credentials

It looks like Credentials is used in one other place in core.py, in _dict_to_credentials:

            token = Credentials(
                None, refresh_token=token['refresh_token'],
                client_secret=token['client_secret'],
                client_id=token['client_id'],
                token_uri='https://www.googleapis.com/oauth2/v4/token',
                scopes=[self.scope]

So I gather that would need testing, if the PR went the route of changing the import.

I run into downstream errors with Failed to Serialize, but I expect that's a different issue.

I have a similar use-case to @jiajie-chen, using gcsfs with Dask, where I ideally want to use a service account rather than authenticate via a user account or a browser token (or cached .gcs_tokens).

@jiajie-chen
Copy link
Author

jiajie-chen commented Jun 11, 2019

@martindurant I can put together a quick PR for this, but it might need some further testing to ensure all classes of Credentials work correctly.

@leehart the Failed to Serialize error might have something to do with deepcopying Credential objects? I think something in them isn't serializable/deepcopyable, but it needs further investigation.
EDIT: Nevermind, I'm getting a similar error, it doesn't seem to be gcsfs/credential related:
TypeError: ('Could not serialize object of type tuple.', "<snip>")

martindurant added a commit that referenced this issue Jun 12, 2019
Fix for Issue #151 - Google Credentials isinstance check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants