Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_try_credentials requires read permissions #202

Closed
telenieko opened this issue Aug 28, 2018 · 6 comments
Closed

_try_credentials requires read permissions #202

telenieko opened this issue Aug 28, 2018 · 6 comments

Comments

@telenieko
Copy link

Hi,

I am trying out pandas-gbq as part of a data loading process (we load lots of stuff into DataFrame's for cleanup and processing, and the final result is loaded into BigQuery).

On that use case, the account used needs only write access to BigQuery but _try_credentials issues a SELECT which makes it fail.

Moreover, this is run on GCP so google.auth.default() succeeds but as the SELECT fails, the library tries interactive authentication which fails.

Either _try_credentials should not require read permissions (unless on a read operation) or the method should be optional.

Related to #198

@max-sixty
Copy link
Contributor

Thanks for the issue @telenieko

I think supporting write-only creds is probably further off than some of the more immediate auth changes; though we may get it for free after some of the clean-up.

If you need this sooner, you could try:

  • Uploading to a dataset or project with read+write auth, and then a separate script to copy the tables to the dataset you don't want to allow read auth
  • As a last resort, you could try monkey patching _try_credentials to avoid running the SELECT 1 query
  • You could also try using google.cloud.bigquery, though you lose some of the wrappings that pandas_gbq provides

@telenieko
Copy link
Author

Thanks for the quick reply.
I am currently monkey patching _try_credentials :)

@max-sixty
Copy link
Contributor

If this works well for you, I'd be very open to a PR that offered an option to skip that step.

(though @tswast let us know if you think that's a poor middle ground prior to the auth overhaul)

@tswast
Copy link
Collaborator

tswast commented Aug 28, 2018

I'm still thinking about how we want to handle try_credentials in the future. The main reason it's used right now is to avoid the problem that GCE credentials might not work even if you can get them because they have the incorrect scopes.

I have a try_credentials argument over in https://github.com/pydata/pydata-google-auth where I'm doing some of the auth refactoring, but could be convinced to drop them. For example, I don't know how we'd safely check for write permissions. It'd probably be fine for the API to fail in this case, and expect the user to manually specify user credentials if they don't like the default credentials.

@tswast
Copy link
Collaborator

tswast commented Aug 28, 2018

A short-term solution could be to only run _try_credentials for read_gbq and not to_gbq. I'd welcome a PR that does that.

@max-sixty
Copy link
Contributor

max-sixty commented Aug 28, 2018

It'd probably be fine for the API to fail in this case, and expect the user to manually specify user credentials if they don't like the default credentials.

I imagine, with fairly low confidence, that users with such defined permissions are a) not that common, b) sophisticated enough to supply the correct creds

Edit: ...and so 👍 to your suggestion

tswast added a commit to pydata/pydata-google-auth that referenced this issue Sep 7, 2018
Trim pydata-google-auth package and add tests

This is the initial version of the proposed pydata-google-auth package (to be used by pandas-gbq and ibis). It includes two methods:

* `pydata_google_auth.default()`
  * A function that does the same as pandas-gbq does auth currently. Tries `google.auth.default()` and then falls back to user credentials.
* `pydata_google_auth.get_user_credentials()`
  * A public `get_user_credentials()` function, as proposed in googleapis/python-bigquery-pandas#161. Missing in this implementation is a more configurable way to adjust credentials caching. I currently use the `reauth` logic from pandas-gbq.

I drop `try_credentials()`, as it makes less sense when this module might be used for other APIs besides BigQuery. Plus there were problems with `try_credentials()` even for pandas-gbq (googleapis/python-bigquery-pandas#202, googleapis/python-bigquery-pandas#198).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants