Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access token expires when accessing Google Cloud Storage (GS) objects #803

Open
slagelwa opened this issue Dec 5, 2018 · 3 comments
Open

Comments

@slagelwa
Copy link

slagelwa commented Dec 5, 2018

I’m currently using the GCS_OAUTH_TOKEN environment variable method to provide an OAuth access token to samtools in order to access GCS stored objects (see #390). Obtaining an access token is fairly easy on a Google compute VM with the command “export GCS_OAUTH_TOKEN=$(gcloud auth application-default print-access-token)”. However the application-default access token that is returned expires after 3600 seconds and therefore any long running program/script that attempts to invoke samtools after the expiration period is understandably denied access. As a workaround it’s possible to keep track of the expiration time and reissue a request for a new token prior to invoking samtools – but this is a serious inconvenience and not always possible.

I’m left wondering why htslib doesn’t just try a request the access token itself should no other means of authentication be provided? As I understand it the metadata URL for the token is http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token and you need to pass a header of "Metadata-Flavor: Google".

Example:

joe@test:/tmp$ export GCS_OAUTH_TOKEN=$(gcloud auth application-default print-access-token); echo $GCS_OAUTH_TOKEN
ya29.c.XXXXXXXXXXXXXXXXXXXXX
joe@test:/tmp$ curl -s -S https://www.googleapis.com/oauth2/v1/tokeninfo?access_token="$GCS_OAUTH_TOKEN"  | jq '.expires_in'
2442
joe@test:/tmp$ curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token
{"access_token":"ya29.c.XXXXXXXXXXXXXXXXXXXXX","expires_in":2437,"token_type":"Bearer"}

And there already seems to be code inhfile_libcurl.c that handles expiring tokens that could also be used, or at least work as a model. Then I stumbled across the Add Bearer token support to hfile_libcurl (for htsget) #600 pull, which looks to be where the code came from and does exactly what I think I was just suggesting by setting a HTS_AUTH_LOCATION token. Unfortunately though I wasn’t able to get it to work for GCS -- is this functionality only for htsget ?

@daviesrob
Copy link
Member

HTS_AUTH_LOCATION was created for use with htsget, but it may work with GCS as long as you don't set GCS_OAUTH_TOKEN. The best way to test this would be to use htsfile which allows you to crank up the verbosity enough to see the https transaction. Based on your curl command-line above, a proof-of-concept might be something like this:

mkfifo /tmp/token_fifo
( while true ; do curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token > /tmp/token_fifo ; done ) &
HTS_AUTH_LOCATION=/tmp/token_fifo ./htsfile -vvvvvvvv -c gs://my_bucket/my_file | head

If that works then it should be possible to try something similar on a longer-running process.

@slagelwa
Copy link
Author

slagelwa commented Dec 7, 2018

Confirmed that this does indeed work with htsfile and samtools.

@daviesrob
Copy link
Member

Good to hear this. I'll leave a note here that we need to document this as a better way of supplying the token when using GCS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants