Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Requester Pays S3 buckets #346

Open
jmarshall opened this issue Mar 2, 2016 · 9 comments
Open

Support Requester Pays S3 buckets #346

jmarshall opened this issue Mar 2, 2016 · 9 comments

Comments

@jmarshall
Copy link
Member

Requester Pays buckets need an extra X-Amz-Request-Payer: requester header.

Clearly it would not be appropriate for htslib/samtools to set it all the time (as it represents explicit acknowledgement from the user that they will be charged). So we could set it if some flag was present in the URL or perhaps via an extra config file key on the profile used. @DonFreed or anyone else: are you aware of any existing practice in this area?

[As reported at biostars.org.]

@DonFreed
Copy link

DonFreed commented Mar 2, 2016

Sorry, I have no experience with requester pays buckets. It seems like a flag would work. A config key might be a little more dangerous as users could potentially make large numbers of requests while forgetting about the profile setting.

@jmarshall
Copy link
Member Author

Some more details about my ideas for how to do this, and the sort of existing practice I'm looking for…

Looking at their source code, it seems that e.g. Tim Kay's aws has a --requester command-line option and s3cmd has --requester-pays similarly.

However because samtools, htslib, and htslib's libcurl plugin are all separate, it would be inconvenient to have this as a samtools/bcftools/other-client-software command-line option that would need to be implemented separately for each client program and communicated to htslib and thence to the plugin. What would be more workable for us would be to encode this flag into the URL (which is of course already the main parameter in the interface to the plugin) and/or into the S3 configuration files that the plugin consults.

There's a few places in an S3 pseudo-URL where such a flag could be bolted on:

s3+requesterpays://bucket/foo/bar.bam
s3://bucket/foo/bar.bam?request_payer=requester
s3://requesterpays@bucket/foo/bar.bam

We already have schemes like s3+http://bucket… but using this for non-transport flags too seems unfortunate. Using the ?query or #fragment parts seems like usurping them from their real uses. We're already using the authority authentication part for [ID:SECRET:TOKEN@]BUCKET, but extending this to encode this flag too seems the most workable. At present we parse this as either [PROFILE@]BUCKET or [ID:SECRET[:TOKEN]@]BUCKET; this could be extended with :requesterpays or so, or even better it could piggy-back off our profile handling.

It appears that aws/s3cmd also add the requester-pays header based on their configuration files: if ~/.awsrc contains --requester and possibly if ~/.s3cfg contains requester_pays. We read the legacy ~/.s3cfg configuration file so could trigger the header based on requester_pays there too.

We also read ~/.aws/credentials and (when I get around to writing the documentation!) will recommend this as the best config file for storing this stuff. So it would be good if there were a key for setting requester-pays in this standard config file. However I don't know if there is such a key, and I don't know where ~/.aws/credentials is documented other than this AWS security blog posting.

As @DonFreed noted, putting this in your configuration file is a little dangerous, but we could recommend a setup like the following:

[default]
aws_access_key_id = mykeyid
aws_secret_access_key = mysecret

[mykeyid_pays]
aws_access_key_id = mykeyid
aws_secret_access_key = mysecret
request_payer_header_value = requester

With that, s3://bucket/foo.bam would not risk accidental paying; to access a requester-pays bucket, you would need to explicitly ask for s3://mykeyid_pays@bucket/foo.bam. (Or set $AWS_PROFILE to mykeyid_pays, but we'd recommend not to do that too.)

So what we're looking for is:

  1. Any existing practice along these lines that we can aim to be compatible with;
  2. Documentation for the settings in the ~/.aws/credentials file, and in particular whether anyone else has invented a key/value for Requester-Pays that we can reuse.

@delagoya
Copy link

* EDITED*

This seems pretty out of scope for htslib. You shoul be able to pre-sign an URL for all S3 requests, including the objects in requester pays, and just pass that to samtools as a regular HTTPS URL. Will create an example and test, then put results here.

OK, I still think this sort of thing is out of scope for htslib, but I am havign trouble (surprise) compiling in the lib-curl bindings to get HTTPS support. Anyone have good and complete instructions for OS X with Homebrew?

@delagoya
Copy link

OK, was able to compile htslib with good libcurl support. Confirmed that it can take a presigned URL to view files:

import boto3
client = boto3.client('s3')
url  = client.generate_presigned_url("get_object", Params={"Bucket":"angel-reqpay","Key":"test.cram" , "RequestPayer":'requester'})

print("./htsfile -h '{0}'".format(url))

Assuming that a local directory version of htsfile is compiled properly the above should be able to produce the head from the pre-signed GET request of a requester pays bucket.

$ python ~/Desktop/s3-presigned-samtools.py
./htsfile -h 'https://angel-reqpay.s3.amazonaws.com/test.cram?AWSAccessKeyId=XXXXXXXXXXXXXXXXX&x-amz-request-payer=requester&Expires=1458309035&Signature=XXXXXXXXXXXXX'
$ ./htsfile -h 'https://angel-reqpay.s3.amazonaws.com/test.cram?AWSAccessKeyId=XXXXXXXXXXXXXXXXX&x-amz-request-payer=requester&Expires=1458309035&Signature=XXXXXXXXXXXXX' | head -4
@HD VN:1.4  GO:none SO:coordinate
@SQ SN:chr1 LN:248956422    AS:GRCh38   M5:6aef897c3d6ff0c78aff06ac189178dd UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa  SP:Human
@SQ SN:chr2 LN:242193529    AS:GRCh38   M5:f98db672eb0993dcfdabafe2a882905c UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa  SP:Human
@SQ SN:chr3 LN:198295559    AS:GRCh38   M5:76635a41ea913a405ded820447d067b0 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa  SP:Human

Since this is my bucket, I can't be 100% sure that my signature is disabling the requester pays option, so I would appreciate it if someone else confirms that the above works.

@delagoya
Copy link

Also the ~/.aws/config is documented here. The requester_pays setting is not supported in that file. At least not by the AWS CLI.

@mp15
Copy link
Member

mp15 commented Mar 18, 2016

I gave the script generated by that URL a try and it seemed to work? I'm not 100% sure where I would check that I got billed rather than you though?

@delagoya
Copy link

You would get billed, but it would be like $0.002 cents at most.

@mp15
Copy link
Member

mp15 commented Mar 18, 2016

Okay, the htsfile version of the command works. However I've realised there's a problem with using this in practice. When you do for example:

$ ./samtools view 'https://angel-reqpay.s3.amazonaws.com/test.cram?AWSAccessKeyId=XXXXXXXXXXXX&x-amz-request-payer=requester&Expires=1458315965&Signature= XXXXXXXXXXXXX' chr6:10000-11000
[main_samview] random alignment retrieval only works for indexed BAM or CRAM files.

The .crai is going to be a separate file and thus another URL. Am I correct in assuming that there is either no test.crai in your bucket or that the .crai being a different URL would require it's own generated pre-signed URL?

@delagoya
Copy link

So samtools will look for a local index file. You can use boto to download
the .crai then issue the above command. Not perfect, but workable.

The point is that any system is going to have weird protocols. Perhaps that
is a case for the htslib plugins, but my head is screaming "feature creep"
once you get into authentication and authorization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants