Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support S3 urls for public and/or privete data. (Does IGV Windows app support S3 urls NOT https) #1093

Closed
markotitel opened this issue Jan 24, 2022 · 19 comments

Comments

@markotitel
Copy link

Hi,

It is confusing on wheter IGV supports S3 urls besides HTTP file location urls?

I am trying with the latest December IGV version and simply test with PUBLIC S3 bam file using S3 location URL.
Example BAM s3 URL: s3://1000genomes-dragen/data/dragen-3.5.7b/hg38_altaware_nohla-cnv-anchored/HG00096/HG00096.bam

@jrobinso
Copy link
Contributor

No, as noted in the README IGV does not resolve s3 urls to https for public bams. I don't know what would be involved in supporting that, but will rename this question and treat it as a request.

The https url for that particular bam is below. Other files in that dataset can probably be accessed in the same way, that is replacing s3://1000genomes-dragen with https://1000genomes-dragen.s3.us-west-2.amazonaws.com.

https://1000genomes-dragen.s3.us-west-2.amazonaws.com/data/dragen-3.5.7b/hg38_altaware_nohla-cnv-anchored/HG00096/HG00096.bam

@jrobinso jrobinso changed the title Does IGV Windows app support S3 urls NOT https Support S3 urls for public data. (Was Does IGV Windows app support S3 urls NOT https) Jan 25, 2022
@markotitel markotitel changed the title Support S3 urls for public data. (Was Does IGV Windows app support S3 urls NOT https) Support S3 urls for public data. (Does IGV Windows app support S3 urls NOT https) Jan 26, 2022
@markotitel
Copy link
Author

markotitel commented Jan 26, 2022

Thank you for taking time to answer so quickly.

Thing is that IGV is used by many companies and individuals world wide.
Most of the time IGV will be used with PRIVATE buckets, and in addition to that most likely the buckets would be KMS encrypted.

It would be most excellent if IGV could have a "field" in the settings for access key and secret access key which could then be used to open S3 private objects by using S3 type URL.

Unfortunately I am not Java dev. But in Python this would not take too much effort to do.

Reason why I raised this issue is to see does this makes sense at all. = )

I will gladly find a Java dev, pay him so he can issu a PR for you guys to check.

@markotitel markotitel changed the title Support S3 urls for public data. (Does IGV Windows app support S3 urls NOT https) Support S3 urls for public and/or privete data. (Does IGV Windows app support S3 urls NOT https) Jan 26, 2022
@markotitel
Copy link
Author

I have EDITed the title to be more clear on the intention.

@jrobinso
Copy link
Contributor

@markotitel So far I can't find any way to support automated conversion of public S3 -> https urls without using the AWS libraries (the SDK), which in turn would require the IGV user to have an AWS account and keys, etc. The mapping of google buckets to https is straightforward and documented, but I can't find the equivalent for S3.

You can set up IGV for access to private URLs, follow the links at the bottom of the REAMDE to the OMCCR sites that describe the process.

@markotitel
Copy link
Author

Ofcorse user would have to have a bucket.

But if the owner of the bucket provides the keys to the user. Then the user will be able to run IGV and use that private bucket.

Instead of doing range http request do s3GetObjet(Key='', Range='') if s3://

But yeah. This is opensource project and I will try to find a guy to do this for me liek a freelance job. And we'll post a PR for this small feature.

@jrobinso
Copy link
Contributor

@markotitel Sorry I misread your suggestion, we could explore a solution that requires use of access key and secret access key as defined here: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html#get-started-setup-credentials. Sorry the first time I read your suggestion my brain saw oAuth access key which we would not store persistently.

https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html#get-started-setup-credentials

I was searching for a solution that did not require the SDK, and thus open to everyone. Let me look at this again using the SDK. If you know how you would do this in python post the code here, the Java is probably very similar.

@brainstorm
Copy link
Contributor

brainstorm commented Feb 16, 2022

If a presigned S3 url is public, you can access via https:// as-is, @markotitel. Then you can just open the resource via Open->URL, right @jrobinso?... if that's not possible today, then it just needs to be fixed, but please do not introduce breakage in IGV's S3 access model like allowing users to enter private IAM keys, use S3 presigning instead:

https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html

It would be most excellent if IGV could have a "field" in the settings for access key and secret access key which could then be used to open S3 private objects by using S3 type URL.

Please don't implement this. There's no need to manually put sensitive private keys on IGV, this insecure practice is strongly discouraged as I mentioned repeatedly and also on #1084 (comment). Not to mention introducing a "permissions priority" where the manual IAM keys take precedence over the centrally managed Cognito mechanism, potentially causing unintended unauthorized accesses to private data.

But if the owner of the bucket provides the keys to the user. Then the user will be able to run IGV and use that private bucket.

Again, there's no need to go with this mechanism, as soon as the person who would like to generate data with you generates a S3 presigned url, there's no need to exchange secure keys. The recipient can read the data via the resulting signed S3 presigned URL. That presigned URL will be accessible via a time-gated mechanism already builtin in presigned urls.

I.e, via aws cli you can easily presign URLs like so (I'm presigning a 'public object' that doesn't exist, just to demo and to not incur egress charges on our organization):

% aws s3 presign s3://bucket/public_object
https://bucket.s3.ap-southeast-2.amazonaws.com/public_object?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIA4IXYHPYNFIJX5D55%2F20220216%2Fap-southeast-2%2Fs3%2Faws4_request&X-Amz-Date=20220216T013834Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEGoaDmFwLXNvdXRoZWFzdC0yIkcwRQIhAIDLJ7ywG7xvvWQ3VmTgTjIQrM6UUJvCO3hyb4Oxo5uFAiAudo3nUVInWQexYHfMcxIUSqS%2B2mUI9V65xd3JBmQvByqdAwiz%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F8BEAIaDDg0MzQwNzkxNjU3MCIMFPZ6HFK02ZewPb4kKvECAJxm%2FFbYXdr3d93lLudHqySDrV19QQp1Cp9IBx9uBMwSQkcI7KVCDe0%2FloVUwsp6KKOul2bXJX6GhlnpE8xMvDn0NB1J9C87r4w6BvZfVlj6%2Fqf39avu%2Bn9QvbrQWwuHGHtiL4m402Md7X52yhPBhtaAIUwzaYW0ZEUt%2Bx69C5CdOAk%2BrI0m2r0vaMQlC0y0cX4iM%2FUnvJdPM8TK9uW8H1DEWDdjTQnmjhz77m%2FCW8025fLBdM6ysQXiL1FuEXXKRsIqUfyuesCQiyG2ar%2FLurrbV8gdv2PjKFGzmp0NLyO2tEmIPPnvU6UIhinKIz4D%2BfbNffANGg%2BRry%2BOedlhyRvoJ616EzeTa5izFWFFL6mvIp0HSlnmQVFy8NV4rY1oMbJ9g3dkVXQndOWZq6%2BDug8GfmHq57b0EaUXVbvHCrKdMT7gZLKp8MSzpr6MDJr5OWl2IPc7YGlkrx6xsZmA5AZ3KUy3dyGCfsOva0sq2ghlMJqrsZAGOqYB9EiWMpP5etMgHkN%2BnmkB59C7J41WkqOIGUhAU4Jr75QvX0XDVUQDSKGfDWkiAXD8F3O0kdjFTO0cWaCldYKXUIYckz1MeIEJsL8N7AjffLfzKiuEDZ19I5VjQLiQKzoT7zqO6mCTXKiLD6TG1Z5jCzsCWOnCwte6a7CDwGLFJjhsSXIOFOJg2yl%2FTrYCGuOo7Zw5dawdsihwRO2lidYufyAcLberTw%3D%3D&X-Amz-Signature=0e6de0affe57ec130432e5d331290500b79a10e14ecf5a777b3d943b881f418a

The AWS SDK follows a similar logic.

/cc @ohofmann @reisingerf

@jrobinso
Copy link
Contributor

@brainstorm The subject of this issue is access to public objects. I hoped to find a pattern for translating s3 -> https translation for public files. However replacing S3 with https:// as is did not work for me. The https:// urls generally have the region in them, the S3 do not, in any event just replacing "s3://" with "https://" did not appear to work. If you think it should I would appreciate some tips.

I think its possible to get the https:// url for a public s3:// object with the aws sdk. I haven't had time to investigate that yet, if it is possible we could use that mechanism if the user has an aws account and has set up their keys for aws command line access. You would not enter those keys into igv, we would just use them if they are there. @brainstorm If its your contention that is not secure in some way then the aws command line tools are not secure.

@brainstorm
Copy link
Contributor

brainstorm commented Feb 16, 2022

However replacing S3 with https:// as is did not work for me. The https:// urls generally have the region in them, the S3 do not, in any event just replacing "s3://" with "https://" did not appear to work. If you think it should I would appreciate some tips.

Doesn't work that way, the user has to be auth'd via his/her AWS account on the CLI tool and presign it there, even if it's a public object... the fact that is public doesn't remove region and egress billing, for instance.

I think its possible to get the https:// url for a public s3:// object with the aws sdk.

Yes, use this from the AWS Java2 SDK or this SO post.

@brainstorm If its your contention that is not secure in some way then the aws command line tools are not secure.

False. See earlier log4j issue vs CLI: if IGV is vulnerable, the CLI doesn't necessarily have to be. Now, if you refer to the whole computer being wormed/compromised, then you are right. Also, IAM keys do not provide time-gated access to resources, Cognito provided access creds and presigned URLs do.

A fair bit of endpoint (laptop) security is about threat models and their mitigation, so let's not mix things and we'll be safer.

@jrobinso
Copy link
Contributor

@brainstorm I agree, let's not mix things. No one is talking about IAM keys, unless my nomenclature is wrong as it well might be. I'm simply talking about using the AWS SDK, which we already do, if the user has enabled that as described here: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html#get-started-setup-credentials

@brainstorm
Copy link
Contributor

brainstorm commented Feb 16, 2022

@brainstorm I agree, let's not mix things. No one is talking about IAM keys, unless my nomenclature is wrong as it well might be. I'm simply talking about using the AWS SDK, which we already do, if the user has enabled that as described here: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html#get-started-setup-credentials

Those are IAM keys, avoid them in user-facing applications as much as possible. I don't even use them on the CLI (we use SSO) and for instance the AWS mobile app also deprecated them years ago for the aforementioned reasons, they are a security hazard.

@jrobinso
Copy link
Contributor

@brainstorm OK, so my nomenclature is wrong. Im going to close this because sharing https:// urls to public objects is not difficult, we do it for example, and there are a lot of higher priority issues open now. This is trivial with Google Cloud Storage which I am more familiar with.

@markotitel
Copy link
Author

markotitel commented Apr 25, 2022

@brainstorm I agree, let's not mix things. No one is talking about IAM keys, unless my nomenclature is wrong as it well might be. I'm simply talking about using the AWS SDK, which we already do, if the user has enabled that as described here: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html#get-started-setup-credentials

Those are IAM keys, avoid them in user-facing applications as much as possible. I don't even use them on the CLI (we use SSO) and for instance the AWS mobile app also deprecated them years ago for the aforementioned reasons, they are a security hazard.

Sorry for reviving. But the topic went to a completely wrong direction.
Using IAM keys is completely fine and secure. As long as permissions are properly defined.
I am curious why do you say this is insecure?
When a user downloads IGV application. There would be no keys anywhere. One can use IGV app fine. Then if I want to share a key with my users I can do so.

Imagine IGV is used in corporate environment.

  • Each user has Windows laptop.
  • User logs in and automatically will get IAM credentials exported to windows system env vars.
  • IGV Java SDK for AWS will automatically detect and use the keys.
    What is wrong with this?

Additionally Cognito approach is another option for someone who wants to setup something totally complicated and unnecessary. Under the hood principle is the same.

Why not have something like this? Btw, I found a freelancer and paid him to add this small feature to existing codebase. I am not java developer, but common sense pushes me to discuss more.
In addition to below, if there are system ENV vars available IGV Java AWS SDK would automagically resolve those and use them. This is perfect if you ask me. AWS SDK is meant to be used like this.
What is not clear from the docs?
https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html

image

@jrobinso
Copy link
Contributor

jrobinso commented Apr 25, 2022

It's an excellent question, @brainstorm as succinctly as you possibly can, what is your objection if any to support the credentials chain documented here?

https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html

In particular

The default credential provider chain looks for credentials in this order:

1. Environment variables-AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. The AWS SDK for Java uses the [EnvironmentVariableCredentialsProvider](https://docs.aws.amazon.com/sdk-for-java/v1/reference/com/amazonaws/auth/EnvironmentVariableCredentialsProvider.html) class to load these credentials.

2. Java system properties-aws.accessKeyId and aws.secretKey. The AWS SDK for Java uses the [SystemPropertiesCredentialsProvider](https://docs.aws.amazon.com/sdk-for-java/v1/reference/com/amazonaws/auth/SystemPropertiesCredentialsProvider.html) to load these credentials.

3. Web Identity Token credentials from the environment or container.

4. The default credential profiles file- typically located at ~/.aws/credentials (location can vary per platform), and shared by many of the AWS SDKs and by the AWS CLI. The AWS SDK for Java uses the [ProfileCredentialsProvider](https://docs.aws.amazon.com/sdk-for-java/v1/reference/com/amazonaws/auth/profile/ProfileCredentialsProvider.html) to load these credentials.

You can create a credentials file by using the aws configure command provided by the AWS CLI, or you can create it by editing the file with a text editor. For information about the credentials file format, see [AWS Credentials File Format](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html#credentials-file-format).

If I interpret this correctly, if those variables are defined by any of the means listed above the Java SDK will use them, no need to expose them in the IGV preference UI. IGV itself doesn't need to know anything about them.

@brainstorm
Copy link
Contributor

brainstorm commented Apr 26, 2022

@jrobinso your approach of NOT exposing those keys in the preferences UI (and using JDK env) is less wrong and preferred against what Marko suggests in his screenshot.

Succinctly, one of my main objections is the lack of time-gated and automated (managed) key rotation.

The way Marko seems to suggest solving his problem is generating IAM user keys per user and dismiss the lifecycle of those (which Cognito guarantees and handles for free, it's an AWS managed service).

@markotitel Cognito does this for you in several corporate environments that embrace SSO instead of issuing static IAM keys (considered a bad practice in 2022). But if you want to implement your own "totally complicated and unnecessary" IAM key management yourself, go nuts. Hint: AWS STS does this for you too, which is also included transparently in AWS Cognito so you don't have to care about implementing it yourself.

Since this feature seems to be on its way of inclusion by insistence, Marko please at least check this 3 part series IAM security issues mitigation guide: https://www.fugue.co/blog/locking-down-the-security-of-aws-iam

Also, @markotitel, as I don't know what's the security posture nor maturity stage of your "corporate environment", please check this guide out and address any shortcomings you might find that might be present in your org: https://summitroute.com/downloads/aws_security_maturity_roadmap-Summit_Route.pdf

Good luck!

@brainstorm
Copy link
Contributor

brainstorm commented Apr 26, 2022

Spoiler from the security maturity roadmap document I shared:

Screen Shot 2022-04-26 at 11 36 51 am

@jrobinso
Copy link
Contributor

So what I am proposing is we just support the credential chain documented in AWS, that is if the AWS JDK picks up credentials from magic places or environment variables so be it, IGV is not really involved and knows nothing about these keys. I'm not sure why this isn't working automatically actually.

@brainstorm
Copy link
Contributor

So what I am proposing is we just support the credential chain documented in AWS, that is if the AWS JDK picks up credentials from magic places or environment variables so be it, IGV is not really involved and knows nothing about these keys. I'm not sure why this isn't working automatically actually.

Sure. Most probably it isn't working automatically because most probably the AWS DefaultProviderChain is not configured as such in IGV, so Marko would have to, in broad steps:

  1. Remove the UI bits/code.
  2. Tweak the creds provider chain accordingly in the code.
  3. Make sure that the existing Cognito functionality is still operational.

@jrobinso
Copy link
Contributor

Its a little strange discussing the PR over here on this closed issue. I need to get my head around this myself, I am hoping there will be just a few lines of code or maybe just configuration to enable the AWS DefaultProviderChain in the JDK. If it can be done by configuration alone that would be great. In any even I need to get my head around it first.

I will add a few notes to the PR but let's move the discussion, if any is needed, to there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants