Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hosted file moved? #19

Closed
yonghaoy opened this issue Jan 31, 2023 · 11 comments
Closed

Hosted file moved? #19

yonghaoy opened this issue Jan 31, 2023 · 11 comments

Comments

@yonghaoy
Copy link

Hello, IGV.js is broken recently because https://igv.genepattern.org/genomes/seq/hg38/hg38.fa.fai is blocked by our Content Security Policy.
It is expected because we have Content Security Policy allowlist and that allowlist does not contain that url.
IGV urls in our current allowlist are

    "https://s3.amazonaws.com/igv.broadinstitute.org/",
    "https://s3.amazonaws.com/igv.org.genomes/",
    "https://portals.broadinstitute.org/webservices/igv/",
    "https://igv.org/genomes/",

I am wondering if igv.js recently move their hosted files to igv.genepattern.org?
I saw @jrobinso mentioned the moving here: igvteam/igv.js#1570 (comment)
but I am not sure if that is the same thing.

Thanks!

@jrobinso
Copy link
Contributor

Yes, we will be slowing moving all our data to that host. If you want to absolutely protect yourself against data moves you could host genomes on your own server, but we don't move data often.

@yonghaoy
Copy link
Author

Thanks @jrobinso
To confirm, is hg38.fa.fai moved to igv.genepattern.org already?
I am investigating why igv is broken recently, and want to confirm adding https://igv.genepattern.org to our CSP allowlist can fix this issue.
Thanks
Yonghao

@jrobinso
Copy link
Contributor

Yes. But again, if this is an issue for your organization you might consider hosting the files you need within your organization locally, we host data for IGV as a convenience but it is not required you use our hosted data. Costs are becoming an issue and files could move again in the future. Some instructions are here: https://github.com/igvteam/igv/wiki/Hosting-Genomes

Another important host, and I'm surprised IGV works at all without this one whitelisted is

https://data.broadinstitute.org/igvdata

Also, the following might well be used

https://igv-genepattern-org.s3.amazonaws.com

@yonghaoy
Copy link
Author

yonghaoy commented Jan 31, 2023

Thanks for getting this back.
I am not familiar with IGV and how to use it. We are developing tools for scientists who are heavily using IGV.
The python code snippet that stop working(and I don't know how/where to set host data) is

import igv

b = igv.Browser({"genome": "hg38"})
b.load_track(
    {
        "name": "wgs_1000004",
        "url": "wgs_1000004.cram",
        "format": "cram",
        "type": "alignment",
        "indexURL": "wgs_1000004.cram.crai",
        "indexed": True
    })

b.show()

RE: https://data.broadinstitute.org/igvdata

The server(https://app.terra.bio/) we are developing is hosted by Broad. And I think all Broad hostnames are allowed.

RE: https://igv-genepattern-org.s3.amazonaws.com/

Thanks! I will also add that into our allowlist...

@jrobinso
Copy link
Contributor

Ahh, you are actually using https://github.com/igvteam/igv-notebook then. OK, sorry, my suggestion to consider hosting data on your own servers still apply, however the instructions I pointed to was for IGV desktop.

What version of igv-notebook are you using? The most recent is 0.4.4, although for all practical purposes this is ready to release as 1.0.0. https://pypi.org/project/igv-notebook/

The configuration looks suspect, in particular the url and indexURL are not qualifed. I'm not sure how this is working, but if it was working before then the host name change might be the problem.

I'm going to transfer this issue to igv-notebook.

@jrobinso jrobinso transferred this issue from igvteam/igv.js Jan 31, 2023
@yonghaoy
Copy link
Author

Actually we are using igv-jupyter https://github.com/g2nb/igv-jupyter which wraps igv.js by way of igv-notebook.

@jrobinso
Copy link
Contributor

jrobinso commented Jan 31, 2023

OK. igv-jupyter is focused on the needs of the g2nb project, its not something I personally have any involvement in. I don't understand how those url property values are working but maybe it uses some magic of some kind. Anyway igv-jupyter project would be the place to discuss that.

@yonghaoy
Copy link
Author

yonghaoy commented Apr 17, 2023

Hey @jrobinso need to reopen this issue again as we are broken by host move(and I suspect that is caused by this genomes.json updates).
Now our content security policy complains about
1: https://igv-genepattern-org.s3.amazonaws.com/ URL not in our CSP allowlist
2: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/ncbiRefSeq.txt.gz is not in our CSP allowlist.

I checked this reference https://s3.amazonaws.com/igv.org.genomes/genomes.json and did see those two urls are now used by hg38.
And you also mentioned in this issue: that you changed to refer to UCSC url directly.
I want to confirm the 1: breakage is expected, 2 we just need to add the two url in our CSP allowlist? 3: How can we check the changelog for genome references?

Thanks

@jrobinso
Copy link
Contributor

I don't understand question (1), but urls to data will change from time to time, these are not noted in change logs as they are not part of the application. As I suggested above you should consider hosting these files yourselves, if you want absolute control, we provide them as a service but it is not mandatory to use our (or UCSC's) hosted files. That said these do not change often.

In the future we will be doing more direct references to UCSC hosted files, which will include the host you reference and possibly others in the UCSC domains. I had already list https://igv-genepattern-org.s3.amazonaws.com/ earlier.

@yonghaoy
Copy link
Author

yonghaoy commented Apr 17, 2023

Thanks @jrobinso .
For 1, we were just trying to understand why IGV breaks this time, and i want to confirm the breakage is caused by "IGV moves its hosted files (specified in that genomes.json)".

For hosting our reference, we were thinking about that.
For now, I got it working by using old reference url(and seems the next step would be hosting those references, then replace those URLs):

import igv_notebook

igv_notebook.init()
igv_browser = igv_notebook.Browser(
    {
        "reference": {
            "id": "custom_hg38",
            "name": "Custom HG38 reference that works in Terra",
            "fastaURL": "https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa",
            "indexURL": "https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa.fai",
            "aliasURL": "https://s3.amazonaws.com/igv.org.genomes/hg38/hg38_alias.tab",
            "tracks": [
                {
                    "name": "Refseq Genes",
                    "format": "refgene",
                    "url": "https://s3.amazonaws.com/igv.org.genomes/hg38/ncbiRefSeq.sorted.txt.gz",
                    "indexed": "false",
                    "removable": "false",
                    "order": 1000000,
                    "infoURL": "https://www.ncbi.nlm.nih.gov/gene/?term=$$"
      }
            ]
        },
        "locus": "chr22:24,376,277-24,376,350"
    })

igv_browser.load_track(
    {
        "name": "1000004 CRAM",
        "url": "wgs_1000004.cram",
        "format": "cram",
        "type": "alignment",
        "indexURL": "wgs_1000004.cram.crai",
        "indexed": True
    })
igv_browser.show()

@jrobinso
Copy link
Contributor

Yes that is what I recommend if you want full control over the URLs. Specifically do not use the "genome: id" shortcut, but fully specify everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants