Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracks in Custom genome #1306

Closed
YeHW opened this issue Mar 28, 2023 · 9 comments
Closed

Tracks in Custom genome #1306

YeHW opened this issue Mar 28, 2023 · 9 comments

Comments

@YeHW
Copy link

YeHW commented Mar 28, 2023

Hi, igv team.

I've got human genome fasta file (human 1kg_v37) and want to setup a custom genome.json for that to use with igv. I've checked wiki and b37_1kg.json shipped with igv and had some questions about the fields in the json file.

  1. In b37_1kg.json, "cytobandURL": "https://s3.amazonaws.com/igv.org.genomes/1kg_v37/b37_cytoband.txt", what is the origin of this cytoband file? I found that the cytoband for GRCh38 is from UCSC. Is it the same case for 1kg_v37?
  2. In b37_1kg.json, how is "url": "https://s3.amazonaws.com/igv.org.genomes/hg19/ncbiRefSeq.sorted.txt.gz" in the Refseq Genes track sorted and tabixed? I found there's only a unsorted ncbiRefSeq.txt.gz in ucsc's ftp site.
  3. In the above Refseq Genes track, there is only 1 transcript (NM_002944.2) for ROS1 gene, but in the ncbi GDV there are 3 transcripts (NM_002944.3, NM_001378902.1, NM_001378891.1) for ROS1 gene (because it's using the latest Annotation Release 105.20220307). How can I build a new Refseq Genes track to be used with igv using the latest Annotation Release from Refseq (105.20220307 as of writing)?

Thank you!

@maximilianh
Copy link

maximilianh commented Mar 28, 2023 via email

@jrobinso
Copy link
Contributor

Thanks @maximilianh , you are correct, the cytoband files all come from UCSC.

Tabix indexing is optional, and not really necessary for tracks < 20MB in total size. I would suggest skipping this, but if you have a need for it documentation is here: https://www.htslib.org/doc/tabix.html.

I assume you have reviewed the documentation (https://github.com/igvteam/igv.js/wiki) that describes genomes.json and other files.

@YeHW
Copy link
Author

YeHW commented Mar 30, 2023

  1. In b37_1kg.json https://s3.amazonaws.com/igv.org.genomes/1kg_v37/b37_1kg.json, "cytobandURL": "https://s3.amazonaws.com/igv.org.genomes/1kg_v37/b37_cytoband.txt", what is the origin of this cytoband file? I found that the cytoband for GRCh38 is from UCSC. Is it the same case for 1kg_v37? I am not with IGV, but I bet the cytoband file also comes from UCSC. 1. In b37_1kg.json https://s3.amazonaws.com/igv.org.genomes/1kg_v37/b37_1kg.json, how is "url": " https://s3.amazonaws.com/igv.org.genomes/hg19/ncbiRefSeq.sorted.txt.gz" in the Refseq Genes track sorted and tabixed? I found there's only a unsorted ncbiRefSeq.txt.gz in ucsc's ftp https://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/ site. 1. In the above Refseq Genes track, there is only 1 transcript (NM_002944.2) for ROS1 gene, but in the ncbi GDV https://www.ncbi.nlm.nih.gov/genome/gdv/browser/genome/?id=GCF_000001405.25 there are 3 transcripts for ROS1 gene (because it's using the latest Annotation Release 105.20220307). How can I build a new Refseq Genes track to be used with igv using the latest Annotation Release from Refseq ( 105.20220307 https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/annotation_releases/105.20220307/GCF_000001405.25_GRCh37.p13/ as of writing)? UCSC also uses 105.20220307 on hg19 and there is only one location
    for NM_002944.3 chr6 - 117608515 - 117747105. The GDV also has only one location for NM_002944.3. On hg38, and T2T, only one location in GDV. On the UCSC search, for hg38, there are three locations, but on three different tracks (UCSC's mapping, NCBI's mapping and Gencode)

    Thank you! — Reply to this email directly, view it on GitHub <Tracks in Custom genome #1306>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TKZFO6DUZVHP53KXVDW6KSDPANCNFSM6AAAAAAWKIGFMU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Thanks @maximilianh and @jrobinso.

Just want to make sure about one thing:
I spot three NM_ transcripts for ROS1 gene in GDV:
image

I guess it's because there are three NM_ transcripts in the RefSeq release 105.20220307. I downloaded GCF_000001405.25_GRCh37.p13_genomic.gtf.gz from the above ftp site, and checked:

curl -s 'https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/annotation_releases/105.20220307/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.gtf.gz' \
  | zcat GCF_000001405.25_GRCh37.p13_genomic.gtf.gz \
  | rg '(gene_id "ROS1"; transcript_id "NM_.*?")' -or '$1' \
  | sort | uniq

# output
## gene_id "ROS1"; transcript_id "NM_001378891.1"
## gene_id "ROS1"; transcript_id "NM_001378902.1"
## gene_id "ROS1"; transcript_id "NM_002944.3"

If the above speculation is correct, I want to build a new igv track based on RefSeq release 105.20220307. To ahieve that, I need to build a file like ncbiRefSeq.sorted.txt.gz. Could you please help me with this?

Thank you!

@maximilianh
Copy link

maximilianh commented Mar 30, 2023 via email

@jrobinso
Copy link
Contributor

@maximilianh Thanks for the info. Actually I was considering just referencing those URLs directly (e.g. https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/ncbiRefSeq.txt.gz). The only thing I do additionally is tabix index them, but really they aren't that large and the benefit is marginal.

@jrobinso
Copy link
Contributor

@YeHW You do not need to build a file like ncbiRefSeq.sorted.txt.gz. See the user documentation, all common annotation formats are supported, including gff3.

@maximilianh
Copy link

maximilianh commented Mar 30, 2023 via email

@jrobinso
Copy link
Contributor

This is done. @YeHW if you update your genome by selecting "Genomes > Select Hosted Genome" from the menu you will get updated annotation. The assembly you are asking about is in the updated menu as follows

Screen Shot 2023-03-30 at 11 20 44 AM

@YeHW
Copy link
Author

YeHW commented Apr 1, 2023

@maximilianh @jrobinso Thank you! I can see the updated annotation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants