-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracks in Custom genome #1306
Comments
1. In b37_1kg.json
<https://s3.amazonaws.com/igv.org.genomes/1kg_v37/b37_1kg.json>, "cytobandURL":
"https://s3.amazonaws.com/igv.org.genomes/1kg_v37/b37_cytoband.txt",
what is the origin of this cytoband file? I found that the cytoband for
GRCh38 is from UCSC. Is it the same case for 1kg_v37?
I am not with IGV, but I bet the cytoband file also comes from UCSC.
1. In b37_1kg.json
<https://s3.amazonaws.com/igv.org.genomes/1kg_v37/b37_1kg.json>, how
is "url": "
https://s3.amazonaws.com/igv.org.genomes/hg19/ncbiRefSeq.sorted.txt.gz"
in the Refseq Genes track sorted and tabixed? I found there's only a
unsorted ncbiRefSeq.txt.gz in ucsc's ftp
<https://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/> site.
1. In the above Refseq Genes track, there is only 1 transcript
(NM_002944.2) for ROS1 gene, but in the ncbi GDV
<https://www.ncbi.nlm.nih.gov/genome/gdv/browser/genome/?id=GCF_000001405.25>
there are 3 transcripts for ROS1 gene (because it's using the latest
Annotation Release 105.20220307). How can I build a new *Refseq Genes*
track to be used with igv using the latest Annotation Release from Refseq (
105.20220307
<https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/annotation_releases/105.20220307/GCF_000001405.25_GRCh37.p13/>
as of writing)?
UCSC also uses 105.20220307 on hg19 and there is only one location
for NM_002944.3 chr6 - 117608515 - 117747105. The GDV also has only
one location for NM_002944.3. On hg38, and T2T, only one location in GDV.
On the UCSC search, for hg38, there are three locations, but on three
different tracks (UCSC's mapping, NCBI's mapping and Gencode)
…
Thank you!
—
Reply to this email directly, view it on GitHub
<#1306>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACL4TKZFO6DUZVHP53KXVDW6KSDPANCNFSM6AAAAAAWKIGFMU>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thanks @maximilianh , you are correct, the cytoband files all come from UCSC. Tabix indexing is optional, and not really necessary for tracks < 20MB in total size. I would suggest skipping this, but if you have a need for it documentation is here: https://www.htslib.org/doc/tabix.html. I assume you have reviewed the documentation (https://github.com/igvteam/igv.js/wiki) that describes genomes.json and other files. |
Thanks @maximilianh and @jrobinso. Just want to make sure about one thing: I guess it's because there are three NM_ transcripts in the RefSeq release 105.20220307. I downloaded GCF_000001405.25_GRCh37.p13_genomic.gtf.gz from the above ftp site, and checked: curl -s 'https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/annotation_releases/105.20220307/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.gtf.gz' \
| zcat GCF_000001405.25_GRCh37.p13_genomic.gtf.gz \
| rg '(gene_id "ROS1"; transcript_id "NM_.*?")' -or '$1' \
| sort | uniq
# output
## gene_id "ROS1"; transcript_id "NM_001378891.1"
## gene_id "ROS1"; transcript_id "NM_001378902.1"
## gene_id "ROS1"; transcript_id "NM_002944.3" If the above speculation is correct, I want to build a new igv track based on RefSeq release 105.20220307. To ahieve that, I need to build a file like ncbiRefSeq.sorted.txt.gz. Could you please help me with this? Thank you! |
"I spot three NM_ transcripts for ROS1 gene in GDV" - OK, now this is
entirely different. You originally said that there were three locations for
the transcript NM_002944.3. Yes, I think your analysis is correct. The
updated file ncbiRefSeq for 105.20220307 is here
https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/ncbiRefSeq.txt.gz
and as Jim mentioned, you can sort it but that's not required. You should
be able to load the file as-is, if I understood him correctly (but he may
be able to confirm that.
Jim: we're updating these files on a regular schedule now, automatically.
Maybe we can figure out a way to let you know when to update these? We can
send email or ping a URL when we update. We also keep the previous
versions, so you could in theory tag them with the release name and offer a
version history.
…On Thu, Mar 30, 2023 at 11:59 AM Hongwei Ye ***@***.***> wrote:
1. In b37_1kg.json
https://s3.amazonaws.com/igv.org.genomes/1kg_v37/b37_1kg.json,
"cytobandURL": "
https://s3.amazonaws.com/igv.org.genomes/1kg_v37/b37_cytoband.txt",
what is the origin of this cytoband file? I found that the cytoband for
GRCh38 is from UCSC. Is it the same case for 1kg_v37? I am not with IGV,
but I bet the cytoband file also comes from UCSC. 1. In b37_1kg.json
https://s3.amazonaws.com/igv.org.genomes/1kg_v37/b37_1kg.json, how is
"url": "
https://s3.amazonaws.com/igv.org.genomes/hg19/ncbiRefSeq.sorted.txt.gz"
in the Refseq Genes track sorted and tabixed? I found there's only a
unsorted ncbiRefSeq.txt.gz in ucsc's ftp
https://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/ site. 1. In
the above Refseq Genes track, there is only 1 transcript (NM_002944.2) for
ROS1 gene, but in the ncbi GDV
https://www.ncbi.nlm.nih.gov/genome/gdv/browser/genome/?id=GCF_000001405.25
there are 3 transcripts for ROS1 gene (because it's using the latest
Annotation Release 105.20220307). How can I build a new *Refseq Genes*
track to be used with igv using the latest Annotation Release from Refseq (
105.20220307
https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/annotation_releases/105.20220307/GCF_000001405.25_GRCh37.p13/
as of writing)? UCSC also uses 105.20220307 on hg19 and there is only one
location
for NM_002944.3 chr6 - 117608515 - 117747105. The GDV also has only
one location for NM_002944.3. On hg38, and T2T, only one location in GDV.
On the UCSC search, for hg38, there are three locations, but on three
different tracks (UCSC's mapping, NCBI's mapping and Gencode)
… <#m_-8401278165509756520_>
Thank you! — Reply to this email directly, view it on GitHub <#1306
<#1306>>, or unsubscribe
https://github.com/notifications/unsubscribe-auth/AACL4TKZFO6DUZVHP53KXVDW6KSDPANCNFSM6AAAAAAWKIGFMU
. You are receiving this because you are subscribed to this thread.Message
ID: *@*.***>
Thanks @maximilianh <https://github.com/maximilianh> and @jrobinso
<https://github.com/jrobinso>.
Just want to make sure about one thing:
I spot three NM_ transcripts for ROS1 gene in GDV
<https://www.ncbi.nlm.nih.gov/genome/gdv/browser/genome/?id=GCF_000001405.25>
:
[image: image]
<https://user-images.githubusercontent.com/43214065/228794780-46125249-ca93-49c4-bae3-b2a0f8f46ac4.png>
I guess it's because there are three NM_ transcripts in the RefSeq release
105.20220307
<https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/annotation_releases/105.20220307/GCF_000001405.25_GRCh37.p13/>.
I downloaded GCF_000001405.25_GRCh37.p13_genomic.gtf.gz
<https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/annotation_releases/105.20220307/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.gtf.gz>
from the above ftp site, and checked:
curl -s 'https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/annotation_releases/105.20220307/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.gtf.gz' \
| zcat GCF_000001405.25_GRCh37.p13_genomic.gtf.gz \
| rg '(gene_id "ROS1"; transcript_id "NM_.*?")' -or '$1' \
| sort | uniq
# output## gene_id "ROS1"; transcript_id "NM_001378891.1"## gene_id "ROS1"; transcript_id "NM_001378902.1"## gene_id "ROS1"; transcript_id "NM_002944.3"
If the above speculation is correct, I want to build a new igv track based
on RefSeq release 105.20220307
<https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/annotation_releases/105.20220307/GCF_000001405.25_GRCh37.p13/>.
To ahieve that, I need to build a file like *ncbiRefSeq.sorted.txt.gz*.
Could you please help me with this?
Thank you!
—
Reply to this email directly, view it on GitHub
<#1306 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACL4TOCOIT6RNSITZRE4Q3W6VKR3ANCNFSM6AAAAAAWKIGFMU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@maximilianh Thanks for the info. Actually I was considering just referencing those URLs directly (e.g. https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/ncbiRefSeq.txt.gz). The only thing I do additionally is tabix index them, but really they aren't that large and the benefit is marginal. |
@YeHW You do not need to build a file like ncbiRefSeq.sorted.txt.gz. See the user documentation, all common annotation formats are supported, including gff3. |
Yes, that would be perfect. We have never, not once, changed these URLs so
far.
…On Thu, Mar 30, 2023 at 5:37 PM Jim Robinson ***@***.***> wrote:
@YeHW <https://github.com/YeHW> You do not need to build a file like
ncbiRefSeq.sorted.txt.gz. See the user documentation, all common annotation
formats are supported, including gff3.
—
Reply to this email directly, view it on GitHub
<#1306 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACL4TPRG7WWYRBAVJVHYATW6WSDHANCNFSM6AAAAAAWKIGFMU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This is done. @YeHW if you update your genome by selecting "Genomes > Select Hosted Genome" from the menu you will get updated annotation. The assembly you are asking about is in the updated menu as follows |
@maximilianh @jrobinso Thank you! I can see the updated annotation. |
Hi, igv team.
I've got human genome fasta file (human 1kg_v37) and want to setup a custom
genome.json
for that to use with igv. I've checked wiki and b37_1kg.json shipped with igv and had some questions about the fields in the json file."cytobandURL": "https://s3.amazonaws.com/igv.org.genomes/1kg_v37/b37_cytoband.txt"
, what is the origin of this cytoband file? I found that the cytoband for GRCh38 is from UCSC. Is it the same case for 1kg_v37?"url": "https://s3.amazonaws.com/igv.org.genomes/hg19/ncbiRefSeq.sorted.txt.gz"
in theRefseq Genes
track sorted and tabixed? I found there's only a unsortedncbiRefSeq.txt.gz
in ucsc's ftp site.Refseq Genes
track, there is only 1 transcript (NM_002944.2) for ROS1 gene, but in the ncbi GDV there are 3 transcripts (NM_002944.3, NM_001378902.1, NM_001378891.1) for ROS1 gene (because it's using the latest Annotation Release 105.20220307). How can I build a new Refseq Genes track to be used with igv using the latest Annotation Release from Refseq (105.20220307 as of writing)?Thank you!
The text was updated successfully, but these errors were encountered: