IGV should always include gene annotation track #1494

mccalluc · 2016-10-20T14:19:20Z

Shannon or Nils will provide more information.

ngehlenborg · 2016-11-03T13:49:42Z

This should be tested for human (hg19), mouse (mm10), and zebrafish (danrer7).

mccalluc · 2016-11-09T00:06:42Z

Added the relevant bits to https://github.com/parklab/refinery-platform/tree/mccalluc/refgene-bed-igv ... but I think now it would work better as its own repo, since there aren't really any dependencies between it and the main project.

ngehlenborg · 2016-11-09T13:12:56Z

I agree that a separate repo is a good idea. Can you create one in the "refinery-platform" organization?

ngehlenborg · 2016-11-09T13:16:47Z

@sjhosui: For now we will use the RefGene annotations from UCSC but that might not be the preferred annotation for all users.

Also - what should we use for gene identifiers?

sjhosui · 2016-11-09T14:09:43Z

I would go with the default options you get with the standalone IGV - refseq genes with gene symbols displayed.

ngehlenborg · 2016-11-09T14:17:19Z

Unfortunately we were unable to find that information in the context of IGV for the species that we want to support.

Were would you go to get that specific information for the species that we want to support?

sjhosui · 2016-11-09T18:36:21Z

I believe IGV stores that information in a .genome file. Is that how IGV web takes it? You can create that file within the standalone IGV. Is this what you're asking?

mccalluc · 2016-11-09T20:44:07Z

Pete helped me find it: It's in refGene.txt, but I just had over looked it. The script is now at https://github.com/refinery-platform/get-reference-genomes, and it has some tests so we can be sure we're getting things in the right format.

Still need to add configs in the refinery JSON that gets generated.

mccalluc · 2016-11-09T21:04:29Z

Here's the results of processing the data we can get from UCSC:

get-reference-genomes$ head /tmp/genomes/danrer7/refGene.bed
Zv9_NA110   3369    25536   tec
Zv9_NA119   1161    3883    ppp4r1
Zv9_NA122   0   13196   etv6
Zv9_NA123   39150   44920   zgc:165507
Zv9_NA15    18564   21757   plgrkt
Zv9_NA154   29657   32056   trim32
Zv9_NA157   1146    3606    commd10
Zv9_NA165   33  404 zgc:66388
Zv9_NA18    71  3551    pde8a
Zv9_NA192   15876   22659   mrps16
get-reference-genomes$ head /tmp/genomes/mm10/refGene.bed
chr1    3214481 3671498 Xkr4
chr1    4290845 4409241 Rp1
chr1    4343506 4360314 Rp1
chr1    4490927 4497354 Sox17
chr1    4490927 4497354 Sox17
chr1    4490927 4497354 Sox17
chr1    4490927 4497354 Sox17
chr1    4490927 4497354 Sox17
chr1    4773199 4785726 Mrpl15
chr1    4773199 4785726 Mrpl15

Should more columns be included, or should the duplicate rows be removed?
Do the danRer chromosome names make sense?

ngehlenborg · 2016-11-10T14:07:15Z

We also need the strand information (usually + or -). I think the default columns in a bed file are chr, start, end, but often there are the following additional columns: score, strand, name.

For the genome annotation is might also be useful to have not only gene start and gene end but also information about the intron/exon structure, which can be embedded in the last three columns of the BED file (assuming we can readily get that information from UCSC): https://genome.ucsc.edu/FAQ/FAQformat#format1 (see columns 10, 11, 12)

I will put this on the agenda for the meeting today.

mccalluc · 2016-11-10T18:57:19Z

@ngehlenborg : copied your last comment over to refinery-platform/get-reference-genomes#4, since that's what determines what data is on s3.

Ilya will be merging the PR, so I'll close it for now.

mccalluc self-assigned this Oct 20, 2016

mccalluc added enhancement ui visualization labels Oct 20, 2016

mccalluc added this to the Next milestone Oct 20, 2016

ngehlenborg modified the milestones: v1.6.0, Next Nov 3, 2016

mccalluc added a commit that referenced this issue Nov 10, 2016

Add gene annotation track to IGV. Fix #1494.

63ba0dd

mccalluc closed this as completed Nov 10, 2016

hackdna pushed a commit that referenced this issue Nov 10, 2016

Add gene annotation track to IGV. Fix #1494. (#1516)

3d26a94

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IGV should always include gene annotation track #1494

IGV should always include gene annotation track #1494

mccalluc commented Oct 20, 2016

ngehlenborg commented Nov 3, 2016

mccalluc commented Nov 9, 2016

ngehlenborg commented Nov 9, 2016

ngehlenborg commented Nov 9, 2016

sjhosui commented Nov 9, 2016 •

edited by ngehlenborg

ngehlenborg commented Nov 9, 2016

sjhosui commented Nov 9, 2016

mccalluc commented Nov 9, 2016

mccalluc commented Nov 9, 2016 •

edited

ngehlenborg commented Nov 10, 2016

mccalluc commented Nov 10, 2016

IGV should always include gene annotation track #1494

IGV should always include gene annotation track #1494

Comments

mccalluc commented Oct 20, 2016

ngehlenborg commented Nov 3, 2016

mccalluc commented Nov 9, 2016

ngehlenborg commented Nov 9, 2016

ngehlenborg commented Nov 9, 2016

sjhosui commented Nov 9, 2016 • edited by ngehlenborg

ngehlenborg commented Nov 9, 2016

sjhosui commented Nov 9, 2016

mccalluc commented Nov 9, 2016

mccalluc commented Nov 9, 2016 • edited

ngehlenborg commented Nov 10, 2016

mccalluc commented Nov 10, 2016

sjhosui commented Nov 9, 2016 •

edited by ngehlenborg

mccalluc commented Nov 9, 2016 •

edited