Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IGV should always include gene annotation track #1494

Closed
mccalluc opened this issue Oct 20, 2016 · 11 comments
Closed

IGV should always include gene annotation track #1494

mccalluc opened this issue Oct 20, 2016 · 11 comments

Comments

@mccalluc
Copy link
Member

Shannon or Nils will provide more information.

@mccalluc mccalluc self-assigned this Oct 20, 2016
@mccalluc mccalluc added this to the Next milestone Oct 20, 2016
@ngehlenborg
Copy link
Contributor

This should be tested for human (hg19), mouse (mm10), and zebrafish (danrer7).

@ngehlenborg ngehlenborg modified the milestones: v1.6.0, Next Nov 3, 2016
@mccalluc
Copy link
Member Author

mccalluc commented Nov 9, 2016

Added the relevant bits to https://github.com/parklab/refinery-platform/tree/mccalluc/refgene-bed-igv ... but I think now it would work better as its own repo, since there aren't really any dependencies between it and the main project.

@ngehlenborg
Copy link
Contributor

I agree that a separate repo is a good idea. Can you create one in the "refinery-platform" organization?

@ngehlenborg
Copy link
Contributor

@sjhosui: For now we will use the RefGene annotations from UCSC but that might not be the preferred annotation for all users.

Also - what should we use for gene identifiers?

@sjhosui
Copy link
Collaborator

sjhosui commented Nov 9, 2016

I would go with the default options you get with the standalone IGV - refseq genes with gene symbols displayed.

@ngehlenborg
Copy link
Contributor

Unfortunately we were unable to find that information in the context of IGV for the species that we want to support.

Were would you go to get that specific information for the species that we want to support?

@sjhosui
Copy link
Collaborator

sjhosui commented Nov 9, 2016

I believe IGV stores that information in a .genome file. Is that how IGV web takes it? You can create that file within the standalone IGV. Is this what you're asking?

@mccalluc
Copy link
Member Author

mccalluc commented Nov 9, 2016

Pete helped me find it: It's in refGene.txt, but I just had over looked it. The script is now at https://github.com/refinery-platform/get-reference-genomes, and it has some tests so we can be sure we're getting things in the right format.

Still need to add configs in the refinery JSON that gets generated.

@mccalluc
Copy link
Member Author

mccalluc commented Nov 9, 2016

Here's the results of processing the data we can get from UCSC:

get-reference-genomes$ head /tmp/genomes/danrer7/refGene.bed
Zv9_NA110   3369    25536   tec
Zv9_NA119   1161    3883    ppp4r1
Zv9_NA122   0   13196   etv6
Zv9_NA123   39150   44920   zgc:165507
Zv9_NA15    18564   21757   plgrkt
Zv9_NA154   29657   32056   trim32
Zv9_NA157   1146    3606    commd10
Zv9_NA165   33  404 zgc:66388
Zv9_NA18    71  3551    pde8a
Zv9_NA192   15876   22659   mrps16
get-reference-genomes$ head /tmp/genomes/mm10/refGene.bed
chr1    3214481 3671498 Xkr4
chr1    4290845 4409241 Rp1
chr1    4343506 4360314 Rp1
chr1    4490927 4497354 Sox17
chr1    4490927 4497354 Sox17
chr1    4490927 4497354 Sox17
chr1    4490927 4497354 Sox17
chr1    4490927 4497354 Sox17
chr1    4773199 4785726 Mrpl15
chr1    4773199 4785726 Mrpl15
  • Should more columns be included, or should the duplicate rows be removed?
  • Do the danRer chromosome names make sense?

@ngehlenborg
Copy link
Contributor

We also need the strand information (usually + or -). I think the default columns in a bed file are chr, start, end, but often there are the following additional columns: score, strand, name.

For the genome annotation is might also be useful to have not only gene start and gene end but also information about the intron/exon structure, which can be embedded in the last three columns of the BED file (assuming we can readily get that information from UCSC): https://genome.ucsc.edu/FAQ/FAQformat#format1 (see columns 10, 11, 12)

I will put this on the agenda for the meeting today.

@mccalluc
Copy link
Member Author

@ngehlenborg : copied your last comment over to refinery-platform/get-reference-genomes#4, since that's what determines what data is on s3.

Ilya will be merging the PR, so I'll close it for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants