New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IGV should always include gene annotation track #1494
Comments
This should be tested for human (hg19), mouse (mm10), and zebrafish (danrer7). |
Added the relevant bits to https://github.com/parklab/refinery-platform/tree/mccalluc/refgene-bed-igv ... but I think now it would work better as its own repo, since there aren't really any dependencies between it and the main project. |
I agree that a separate repo is a good idea. Can you create one in the "refinery-platform" organization? |
@sjhosui: For now we will use the RefGene annotations from UCSC but that might not be the preferred annotation for all users. Also - what should we use for gene identifiers? |
I would go with the default options you get with the standalone IGV - refseq genes with gene symbols displayed. |
Unfortunately we were unable to find that information in the context of IGV for the species that we want to support. Were would you go to get that specific information for the species that we want to support? |
I believe IGV stores that information in a .genome file. Is that how IGV web takes it? You can create that file within the standalone IGV. Is this what you're asking? |
Pete helped me find it: It's in refGene.txt, but I just had over looked it. The script is now at https://github.com/refinery-platform/get-reference-genomes, and it has some tests so we can be sure we're getting things in the right format. Still need to add configs in the refinery JSON that gets generated. |
Here's the results of processing the data we can get from UCSC:
|
We also need the strand information (usually For the genome annotation is might also be useful to have not only gene start and gene end but also information about the intron/exon structure, which can be embedded in the last three columns of the BED file (assuming we can readily get that information from UCSC): https://genome.ucsc.edu/FAQ/FAQformat#format1 (see columns 10, 11, 12) I will put this on the agenda for the meeting today. |
@ngehlenborg : copied your last comment over to refinery-platform/get-reference-genomes#4, since that's what determines what data is on s3. Ilya will be merging the PR, so I'll close it for now. |
Shannon or Nils will provide more information.
The text was updated successfully, but these errors were encountered: