Parsing Chromosome and Gene Names Given by TOGA DATA #145
-
Is there some standard for mapping the chromosome names outputted in TOGA geneAnnotation.bed files (ex. CMO21918) to more standard chromosome names (ex. chr1...)? More generally, I'm trying to analyze TOGA-annotated orthologous query genes on genome browsers (UCSC and Ensembl) and I am struggling to find a chromosome or gene naming standard within TOGA data that is accepted as a query on either UCSC or Ensembl genome browsers. I very well may be overlooking something but please let me know if there is a means of mapping TOGA's given chromosome names or gene IDs to formats that are queryable in genome browsers. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hi, the overview.tsv (e.g. https://genome.senckenberg.de/download/TOGA/human_hg38_reference/overview.table.tsv for human) provides in column F the assembly accession (NCBI). We used these assemblies and kept the respective chrom names. In the tsv, column E, we indicate the internal or UCSC assembly name. Any assembly that starts with HL were obtained from NCBI, DNAzoo. All others (e.g. balAcu1) are assemblies from UCSC. For all assemblies, we provide UCSC genome browsers showing the TOGA tracks. Just go to https://genome.senckenberg.de/ and click "517 mammals" or "501 birds". Which assemblies match those available on Ensembl is something I don't know. But Ensembl should also list NCBI accessions. Pls note that Ensembl sometimes converts chr1 to 1. Hope this helps |
Beta Was this translation helpful? Give feedback.
-
We don't have an API. But our browser works just like's UCSC (mostly the same code) |
Beta Was this translation helpful? Give feedback.
Hi,
the overview.tsv (e.g. https://genome.senckenberg.de/download/TOGA/human_hg38_reference/overview.table.tsv for human) provides in column F the assembly accession (NCBI). We used these assemblies and kept the respective chrom names.
Some assemblies are from DNAzoo. For those we provide the assembly as a 2bit at https://genome.senckenberg.de/download/TOGA/MammalianDNAZooAssemblies/ You can use twoBitInfo to extract the list of all chroms/scaffolds.
In the tsv, column E, we indicate the internal or UCSC assembly name. Any assembly that starts with HL were obtained from NCBI, DNAzoo. All others (e.g. balAcu1) are assemblies from UCSC.
For all assemblies, we provide UCSC genome browsers sh…