List of gene lists for genomic analyses.
Python R Shell
Latest commit 0d0c637 May 23, 2016 Daniel Birnbaum New key for accessing OMIM API hard-coded to parse_omim script. Also,…
… re-downloaded OMIM data

README.md

List of gene lists

Often in bioinformatics we want a list of genes so that we can ask, "are genes in this list more X than other genes?" or "are genes in this list enriched in this other list?" and so on. There are many useful lists out there, but many of them are in an Excel file supplement to a paper, or an XML format with loads of other info you don't need, or use outdated gene symbols. For one reason or another, it often takes a lot of work to wrestle them into a format you can use. This repository is the MacArthur Lab's effort to collect all the lists we find useful into one place, with each formatted as just a single-column text file listing the current gene symbols.

Here is a guide to the lists we currently have in this repo:

List Count Description Please cite
Universe 18,991 Approved symbols for 18,991 protein-coding genes according to HGNC as of Feb 9, 2015. For details see src/create_universe.bash. This list is the "universe" of which all subsequent lists are subsets. See genenames.org/about/overview. Users are asked to web reference "HUGO Gene Nomenclature Committee at the European Bioinformatics Institute" (http://www.genenames.org/) if possible.
FDA-approved drug targets 286 Genes whose protein products are known to be the mechanistic targets of FDA-approved drugs. For details on the exact criteria we used for inclusion in this list, see src/drug_targets.py See drugbank.ca/about. Please cite [Law 2014, Knox 2011, Wishart 2008 and/or Wishart 2006].
Drug targets by Nelson et al 2012 201 Drug targets according to Nelson et al 2012, with reference to Russ & Lampel 2005. [Nelson 2012, Russ & Lampel 2005]
Autosomal dominant genes by Blekhman et al 2008 307 OMIM disease genes deemed to follow autosomal dominant inheritance according to extensive manual curation by Molly Przeworski's group. [Blekhman 2008]
Autosomal dominant genes by Berg et al 2013 631 OMIM disease genes (as of June 2011) deemed to follow autosomal dominant inheritance according Berg et al, 2013. [Berg 2013]
Autosomal recessive genes by Blekhman et al 2008 529 OMIM disease genes deemed to follow autosomal recessive inheritance according to extensive manual curation by Molly Przeworski's group. [Blekhman 2008]
Autosomal recessive genes by Berg et al 2013 1073 OMIM disease genes (as of June 2011) deemed to follow autosomal recessive inheritance according Berg et al, 2013. [Berg 2013]
X-linked genes by Blekhman et al 2008 66 OMIM disease genes deemed to follow X-linked inheritance (dominant/recessive not specified) according to extensive manual curation by Molly Przeworski's group. [Blekhman 2008]
X-linked recessive genes by Berg et al 2013 102 OMIM disease genes (as of June 2011) deemed to follow X-linked recessive inheritance according Berg et al, 2013. [Berg 2013]
X-linked dominant genes by Berg et al 2013 34 OMIM disease genes (as of June 2011) deemed to follow X-linked dominant inheritance according Berg et al, 2013. [Berg 2013]
X-linked ClinVar genes 61 X chromosome genes in the August 6, 2015 ClinVar release that have at least 3 reportedly pathogenic, non-conflicted variants in ClinVar with at least one submitter other than OMIM or GeneReviews. Code here. Cite the ClinVar paper [Landrum 2014]
All dominant genes 709 Currently the union of the Berg and Blekhman dominant lists, may add more lists later. [Blekhman 2008, Berg 2013]
All recessive genes 1183 Currently the union of the Berg and Blekhman recessive lists, may add more lists later. [Blekhman 2008, Berg 2013]
Essential in culture 285 Genes deemed essential in multiple cultured cell lines based on shRNA screen data [Hart 2014]
Essential in mice 2,454 Genes where homozygous knockout in mice results in pre-, peri- or post-natal lethality. The mouse phenotypes were reported by Jackson Labs [Blake 2011], then essential gene list was extracted via manual review of phenotypes by [Georgi 2013], and the essential/non-essential flag was put into dbNSFP [Liu 2013]. We extracted the genes from dbNSFP. [Blake 2011, Georgi 2013, and Liu 2013]
Genes nearest to GWAS peaks 3,762 Closest gene 3' and 5' of GWAS hits in the NHGRI GWAS catalog as of Feb 9, 2015 See instructions here. Cite [Welter 2014] and include a web reference to genome.gov/gwastudies/.
DNA Repair Genes, WoodRD 178 An updated inventory of human DNA repair genes. (Last modified on Tuesday 15th April 2014). For details see src/DRG_WoodRD.R Cite [Wood 2005] and include a web reference to this URL.
DNA Repair Genes, KangJ 151 Supplementary Table 1. 151 DNA repair genes. DNA repair genes from DNA repair pathways: ATM, BER, FA/HR, MMR, NHEJ, NER, TLS, XLR, RECQ, and other. Cite [Kang 2012]
ClinGen haploinsufficient genes 221 Genes with sufficient evidence for dosage pathogenicity (level 3) as determined by the ClinGen Dosage Sensitivity Map as of Feb 27, 2015 See ClinGen
Olfactory receptors 371 Olfactory receptors from the Mainland 2015's data release Mainland 2015
Genes with any disease association reported in ClinVar 3078 Using this simple script, downloaded the ClinVar tab-delimited summary as of May 12, 2015, and took all gene symbols for which there is at least one variant with an assertion of pathogenic or likely pathogenic in ClinVar. Cite the ClinVar paper [Landrum 2014]
Kinases 351 From UniProt's pkinfam list According to UniProt this list is based on 3 publications: [Hunter 2000, Manning 2002, Miranda-Saavedra & Barton 2007]
GPCRs 1705 GPCR list from guidetopharmacology.org Please read citing instructions here and at a minimum, cite [Pawson 2014].
Natural product targets 37 List of hand-curated targets of natural products from supplement of [Dancik 2010] [Dancik 2010]

We welcome pull requests for adding additional lists, provided they are licensed for redistribution. If possible, please provide the source code used to extract the list from its original source, and an appropriate description for this readme.