Skip to content

sigven/geneOncoX

Repository files navigation

 

geneOncoX

Which human genes are implicated in tumor development?

geneOncoX is an R package that address this question through the integration of a number of resources with respect to the functional roles of cancer genes, and also their representation in commercially available targeted sequencing assays (gene panels). Resources included among the integrated annotations include the following:

The package offers a few pre-processed datasets, along with metadata, that the user can retrieve and use for their own projects or set-ups. The package utilizes the googledrive R package to download the pre-processed and documented datasets to a local cache directory provided by the user.

Installation

remotes::install_github('sigven/geneOncoX')

Usage

The package offers (currently) five different functions, that each retrieves a specific dataset that can be of use for gene annotation purposes.

  • get_basic() - retrieves basic, non-transcript-specific gene annotations. Includes tumor suppressor gene/oncogene/driver annotations from multiple resources, NCBI gene summary descriptions, as well as multiple predictions/scores when it comes to gene indispensability and loss-of-function tolerance

  • get_gencode() - retrieves two datasets ( grch37 and grch38 ) with human gene transcripts from GENCODE, including cross-references to RefSeq, UniProt, APPRIS, and MANE

  • get_alias() - retrieves a list of gene synonyms, indicating which synonyms are ambiguous or nonambiguous (with respect to primary gene symbols)

  • get_predisposition() - retrieves a list of genes of relevance for cancer predisposition, utilizing multiple resources, including Cancer Gene Census, Genomics England PanelApp, TCGA's PanCancer study, and others.

  • get_panels() - retrieves a collection of > 40 different panels for various cancer conditions, as found in the Genomics England PanelApp.

Technically, each dataset comes as a list object in R with

  • a metadata data frame that lists URLs, citations, and versions of underlying resources
  • a records data frame that contains the actual gene/transcript annotations

IMPORTANT NOTE

If you use the datasets provided with geneOncoX, make sure you properly cite the original publications of the resources integrated, and that you comply with the licensing terms:

  1. IntOGen - Martínez-Jiménez et al., Nat Rev Cancer, 2020 - CC0 1.0
  2. CancerMine - Lever et al., Nat Methods, 2019 - CC0 1.0
  3. Network of Cancer Genes - Repana et al., Genome Biol, 2019 - Open Access
  4. Cancer Gene Census - Sondka et al., Nat Rev Cancer, 2018 - Free for non-commercial, academic use - for commercial usage see https://cancer.sanger.ac.uk/cosmic/license
  5. DNA repair genes database - Woods et al., Science, 2001 - Open Access
  6. dbNSFP - Liu et al., Genome Med, 2020 - Open Access
  7. Genomics England PanelApp - Martin et al., Nat Genet, 2019 - Commercial use requires separate agreement with GEL, see licensing terms
  8. GENCODE - Frankish et al., Nucleic Acids Res, 2021 - Open Access

Contact

sigven AT ifi.uio.no