Gene networks in cancer are biased by aneuploidies and sample impurities

Gene regulatory network inference is a standard technique for obtaining structured regulatory information from, for instance, gene expression measurements. Methods performing this task have been extensively evaluated on synthetic, and to a lesser extent real data sets. In contrast to these test evaluations, applications to gene expression data of human cancers are often limited by fewer samples and more potential regulatory links, and are biased by copy number aberrations as well as cell mixtures and sample impurities. Here, we take networks inferred from TCGA cohorts as an example to show that (1) transcription factor annotations are essential to obtain reliable networks, and (2) even for state of the art methods, we expect that between 20 and 80% of edges are caused by copy number changes and cell mixtures rather than transcription factor regulation.

This is the analysis code for our paper at https://doi.org/10.1016/j.bbagrm.2019.194444

It was used to generate all figures in the article (report directory). The main conclusions are outlined below.

Findings

DREAM evaluations differ from cancer data

Gene network inference methods have been extensively validated using consortium benchmarks such as the DREAM challenges. However, the cancer data these methods are often used on have many more genes and fewer samples than those in the evaluation.

Cancers are aneuploid and impure, but this has not been modeled

Previous evaluations have also not considered specific biological properties of cancer data: they show chromosome copy number changes that influences gene expression in a coordinated fashion, and they are mixtures of different cell types. These confounding factors are pervasive and not limited to individual samples.

Copy number alterations influence inferred gene networks

Regulatory links are often inferred for genes in the same currently amplified region. They do, however, only comprise few genes so the effect on the total network is small. Aneuploidies (whole chromosome copy number changes) introduce a smaller fraction of false positive links, but this has a major influence on the inferred network as a whole.

Sample mixtures influence inferred gene networks

Sample mixtures also show a strong effect on false positive regulatory links. Looking at only cancer vs. non-cancer (purity of the sample), we find that FP links are also often inferred between genes that correlate with sample purity.

Running the code

If you want to run this code, you will require:

All R packages loaded in the scripts
Snakemake for workflow execution
ebits and data repositories with set up module paths

Given that you have set up these packages and data, you can run the analyses:

cd set_enrichment
snakemake

And generate the figures:

cd report
make

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
data		data
networks		networks
report		report
set_enrichment		set_enrichment
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
config.yaml		config.yaml
rsync_peregrine.sh		rsync_peregrine.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

networks

networks

report

report

set_enrichment

set_enrichment

.gitignore

.gitignore

LICENSE.md

LICENSE.md

README.md

README.md

config.yaml

config.yaml

rsync_peregrine.sh

rsync_peregrine.sh

Repository files navigation

Gene networks in cancer are biased by aneuploidies and sample impurities

Findings

DREAM evaluations differ from cancer data

Cancers are aneuploid and impure, but this has not been modeled

Copy number alterations influence inferred gene networks

Sample mixtures influence inferred gene networks

Running the code

About

Releases 1

Packages

Languages

License

mschubert/GRN-aneup-purity

Folders and files

Latest commit

History

Repository files navigation

Gene networks in cancer are biased by aneuploidies and sample impurities

Findings

DREAM evaluations differ from cancer data

Cancers are aneuploid and impure, but this has not been modeled

Copy number alterations influence inferred gene networks

Sample mixtures influence inferred gene networks

Running the code

About

Resources

License

Stars

Watchers

Forks

Languages