Skip to content

abxka/tissueSpecificExpression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This directory contains scripts and results from analyses on tissue specific gene expression in GTEx.

Introduction and Methods

GTEx dataset with FPKM normalized expression values from Tophat2 alignments and associated metadata listing for each GTEx sample its tissue name, were downloaded from synapse (syn8105922) and syn7596611, respectively). Prior to the analysis testis samples were removed due to overrepresentation in the analysis. Tissues belonging to the digestive tract (Liver, Colon, Pancreas, Esophagus, Stomach, Biliary) and the female reproductive system (Breast, Cervix Uteri, Ovary, Uterus, Fallopian Tube, Vagina) were merged into meta-cohorts following guidelines in the PCAWG meta tumor group documentation. 11,148 of the total 57,820 genes having constantly FPKM values < 1 in all GTEx samples were filtered out from further analysis. The FPKM threshold was chosen on the basis of the expression of olfactory receptors, which are likely to have no function in almost all tissues (Ezkurdia et al.) and of which 99% had a smaller expression than 1 FPKM.

To determine which of the remaining genes are expressed in a tissue specific manner, we followed the ideas from the TiGER database, where tissue specificity was defined by a minimum enrichment score between two tissues and an associated maximum p-value between their samples. The two tissues with the hightest (A) and 2nd highest (B) mean expression were selected for each gene and an enrichment score, equivalent to the quotient A/B was calculated. The enrichment score was supplemented with a p-value from a Wilcox-Rank Sum test comparing the set of expression values for the gene in the samples of the highest expressed tissue (Ai) and 2nd highest expressed tissue (Bi). Genes with an enrichment score > 5 and an p-value < 0.01 were regarded as tissue specific, while the rest were considered to be non-tissue specific.

This directory has following structure:

  • scripts/
    • A set of PERL scripts for determining tissue specific gene expression
  • input/
    • Input data files.
  • output/
    • Result data files
  • README.md
    • This Readme file.

Requirements

scripts/tissueSpecificGeneExpression.pl requires the Statistics::R PERL module. Try to install it via

perl -MCPAN -e 'install Statistics::R'

Running the analysis

First, remove genes with negligible expression: scripts/filterGeneExpression.pl -in GTEx.tophat2.gene.fpkm.tsv.gz -prefix ENSG -min 1 | gzip > input/GTEx.tophat2.gene.fpkm_filt.tsv.gz

GTEx.tophat2.gene.fpkm.tsv.gz can be downloaded from Synapse.

Then, determine tissue specific gene expression: scripts/tissueSpecificGeneExpression.pl -meta GTEX_v4.metadata.tsv.gz -cohort input/metaCohorts.tsv -express GTEx.tophat2.gene.fpkm_filt.tsv.gz -mean -exclude testis | gzip > output/tissueSpecificGeneExpression_meta_mean.tsv.gz

GTEX_v4.metadata.tsv.gz can be downloaded from Synapse.

Lastly, reformat output and put genes in order of a predefined gene list: scripts/assignTissueSpecificity2genes.pl -gene input/genes.txt -tissue output/tissueSpecificGeneExpression_meta_mean.tsv.gz -map input/ensg_ensp_enst_ense_geneName_v75.tsv.gz -min 5 -max 0.01 > output/genes_tissueEnrichmentPvalue.tsv

About

Calculate tissue specific gene expression in GTEx tissue samples.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages