Skip to content

How Pvalue is calculated

Chris Churas edited this page Apr 12, 2022 · 14 revisions

This page describes how the Pvalues are calculated for results returned from this service. The Pvalues are used in sorting the results.

WARNING: THE IMPLEMENTATION IN ENRICHMENT MAY BE INCORRECTLY USING HYPERGEOMETRIC DISTRIBUTION FUNCTION. PLEASE LET ME KNOW IF THIS IS THE CASE AND I CAN FIX IT.

Overview

This service uses HypergeometricDistribution to calculate Pvalues for each enrichment result leveraging the cumulativeProbability function to get the pvalue. (If this is wrong please let me know and I can change it).

The population size is set to the number of unique genes in all networks visible to enrichment.

The number of successes is set to the number unique genes for given network being examined.

number of successess value might be wrong, should the code be changed 
to get count of number of networks where # of matched genes occur?

The sample size is set to the number of unique genes in the query.

To get the pvalue the unique number of matching genes is passed to the cumulativeProbability function.

Actual code

Specifically the HypergeometricDistribution class in the apache library is used

HypergeometricDistribution hd = new HypergeometricDistribution(populationSize, 
                                                               numberOfSuccesses,
                                                               sampleSize);

double pvalue = ((double)1.0 - hd.cumulativeProbability(numGenesMatch));

Definition of values used above

  • populationSize is set to the number of unique genes in all networks examined by enrichment

  • numberOfSuccesses is set to the number of unique genes for the given network being examined

  • sampleSize is set to the number of unique genes in the query

  • numGenesMatch is set to the number of genes that match the given network

Clone this wiki locally