How Pvalue is calculated

This page describes how the Pvalues are calculated for results returned from this service. The Pvalues are used in sorting the results.

WARNING: THE IMPLEMENTATION IN ENRICHMENT MAY BE INCORRECTLY USING HYPERGEOMETRIC DISTRIBUTION FUNCTION. PLEASE LET ME KNOW IF THIS IS THE CASE AND I CAN FIX IT.

Overview

This service uses HypergeometricDistribution to calculate Pvalues for each enrichment result leveraging the cumulativeProbability function to get the pvalue. (If this is wrong please let me know and I can change it).

The population size is set to the number of unique genes in all networks visible to enrichment.

The number of successes is set to the number unique genes for given network being examined.

number of successess value might be wrong, should the code be changed 
to get count of number of networks where # of matched genes occur?

The sample size is set to the number of unique genes in the query.

To get the pvalue the unique number of matching genes is passed to the cumulativeProbability function.

Actual code

Specifically the HypergeometricDistribution class in the apache library is used

HypergeometricDistribution hd = new HypergeometricDistribution(populationSize, 
                                                               numberOfSuccesses,
                                                               sampleSize);

double pvalue = ((double)1.0 - hd.cumulativeProbability(numGenesMatch));

Definition of values used above

populationSize is set to the number of unique genes in all networks examined by enrichment
numberOfSuccesses is set to the number of unique genes for the given network being examined
sampleSize is set to the number of unique genes in the query
numGenesMatch is set to the number of genes that match the given network

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How Pvalue is calculated

Overview

Actual code

Definition of values used above

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally