-
Notifications
You must be signed in to change notification settings - Fork 3
How Pvalue is calculated
This page describes how the Pvalues are calculated for results returned from this service. The Pvalues are used in sorting the results.
WARNING: THE IMPLEMENTATION IN ENRICHMENT MAY BE INCORRECTLY USING HYPERGEOMETRIC DISTRIBUTION FUNCTION. PLEASE LET ME KNOW IF THIS IS THE CASE AND I CAN FIX IT.
This service uses HypergeometricDistribution to calculate Pvalues for each enrichment result leveraging the cumulativeProbability function to get the pvalue. (If this is wrong please let me know and I can change it).
The population size is set to the number of unique genes in all networks visible to enrichment. The number of successes is set to the number unique genes for a given network. The sample size is set to the number of unique genes in the query. To get the pvalue the unique number of matching genes is passed to the cumulativeProbability function.
Specifically the HypergeometricDistribution class in the apache library is used
HypergeometricDistribution hd = new HypergeometricDistribution(populationSize,
numberOfSuccesses,
sampleSize);
double pvalue = ((double)1.0 - hd.cumulativeProbability(numGenesMatch));-
populationSize is set to the number of unique genes in all networks examined by enrichment
-
numberOfSuccesses is set to the number of unique genes for the given network being examined
-
sampleSize is set to the number of unique genes in the query
-
numGenesMatch is set to the number of genes that match the given network