-
Notifications
You must be signed in to change notification settings - Fork 3
How Pvalue is calculated
This page describes how the p-values are calculated for results returned from this service. The p-values are used in sorting the results.
This service uses HypergeometricDistribution to calculate p-values for each enrichment result leveraging the probability function to get the p-value. The p-values are then adjusted using this Benjamini, Heller, Yekutieli 2009 paper.
The population size is set to the number of unique genes in all networks visible to enrichment.
The number of successes is set to the number unique genes for given network being examined.
number of successes value might be wrong, should the code be changed
to get count of number of networks where # of matched genes occur?
The sample size is set to the number of unique genes in the query.
To get the p-value the unique number of matching genes is passed to the probability function.
The p-values are then adjusted where each p-value is multiplied by the number of networks queried, and then divided by its rank relative to other p-values (where low p-values have a low rank and vice versa). Lower value p-values are propagated up the list so that the p-values are always ascending.
Specifically the HypergeometricDistribution class in the apache library is used
HypergeometricDistribution hd = new HypergeometricDistribution(populationSize,
numberOfSuccesses,
sampleSize);
double pvalue = hd.probability(numGenesMatch);For p-value adjustment see this class
-
populationSize is set to the number of unique genes in all networks examined by enrichment
-
numberOfSuccesses is set to the number of unique genes for the given network being examined
-
sampleSize is set to the number of unique genes in the query
-
numGenesMatch is set to the number of genes that match the given network