bootstrap the empirical FDR #14

noamteyssier · 2023-05-17T22:13:56Z

a single instantiation of the FDR is biased because pseudogene groupings are completely random and nondeterministic. This leads to an FDR threshold that can vary wildly depending on the instantiation.

Instead, would be better to run the null distribution test a large number of times, calculate empirical FDRs, then average over the values to get an estimate of the FDR for every gene.

Another method would be to run the null test a large number of times, calculate empirical FDRs, find the threshold in each case, then average the threshold between the runs.

Would be best to do both and compare the statistics

noamteyssier · 2023-05-18T15:55:20Z

Distribution of Rankings for pseudogenes in 500 runs

Pretty wide spread here, makes a good case for why there should be some form of aggregation

Distribution of FDR threshold ($\alpha = 0.05$) for 500 runs

In this case all of those genes at the $~2.5$ range would be considered hits, while in each individual INC run they may not be considered significant.

This would consider 60 genes as hits in my test dataset.

Calculating individual gene FDR averages

Another approach would be to calculate an average empirical FDR for each gene given the background distribution and then using the $\alpha = 0.05$ threshold on that average score. In this case an FDR would be calculated for each of the $m=500$ runs to bootstrap the pseudogenes, then for each gene the mean of all of their empirical FDRs would be taken and reported.

This would consider 73 genes as hits in my test dataset

noamteyssier added the enhancement New feature or request label May 17, 2023

noamteyssier mentioned this issue May 19, 2023

update crispr_screen to use newer INTC algorithm noamteyssier/crispr_screen#117

Closed

noamteyssier linked a pull request May 19, 2023 that will close this issue

14 bootstrap the empirical fdr #15

Merged

noamteyssier closed this as completed in #15 May 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bootstrap the empirical FDR #14

bootstrap the empirical FDR #14

noamteyssier commented May 17, 2023

noamteyssier commented May 18, 2023

bootstrap the empirical FDR #14

bootstrap the empirical FDR #14

Comments

noamteyssier commented May 17, 2023

noamteyssier commented May 18, 2023

Distribution of Rankings for pseudogenes in 500 runs

Distribution of FDR threshold ($\alpha = 0.05$) for 500 runs

Calculating individual gene FDR averages