You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
a single instantiation of the FDR is biased because pseudogene groupings are completely random and nondeterministic. This leads to an FDR threshold that can vary wildly depending on the instantiation.
Instead, would be better to run the null distribution test a large number of times, calculate empirical FDRs, then average over the values to get an estimate of the FDR for every gene.
Another method would be to run the null test a large number of times, calculate empirical FDRs, find the threshold in each case, then average the threshold between the runs.
Would be best to do both and compare the statistics
The text was updated successfully, but these errors were encountered:
Distribution of Rankings for pseudogenes in 500 runs
Pretty wide spread here, makes a good case for why there should be some form of aggregation
Distribution of FDR threshold ($\alpha = 0.05$) for 500 runs
In this case all of those genes at the $~2.5$ range would be considered hits, while in each individual INC run they may not be considered significant.
This would consider 60 genes as hits in my test dataset.
Calculating individual gene FDR averages
Another approach would be to calculate an average empirical FDR for each gene given the background distribution and then using the $\alpha = 0.05$ threshold on that average score. In this case an FDR would be calculated for each of the $m=500$ runs to bootstrap the pseudogenes, then for each gene the mean of all of their empirical FDRs would be taken and reported.
This would consider 73 genes as hits in my test dataset
a single instantiation of the FDR is biased because pseudogene groupings are completely random and nondeterministic. This leads to an FDR threshold that can vary wildly depending on the instantiation.
Instead, would be better to run the null distribution test a large number of times, calculate empirical FDRs, then average over the values to get an estimate of the FDR for every gene.
Another method would be to run the null test a large number of times, calculate empirical FDRs, find the threshold in each case, then average the threshold between the runs.
Would be best to do both and compare the statistics
The text was updated successfully, but these errors were encountered: