-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
P-value aggregate #148
P-value aggregate #148
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODOs
R/model.R
Outdated
@@ -308,7 +308,8 @@ tests.sleuth <- function(obj, lrt = TRUE, wt = TRUE) { | |||
#' results_table <- sleuth_results(sleuth_obj, 'conditionIP') | |||
#' @export | |||
sleuth_results <- function(obj, test, test_type = 'wt', | |||
which_model = 'full', rename_cols = TRUE, show_all = TRUE) { | |||
which_model = 'full', rename_cols = TRUE, show_all = TRUE, | |||
aggregate_pval = obj$gene_aggregate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change obj$gene_aggregate to obj$aggregate_pval. Job of @pimentel to make sure changing obj$gene_aggregate is global.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the question --> yes.
And probably not expose aggregate_pval
as an external argument; just do internal check (obj$pval_aggregate
and obj$gene_mode
should not both be TRUE
, but ok if both are FALSE
). That should just be set with sleuth_prep
and then the user doesn't have to think about it again after that. The internal check will make sure that the users didn't inappropriately mess with those booleans between sleuth_prep
and sleuth_results
.
R/model.R
Outdated
if(any(res$pval < 10^-323, na.rm=TRUE)) { | ||
warning('Extreme p-values around and below 10^-320 will generate 0 pvalues in aggregation') | ||
} | ||
res <- res %>% group_by(ens_gene) %>% summarise(ext_gene = unique(ext_gene), num_aggregated_transcripts = length(!is.na(pval)), sum_mean_obs_counts = sum(mean_obs, na.rm=TRUE), pval = lancaster(pval, mean_obs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to data.table
@@ -238,7 +238,7 @@ sleuth_prep <- function( | |||
obs_raw <- dplyr::bind_rows(lapply(kal_list, function(k) k$abundance)) | |||
|
|||
counts_test <- data.table::as.data.table(obs_raw) | |||
counts_test <- counts_test[, total = .(total = sum(est_counts)), by = "sample"] | |||
counts_test <- counts_test[, .(total = sum(est_counts)), by = "sample"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed this. Thanks!
R/model.R
Outdated
if(any(res$pval < 10^-323, na.rm=TRUE)) { | ||
warning('Extreme p-values around and below 10^-320 will generate 0 pvalues in aggregation') | ||
} | ||
res <- data.table::as.data.table(res)[, .(num_aggregated_transcripts = length(!is.na(pval)), sum_mean_obs_counts = sum(mean_obs, na.rm=TRUE), pval = as.numeric(lancaster(pval, exp(mean_obs)))), by=eval(obj$gene_column)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to data.table implementation. sleuth_results(aggregate_pval = TRUE) now takes ~1.5 seconds instead of 6 seconds with data.frame. Please check that I've optimized data.table, @warrenmcg , @pimentel. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I'm going to ask this question here: I noticed that using a newly generated sleuth object, I get much fewer significant genes compared to my older sleuth object (using sleuth about a year or so ago) on the same data set. I just want to make sure this is correct; i.e. there were changes in sleuth to support this observation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I would only suggest a formatting change to keep the columns to under 100 characters (suggested R style tips; recommends 80, but I usually go with 100 because of some nesting that happens).
line 401 would read like this:
res <- data.table::as.data.table(res)[,
.(num_aggregated_transcripts = length(!is.na(pval)),
sum_mean_obs_counts = sum(mean_obs, na.rm=TRUE),
pval = as.numeric(lancaster(pval, exp(mean_obs)))),
by = eval(obj$gene_column)]
The # of spaces may differ between here and your editor, so just align the period with the 1st 't' in data.table, the other columns with num_aggregated_transcripts
, and the by
column with the period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# of genes called issue resolved: false alarm
No description provided.