Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P-value aggregate #148

Merged
merged 16 commits into from Mar 12, 2018
Merged

P-value aggregate #148

merged 16 commits into from Mar 12, 2018

Conversation

pimentel
Copy link
Collaborator

No description provided.

Copy link
Contributor

@lynnyi lynnyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODOs

R/model.R Outdated
@@ -308,7 +308,8 @@ tests.sleuth <- function(obj, lrt = TRUE, wt = TRUE) {
#' results_table <- sleuth_results(sleuth_obj, 'conditionIP')
#' @export
sleuth_results <- function(obj, test, test_type = 'wt',
which_model = 'full', rename_cols = TRUE, show_all = TRUE) {
which_model = 'full', rename_cols = TRUE, show_all = TRUE,
aggregate_pval = obj$gene_aggregate) {
Copy link
Contributor

@lynnyi lynnyi Nov 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change obj$gene_aggregate to obj$aggregate_pval. Job of @pimentel to make sure changing obj$gene_aggregate is global.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the question --> yes.
And probably not expose aggregate_pval as an external argument; just do internal check (obj$pval_aggregate and obj$gene_mode should not both be TRUE, but ok if both are FALSE). That should just be set with sleuth_prep and then the user doesn't have to think about it again after that. The internal check will make sure that the users didn't inappropriately mess with those booleans between sleuth_prep and sleuth_results.

R/model.R Outdated
if(any(res$pval < 10^-323, na.rm=TRUE)) {
warning('Extreme p-values around and below 10^-320 will generate 0 pvalues in aggregation')
}
res <- res %>% group_by(ens_gene) %>% summarise(ext_gene = unique(ext_gene), num_aggregated_transcripts = length(!is.na(pval)), sum_mean_obs_counts = sum(mean_obs, na.rm=TRUE), pval = lancaster(pval, mean_obs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to data.table

@@ -238,7 +238,7 @@ sleuth_prep <- function(
obs_raw <- dplyr::bind_rows(lapply(kal_list, function(k) k$abundance))

counts_test <- data.table::as.data.table(obs_raw)
counts_test <- counts_test[, total = .(total = sum(est_counts)), by = "sample"]
counts_test <- counts_test[, .(total = sum(est_counts)), by = "sample"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed this. Thanks!

R/model.R Outdated
if(any(res$pval < 10^-323, na.rm=TRUE)) {
warning('Extreme p-values around and below 10^-320 will generate 0 pvalues in aggregation')
}
res <- data.table::as.data.table(res)[, .(num_aggregated_transcripts = length(!is.na(pval)), sum_mean_obs_counts = sum(mean_obs, na.rm=TRUE), pval = as.numeric(lancaster(pval, exp(mean_obs)))), by=eval(obj$gene_column)]
Copy link
Contributor

@lynnyi lynnyi Nov 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to data.table implementation. sleuth_results(aggregate_pval = TRUE) now takes ~1.5 seconds instead of 6 seconds with data.frame. Please check that I've optimized data.table, @warrenmcg , @pimentel. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I'm going to ask this question here: I noticed that using a newly generated sleuth object, I get much fewer significant genes compared to my older sleuth object (using sleuth about a year or so ago) on the same data set. I just want to make sure this is correct; i.e. there were changes in sleuth to support this observation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I would only suggest a formatting change to keep the columns to under 100 characters (suggested R style tips; recommends 80, but I usually go with 100 because of some nesting that happens).

line 401 would read like this:

res <- data.table::as.data.table(res)[,
         .(num_aggregated_transcripts = length(!is.na(pval)),
           sum_mean_obs_counts = sum(mean_obs, na.rm=TRUE),
           pval = as.numeric(lancaster(pval, exp(mean_obs)))),
         by = eval(obj$gene_column)]

The # of spaces may differ between here and your editor, so just align the period with the 1st 't' in data.table, the other columns with num_aggregated_transcripts, and the by column with the period.

Copy link
Collaborator Author

@pimentel pimentel Nov 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# of genes called issue resolved: false alarm

@pimentel pimentel merged commit f457a10 into devel Mar 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants