-
Notifications
You must be signed in to change notification settings - Fork 34
statistical tests - pairwise - omnibus #757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Daena Rys <rysdaena8@gmail.com>
Signed-off-by: Daena Rys <rysdaena8@gmail.com>
Signed-off-by: Daena Rys <rysdaena8@gmail.com>
antagomir
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the name getPairwiseTest correct with Kruskal test? Kruskal is not a pairwise test.
In examples:
tse <- tse[1:100, ]
tse <- agglomerateByRank(tse, rank = "Genus")
-> could we just agglomerate to higher rank (e.g. Phylum) for a quick example? Then no need to take subset of 100.
Signed-off-by: Daena Rys <rysdaena8@gmail.com>
True. However, kruskal is folloed by dunn's test. For the test to be valid, user has to check that kruskal test is useful. What other name is suitable getSignificance? getTest? |
We might like to distinguish between these two cases:
The (2) could reduce to (1) if there are only 2 groups but it is still conceptually different. These are very basic methods and one could question whether it makes sense to implement new wrappers for them. The motivation would be that this helps to standardize the way this is done with TreeSE objects, and make it faster to deploy standard summary tables and figures with minimal extra effort. Then also new improved methods can be proposed using the same logic (e.g. probabilistic variants). I would start by implementing (1) as Once that works, we could tackle (2). That could go in its own issue? You can try to implement these in parallel. |
|
|
ping #759 |
TuomasBorman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the function structure can still be simplified and made more generic. I come back to this later.
This functionality is related to little bit bigger project. Let's discuss this in the office. This PR is good foundation for it.
| # Create grouping to test these groups separately | ||
| dplyr::group_split(across(all_of(grouping_vars))) |> | ||
| purrr::map_df(function(df_group) { | ||
| tryCatch({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid try-catch
| pw <- df |> | ||
| as.data.frame() |> | ||
| # Create grouping to test these groups separately | ||
| dplyr::group_split(across(all_of(grouping_vars))) |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use importFrom instead
| .calc_DA <- function( | ||
| df, y, group, facet.by, pair.by, comp.by, features, | ||
| da.method, p.adjust.method = "fdr", | ||
| paired = !is.null(pair.by), include.effect = TRUE, | ||
| digits = 3, ... | ||
| ) { | ||
| # Basic validation | ||
| if (!.is_a_bool(paired)) { | ||
| stop("'paired' must be TRUE or FALSE.", call. = FALSE) | ||
| } | ||
|
|
||
| if ( !.is_a_string(p.adjust.method) ) { | ||
| stop("'p.adjust.method' must be a character string.", call. = FALSE) | ||
| } | ||
|
|
||
| if ( !.is_an_integer(digits) ) { | ||
| stop("'digits' must be a single integer value.", call. = FALSE) | ||
| } | ||
| # Get variables. group will specify the grouping variables, i.e., | ||
| # we test the significance for these groups separately. | ||
| grouping_vars <- c(facet.by, group) | ||
| # comp.by specify groups for comparison. If they are not | ||
| # specified, we make comparison between group. | ||
| comparison_vars <- c(comp.by) |> unique() | ||
| # grouping_vars: used to split data into groups for repeated tests | ||
| # comparison_vars: the variable(s) we are testing across | ||
| if( is.null(comparison_vars) ){ | ||
| comparison_vars <- group | ||
| } | ||
| grouping_vars <- grouping_vars[ !grouping_vars %in% comparison_vars ] | ||
|
|
||
| formula <- as.formula(paste(y, "~", paste(comparison_vars, collapse = "+"))) | ||
|
|
||
| res <- global <- effect <- NULL | ||
| method_funs <- list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use same code-styling as elsewhere
| #' @rdname getDA | ||
| #' @export | ||
| setMethod("getDA", signature(x = "SummarizedExperiment"), | ||
| function(x, da.type = c("pairwise", "omnibus", "posthoc"), ...) { | ||
| da.type <- match.arg(da.type) | ||
| FUN <- switch( | ||
| da.type, | ||
| pairwise = getPairwiseDA, | ||
| omnibus = getOmnibusDA, | ||
| posthoc = getPosthocDA) | ||
| FUN(x, ...) | ||
| } | ||
| ) | ||
|
|
||
| #' @rdname getDA | ||
| #' @export | ||
| setMethod("addDA", signature(x = "SummarizedExperiment"), | ||
| function(x, da.type = c("pairwise", "omnibus", "posthoc"), | ||
| name = "DAtest", ...) { | ||
| da.type <- match.arg(da.type) | ||
| res <- getDA(x, da.type, ...) | ||
| x <- .add_values_to_metadata(x, name, res) | ||
| return(x) | ||
| } | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove these wrappers. They do not add much value
| #' @export | ||
| setMethod("getPairwiseDA", signature(x = "SummarizedExperiment"), | ||
| function(x, assay.type = NULL, features = NULL, row.var = NULL, | ||
| col.var = NULL, group, facet.by = NULL, comp.by = NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is facet.by?
| if (!is.null(pair.by)) { | ||
| coldata <- as.data.frame(colData(tse)) | ||
|
|
||
| sample_counts <- coldata %>% | ||
| group_by(.data[[pair.by]]) %>% | ||
| summarise(n_samples = n(), .groups = "drop") | ||
|
|
||
| if (length(unique(sample_counts$n_samples)) > 1) { | ||
| stop( | ||
| "When 'pair.by' is specified, all subjects must have the ", | ||
| "same number of samples.\nRemove 'pair.by' or filter your ", | ||
| "data to include only subjects with balanced repeats.", | ||
| call. = FALSE | ||
| ) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required for the function to work? I guess user should already be aware of this, and we should not restrict user to run the method if some samples are missing from certain time points
|
These will be implemented in microbiome/daa#8 |


#733