Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give informative error when values are missing in addRDA #626

Merged
merged 8 commits into from
Sep 10, 2024

Conversation

RiboRings
Copy link
Member

Fix for issue #432.

Usage examples:

# Import TreeSE
library(mia)
data("enterotype", package = "mia")
tse <- enterotype

# Throw error when na.action is na.fail and some values are missing

tse <- addCCA(tse, formula = assay ~ ClinicalStatus + Gender + Age)
# Error: Variables contain missing values. Set na.action to na.exclude to remove
# samples with missing values.

tse <- addRDA(tse, formula = assay ~ ClinicalStatus + Gender + Age,
              FUN = vegan::vegdist, method = "bray")
# Error: Variables contain missing values. Set na.action to na.exclude to remove
# samples with missing values.

# Work as usual when na.action is na.omit or na.exclude

tse <- addCCA(tse, formula = assay ~ ClinicalStatus + Gender + Age,
              na.action = na.exclude)

tse <- addRDA(tse, formula = assay ~ ClinicalStatus + Gender + Age,
              FUN = vegan::vegdist, method = "bray",
              na.action = na.exclude)

@RiboRings RiboRings linked an issue Aug 10, 2024 that may be closed by this pull request
Copy link

codecov bot commented Aug 10, 2024

Codecov Report

Attention: Patch coverage is 40.00000% with 9 lines in your changes missing coverage. Please review.

Please upload report for BASE (devel@5282a72). Learn more about missing BASE report.

Files Patch % Lines
R/runCCA.R 40.00% 9 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             devel     #626   +/-   ##
========================================
  Coverage         ?   67.80%           
========================================
  Files            ?       44           
  Lines            ?     5302           
  Branches         ?        0           
========================================
  Hits             ?     3595           
  Misses           ?     1707           
  Partials         ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@antagomir
Copy link
Member

Thanks!

Original data dimension

dim(enterotype)
[1] 553 280

Dimension after RDA (or CCA):

dim(reducedDim(tse, "RDA"))
[1] 280 6

Ok the samples match and they are not dropped. This solution seems OK to me.

We could consider providing an optional (hidden?) argument to drop out samples with missing values but perhaps this is not critical for now.

Confirm that we have sufficient unit tests in place, and documentation is clear about the key issues related to this one.

I am wondering where the 6 component solution is defined, why the default is 6 dimensions and is it possible to change that somehow? (not really related to this issue but it would be useful to find out - if time not available now, perhaps open another issue on that?).

@RiboRings
Copy link
Member Author

We could consider providing an optional (hidden?) argument to drop out samples with missing values but perhaps this is not critical for now.

I think this can be simply done by using na.omit instead of na.exclude, so in a sense the argument is already there.

@antagomir
Copy link
Member

We could consider providing an optional (hidden?) argument to drop out samples with missing values but perhaps this is not critical for now.

I think this can be simply done by using na.omit instead of na.exclude, so in a sense the argument is already there.

Right, sounds good.

Could/should we mention this explicitly in the documentation (@details section perhaps)?

Good to check that sufficient unit tests are in place.

@antagomir
Copy link
Member

Resolve conflicts.

Copy link
Contributor

@TuomasBorman TuomasBorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good!

@TuomasBorman
Copy link
Contributor

We could consider providing an optional (hidden?) argument to drop out samples with missing values but perhaps this is not critical for now.

I think this can be simply done by using na.omit instead of na.exclude, so in a sense the argument is already there.

That works only for get* functions. When add* function is used, the results is added with .add_object_to_reduceddim. That function checks if the result is missing samples compared to the TreeSE. If the result is missing samples and subset_result = FALSE, this functions adds samples to result to match with TreeSE.

However, that might not be the behavior that user wants. (If the result is missing samples, user specified na.omit and that is what user wants.)

So I suggest that we

  1. replace default value of subset_result to TRUE
  2. Rename subset_result --> subset.result

subset_result is not documented anywhere and it is intended just for us to control the behavior. Otherwise, the function should work (check).

@antagomir
Copy link
Member

ok

@RiboRings
Copy link
Member Author

Hi! I got back to this PR.

I noticed that the example with GlobalPatterns is very slow. Is it ok if I change the dataset to enterotype?

@RiboRings
Copy link
Member Author

This PR is ready to merge from my side.

@TuomasBorman TuomasBorman merged commit 52d2a23 into devel Sep 10, 2024
3 checks passed
@TuomasBorman TuomasBorman deleted the na_action branch September 10, 2024 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggestion to remove na.action argument from runRDA
3 participants