Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add option for seeding rarefaction? #55

Open
gregcaporaso opened this issue Nov 9, 2016 · 6 comments
Open

add option for seeding rarefaction? #55

gregcaporaso opened this issue Nov 9, 2016 · 6 comments

Comments

@gregcaporaso
Copy link
Member

gregcaporaso commented Nov 9, 2016

Improvement Description
A forum user suggested that we add support for seeding rarefaction, which is an interesting idea for supporting reproducibility, though I'm not certain what the specific use cases would be.

Questions
Are there times where we would want to perfectly replicate rarefaction results? If so, we'd need the seed to be logged into the artifact's provenance.

References
suggested

@lkursell
Copy link

lkursell commented Nov 9, 2016

I really like this idea - if you are doing more global analysis like PCoA plots, I don't think the results would be affected all that much. However, in recent work I've been doing on machine learning with feature tables, I've seen that correlations and p-values can be significantly affected just be re-running rarefaction, especially if the community is particularly diverse.

I also think this would allow a more definitive assessment of if rarefaction was making a difference. Aka seed the tables two different ways, run your analysis, and then compare across the tables what was different.

Perhaps more importantly this means that you could provide a user or collaborator with the raw table and get to exactly the same rarefied table, without having to send along intermediate files. This seems helpful in the context of when a database is being used, like making sure if you pulled out studies from QIITA and rarified them, that you'd always get the same table.

@sejsong
Copy link

sejsong commented Dec 19, 2018

Tacking on to what @lkursell mentioned, I've found that ancom results can also differ depending on rarefaction iteration.
More and more journals are asking for analysis notebooks with manuscript submissions, and I think this is important for exact reproducibility of results by others who may run the code.

@nbokulich
Copy link
Member

It looks like this is now possible, as setting a random seed has been enabled in biom-format Table.subsample. See: biocore/biom-format#916

@wasade would you by any chance be interested in exposing that option in q2-feature-table? Or could you let us know when the next release of biom-format is planned so that we can coordinate this issue?

@wasade
Copy link
Member

wasade commented Apr 25, 2023

Hey @nbokulich, the next release will happen as soon as I can get enough time to make it happen. I had actually intended to release a week or two ago, but it keeps getting bumped. It's relatively high on my priorities but just not yet at the top. Is this time sensitive for q2-feature-table?

@nbokulich
Copy link
Member

thanks @wasade ! The next release of QIIME 2 is in May (PRs must be merged by May 5) so we could add this feature to q2-feature-table in that release if you cut the new release of biom-format before then. So there's opportunity but not urgency I'd say.

@wasade
Copy link
Member

wasade commented Apr 26, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants