Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stepwise completeness, module copy number, and a bunch of other useful updates to metabolism #1927

Merged
merged 166 commits into from
Apr 19, 2022

Conversation

ivagljiva
Copy link
Contributor

This PR will be my last major update to the metabolism code for the next few months. It includes the following updates:

  • a new 'stepwise' strategy for computing module completeness. The old strategy is now referred to as 'pathwise'
  • 'hits in modules' output mode has been deprecated and replaced with two new modes: 'module paths' mode (for pathwise information) and 'module steps' mode (for stepwise information)
  • the old matrix file outputs have been renamed to specify 'pathwise', and new matrix outputs have been added for 'stepwise' metrics
  • we can now compute copy number of metabolic modules, in either a pathwise or stepwise fashion. The flag --add-copy-number is used to add this data to long format output, or to generate additional matrix files with these metrics
  • anvi-compute-metabolic-enrichment uses pathwise completeness by default, but you can request to use stepwise completeness instead with the new flag --use-stepwise-completeness
  • anvi-setup-kegg-kofams has been refactored to better separate download of data from database setup. There are two new flags, --only-download and --only-database, for better control of what is done when the -D flag is used. This is mostly useful for quick testing: now you can download the data once using --only-download and run the database setup multiple times using --only-database
  • there is a new KEGG snapshot, downloaded and set up last week. It is the new default snapshot for anvi-setup-kegg-kofams. It includes BRITE data and modules db v4
  • bonus: the modules output dictionary generation function has been refactored to reduce redundancy in the code
  • bonus: fixed a few bugs related to the inclusion of BRITE in PR KEGG BRITE #1910

The metabolism suite of anvi-self-test + documentation have been updated to reflect these new changes. The PR passes the self test. Feel free to check the help pages for descriptions of these new features :)

…e this data in the hits_in_modules output mode
…rate for paths with duplicate kos.

the problem was that we put the hit counts into a dictionary and used 
the number of keys as path length, thereby ignoring duplicate kos
…to be split into only their essential components. otherwise we couldn't look up their hit counts in the module dictionary
… as completeness

except for modules defined by other modules
@ivagljiva ivagljiva self-assigned this Apr 19, 2022
@ivagljiva ivagljiva merged commit 4cd981b into master Apr 19, 2022
@ivagljiva ivagljiva deleted the module_redundancy branch April 19, 2022 04:52
ivagljiva added a commit to merenlab/anvio.org that referenced this pull request Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant