Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidating our KEGG data download #2121

Merged
merged 84 commits into from
Sep 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
cd9aa86
Merge remote-tracking branch 'origin/metabolic-network-storage' into …
ivagljiva Aug 29, 2023
1f67986
add mode argument for selecting data to download
ivagljiva Aug 30, 2023
6b09101
the most simple refactor: conditionally run KEGG setup or refdbs setup
ivagljiva Aug 30, 2023
8df990b
a little fix for people whose PROTEIN_DATA dir doesn't exist yet
ivagljiva Aug 30, 2023
2d5181a
update description
ivagljiva Aug 31, 2023
f0b401e
update copyright to 2023
ivagljiva Aug 31, 2023
bd1eef2
rename setup program
ivagljiva Aug 31, 2023
f7d5d56
make kofam setup independent of the rest
ivagljiva Aug 31, 2023
3b3f49a
add kofam mode for downloading
ivagljiva Aug 31, 2023
6b9e1ff
enforce sanity check for directory creation
ivagljiva Aug 31, 2023
a140ba5
make class specific to KOfam download mode
ivagljiva Aug 31, 2023
e48dbd7
make db check function more generic
ivagljiva Aug 31, 2023
8bce614
we will move these sanity checks to individual subclasses
ivagljiva Aug 31, 2023
eebf79f
skip_init param that will be accessible in args
ivagljiva Aug 31, 2023
c0404d6
make subclass specific to modules download
ivagljiva Aug 31, 2023
f52d260
utilize the new subclasses in setup program
ivagljiva Aug 31, 2023
bb32a87
add missing lambda func and remove stray space
ivagljiva Aug 31, 2023
bedcff2
we actually need archive path attribute in parent class
ivagljiva Aug 31, 2023
144ca5c
little init fixies
ivagljiva Aug 31, 2023
0b38947
now this dict stores the modes
ivagljiva Sep 1, 2023
3cacc38
use ozcan's hack for subparser parameters
ivagljiva Sep 1, 2023
8c5963a
section for mode specific params
ivagljiva Sep 1, 2023
4e25621
a bit of curation of parameter help
ivagljiva Sep 1, 2023
858fe37
clarify dir param
ivagljiva Sep 1, 2023
a87a710
copy pasta error OOPS
ivagljiva Sep 1, 2023
e921d3f
turns out we need this too
ivagljiva Sep 1, 2023
6b7baab
a little fixy for kofam --only-database
ivagljiva Sep 1, 2023
498bf49
add debug output for db check
ivagljiva Sep 1, 2023
4e82c5d
rename only-database arg to only-processing
ivagljiva Sep 1, 2023
d5e9e51
switch to sam's newer class for modeling data download
ivagljiva Sep 5, 2023
f498190
a bit cleaner
ivagljiva Sep 5, 2023
954759b
a better way to list mode descriptions
ivagljiva Sep 6, 2023
588e7cd
nicer descriptions
ivagljiva Sep 6, 2023
24d9d18
make module download multithreaded
ivagljiva Sep 6, 2023
ea03b03
verify complete files in a different function
ivagljiva Sep 6, 2023
71dfbd0
KOs -> modules
ivagljiva Sep 6, 2023
41bbd1b
enable skip_init for subclasses of KeggSetup
ivagljiva Sep 6, 2023
cf55991
multithreaded brite download
ivagljiva Sep 6, 2023
2b36a57
confirm json format in brite hierarchies separately from download fun…
ivagljiva Sep 6, 2023
c1e38d1
remove kegg from anvi-setup-protein-ref-db
ivagljiva Sep 6, 2023
eb84e9c
update provides and requires
ivagljiva Sep 6, 2023
7812718
anvi-setup-kegg- kofams to data name change
ivagljiva Sep 6, 2023
ace50a5
reference anvi-setup-kegg-data in reaction-ref-data artifact
ivagljiva Sep 6, 2023
02b662b
some partial changes to the anvi-setup-kegg-data doc
ivagljiva Sep 13, 2023
9a0070f
Revert "can we sneak in metabolism here too?"
meren Sep 19, 2023
130c315
merge reaction data into KEGG data dir (now at data/MISC/KEGG/KO_REAC…
ivagljiva Sep 21, 2023
2457032
Add meme to environment.yaml
FlorianTrigodet Sep 21, 2023
57ec54a
move stuff around so that kegg snapshots are handled by KeggSetup class
ivagljiva Sep 21, 2023
be25cca
remove KOfam download from modules mode
ivagljiva Sep 21, 2023
14e8d56
reorganize params and help output
ivagljiva Sep 21, 2023
6870157
add logic for 'all' mode and make it default
ivagljiva Sep 21, 2023
55c0631
move snapshot arg to the right class
ivagljiva Sep 21, 2023
f2a3a9a
whoops. init needs args
ivagljiva Sep 21, 2023
10fa055
fix arg parsing error message
ivagljiva Sep 21, 2023
eb43d9b
include BRITE in the mode description
ivagljiva Sep 21, 2023
76a8810
fix priorities for dir modeling mode param
ivagljiva Sep 21, 2023
193b866
allow contigs db only annotated with kofams
ivagljiva Sep 21, 2023
1252184
just do it flag for anvi-estimate-metabolism
ivagljiva Sep 21, 2023
7f88108
metabolism self-test uses 'all' mode for anvi-setup-kegg-data
ivagljiva Sep 21, 2023
b67f349
update anvi-run-kegg-kofams doc with conditional modules annotations
ivagljiva Sep 21, 2023
26850e4
now this extra --dir parameter is properly handled in 'all' mode
ivagljiva Sep 21, 2023
b34c632
better debug output if archive is not ok
ivagljiva Sep 21, 2023
815d6ca
fixy
meren Sep 21, 2023
3eb8a5e
whoops. these sanity checks should be in KeggSetup class
ivagljiva Sep 21, 2023
8cbd3a2
this num_threads initialization is better for API users
ivagljiva Sep 21, 2023
6f2f15b
update doc for anvi-setup-kegg-data with new params and features
ivagljiva Sep 21, 2023
3d53ebd
add indication for no modeling data to KEGG snapshots file
ivagljiva Sep 21, 2023
c0722ce
smart check for modeling files in kegg archives
ivagljiva Sep 21, 2023
51694d1
Merge remote-tracking branch 'origin/metabolic-network-storage' into …
ivagljiva Sep 21, 2023
e50000f
rename anvi-setup-protein-reference-database to anvi-setup-modelseed-…
ivagljiva Sep 21, 2023
4b490d3
Merge branch 'master' into kegg_download_consolidation
ivagljiva Sep 21, 2023
5b4dcdc
most recent snapshot has no modeling data associated with it
ivagljiva Sep 21, 2023
0a36d42
don't reset twice in 'all' mode
ivagljiva Sep 22, 2023
9bf9bdc
specify KO database in output
ivagljiva Sep 22, 2023
9c80faa
remove references to mode parameter wherever possible in documentatio…
ivagljiva Sep 22, 2023
7868c64
remove references to REACTION and COMPOUND databases in documentation
ivagljiva Sep 22, 2023
f693593
whoops, forgot this reference to mode param
ivagljiva Sep 22, 2023
ab0b26d
update instructions for adding new KEGG snapshot
ivagljiva Sep 22, 2023
6ab82cc
add new default snapshot for anvi'o v8 (now includes modeling data)
ivagljiva Sep 22, 2023
423fe1f
make --reset work for snapshots and archives
ivagljiva Sep 22, 2023
d9e1565
class variable of expected files
semiller10 Sep 24, 2023
a3bf986
correct docstring
semiller10 Sep 24, 2023
50d543a
aesthetic
semiller10 Sep 24, 2023
fb10435
reference expected files from KODatabase
semiller10 Sep 24, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .conda/environment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,4 @@ dependencies:
- r-magrittr
- bioconductor-qvalue
- fastani
- meme
5 changes: 4 additions & 1 deletion .github/workflows/daily-component-tests-and-migrations.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,13 @@ jobs:
anvi-self-test --suite metagenomics-full --no-interactive
anvi-self-test --suite pangenomics --no-interactive
anvi-self-test --suite inversions --no-interactive
anvi-self-test --suite metabolism --no-interactive
# the following steps cause our actions to fail on GitHub runners
# due to space limitations :/ please do not uncomment this until we
# have a solution for this :/
#- name: "Run component tests for metabolism framework"
# shell: bash -l {0}
# run: |
# anvi-self-test --suite metabolism --no-interactive
#- name: "Migrate ancient anvi'o databases"
# shell: bash -l {0}
# run: |
Expand Down
2 changes: 1 addition & 1 deletion Dockerfiles/anvio-structure/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ RUN rm anvio-7.1.tar.gz
# Setup anvi'o databases
##############################################################
RUN anvi-setup-interacdome
RUN anvi-setup-kegg-kofams --kegg-snapshot v2020-12-23
RUN anvi-setup-kegg-data --kegg-snapshot v2020-12-23
RUN anvi-setup-pfams --pfam-version 33.1
RUN anvi-setup-ncbi-cogs --cog-version COG20

Expand Down
28 changes: 14 additions & 14 deletions anvio/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -1044,37 +1044,37 @@ def TABULATE(table, header, numalign="right", max_width=0):
"you will not have the most up-to-date version of KEGG for your annotations, metabolism "
"estimations, or any other downstream uses of this data. If that is going to be a problem for you, "
"do not fear - you can provide this flag to tell anvi'o to download the latest, freshest data directly "
"from KEGG's REST API and set it up into an anvi'o-compatible database."}
"from KEGG's REST API and set it up into anvi'o-compatible files."}
),
'only-download': (
['--only-download'],
{'default': False,
'action': 'store_true',
'help': "You want this program to only download data from KEGG, and then stop. It will not "
"make a modules database. (It would be a *very* good idea for you to specify a "
"data directory using --kegg-data-dir in this case, so that you can find the resulting "
"data easily and avoid messing up any data in the default KEGG directory. But you are "
"of course free to do whatever you want.). Note that KOfam profiles will still be "
"processed with `hmmpress` if you choose this option."}
"process the data (ie, into organized HMMs or a modules database). (It would be a "
"*very* good idea for you to specify a data directory using --kegg-data-dir in this "
"case, so that you can find the resulting data easily and avoid messing up any data "
"in the default KEGG directory. But you are of course free to do whatever you want.)"}
),
'only-database': (
['--only-database'],
'only-processing': (
['--only-processing'],
{'default': False,
'action': 'store_true',
'help': "You already have all the KEGG data you need on your computer. Perhaps you even got it from "
'help': "You already have all the KEGG data you need on your computer. Probably you even got it from "
"this program, using the --only-download option. We don't know. What matters is that you don't "
"need anything downloaded, you just want this program to setup a modules database from that "
"existing data. Good. We can do that if you provide this flag (and probably also the --kegg-data-dir "
"need anything downloaded, you just want this program to process that "
"existing data. Good. We can do that if you provide this flag (and hopefully also the --kegg-data-dir "
"in which said data is located)."}
),
'kegg-snapshot': (
['--kegg-snapshot'],
{'default': None,
'type': str,
'metavar': 'RELEASE_NUM',
'help': "If you are particularly interested in an earlier snapshot of KEGG that anvi'o knows about, you can set it here. "
"Otherwise anvi'o will always use the latest snapshot it knows about, which is likely to be the one associated with "
"the current release of anvi'o."}
'help': "The default behavior of this program is to download a pre-processed snapshot of data "
"from KEGG. If you are particularly interested in an earlier snapshot of KEGG that anvi'o "
"knows about, you can set it here. Otherwise anvi'o will always use the latest snapshot "
"it knows about, which is likely to be the one associated with the current release of anvi'o."}
),
'hide-outlier-SNVs': (
['--hide-outlier-SNVs'],
Expand Down
35 changes: 18 additions & 17 deletions anvio/biochemistry/reactionnetwork.py
Original file line number Diff line number Diff line change
Expand Up @@ -1076,7 +1076,8 @@ class KODatabase:
Unless an alternative directory is provided, the database is downloaded and set up in a
default anvi'o data directory, and loaded from this directory in network construction.
"""
default_dir = os.path.join(os.path.dirname(ANVIO_PATH), 'data/MISC/REACTION_NETWORK/KO')
default_dir = os.path.join(os.path.dirname(ANVIO_PATH), 'data/MISC/KEGG/KO_REACTION_NETWORK')
expected_files = ['ko_info.txt', 'ko_data.tsv']

def __init__(self, ko_dir: str = None) -> None:
"""
Expand All @@ -1093,19 +1094,17 @@ def __init__(self, ko_dir: str = None) -> None:
raise ConfigError(f"There is no such directory, '{ko_dir}'.")
else:
ko_dir = self.default_dir
info_path = os.path.join(ko_dir, 'ko_info.txt')
if not os.path.isfile(info_path):
raise ConfigError(f"No required file named 'ko_info.txt' was found in the KO directory, '{ko_dir}'.")
table_path = os.path.join(ko_dir, 'ko_data.tsv')
if not os.path.isfile(table_path):
raise ConfigError(f"No required file named 'ko_data.tsv' was found in the KO directory, '{ko_dir}'.")

f = open(info_path)
for expected_file in self.expected_files:
if not os.path.isfile(os.path.join(ko_dir, expected_file)):
raise ConfigError(f"No required file named '{expected_file}' was found in the KO directory, '{ko_dir}'.")

f = open(os.path.join(ko_dir, 'ko_info.txt'))
f.readline()
self.release = ' '.join(f.readline().strip().split()[1:])
f.close()

self.ko_table = pd.read_csv(table_path, sep='\t', header=0, index_col=0, low_memory=False)
self.ko_table = pd.read_csv(os.path.join(ko_dir, 'ko_data.tsv'), sep='\t', header=0, index_col=0, low_memory=False)

def set_up(
num_threads: int = 1,
Expand All @@ -1124,22 +1123,24 @@ def set_up(
Number of threads to use in parallelizing the download of KO files.

dir : str, None
Directory in which to create a new subdirectory called 'KO', in which files are
downloaded and set up. This argument overrides the default directory.
Directory in which to create a subdirectory called `KO_REACTION_NETWORK`,
in which files are downloaded and set up. This argument overrides
the default directory.

reset : bool, False
If True, remove any existing 'KO' database directory and the files therein. If False,
an exception is raised if there are files in this directory.
If True, remove any existing 'KO_REACTION_NETWORK' database directory and the files
therein. If False, an exception is raised if there are files in this directory.

run : anvio.terminal.Run, None

progress : anvio.terminal.Progress, None
"""
if dir:
if os.path.isdir(dir):
ko_dir = os.path.join(dir, 'KO')
ko_dir = os.path.join(dir, 'KO_REACTION_NETWORK')
else:
raise ConfigError(f"There is no such directory, '{dir}'.")
raise ConfigError(f"There is no such directory, '{dir}'. You should create it "
"first if you want to use it.")
else:
ko_dir = KODatabase.default_dir
parent_dir = os.path.dirname(ko_dir)
Expand Down Expand Up @@ -1242,7 +1243,7 @@ def set_up(
"from the KO database. Anvi'o will now attempt to redownload all of the files. "
)
run.info(f"Total number of KOs/entry files", total)
run.info("KEGG database version", release_after)
run.info("KEGG KO database version", release_after)
run.info("KEGG KO list", list_path)
run.info("KEGG KO info", info_path)

Expand All @@ -1264,7 +1265,7 @@ def set_up(
section = line.split()[0]
if section == 'NAME':
# The name value follows 'NAME' at the beginning of the line.
ko_data['name'] = line[4:].lstrip().rstrip()
ko_data['name'] = line[4:].strip()
# EC numbers associated with the KO are recorded at the end of the name value.
ec_string = re.search('\[EC:.*\]', line)
if ec_string:
Expand Down
2 changes: 2 additions & 0 deletions anvio/biochemistry/refdbs.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,8 @@ def raise_missing_files(self, missing: List[str]) -> None:
)

def _set_up_db_dir(self, reset: bool) -> None:
if os.path.split(self.db_dir)[0] == self.default_superdir and not os.path.exists(self.default_superdir):
os.mkdir(self.default_superdir)
if os.path.exists(self.db_dir):
if reset:
rmtree(self.db_dir)
Expand Down
29 changes: 23 additions & 6 deletions anvio/data/misc/KEGG-SNAPSHOTS.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,60 +6,77 @@ v2020-04-27:
archive_name: KEGG_build_2020-04-27_b893b7b915cb.tar.gz
hash: b893b7b915cb
modules_db_version: 1
no_modeling_data: True

v2020-06-23:
url: https://ndownloader.figshare.com/files/23701919
archive_name: KEGG_build_2020-06-23_4a75508b48aa.tar.gz
hash: 4a75508b48aa
modules_db_version: 2
no_modeling_data: True

v2020-08-06:
url: https://ndownloader.figshare.com/files/25464530
archive_name: KEGG_build_2020-08-06_8f88ef165f4c.tar.gz
hash: 8f88ef165f4c
modules_db_version: 2
no_modeling_data: True

v2020-12-23:
url: https://ndownloader.figshare.com/files/25878342
archive_name: KEGG_build_2020-12-23_45b7cc2e4fdc.tar.gz
hash: 45b7cc2e4fdc
modules_db_version: 2
no_modeling_data: True

v2021-12-18:
url: https://figshare.com/ndownloader/files/31959416
archive_name: KEGG_build_2021-12-18_58937b64c44c.tar.gz
hash: 58937b64c44c
modules_db_version: 3
no_modeling_data: True

v2022-04-14:
url: https://figshare.com/ndownloader/files/34817812
archive_name: KEGG_build_2022-04-14_666feeac5de2.tar.gz
hash: 666feeac5de2
modules_db_version: 4
no_modeling_data: True

v2023-01-10:
url: https://figshare.com/ndownloader/files/38799687
archive_name: KEGG_build_2023-01-10_d20a0dcd2128.tar.gz
hash: d20a0dcd2128
modules_db_version: 4
no_modeling_data: True

v2023-09-18:
url: https://figshare.com/ndownloader/files/42381873
archive_name: KEGG_build_2023-09-18_a2b5bde358bb.tar.gz
hash: a2b5bde358bb
modules_db_version: 4
no_modeling_data: True

v2023-09-22:
url: https://figshare.com/ndownloader/files/42428115
archive_name: KEGG_build_2023-09-22_a2b5bde358bb.tar.gz
hash: a2b5bde358bb
modules_db_version: 4

# How to add a new KEGG snapshot to this file:
# 1. download the latest data directly from KEGG by running
# `anvi-setup-kegg-kofams -D --kegg-data-dir ./KEGG`
# `anvi-setup-kegg-data -D --kegg-data-dir ./KEGG -T 5`
# 2. get the hash value and version info from the MODULES.db:
# `anvi-db-info ./KEGG/MODULES.db`
# 3. archive that directory:
# `tar -czvf KEGG_build_YYYY-MM-DD_HASH.tar.gz ./KEGG`
# Please remember to replace YYYY-MM-DD with the current date and replace HASH with the MODULES.db hash value obtained in step 2
# Please remember to replace YYYY-MM-DD with the current date and replace HASH with the
# MODULES.db hash value obtained in step 2
# 4. Test that setup works with this archive by running
# `anvi-setup-kegg-kofams --kegg-archive KEGG_build_YYYY-MM-DD_HASH.tar.gz --kegg-data-dir TEST_NEW_KEGG_ARCHIVE`
# `anvi-setup-kegg-data --kegg-archive KEGG_build_YYYY-MM-DD_HASH.tar.gz --kegg-data-dir TEST_NEW_KEGG_ARCHIVE`
# 5. Upload the .tar.gz archive to figshare and get the download url
# 6. Finally, add an entry to the bottom of this file with the url, archive name, and MODULES.db hash and version. You should also update the
# default self.target_snapshot variable in kegg.py to point to this latest version that you have added.
# 7. Test it by running `anvi-setup-kegg-kofams --kegg-data-dir TEST_NEW_KEGG`, and if it works you are done :)
# 6. Finally, add an entry to the bottom of this file with the url, archive name, and MODULES.db hash and version.
# You should also update the default self.target_snapshot variable in kegg.py to point to this
# latest version that you have added.
# 7. Test it by running `anvi-setup-kegg-data --kegg-data-dir TEST_NEW_KEGG` (you don't need to run the full thing,
# just long enough to see that the correct snapshot is being downloaded), and if it works you are done :)
2 changes: 1 addition & 1 deletion anvio/data/misc/PEOPLE/DEVELOPERS.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
linkedin: meren
orcid: 0000-0001-9013-4827
skype: a.murat.eren
bio: "Computer scientist and microbial ecologist interested in undersatnding mechanisms by which microbes interact with their surroundings, evolve, disperse, and respond to environmental change."
bio: "Computer scientist and microbial ecologist interested in understanding mechanisms by which microbes interact with their surroundings, evolve, disperse, and respond to environmental change."
affiliations:
- title: Professor
inst: Helmholtz Institute for Functional Marine Biodiversity at Oldenburg
Expand Down
2 changes: 1 addition & 1 deletion anvio/docs/artifacts/anvi-reaction-network.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
This program **generates a metabolic reaction network in a %(contigs-db)s.** Gene %(functions)s that have been annotated in the %(contigs-db)s are compared to reference databases, yielding predictions of the biochemical reactions that may be catalyzed by the gene products. Possible applications of anvi'o metabolic networks include the export of draft metabolic models (see %(anvi-get-metabolic-model-file)s) and the import and integration of metabolomic datasets.

A network can currently be generated from KEGG Orthology (KO) annotations of genes in conjunction with %(reaction-ref-data)s: KEGG ([KO](https://www.genome.jp/kegg/ko.html), [REACTION](https://www.genome.jp/kegg/reaction/), and [COMPOUND](https://www.genome.jp/kegg/compound/)) databases and the [ModelSEED Biochemistry](https://github.com/ModelSEED/ModelSEEDDatabase) database. The reference databases must have been downloaded and set up by %(anvi-setup-protein-reference-database)s.
A network can currently be generated from KEGG Orthology (KO) annotations of genes in conjunction with %(reaction-ref-data)s: KEGG ([KO](https://www.genome.jp/kegg/ko.html), [REACTION](https://www.genome.jp/kegg/reaction/), and [COMPOUND](https://www.genome.jp/kegg/compound/)) databases and the [ModelSEED Biochemistry](https://github.com/ModelSEED/ModelSEEDDatabase) database. The reference databases must have been downloaded and set up by %(anvi-setup-modelseed-database)s.
6 changes: 3 additions & 3 deletions anvio/docs/artifacts/kegg-data.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
A **directory of data** downloaded from the [KEGG database resource](https://www.kegg.jp/) for use in function annotation and metabolism estimation.

It is created by running the program %(anvi-setup-kegg-kofams)s. Not everything from KEGG is included in this directory, only the information relevant to downstream programs. The most critical components of this directory are KOfam HMM profiles and the %(modules-db)s which contains information on metabolic pathways as described in the [KEGG MODULES resource](https://www.genome.jp/kegg/module.html), as well as functional classification hierarchies from [KEGG BRITE](https://www.genome.jp/kegg/brite.html).
It is created by running the program %(anvi-setup-kegg-data)s. Not everything from KEGG is included in this directory, only the information relevant to downstream programs. The most critical components of this directory are KOfam HMM profiles and the %(modules-db)s which contains information on metabolic pathways as described in the [KEGG MODULES resource](https://www.genome.jp/kegg/module.html), as well as functional classification hierarchies from [KEGG BRITE](https://www.genome.jp/kegg/brite.html).

Programs that rely on this data directory include %(anvi-run-kegg-kofams)s and %(anvi-estimate-metabolism)s.

## Directory Location
The default location of this data is in the anvi'o folder, at `anvio/anvio/data/misc/KEGG/`.

You can change this location when you run %(anvi-setup-kegg-kofams)s by providing a different path to the `--kegg-data-dir` parameter:
You can change this location when you run %(anvi-setup-kegg-data)s by providing a different path to the `--kegg-data-dir` parameter:

{{ codestart }}
anvi-setup-kegg-kofams --kegg-data-dir /path/to/directory/KEGG
anvi-setup-kegg-data --kegg-data-dir /path/to/directory/KEGG
{{ codestop }}

If you do this, you will need to provide this path to downstream programs that require this data as well.
Expand Down
Loading