Genome assembly files accepted by the CLI must be in FASTA format, optionally compressed with gzip.
gambit
gambit [OPTIONS] COMMAND [ARGS]...
Some top-level options are set at the root command group, and should be specified before the name of the subcommand to run.
-d, --db DIR
Path to the directory containing reference database files. Required by the query <query-cmd>
subcommand. As an alternative you can specify the database location with the GAMBIT_DB_PATH
environment variable.
GAMBIT_DB_PATH
Alternative to -d
for specifying path to database.
gambit query
gambit query [OPTIONS] (-s SIGFILE | -l LISTFILE | GENOMES...)
Predict taxonomy of microbial samples from genome sequences.
The reference database must be specified from the root command group.
Query genomes can be specified using one of the following methods:
- Give paths of one or more genome files as positional arguments.
- Use the
-l
option to specify a text file containing paths of the genome files. - Use the
-s
option to use a signatures file created with the signatures create command.
-l LISTFILE
File containing paths to genomes, one per line.
--ldir DIRECTORY
Parent directory of paths in file given by -l
option.
-s, --sigfile FILE
A genome signatures file.
-o, --output FILE
File to write output to. If omitted will write to stdout.
-f, --outfmt {csvarchive}
Results format (see next section).
--progress / --no-progress
Show/don't show progress meter.
-c, --cores INT
Number of CPU cores to use.
A .csv file with one row per query. Contains the following columns:
query
- Query genome file name (minus extension).predicted
- Predicted taxon.predicted.name
- Name of taxon.predicted.rank
- Taxonomic rank (genus, species, etc.).predicted.ncbi_id
- Numeric ID in NCBI taxonomy database, if any.predicted.threshold
- Classification threshold.
closest
- Reference genome closest to query.closest.distance
- Distance to closest genome.closest.decription
- Text description.
next
- Next most specific taxon for which the classification threshold was not met.next.name
next.rank
next.ncbi_id
next.threshold
A machine-readable format meant to be used in pipelines.
Document schema
A more verbose JSON-based format used for testing and development.
gambit signatures info
gambit signatures info [OPTIONS] FILE
Print information about a GAMBIT signatures file. Defaults to a basic human-readable format.
-j, --json
Print information in JSON format. Includes more information than standard output.
-p, --pretty
Prettify JSON output to make it more human-readable.
-i, --ids
Print IDs of all signatures in file.
gambit signatures create
gambit signatures create [OPTIONS] -o OUTFILE (-l LISTFILE | GENOMES...)
Calculate GAMBIT signatures of a set of genomes and write to a binary file.
-l LISTFILE
File containing paths to genomes, one per line.
--ldir DIRECTORY
Parent directory of paths in file given by -l
option.
-o, --output FILE
Path to write file to (required).
-k INTEGER
Length of k-mers to find (does not include length of prefix). Default is 11.
-p, --prefix STRING
K-mer prefix to match, a non-empty string of DNA nucleotide codes. Default is ATGAC.
-i, --ids FILE
File containing IDs to assign to signatures in file metadata. Should contain one ID per line. If omitted will use file names stripped of extensions.
-m, --meta-json FILE
JSON file containing metadata to attach to file.
Document metadata schema
--progress / --no-progress
Show/don't show progress meter.
-c, --cores INT
Number of CPU cores to use.
gambit dist
gambit dist [OPTIONS] -o OUTFILE
(-q GENOME... | --ql LISTFILE | --qs SIGFILE)
(-r GENOME... | --rl LISTFILE | --rs SIGFILE | --square | --use-db)
Calculate pairwise distances between a set of query genomes and a set of reference genomes. Output is a .csv file. If using --qs
along with --rs
or -use-db
, the k-mer parameters of the query signature file must match the reference parameters.
-q GENOME
Path to a single genome file. May be used multiple times.
--ql LISTFILE
File containing paths of genome files, one per line.
--qdir DIRECTORY
Parent directory of paths in file given by --ql
option.
--qs SIGFILE
A genome signatures file.
-r GENOME
Path to a single genome file. May be used multiple times.
--rl LISTFILE
File containing paths of genome files, one per line.
--rdir DIRECTORY
Parent directory of paths in file given by --rl
option.
--rs SIGFILE
A genome signatures file.
-s, --square
Use same genomes as the query.
-d, --use-db
Use all genomes in reference database.
-o FILE
File to write output to. Required.
Only allowed if query and reference genomes do not come from precomputed signature files.
-k INTEGER
Length of k-mers to find (does not include length of prefix). Default is 11.
-p, --prefix STRING
K-mer prefix to match, a non-empty string of DNA nucleotide codes. Default is ATGAC.
--progress / --no-progress
Show/don't show progress meter.
-c, --cores INT
Number of CPU cores to use.
gambit tree
gambit tree [OPTIONS] (-l LISTFILE | -s SIGFILE | GENOMES...)
Estimate a relatedness tree for a set of genomes and output in Newick format.
-l LISTFILE
File containing paths of genome files, one per line.
--ldir DIRECTORY
Parent directory of paths in file given by -l
option.
-s, --sigfile SIGFILE
A genome signatures file.
-o FILE
File to write output to. If omitted will write to stdout.
Allow using a distance matrix calculated using gambit dist
.
Not allowed if the -s/--sigfile
option was used.
-k INTEGER
Length of k-mers to find (does not include length of prefix). Default is 11.
-p, --prefix STRING
K-mer prefix to match, a non-empty string of DNA nucleotide codes. Default is ATGAC.
--progress / --no-progress
Show/don't show progress meter.
-c, --cores INT
Number of CPU cores to use.