Options

Usage:

usage: PopPUNK [-h]
            (--easy-run | --create-db | --fit-model | --refine-model | --assign-query | --use-model | --generate-viz)
            [--ref-db REF_DB] [--r-files R_FILES] [--q-files Q_FILES]
            [--distances DISTANCES]
            [--external-clustering EXTERNAL_CLUSTERING] --output OUTPUT
            [--plot-fit PLOT_FIT] [--full-db] [--update-db] [--overwrite]
            [--min-k MIN_K] [--max-k MAX_K] [--k-step K_STEP]
            [--sketch-size SKETCH_SIZE] [--max-a-dist MAX_A_DIST]
            [--ignore-length] [--K K] [--dbscan] [--D D]
            [--min-cluster-prop MIN_CLUSTER_PROP] [--pos-shift POS_SHIFT]
            [--neg-shift NEG_SHIFT] [--manual-start MANUAL_START]
            [--indiv-refine] [--no-local] [--model-dir MODEL_DIR]
            [--previous-clustering PREVIOUS_CLUSTERING] [--core-only]
            [--accessory-only] [--subset SUBSET] [--microreact]
            [--cytoscape] [--phandango] [--grapetree] [--rapidnj RAPIDNJ]
            [--perplexity PERPLEXITY] [--info-csv INFO_CSV] [--mash MASH]
            [--threads THREADS] [--no-stream] [--version]

Command line options

optional arguments:

-h, --help show this help message and exit

Mode of operation:

--easy-run Create clusters from assemblies with default settings

--create-db Create pairwise distances database between reference sequences

--fit-model Fit a mixture model to a reference database

--refine-model Refine the accuracy of a fitted model

--assign-query Assign the cluster of query sequences without re- running the whole mixture model

--generate-viz Generate files for a visualisation from an existing database

--use-model Apply a fitted model to a reference database to restore database files

Input files:

--ref-db REF_DB

Location of built reference database

--r-files R_FILES

File listing reference input assemblies

--q-files Q_FILES

File listing query input assemblies

--distances DISTANCES

Prefix of input pickle of pre-calculated distances

--external-clustering EXTERNAL_CLUSTERING

File with cluster definitions or other labels generated with any other method.

Output options:

--output OUTPUT

Prefix for output files (required)

--plot-fit PLOT_FIT

Create this many plots of some fits relating k-mer to core/accessory distances [default = 0]

--full-db Keep full reference database, not just representatives

--update-db Update reference database with query sequences

--overwrite Overwrite any existing database files

Kmer comparison options:

--min-k MIN_K Minimum kmer length [default = 9]

--max-k MAX_K Maximum kmer length [default = 29]

--k-step K_STEP

K-mer step size [default = 4]

--sketch-size SKETCH_SIZE

Kmer sketch size [default = 10000]

Quality control options:

--max-a-dist MAX_A_DIST

Maximum accessory distance to permit [default = 0.5]

--ignore-length

Ignore outliers in terms of assembly length [default = False]

Model fit options:

--K K Maximum number of mixture components [default = 2]

--dbscan Use DBSCAN rather than mixture model

--D D Maximum number of clusters in DBSCAN fitting [default = 100]

--min-cluster-prop MIN_CLUSTER_PROP

Minimum proportion of points in a cluster in DBSCAN fitting [default = 0.0001]

Refine model options:

--pos-shift POS_SHIFT

Maximum amount to move the boundary away from origin [default = 0.2]

--neg-shift NEG_SHIFT

Maximum amount to move the boundary towards the origin [default = 0.4]

--manual-start MANUAL_START

A file containing information for a start point. See documentation for help.

--indiv-refine Also run refinement for core and accessory individually

--no-local Do not perform the local optimization step (speed up on very large datasets)

Database querying options:

--model-dir MODEL_DIR

Directory containing model to use for assigning queries to clusters [default = reference database directory]

--previous-clustering PREVIOUS_CLUSTERING

Directory containing previous cluster definitions and network [default = use that in the directory containing the model]

--core-only Use a core-distance only model for assigning queries [default = False]

--accessory-only

Use an accessory-distance only model for assigning queries [default = False]

Further analysis options:

--subset SUBSET

File with list of sequences to include in visualisation (with --generate-viz only)

--microreact Generate output files for microreact visualisation

--cytoscape Generate network output files for Cytoscape

--phandango Generate phylogeny and TSV for Phandango visualisation

--grapetree Generate phylogeny and CSV for grapetree visualisation

--rapidnj RAPIDNJ

Path to rapidNJ binary to build NJ tree for Microreact

--perplexity PERPLEXITY

Perplexity used to calculate t-SNE projection (with --microreact) [default=20.0]

--info-csv INFO_CSV

Epidemiological information CSV formatted for microreact (can be used with other outputs)

Other options:

--mash MASH Location of mash executable

--threads THREADS

Number of threads to use [default = 1]

--no-stream Use temporary files for mash dist interfacing. Reduce memory use/increase disk use for large datasets

--version show program's version number and exit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

options.rst

options.rst

Options

`--easy-run`	Create clusters from assemblies with default settings
`--create-db`	Create pairwise distances database between reference sequences
`--fit-model`	Fit a mixture model to a reference database
`--refine-model`	Refine the accuracy of a fitted model
`--assign-query`	Assign the cluster of query sequences without re- running the whole mixture model
`--generate-viz`	Generate files for a visualisation from an existing database
`--use-model`	Apply a fitted model to a reference database to restore database files

`--ref-db REF_DB`
	Location of built reference database
`--r-files R_FILES`
	File listing reference input assemblies
`--q-files Q_FILES`
	File listing query input assemblies
`--distances DISTANCES`
	Prefix of input pickle of pre-calculated distances
`--external-clustering EXTERNAL_CLUSTERING`
	File with cluster definitions or other labels generated with any other method.

`--output OUTPUT`
	Prefix for output files (required)
`--plot-fit PLOT_FIT`
	Create this many plots of some fits relating k-mer to core/accessory distances [default = 0]
`--full-db`	Keep full reference database, not just representatives
`--update-db`	Update reference database with query sequences
`--overwrite`	Overwrite any existing database files

`--min-k MIN_K`	Minimum kmer length [default = 9]
`--max-k MAX_K`	Maximum kmer length [default = 29]
`--k-step K_STEP`
	K-mer step size [default = 4]
`--sketch-size SKETCH_SIZE`
	Kmer sketch size [default = 10000]

`--max-a-dist MAX_A_DIST`
	Maximum accessory distance to permit [default = 0.5]
`--ignore-length`
	Ignore outliers in terms of assembly length [default = False]

`--K K`	Maximum number of mixture components [default = 2]
`--dbscan`	Use DBSCAN rather than mixture model
`--D D`	Maximum number of clusters in DBSCAN fitting [default = 100]
`--min-cluster-prop MIN_CLUSTER_PROP`
	Minimum proportion of points in a cluster in DBSCAN fitting [default = 0.0001]

`--pos-shift POS_SHIFT`
	Maximum amount to move the boundary away from origin [default = 0.2]
`--neg-shift NEG_SHIFT`
	Maximum amount to move the boundary towards the origin [default = 0.4]
`--manual-start MANUAL_START`
	A file containing information for a start point. See documentation for help.
`--indiv-refine`	Also run refinement for core and accessory individually
`--no-local`	Do not perform the local optimization step (speed up on very large datasets)

`--model-dir MODEL_DIR`
	Directory containing model to use for assigning queries to clusters [default = reference database directory]
`--previous-clustering PREVIOUS_CLUSTERING`
	Directory containing previous cluster definitions and network [default = use that in the directory containing the model]
`--core-only`	Use a core-distance only model for assigning queries [default = False]
`--accessory-only`
	Use an accessory-distance only model for assigning queries [default = False]

`--subset SUBSET`
	File with list of sequences to include in visualisation (with --generate-viz only)
`--microreact`	Generate output files for microreact visualisation
`--cytoscape`	Generate network output files for Cytoscape
`--phandango`	Generate phylogeny and TSV for Phandango visualisation
`--grapetree`	Generate phylogeny and CSV for grapetree visualisation
`--rapidnj RAPIDNJ`
	Path to rapidNJ binary to build NJ tree for Microreact
`--perplexity PERPLEXITY`
	Perplexity used to calculate t-SNE projection (with --microreact) [default=20.0]
`--info-csv INFO_CSV`
	Epidemiological information CSV formatted for microreact (can be used with other outputs)

`--mash MASH`	Location of mash executable
`--threads THREADS`
	Number of threads to use [default = 1]
`--no-stream`	Use temporary files for mash dist interfacing. Reduce memory use/increase disk use for large datasets
`--version`	show program's version number and exit

Files

options.rst

Latest commit

History

options.rst

File metadata and controls

Options