Skip to content

Verticall matrix

Ryan Wick edited this page May 30, 2023 · 11 revisions

The verticall matrix command is part of the distance tree workflow (see that page for example commands). It takes the TSV file made by verticall pairwise and produces a PHYLIP distance matrix.

Key options

One of its most important options is --distance_type which specifies which distance in the TSV file will be used in the output matrix. See the Columns in pairwise TSV file page for descriptions of each distance, but here are the ones you're most likely to want:

  • median_vertical_window: this is the median value of the vertically-painted part of the sliding-window distance distribution (see Pairwise assembly comparison for details). This is the default because it helps ignore recombination in two ways. First, it ignores the horizontally-painted part of the distance distribution. Second, the median is a robust statistic, so even if Verticall failed to identify some horizontally transmitted part of the genome, the median distance shouldn't change very much.
  • mean_vertical: this is computed from the vertically-painted parts of the alignments by taking one minus the number of matching bases over the alignment length (i.e. one minus identity). Since it's taken from the alignments (not the sliding-window distance distribution), it's a more literal measure of genomic distance, but since it's a mean (not a median), it's less robust than the default.
  • mean: this is computed from all alignments by taking one minus the number of matching bases over the alignment length (i.e. one minus identity). This distance does not filter out horizontally-transmitted parts of the genome, and so it provides similar information to other genomic distance tools such as FastANI and Mash.

If your dataset has quite a lot of recombination, then the --multi option might also be very important. See the Primary vs secondary results page for more information.

Full help output

usage: verticall matrix -i IN_FILE -o OUT_FILE
                        [--distance_type {mean,mean_window,median_window,peak_window,
                                          mean_vertical_window,median_vertical_window,mean_vertical}]
                        [--asymmetrical] [--no_jukes_cantor] [--multi {first,exclude,low,high}]
                        [--include_names INCLUDE_NAMES] [--exclude_names EXCLUDE_NAMES] [-h]
                        [--version]

produce a PHYLIP distance matrix

Required arguments:
  -i IN_FILE, --in_file IN_FILE    Filename of TSV created by vertical pairwise
  -o OUT_FILE, --out_file OUT_FILE
                                   Filename of PHYLIP matrix output

Settings:
  --distance_type {mean,mean_window,median_window,peak_window,
                   mean_vertical_window,median_vertical_window,mean_vertical}
                                   Which distance to use in matrix (default: median_vertical_window)
  --asymmetrical                   Do not average pairs to make symmetrical matrices (default: make
                                   matrices symmetrical)
  --no_jukes_cantor                Do not apply Jukes-Cantor correction (default: apply Jukes-Cantor
                                   correction)
  --multi {first,exclude,low,high}
                                   Behaviour when there are multiple results for a sample pair
                                   (default: first)
  --include_names INCLUDE_NAMES    Samples names to include in matrix (comma-delimited, default:
                                   include all samples)
  --exclude_names EXCLUDE_NAMES    Samples names to exclude from matrix (comma-delimited, default: do
                                   not exclude any samples)

Other:
  -h, --help                       Show this help message and exit
  --version                        Show program's version number and exit