Skip to content
Brice Letcher edited this page Feb 15, 2022 · 8 revisions

This command generates a genome graph (equivalently called population reference graph (PRG)) and other supporting data structures.

Variation that goes into the prg must come either from VCF files (genome-wide) or multiple sequence alignments (MSAs) (one per region under study).

gramtools can build from both sources of variation itself. You can also build prg files from MSAs yourself using make_prgmake_prg; this allows customising its use or using the latest version of make_prg.

Only prg files built with make_prg allow for nested variation (e.g. SNPs/indels + structural variants or variation on multiple references).

Usage

vcf file and ref

gramtools build --gram_dir ./gram --vcf ./vcf --reference ./ref 

MSAs/prg files and ref

gramtools build --gram_dir ./gram --prgs_bed ./prgs.bed --reference ./ref 

Important notes:

  • The reference genome sequence for each region needs to be the first entry of the MSA file (this also applies when building a prg file yourself using make_prg)
  • It is best practice to provide only MSA files or only prg files, to guarantee the same version of make_prg was used for all MSAs.

prg and ref

gramtools build --gram_dir ./gram --prg ./prg --reference ./ref 

This option can be used to re-use a prg file produced by a previous run of gramtools build.

Details on some parameters

parameter description
--gram_dir output directory for gramtools build files (created if missing)
--ref reference genome used for prg construction (--vcf/--prgs_bed) and for validating the built prg or input prg file (--prg)
--max_threads max number of threads, currently only used with --prgs_bed