# Build-a-PHG
This notebook interactively walks through [Steps 1 and 2](https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/CreatePHG_step1-2_main) of the PHG Wiki's user setup instructions.

The steps are as follows:

1. [Generate GVCFs from Assembly MAFs](#Step-1:-Generate-GVCFs-from-Assembly-MAFs)
2. [Generate Wiggles from MAFs](#Step-2:-Generate-Wiggles-from-MAFs)
3. [Generate and Validate Reference Range BED File](#Step-3:-Generate-and-Validate-Reference-Range-BED-File)
4. [Create Database and Load Haplotypes Into PHG](#Step-4:-Create-Database-and-Load-Haplotypes-Into-PHG)
5. [Create Consensus Haplotypes](#Step-5:-Create-Consensus-Haplotypes)
6. [Impute Variants with the PHG](#Step-6:-Impute-Variants-with-the-PHG)

Review the documentation to ensure the command-line arguments are correct for your data. Some of the code here is specific to the needs of the NextGen Cassava project, and so may not be relevant to your data.

### Assumptions
* You have already selected a reference and its associated FASTA and GFF.
* You'll be loading haplotypes into the database from MAF files generated by [AnchorWave](https://github.com/baoxingsong/AnchorWave).

Additionally, this notebook assumes your "working directory" – where your input files are located, and where the PHG's data will reside – will be relative to where this notebook is stored.

### Requirements
Ensure the following software is available to be executed (in your `PATH`) before proceeding:

* `docker`: https://docs.docker.com/get-docker/
* `faSize`: https://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
* `kotlinc`: https://github.com/JetBrains/kotlin/releases/download/v1.7.21/kotlin-compiler-1.7.21.zip
    * After unzipping, the binary is in the `bin/` directory - only for x86-64

### Configuration
Before we begin, we must import packages and initialize variables which we will be using later.

In [None]:
import os
import glob
from sys import platform

mkdirp = lambda d: os.makedirs(d, exist_ok = True)

mkdirp("workdir")
WORKING_DIR = f"{os.getcwd()}/workdir"

PHG_DIR = f"{WORKING_DIR}/phg"
mkdirp(PHG_DIR)

mkdirp(f"{PHG_DIR}/local_gvcf")
LOCAL_GVCF = "/phg/local_gvcf"

mkdirp(f"{PHG_DIR}/inputDir/reference")

PGDATA_DIR = f"{WORKING_DIR}/pgdata"
mkdirp(PGDATA_DIR)

HOST_IP = ! ifconfig | grep "inet " | grep -Fv 127.0.0.1 | awk '{print $2}'
HOST_IP = HOST_IP[0]

USER_ID = ! whoami | id -u && whoami | id -g
USER_ID = ":".join(USER_ID)

Please modify the below variables to refer to the files you will be using, relative to the directory this notebook is being run from. You may also want to change the Docker-related variables if your directory structure is different from the default.

In [None]:
############
# EDIT ME! #
############
REF = "Mesculenta_671_v8.0.fa"
GFF = "Mesculenta_671_v8.1.gene.gff3"

DOCKER_REF = f"/phg/inputDir/reference/{REF}"
DOCKER_GFF = f"/phg/{GFF}"
DOCKER_CONFIG = "/phg/config.txt"

# Change based upon the memory resources you wish to allocate from your machine
# g for gigabytes, m for megabytes, or k for kilobytes
JAVA_XMS = "-Xms32g"
JAVA_XMX = "-Xmx64g"

# External storage of gvcf and reference
SERVER_PATH_ROOT = "example.com;/some/directory"

Create the PHG directory structure and copy our reference's FASTA and GFF to their default locations.

Notice the `--user` argument: it's necessary to ensure files and directories are not owned by root.

In [None]:
! docker run --name create_phg_dirs --rm \
    -v {PHG_DIR}/:/phg/ \
    --user {USER_ID} \
    -t maizegenetics/phg:latest \
    /tassel-5-standalone/run_pipeline.pl -debug \
    -MakeDefaultDirectoryPlugin \
        -workingDir /phg/ \
    -endPlugin

! cp {REF} {WORKING_DIR}{DOCKER_REF}
! cp {GFF} {WORKING_DIR}{DOCKER_GFF}

## Step 1: Generate GVCFs from Assembly MAFs
Here we execute the `MAFToGVCFPlugin` in the PHG via a Docker container to generate [GVCF](https://gatk.broadinstitute.org/hc/en-us/articles/360035531812-GVCF-Genomic-Variant-Call-Format) files from the MAF files produced by AnchorWave. Create the `mafs/` directory in the same location as this notebook and transfer your MAFs to it.

Note that depending on your input MAF, you may need to change the `-twoGvcfs` option to `false`. In our case it is a diploid alignment, so we set the option to `true` because we want two separate GVCF files. See [this section](https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/CreatePHG_step2_MAFToGVCFPluginDetails#markdown-header-parameter-descriptions) of the wiki for a full explanation of the parameters for this plugin.

In [None]:
if not os.path.exists("mafs"):
    raise FileNotFoundError("Please create and populate the mafs/ directory with your files")
mafs = glob.glob("mafs/**/*.maf", recursive = True)

mkdirp(f"{PHG_DIR}/mafs")

for maf in mafs:
    ! cp {maf} {PHG_DIR}/mafs
    file_name = maf.split("/")[-1]
    name = file_name.split(".")[0]
    gvcf_output = f"/phg/inputDir/loadDB/gvcf/{name}.gvcf.gz"
    
    ! docker run --name maf_to_gvcf_{name} --rm \
        -v {PHG_DIR}/:/phg/ \
        --user {USER_ID} \
        -t maizegenetics/phg:latest \
        /tassel-5-standalone/run_pipeline.pl {JAVA_XMS} {JAVA_XMX} -debug \
        -MAFToGVCFPlugin \
            -referenceFasta {DOCKER_REF} \
            -mafFile /phg/mafs/{file_name} \
            -sampleName {name} \
            -gvcfOutput {gvcf_output} \
            -fillGaps false -twoGvcfs true \
        -endPlugin

After the GVCFs are created, we must produce a [keyfile](https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/CreatePHG_step2_LoadHaplotypesFromGVCFPluginDetails.md#markdown-header-keyfile) from them for use by the PHG.

In [None]:
HEADER_COL = "type\tsample_name\tsample_description\tmafFile\tfiles\tchrPhased\tgenePhased\tphasingConf\tgvcfServerPath\n"
gvcf_paths = glob.glob(f"{PHG_DIR}/inputDir/loadDB/gvcf/*.gvcf.gz")

cultivars = dict()
for gvcf_path in gvcf_paths:
    gvcf_filename = gvcf_path.split("/")[-1]
    sample_name = gvcf_filename.split(".gvcf.gz")[0]
    if sample_name not in cultivars:
        cultivars[sample_name] = []
    cultivars[sample_name].append(gvcf_filename)

# NOTE:
#  - What values should be used for `chrPhased`, `genePhased`, and `phasingConf`?
with open(f"{PHG_DIR}/gvcfKeyFile.txt", "w") as kf:
    kf.write(HEADER_COL)
    for cultivar in cultivars.keys():
        entry = f"GVCF\t{cultivar}_Assembly\t{cultivar} description\t/phg/outputDir/align/{cultivar}.maf\t{','.join(cultivars[cultivar])}\ttrue\ttrue\t0.9\t{SERVER_PATH_ROOT}\n"
        kf.write(entry)

## Step 2: Generate Wiggles from MAFs
The MAFs are also used to create [Wiggle](https://genome.ucsc.edu/goldenPath/help/wiggle.html) files. These, alongside the GVCFs, are used in *Step 3* to generate the reference range intervals (or "anchors").

Here we utilize `faSize` to output the name and size of each record within the reference FASTA. Afterwards, we run [this Kotlin script](https://bitbucket.org/bucklerlab/biokotlin/src/master/src/main/kotlin/biokotlin/genome/wiggle_fromMAFmultiThread.kts) to generate the wiggles.

In [None]:
mkdirp(f"{PHG_DIR}/wiggles")

! faSize -detailed {REF} > mesculenta_fasize.txt

! grep "^Chromosome" mesculenta_fasize.txt | sort > mesculenta_fasize_sorted.txt

In [None]:
# Retrieve Wiggle script, add dependency annotation
! wget -O wiggle_fromMAFmultiThread.main.kts https://bitbucket.org/bucklerlab/biokotlin/raw/6b5379534d1e1988039a0decd47fbdef3f878f91/src/main/kotlin/biokotlin/genome/wiggle_fromMAFmultiThread.kts

# Change usage of sed if on MacOS
# Source: https://stackoverflow.com/a/21243111
if platform == "darwin":
    ! sed -i '' '1s/^/@file:DependsOn("org\.biokotlin:biokotlin:0\.05\.01")\n\n/' wiggle_fromMAFmultiThread.main.kts
else:
    ! sed -i '1s/^/@file:DependsOn("org\.biokotlin:biokotlin:0\.05\.01")\n\n/' wiggle_fromMAFmultiThread.main.kts

In [None]:
with open("mesculenta_fasize_sorted.txt", "r") as mfa:
    chroms = mfa.readlines()
    for chrom in chroms:
        contig_name, end_pos = chrom.split("\t")
        contig_name = contig_name.strip()
        end_pos = end_pos.strip()
        print(f"Creating {contig_name} wiggle")
        ! _JAVA_OPTIONS={JAVA_XMX} kotlinc -script wiggle_fromMAFmultiThread.main.kts -- -mafDir {PHG_DIR}/mafs/ -mafContig {contig_name} -wiggleContig {contig_name} -start 1 -end {end_pos} -outputDir {PHG_DIR}/wiggles/

! rm {PHG_DIR}/wiggles/identity_*.wig

## Step 3: Generate and Validate Reference Range BED File
**TODO:** Add explanation about tweaking parameters to `CreateRefRangesPlugin` in order to produce ranges that are biologically sound: `minCover`, `windowSize`, `intergenicStepSize`, `maxSearchWindow`, `maxDiversity`, `minGenicLength`, and `maxClusters`.

Use the artifacts of *Step 1* and *Step 2* to create a [BED](https://genome.ucsc.edu/FAQ/FAQformat.html#format1) file. You can break the reference genome into any set of intervals you want.

Ensure that `minCover` is less than or equal to the number of taxa, otherwise a cut site will not be found.

In [None]:
! docker run --name create_ref_ranges --rm \
    -v {PHG_DIR}/:/phg \
    --user {USER_ID} \
    -t maizegenetics/phg:latest \
    /tassel-5-standalone/run_pipeline.pl {JAVA_XMS} {JAVA_XMX} -debug \
    -CreateRefRangesPlugin \
        -wiggleDir /phg/wiggles/ \
        -gffFile {DOCKER_GFF} \
        -minCover 2 \
        -outputBedFile /phg/refRanges.bed \
        -refGenome {DOCKER_REF} \
        -vcfdir /phg/inputDir/loadDB/gvcf \
        -outputGeneRanges /phg/geneRanges.bed \
        -nThreads 22 \
    -endPlugin

Ensure your BED file contains no overlaps - the `CreateValidIntervalsFilePlugin` does this for you.

In [None]:
! docker run --name validate_ref_ranges --rm \
    -v {PHG_DIR}/:/phg \
    --user {USER_ID} \
    -t maizegenetics/phg:latest \
    /tassel-5-standalone/run_pipeline.pl {JAVA_XMS} {JAVA_XMX} -debug \
    -CreateValidIntervalsFilePlugin \
        -intervalsFile /phg/refRanges.bed \
        -referenceFasta {DOCKER_REF} \
        -mergeOverlaps true \
        -generatedFile /phg/validRefRanges.bed \
    -endPlugin

Remove the header from the `validRefRanges.bed` file. The `LoadHaplotypesFromGVCFPlugin` does not expect the header that the `CreateValidIntervalsFilePlugin` adds. This will be fixed in future.

In [None]:
# Use tail to get all lines, skipping the first
! tail -n +2 {PHG_DIR}/validRefRanges.bed > {PHG_DIR}/refRanges.bed

## Step 4: Create Database and Load Haplotypes Into PHG


Start the PostgreSQL database container. Be aware that although Jupyter will display the cell as completed, the Postgres container may still be initializing - watch the log output of Docker (via the desktop application or `docker logs -f <container_name>` in a terminal) to determine if the database is ready. The server will start and create the database specified by `POSTGRES_DB` before restarting again. Therefore there will be two `LOG:  database system is ready to accept connections` messages, *wait until you see the second before continuing!*

In [None]:
# NOTE: POSTGRES_HOST_AUTH_METHOD should NOT be `trust` which is insecure
#       However, I am still encountering the error below when using `md5` or the default `scram-sha-256`
#       `DBLoadingUtils:getPostgresconnection: exception thrown, The authentication type 10 is not supported. Check that you have configured the pg_hba.conf file to include the client's IP address or subnet, and that it is using an authentication scheme supported by the driver.`
# NOTE: User must be `postgres` here and in config.txt below, otherwise we encounter the error:
#       `Could not get create/retrieve database phg_test_db, error: ERROR: role "postgres" does not exist`
! docker run --name postgres-phg \
    --user {USER_ID} \
    -e POSTGRES_USER=postgres \
    -e POSTGRES_PASSWORD=phg_test_password \
    -e POSTGRES_DB=phg_test_db \
    -e POSTGRES_HOST_AUTH_METHOD=trust \
    -v {PGDATA_DIR}/:/var/lib/postgresql/data \
    -p 5432:5432 \
    -d postgres:11-bullseye \
    -c password_encryption=md5

Create `load_genome_data.txt` which is a tab-delimited file containing information related to the reference genome.

In [None]:
lgd_txt = f"""Genotype	Hapnumber	Dataline	Ploidy	GenesPhased	ChromsPhased	Confidence	Method	MethodDetails	gvcfServerPath
{REF}	0	reference	1	false	false	1	ReferenceMethod	ReferenceMethodDetails	{SERVER_PATH_ROOT}"""
! echo '{lgd_txt}' > {PHG_DIR}/inputDir/reference/load_genome_data.txt

Modify the default `config.txt` in the PHG directory. Some things to keep in mind:
* Since we are utilizing WGS data, `haplotypeMethodName` is user assigned. Read more [on the wiki](https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/HaplotypeMethod).
* `BestHaplotypePathPlugin.minTaxa` should be less than or equal to the number of taxa.

In [None]:
docker_config_file = f"{PHG_DIR}/config.txt"

config_contents = f"""host={HOST_IP}:5432
user=postgres
password=phg_test_password
DB=phg_test_db
DBtype=postgres

referenceFasta=/phg/inputDir/reference/{REF}
anchors=/phg/refRanges.bed
refServerPath={SERVER_PATH_ROOT}
genomeData=/phg/inputDir/reference/load_genome_data.txt
localGVCFFolder={LOCAL_GVCF}

LoadHaplotypesFromGVCFPlugin.referenceFasta=/phg/inputDir/reference/{REF}
LoadHaplotypesFromGVCFPlugin.gvcfDir=/phg/inputDir/loadDB/gvcf
LoadHaplotypesFromGVCFPlugin.wgsKeyFile=/phg/gvcfKeyFile.txt
LoadHaplotypesFromGVCFPlugin.bedFile=/phg/refRanges.bed
LoadHaplotypesFromGVCFPlugin.haplotypeMethodName=assembly_by_anchorwave
LoadHaplotypesFromGVCFPlugin.haplotypeMethodDescription="files aligned with anchorwave, then turned to gvcf with plugin"

SAMToMappingPlugin.keyFile=/phg/readMapping_key_file.txt
SAMToMappingPlugin.samDir=/phg/inputDir/imputation/sam
SAMToMappingPlugin.methodDescription="Explain how your input SAMs were produced"

BestHaplotypePathPlugin.pathMethod=assembly_by_anchorwave123

# TODO: Temporary fix for ImputePipelinePlugin
keyFile=/phg/readMapping_key_file

pathMethod=assembly_by_anchorwave
outVcfFile=/phg/outputDir/imputation-results.vcf

# Default is 20, but in our case we have less than that
BestHaplotypePathPlugin.minTaxa=2

outputDir=/phg/outputDir
liquibaseOutdir=/phg/outputDir/"""

! echo '{config_contents}' > {docker_config_file}

Execute the `MakeInitialPHGDBPipelinePlugin` to initialize the database with the PHG schema.

In [None]:
! docker run --name create_initial_db --rm \
    -v {PHG_DIR}/:/phg/ \
    --user {USER_ID} \
    -t maizegenetics/phg:latest \
    /tassel-5-standalone/run_pipeline.pl {JAVA_XMS} {JAVA_XMX} -debug \
    -configParameters {DOCKER_CONFIG} \
    -MakeInitialPHGDBPipelinePlugin -endPlugin

Execute the `LoadHaplotypesFromGVCFPlugin` to populate the PHG using the files you produced in *Steps 1 - 3*.

In [None]:
! docker run --name populate_initial_db --rm \
    -v {PHG_DIR}/:/phg/ \
    --user {USER_ID} \
    -t maizegenetics/phg:latest \
    /tassel-5-standalone/run_pipeline.pl {JAVA_XMS} {JAVA_XMX} -debug \
    -configParameters {DOCKER_CONFIG} \
    -LoadHaplotypesFromGVCFPlugin -endPlugin

## Step 5: Create Consensus Haplotypes
Before performing consensus, you must first create a `rankingFile.txt`. This is a tab-delimited file of the form `taxon	score`, where higher scores mean we trust the taxon more highly.

When clustering assemblies - when we have a cluster of similar haplotypes - we choose the taxon in that group which has the higher ranking score. To break ties, be sure to give each taxon a different score. One simple way to score things is to count the number of haplotypes covered by each taxon in the DB and use that count as the score. Any other arbitrary ranking can be used.

Be sure to include a ranking for your reference!

In [None]:
ranking_file_txt =  f"""GCA_003957995_1_Assembly	1
GCA_003957995_2_Assembly	2
GCA_003957885_1_Assembly	3
GCA_003957885_2_Assembly	4
cassava_tme7_phase0_scaffolded_renamed_alignment_1_Assembly	5
cassava_tme7_phase0_scaffolded_renamed_alignment_2_Assembly	6
Mesculenta_671_v8.0.fa	7"""
! echo '{ranking_file_txt}' > {PHG_DIR}/rankingFile.txt

Copy the `.gvcf.gz` and `.gvcf.gz.tbi` files for your taxa into the local GVCF directory - including the reference!

In [None]:
! cp {PHG_DIR}/inputDir/reference/{REF}.gvcf.gz* {PHG_DIR}/local_gvcf/.
! cp -r {PHG_DIR}/inputDir/loadDB/gvcf/* {PHG_DIR}/local_gvcf/.

Next, we create a haplotype graph and attempt to create consensus haplotypes for each anchor. After the consensus haplotypes are created, they will be added to the database and a graph can be generated from them.

This step is *optional when using assemblies* and *required with WGS*. In the case of this tutorial, we will be producing consensus haplotypes since we are utilizing WGS data.

We highly recommend tuning the clustering parameters to match the diversity present in the database you are working with.

**NOTE:** If running a full graph build, remove the `-debug` flag otherwise output will stall your Jupyter browser tab.

In [None]:
! docker run --name create_consensus --rm \
    -v {PHG_DIR}/:/phg/ \
    --user {USER_ID} \
    -t maizegenetics/phg:latest \
    /tassel-5-standalone/run_pipeline.pl {JAVA_XMS} {JAVA_XMX} -debug \
    -configParameters {DOCKER_CONFIG} \
    -HaplotypeGraphBuilderPlugin \
        -configFile {DOCKER_CONFIG} \
        -includeVariantContexts true \
        -localGVCFFolder {LOCAL_GVCF} \
        -methods assembly_by_anchorwave \
    -endPlugin \
    -RunHapConsensusPipelinePlugin \
        -referenceFasta {DOCKER_REF} \
        -dbConfigFile {DOCKER_CONFIG} \
        -collapseMethod My_Informative_Collapse_Method_Name \
        -collapseMethodDetails My_Informative_Collapse_Method_Details \
        -rankingFile /phg/rankingFile.txt \
        -mxDiv 0.00025 \
        -clusteringMode kmer_assembly \
    -endPlugin

## Step 6: Impute Variants with the PHG
You have reached imputation! This step uses stored haplotype graph data to infer genotypes from skim sequence, GBS data, or other variant information. It uses the input fastq or variant files to match new individuals to haplotypes in the database and generates paths through the haplotype graph. Paths are stored in the database paths table once they are found. The path information can be output as either haplotype node IDs from the haplotypes table or exported to a VCF file containing SNPs for the taxa processed.

**NOTE:** `pangenomeHaplotypeMethod` and `pathHaplotypeMethod` must be set to a valid haplotype method. In our case the value specified by `LoadHaplotypesFromGVCFPlugin.haplotypeMethodName` in `config.txt` from *Step 4*.

### A Note on Rerunning the Pipeline with Different Parameters
When reprocessing the same samples with different parameter settings, the method names *must* be changed. If the method names are not changed, then read mappings or paths will already exist for those sample names and will not be overwritten. If read mapping parameters used to create the pangenome change, new `readMethod` and `pathMethod` names must be used. If only path finding parameters change, then only change `pathMethod`. In that case the pipeline will use the existing `readMethod` data to compute new paths.

If an existing configuration file is modified with new parameter settings and method names, the new configuration file should be saved under a different name. The configuration files provide a record of how analyses were run.

### Perform Imputation
Prior to running the imputation pipeline we must generate a `pangenome.fa` file from the graph. To do so we use the `WriteFastaFromGraphPlugin`. It requires a haplotype graph, so we chain the `HaplotypeGraphBuilderPlugin` using the same parameters as were used in *Step 5*.

In [None]:
! docker run --name write_pangenome_fasta --rm \
    -v {PHG_DIR}/:/phg/ \
    --user {USER_ID} \
    -t maizegenetics/phg:latest \
    /tassel-5-standalone/run_pipeline.pl {JAVA_XMS} {JAVA_XMX} -debug \
    -configParameters {DOCKER_CONFIG} \
    -HaplotypeGraphBuilderPlugin \
        -configFile {DOCKER_CONFIG} \
        -includeVariantContexts true \
        -localGVCFFolder {LOCAL_GVCF} \
        -methods assembly_by_anchorwave \
    -endPlugin \
    -WriteFastaFromGraphPlugin \
        -outputFile /phg/pangenome.fa \
    -endPlugin

Run Minimap2 to index `pangenome.fa`.

In [None]:
! minimap2 -x sr -t 22 -I 16g -d {PHG_DIR}/pangenome.mmi {PHG_DIR}/pangenome.fa

Afterwards, we call Minimap2 using the pangenome FASTA index alongside the chosen taxa's paired-end FASTQs to produce a SAM file to be used as input during imputation.

If your short reads  are a bunch of separate files, concatenate each end's reads together so that you have a single FASTQ file for both R1 and R2.

In [None]:
! minimap2 -ax sr -t 22 --secondary=yes -N50 --eqx -I 16g {PHG_DIR}/pangenome.mmi {PHG_DIR}/tempFileDir/BGM-2018_R1.fq {PHG_DIR}/tempFileDir/BGM-2018_R2.fq > {PHG_DIR}/inputDir/imputation/sam/BGM-2018.sam

Create a tab-delimited `readMapping_key_file.txt` containing data about the input SAMs you will be using for imputation - to be processed by the `SAMToMappingPlugin`.

In [None]:
readmapping_kf_txt = f"""cultivar	flowcell_lane	filename	PlateID
BGM-2018	1	BGM-2018.sam	1"""
! echo '{readmapping_kf_txt}' > {PHG_DIR}/readMapping_key_file.txt

Additionally, be aware of the `SAMToMappingPlugin.keyFile` and `SAMToMappingPlugin.samDir` entries added to `config.txt` during *Step 4*. You may need to change these values if you are not following the PHG's default directory schema.

Next we execute the `ImputePipelinePlugin` to impute variants given our inputs.

In [None]:
! docker run --name imputation --rm \
    -v {PHG_DIR}/:/phg/ \
    -v {WORKING_DIR}/phg.jar:/tassel-5-standalone/lib/phg.jar \
    --user {USER_ID} \
    -t maizegenetics/phg:latest \
    /tassel-5-standalone/run_pipeline.pl {JAVA_XMS} {JAVA_XMX} -debug \
    -configParameters {DOCKER_CONFIG} \
    -ImputePipelinePlugin \
        -readMethod My_Informative_Read_Method_Name \
        -imputeTarget pathToVCF \
        -inputType sam \
        -pangenomeHaplotypeMethod assembly_by_anchorwave \
        -pathHaplotypeMethod assembly_by_anchorwave \
        -localGVCFFolder {LOCAL_GVCF} \
    -endPlugin

Read mapping counts and imputed paths will be stored in the PHG database. Variants for all the paths stored for `pathMethod` will be written to `outputDir/imputation-results.vcf`.