Skip to content

iwc-workflows/Purge-duplicates-one-haplotype-VGP6b

Repository files navigation

Purge Duplicate Contigs

Purge contigs marked as duplicates by purge_dups in a single haplotype(could be haplotypic duplication or overlap duplication) This workflow is the 6th workflow of the VGP pipeline. It is meant to be run after one of the contigging steps (Workflow 3, 4, or 5)

Inputs

  1. Genomescope model parameters [txt] (Generated by the k-mer profiling workflow)
  2. Hifi long reads - trimmed [fastq] (Generated by Cutadapt in the contigging workflow)
  3. Assembly to purge (e.g. hap1) [fasta] (Generated by the contigging workflow)
  4. K-mer database [meryldb] (Generated by the k-mer profiling workflow)
  5. Estimated Genome Size [txt]
  6. Assembly to leave alone (used for merqury statistics) (e.g. hap2) [fasta] (Generated by the contigging workflow)
  7. Name of un-altered assembly
  8. Name of purged assembly

Outputs

  1. Haplotype 1 purged assembly (Fasta and gfa)
  2. Haplotype 2 purged assembly (Fasta and gfa)
  3. QC: BUSCO report for both assemblies
  4. QC: Merqury report for both assemblies
  5. QC: Assembly statistics for both assemblies
  6. QC: Nx plot for both assemblies
  7. QC: Size plot for both assemblies