Skip to content

Influenza A Full-Length Consensus Calling for Nanopore Sequencing Data

License

Notifications You must be signed in to change notification settings

jimmyliu1326/InfluenzaNanopore

Repository files navigation

InfluenzaNanopore

Description

The snakemake pipeline functions to construct the consensus sequence of the full-length Influenza A genome from Nanopore reads. The pipeline uses Centrifuge to bin reads by segments (segments 1-8) from raw reads. A draft consensus is generated by Spoa followed by error correction using medaka to polish the consensus.

Usage

Required arguments:

-i|--input    Path to input samples.csv containing sample name and path to .fastq per line
-o|--output   Path to output directory, the final consensus sequence will be found under consensus/
--db          Path to Centrifuge database for taxonomic and segment classification

Optional arguments:

-t|--threads        Number of threads [Default = 32]
-s|--segment        Target specific Influenza A genomic segments for consensus calling with each segment number delimited by a comma (Example: -s 1,2,5,6)
--subsample         Specify the target coverage for consensus calling [Default = 1000]
-m|--model          Specify the flowcell chemistry used for Nanopore sequencing [Default = r941_min_high_g360]
--notrim            Disable adaptor trimming by Porechop
--keep-tmp          Keep all temporary files
-h|--help           Display help message

Example command line for pipeline execution:

influenza_consensus.sh -i samples.csv -o /path/to/output --db /path/to/centrifuge/database

Dependencies

  • R >= 3.6
  • medaka == 1.0.3
  • centrifuge >= 1.0.3
  • seqtk >= 1.3
  • snakemake >= 5.30.1
  • porechop >= 0.2.4
  • spoa >= 4.0.7

About

Influenza A Full-Length Consensus Calling for Nanopore Sequencing Data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published