Skip to content

laulambr/virus_assembly

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

virus_assembly

A bash pipeline for the de novo assembly of viral genomes generated via Illumina NGS. Currently handles the following viruses: HIV-1, RSV, RRV and HMPV. This pipeline includes quality checks and reports via FastQc and MultiQC, trimming and mapping of reads via BBTools and de novo assembly using MEGAHIT.

Current version: V1

Table of contents

System Requirements

Hardware requirements

The pipeline can be run on either a standard computer or a HPC server. We tested the pipeline on a standard desktop computer with the following specifications:

RAM: 16+ GB CPU: 4+ cores, @1.90 GHz

Software requirements

OS requirements

The pipeline has been tested on several Linux operating systems including the following systems:

Linux: Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04

Dependencies

A conda package manager like Miniconda3. Instructions on how to install:

  1. Download the latest miniconda installation script
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
  1. Make the miniconda installation script executable
chmod +x Miniconda3-latest-Linux-x86_64.sh
  1. Run miniconda installation script
bash ./Miniconda3-latest-Linux-x86_64.sh

Installation

Time to install the pipeline took less than 15 minutes on standard desktop computer.

  1. Download the initial environment installation file
wget https://raw.githubusercontent.com/laulambr/virus_assembly/main/scripts/install_env.sh
  1. Run the script in the terminal
 bash ./install_env.sh
  1. Check if installation worked
 conda activate virus_assembly

Usage

Pipeline: NGS pipeline for viral assembly.
usage: virus_assembly [-h -v -p -q] (-i dir -m value -t value )
(-s string) 
with:
 -h  Show help text
 -v  Version of the pipeline
 -n  Name of RUN.
 -i  Input directory
 -s  Viral species [HIV, RSV, RRV, HMPV]
 -c  Perform clipping of primers
 -q  Perform quality check using fastQC
 -m  Memory
 -t  Number of threads

Quick start

  1. Activate environment.
conda activate virus_assembly
  1. Head to the directory where you will perform the analysis.
  2. Place the raw fastq.gz files in a directory called source.
  3. Create a list holding the sample names from you sequencing files called IDs.list and place it in the main directory.
  4. Start the pipeline using the following command
virus_assembly -i path/to/main/directory -s VIRUS 
  1. When the pipeline has finished, 5 additional folders will have been created:
    • 1_reads: Includes the trimmed and normalised reads.
    • 2_ref_map: Includes a bam file of the trimmed reads against the viral reference genome and a pdf with the qualimap results.
    • 3_contigs: Includes the de novo assembled contigs by megahit for each sample.
    • 4_filter: Includes the high converage contigs generated by megahit (_ *hicov.fasta), filtered and reorientated against the viral reference genome (_ *reoriented.fa)
    • 5_remap: Includes both a fasta file holding the viral contigs for that sample and a bam file of the trimmed reads agains those contigs.

Example

We provide an example analysis using pre-installed test data, which took less than 1 minute on standard desktop.

  1. Activate environment.
conda activate virus_assembly
  1. Head to the directory with the github clone of this repository and head to the test data folder
cd $CONDA_PREFIX/virus_assembly/test_data
  1. Run the following command
virus_assembly -i $CONDA_PREFIX/virus_assembly/test_data -s HIV
  1. The pipeline will run, check the newly created 5_read directory for a fasta file containing the new de novo assembled HIV-1 provirus.

Included viral reference genomes

  • HIV: K03455.1
  • RSV: MH760627; MH760652
  • RRV: RRV_ref (Accession pending)
  • HMPV: HMPV205; HMPV218 (Accessions pending)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Shell 100.0%