virus_assembly

A bash pipeline for the de novo assembly of viral genomes generated via Illumina NGS. Currently handles the following viruses: HIV-1, RSV, RRV and HMPV. This pipeline includes quality checks and reports via FastQc and MultiQC, trimming and mapping of reads via BBTools and de novo assembly using MEGAHIT.

Current version: V1

Table of contents

System Requirements
Installation
Usage
Example

System Requirements

Hardware requirements

The pipeline can be run on either a standard computer or a HPC server. We tested the pipeline on a standard desktop computer with the following specifications:

RAM: 16+ GB CPU: 4+ cores, @1.90 GHz

Software requirements

OS requirements

The pipeline has been tested on several Linux operating systems including the following systems:

Linux: Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04

Dependencies

A conda package manager like Miniconda3. Instructions on how to install:

Download the latest miniconda installation script

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

Make the miniconda installation script executable

chmod +x Miniconda3-latest-Linux-x86_64.sh

Run miniconda installation script

bash ./Miniconda3-latest-Linux-x86_64.sh

Installation

Time to install the pipeline took less than 15 minutes on standard desktop computer.

Download the initial environment installation file

wget https://raw.githubusercontent.com/laulambr/virus_assembly/main/scripts/install_env.sh

Run the script in the terminal

 bash ./install_env.sh

Check if installation worked

 conda activate virus_assembly

Usage

Pipeline: NGS pipeline for viral assembly.
usage: virus_assembly [-h -v -p -q] (-i dir -m value -t value )
(-s string) 
with:
 -h  Show help text
 -v  Version of the pipeline
 -n  Name of RUN.
 -i  Input directory
 -s  Viral species [HIV, RSV, RRV, HMPV]
 -c  Perform clipping of primers
 -q  Perform quality check using fastQC
 -m  Memory
 -t  Number of threads

Quick start

Activate environment.

conda activate virus_assembly

Head to the directory where you will perform the analysis.
Place the raw fastq.gz files in a directory called source.
Create a list holding the sample names from you sequencing files called IDs.list and place it in the main directory.
Start the pipeline using the following command

virus_assembly -i path/to/main/directory -s VIRUS

When the pipeline has finished, 5 additional folders will have been created:
- 1_reads: Includes the trimmed and normalised reads.
- 2_ref_map: Includes a bam file of the trimmed reads against the viral reference genome and a pdf with the qualimap results.
- 3_contigs: Includes the de novo assembled contigs by megahit for each sample.
- 4_filter: Includes the high converage contigs generated by megahit (_ *hicov.fasta), filtered and reorientated against the viral reference genome (_ *reoriented.fa)
- 5_remap: Includes both a fasta file holding the viral contigs for that sample and a bam file of the trimmed reads agains those contigs.

Example

We provide an example analysis using pre-installed test data, which took less than 1 minute on standard desktop.

Activate environment.

conda activate virus_assembly

Head to the directory with the github clone of this repository and head to the test data folder

cd $CONDA_PREFIX/virus_assembly/test_data

Run the following command

virus_assembly -i $CONDA_PREFIX/virus_assembly/test_data -s HIV

The pipeline will run, check the newly created 5_read directory for a fasta file containing the new de novo assembled HIV-1 provirus.

Included viral reference genomes

HIV: K03455.1
RSV: MH760627; MH760652
RRV: RRV_ref (Accession pending)
HMPV: HMPV205; HMPV218 (Accessions pending)

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
db		db
scripts		scripts
test_data		test_data
LICENSE		LICENSE
README.md		README.md
virus_assembly.sh		virus_assembly.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

virus_assembly

System Requirements

Hardware requirements

Software requirements

OS requirements

Dependencies

Installation

Usage

Quick start

Example

Included viral reference genomes

About

Releases 1

Packages

Languages

License

laulambr/virus_assembly

Folders and files

Latest commit

History

Repository files navigation

virus_assembly

System Requirements

Hardware requirements

Software requirements

OS requirements

Dependencies

Installation

Usage

Quick start

Example

Included viral reference genomes

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages