Benchmarking and Simulation Scripts for Plassembler
This directory contains the pipeline used to benchmark Plassembler.
I used Snaketool to make this, and would highly recommend it!
It should work 'out of the box' for the simulated reads. For the real read sets, you will need to change the csv
input files in this directory with the correct FASTQ paths of the required files on your system. These FASTQ reads can be found in the Zenodo repository here, or by following the instructions outlined here.
# creates conda env
conda create -n plassembler_benchmarking pip
conda activate plassembler_benchmarking
# download the repository
git clone "https://github.com/gbouras13/plassembler_simulation_benchmarking"
cd plassembler_simulation_benchmarking
pip install -e .
plassembler_simulation_benchmarking --help
All outputs will be in a directory called simulated_benchmarking
.
simulated_benchmarking
will contain SR
and LR
directories for short and long simulated reads, and SIMULATED
containing the Plassembler Flye and Raven and Unicycler outputs for each thread count.
It will also contain BENCHMARKS
containing the time/memory etc benchmarks and QUAST
, which will contain the Quast outputs comparing Plassembler and Unicycler to the ground truth plasmid assemblies.
# creates simulated read sets
plassembler_simulation_benchmarking simulate --input benchmarking_simulate.csv --output simulated_benchmarking --threads 16
# assembles the simulated read sets
plassembler_simulation_benchmarking assemble-simulated --input benchmarking_simulate.csv --output simulated_benchmarking --threads 16
plassembler_simulation_benchmarking assess-simulated --input benchmarking_simulate.csv --output simulated_benchmarking --threads 16
All outputs will be in a directory called real_benchmarking
.
real_benchmarking
will contain REAL
containing the Plassembler Flye and Raven and Unicycler outputs for each thread count.
It will also contain BENCHMARKS
containing the time/memory etc benchmarks and QUAST
, which will contain the Quast outputs comparing Plassembler and Unicycler to the ground truth plasmid assemblies.
#### you will need to get the FASTQ files and modify benchmarking_real.csv ####
plassembler_simulation_benchmarking assemble-real --input benchmarking_real.csv --output real_benchmarking --threads 16
plassembler_simulation_benchmarking assess-real --input benchmarking_real_quast.csv --output real_benchmarking --threads 16
All outputs will be in a directories called simulated_benchmarking_de_maio
,
simulated_benchmarking_de_maio
will contain SR
and LR
directories for short and long simulated reads, and SIMULATED
containing the Plassembler Flye and Raven and Unicycler outputs for each thread count.
It will also contain BENCHMARKS
containing the time/memory etc benchmarks and QUAST
, which will contain the Quast outputs comparing Plassembler and Unicycler to the ground truth plasmid assemblies.
plassembler_simulation_benchmarking simulate --input benchmarking_simulate_demaio.csv --output simulated_benchmarking_de_maio --threads 16
plassembler_simulation_benchmarking assemble-simulated --input benchmarking_simulate_demaio.csv --output simulated_benchmarking_de_maio --threads 16
plassembler_simulation_benchmarking assess-simulated --input benchmarking_simulate_demaio.csv --output simulated_benchmarking_de_maio --threads 16