In order to run
Mycovista, you'll need to install:
Mycovista, edit the
- specify the pipeline you want to use: long or hybrid mode
- insert additional information:
- path to input short reads
- path to input long reads
- path to output folder
- names of the bacterial strains
Note: The raw read files must contain the name of the strain in the file name.
python script helps you to generate all folders required for the output. Just run:
Afterwards, all required data should be linked in the
raw_data/ folder. Please check this!
Now you can get started and assemble your bacterias. You can start
Mycovista by using:
snakemake -s <mode> -c <#threads> --use-conda
mode - assembly mode file (long or hybrid)
Tip: You can use
-n for a dry run to check if snakemake will start all required rules.
Mycovista, please cite all incorporated tools as without them this pipeline wouldn't exist.
FastQC and NanoPlot are used for quality check of the reads (raw and preprocessed). Short reads are filtered by fastp for adapter clippling and Trimmomatic for quality trimming. Long reads are filtered by length using Filtlong. Flye assembles the long reads first. The assembly is then polished with long reads by Racon using minimap2 as mapper inbetween. Afterwards, medaka is incorporated as additional polishing step with long reads. In hybrid mode, the assembly postprocessed further with Racon and minimap2 using short reads. The final assembly is annotated by Prokka and general assembly statistics are calculated by QUAST.
Downstream analysis of our M. bovis assembly panel included pangenome analsis followed by a genome-wide association analysis (GWAS). We provided the R script for the GWAS in
Click here for all citations
Chen S, Zhou Y, Chen Y, Gu J (2018) fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34(17):i884–i890
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114–2120
Wick R (2018) Filtlong. Available: https://github.com/rrwick/Filtlong
Kolmogorov M, Yuan J, Lin Y, Pevzner PA (2019) Assembly of long, error-prone reads using repeat graphs. Nature Biotechnology 37(5):540–546
Vaser R, Sovi ́c I, N N, Šiki ́c M (2017) Fast and accurate de novo genome assembly from long uncorrected reads. Genome Research 27:737–746
Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34(18):3094–3100
Ltd. ONT (2018) medaka: Sequence correction provided by ONT Research. Available: https://github.com/nanoporetech/medaka
Andrews S, et al. (2012) FastQC (Babraham Institute. Available: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C (2018) NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 34(15):2666–2669
Gurevich A, Saveliev V, Vyahhi N, Tesler G (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics 29(8):1072–1075
Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14):2068–2069
R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.