Snakemake genome assembly

This is a Snakemake pipeline that downloads reads from SRA, assembles them using Unicycler, and outputs various quality metric files and plots. The steps in the pipeline are:

Download unassembled genomes from SRA using SRA toolkit
Quality control with FASTP
phiX spikein removal with bbduk
Assemble genomes with Unicycler
Get genome assembly metrics with CheckM2 and Quast

Running the pipeline

Edit the config file in the indicated places
Install snakemake. A bare conda/mamba environment is recommended (ie., created with mamba create -c conda-forge -c bioconda -n snakemake snakemake)
Edit config/config.yml.
- sra_list should be the path to a newline-separated file of SRA accessions.
- Enter the path to the checkm2 database on your system. If you don't have it installed, you can download it directly from here (source) and put enter the path into the config file.
- By default, the pipeline will put everything into the output folder - change the path if you'd like it to be put somewhere else
Edit slurm/config.yaml.
- In particular, you'll need to edit the default-resources entry with the default partition you'd like to use to submit slurm jobs to.
Run the pipeline with snakemake --use-conda -c

Assumptions

The fastq files dumped from SRA are paired-end (ie, after dumping, they'll be named something like SRRXXXXX_pass_1.fastq.gz and SRRXXXXX_pass_2.fastq.gz)

Workflow

To do

Reports
- Spades runtime reports
- Metric plots
  - See here for parameterizing R scripts with Snakemake: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#r-and-r-markdown
Clean up names in checkm2 report
Modify running time for CheckM2, overall workflow based on number of inputs
- https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources
Add a rule to download the checkm2 database if it doesn't exist
Allow user to supply paired-end reads

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
config		config
data		data
output		output
slurm		slurm
workflow		workflow
working		working
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run_pipeline.sh		run_pipeline.sh
snakemake_dag.png		snakemake_dag.png
snakemake_slurm.sh		snakemake_slurm.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snakemake genome assembly

Running the pipeline

Assumptions

Workflow

To do

About

Releases

Packages

Languages

License

pommevilla/snakemake-genome-assembly-practice

Folders and files

Latest commit

History

Repository files navigation

Snakemake genome assembly

Running the pipeline

Assumptions

Workflow

To do

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages