diploidocus_setonix

Nextflow based workflow to parallelise https://github.com/slimsuite/diploidocus for HPC deployment

Clone repo into your environment
Edit nextflow.config file to change container locations to where you have them stored (not necessary for Oceanomics users). Highly recommend using a Seqera account so that the pipeline execution can be easily monitored (https://training.nextflow.io/basic_training/seqera_platform/), else delete the tower parameters in nextflow.config.
Choose which script to run based on gzipped assembly file size: small (under 600Mb), medium (600Mb-4Gb), large (4-7Gb). If running in diploid mode the assembly file size is based on one file.
Run chosen main.nf using the slurm template provided, the following params can be added to the nextflow run command by adding '--param /path/to/file'. Without these parameters the script will search for the files in the default places and will make them if not found:

Run parameters

Assembly and reads are the only necessary input, all other files will be made by the workflow if needed
--asssembly Path to assembly file (necessary) 

--projectDir Where output directories will be made and the deafult directory where files will be searched for [default '.']

--hifireads Path to all reads files [default "$params.projectDir/reads/*"], you have to put "" around
the directory location if using '*' wildcard e.g. --hifireads "/path/to/*.fastq"

--bam Bam file for assembly. If running in diploid mode (using 2 assembly files) then set as
the directory where both bam files can be found. [default "$params.projectDir/bam"]

--busco Path to full_table.tsv output from busco [default "$params.projectDir/busco/**full_table.tsv"]

--lineage Lineage to use for BUSCO analysis (only necessary if busco file is not given) [default 'actinopterygii_odb10']


Give folder names without a '/' at the end e.g. --depthsizer /my_assembly/depthsizer


--depthsizer Path to folder with depthsizer output files [default search in ("${params.projectDir}/depthsizer/"]

--kat Path to folder with kat and selfkat output files  [default search in ("${params.projectDir}/kat/"]

--purge_hap Path to folder with purge_haplotigs output files  [default search in ("${params.projectDir}/purge_haplotigs/"]

Diploid mode

Using diploid mode, the haplotype input will be pre-processed with its paired haplotype file. To use this mode the --asssembly input stays as one file and the --PairedFactor [default "'hap1', 'hap2'"] parameter can be set as the difference between the files. The other haplotype file will be searched for in the same directory and if found diploid pre-processing will be performed. If no file is found will proceed in haploid mode. For example, to run diploid mode using 2 haplotype files run:

nextflow run main.nf --assembly /path/to/my_assembly.hap1.fa #where my_assembly.hap2.fa file can be found in the same directory

Or to run using a different naming convention:

nextflow run main.nf --assembly /path/to/my_assembly.mat.fa --PairedFactor "'mat', 'pat'" #where my_assembly.pat.fa file can be found in the same directory

Diploid mode

Haploid mode

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
README.md		README.md
large_main.nf		large_main.nf
medium_main.nf		medium_main.nf
nextflow.config		nextflow.config
slurm_diploidocus.sh		slurm_diploidocus.sh
small_main.nf		small_main.nf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

large_main.nf

large_main.nf

medium_main.nf

medium_main.nf

nextflow.config

nextflow.config

slurm_diploidocus.sh

slurm_diploidocus.sh

small_main.nf

small_main.nf

Repository files navigation

diploidocus_setonix

Run parameters

Diploid mode

About

Releases

Packages

Languages

jadedavis5/diploidocus_setonix

Folders and files

Latest commit

History

Repository files navigation

diploidocus_setonix

Run parameters

Diploid mode

About

Topics

Resources

Stars

Watchers

Forks

Languages