Skip to content

General otb directory structure

David Molik edited this page May 27, 2022 · 6 revisions

At its core, otb is a nextflow pipeline with some extra bells and whistles, this page describes the structure of an example otb project:

Here is the directory structure of a genome created for Xylecopa micans

0_Xylecopa_micans/
├── config
│   ├── none.cfg
│   ├── sge.cfg
│   ├── slurm_atlas.cfg
│   ├── slurm.cfg
│   └── slurm_usda.cfg
├── execute.slurm
├── LICENSE
├── nextflow-Xylecopa_micans.log.txt
├── otb.sh
├── RawHiC
│   ├── JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R1.fastq.gz
│   └── JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R2.fastq.gz
├── RawHiFI
│   ├── m54334U_210423_073502.ccs.bam 
│   ├── m54334U_210423_073502.ccs.bam.bai
│   ├── m54334U_210423_073502.ccs.bam.md5
│   └── m54334U_210423_073502.ccs.bam.pbi
├── README.md
├── reports     
├── results
│   ├── busco_no_polish
│   ├── busco_polish
│   ├── filtering
│   │   └── fastq_check.log.txt
│   ├── genome
│   │   ├── left.fastq.gz.stats
│   │   ├── log
│   │   └── right.fastq.gz.stats
│   ├── genomescope
│   │   ├── genomescope2.log.txt
│   │   ├── jellyfish.log.txt
│   │   ├── kcov.txt -> ../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/kcov.txt
│   │   ├── version.txt -> ../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/version.txt
│   │   └── Xylecopa_micans
│   │       ├── fitted_hist.png -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/fitted_hist.png
│   │       ├── linear_plot.png -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/linear_plot.png
│   │       ├── log_plot.png -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/log_plot.png
│   │       ├── lookup_table.txt -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/lookup_table.txt
│   │       ├── model.txt -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/model.txt
│   │       ├── progress.txt -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/progress.txt
│   │       ├── summary.txt -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/summary.txt
│   │       ├── transformed_linear_plot.png -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/transformed_linear_plot.png
│   │       └── transformed_log_plot.png -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/transformed_log_plot.png
│   └── software_versions
│       ├── any2fasta_version.txt
│       ├── bbtools_version.txt
│       ├── bcftools_version.txt
│       ├── busco_version.txt
│       ├── genomescope_version.txt
│       ├── hicstuff_version.txt
│       ├── hifiasm_version.txt
│       ├── jellyfish_version.txt
│       ├── pbadapterfilt_version.txt
│       ├── ragtag_version.txt
│       ├── samtools_version.txt
│       └── shhquis_version.txt
├── run.nf
├── scr
│   ├── check_env.sh
│   ├── force_prefetch_containers.sh
│   ├── io.sh
│   └── prefetch_containers.sh
├── stderr.6595761.ceres19-compute-98
├── stdout.6595761.ceres19-compute-98
├── work
│   ├── 01
│   │   └── 5fd9920526ce4cd891b6181ddb1a70
│   │       ├── any2fasta_stats.flag.txt
│   │       ├── right.fastq.gz -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/RawHiC/JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R2.fastq.gz
│   │       └── right.fastq.gz.stats
│   ├── 1b
│   │   └── 1d5bf40157d16bca1b352914141131
│   │       ├── jellyfish_version.flag.txt
│   │       └── version.txt -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/work/42/0283004d9768bf75bf2ff72fef1031/version.txt
│   ├── 22
│   │   └── 5d18f4291708f512b7a1be9148dd2a
│   │       └── any2fasta_version.flag.txt
│   ├── 39
│   │   └── e8f238662358a6b400104fd9dd0f0a
│   │       └── hifiadapterfilt_version.flag.txt
│   ├── 3c
│   │   └── 9475be059b3bd92f42172edda5d853
│   │       └── shhquis_version.flag.txt
│   ├── 42
│   │   └── 0283004d9768bf75bf2ff72fef1031
│   │       ├── jellyfish.flag.txt
│   │       ├── left.fastq.gz -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/RawHiC/JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R1.fastq.gz
│   │       ├── reads.jf
│   │       ├── right.fastq.gz -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/RawHiC/JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R2.fastq.gz
│   │       ├── version.txt
│   │       └── Xylecopa_micans.histo
│   ├── 43
│   │   └── 611cc48f608f83c5ddda5d8f8090ac
│   │       └── bbtools_version.flag.txt
│   ├── 51
│   │   └── 318c0111ff007513bca8b4b5df1416
│   │       └── bcftools_version.flag.txt
│   ├── 56
│   │   └── 7e48f7c96d4cefcdec1eda5bf4ac6b
│   │       ├── genomescope_version.flag.txt
│   │       └── version.txt -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/work/e8/22e87b3c09c5cd59a28b99cf21e31a/version.txt
│   ├── 5c
│   │   └── cbff3d6fd1b76022d7f1ec52a00934
│   │       └── busco_version.flag.txt
│   ├── 67
│   │   └── 766c5a6cc10fc153b8f5b37c40a826
│   │       └── hicstuff_version.flag.txt
│   ├── 81
│   │   └── 2b38330f70f55413734e6a140cb3dc
│   │       ├── any2fasta_stats.flag.txt
│   │       ├── left.fastq.gz -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/RawHiC/JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R1.fastq.gz
│   │       └── left.fastq.gz.stats
│   ├── a3
│   │   └── 1a6a8bf4fb272772353cd89c763cb1
│   │       └── ragtag_version.flag.txt
│   ├── b3
│   │   └── a1899efe376be7777d1f6fe8d64baf
│   │       └── samtools_version.flag.txt
│   ├── ba
│   │   └── fe4de41ba6a5e575d6079beef371f5
│   │       └── HiFiASM_version.flag.txt
│   ├── e8
│   │   └── 22e87b3c09c5cd59a28b99cf21e31a
│   │       ├── genomescope.flag.txt
│   │       ├── kcov.txt
│   │       ├── version.txt
│   │       ├── Xylecopa_micans
│   │       │   ├── fitted_hist.png
│   │       │   ├── linear_plot.png
│   │       │   ├── log_plot.png
│   │       │   ├── lookup_table.txt
│   │       │   ├── model.txt
│   │       │   ├── progress.txt
│   │       │   ├── summary.txt
│   │       │   ├── transformed_linear_plot.png
│   │       │   └── transformed_log_plot.png
│   │       └── Xylecopa_micans.histo -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/work/42/0283004d9768bf75bf2ff72fef1031/Xylecopa_micans.histo
│   ├── ec
│   │   └── efe6d84cbf60d95363237250212e5e
│   │       ├── check_fastq.flag.txt
│   │       ├── JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R1.fastq.gz -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/RawHiC/JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R1.fastq.gz
│   │       ├── JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R2.fastq.gz -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/RawHiC/JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R2.fastq.gz
│   │       └── out
                ├── left.fastq.gz -> ../JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R1.fastq.gz
                └── right.fastq.gz -> ../JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R2.fastq.gz

There's a lot going on here, so lets break it down. Let's run on the assumption that this user copied down the otb repo before adding that data they wanted processed.

At the top-level directory, we have the following:

0_Xylecopa_micans/
├── config/
├── .nextflow/
├── RawHiC/
├── RawHiFI/
├── reports/
├── results/
├── scr/
└── work/

The RawHiC and RawHiFi directories are user added directories to store data that they are going to use in thier otb run. config holds files that are used to describe to otb what kind of cluster (or lack there of) otb is being run on. .nextflow is a hidden directory which nextflow uses for history and cache. results are where results are stored, otb creates this directory. scr holds helper scripts for orb. work is where each process in otb is actually computed. If we add in typical files we get this:

0_Xylecopa_micans/
├── config/
├── execute.slurm
├── LICENSE
├── .nextflow/
├── .nextflow.log
├── nextflow-Xylecopa_micans.log.txt
├── otb.sh*
├── RawHiC/
├── RawHiFI/
├── README.md
├── results/
├── reports/
├── run.nf
├── scr/
├── stderr.6595761.ceres19-compute-98
├── stdout.6595761.ceres19-compute-98
├── work/
└── Xylecopa_micans.nextflow.command.txt

Here we add execute.slurm which the user added to execute otb on a work node on the cluster, instead of the head node, which is fairly typical of nexflow pipelines. LICENSE which is the LICENSE for otb. .nextflow.log which is nextflow's log file from running otb, and nextflow-Xylecopa_micans.log.txt which is otb's log. otb.sh is the script for running otb. README.md which holds information on running otb. run.nf is called by otb.sh, it is nexflow code to run nextflow. stderr.6595761.ceres19-compute-98 is the standard error of this user's run, and stdout.6595761.ceres19-compute-98 is the standard out of this users run. Xylecopa_micans.nextflow.command.txt is the command call that otb used to call nextflow, this especially noteworthy when a nextflow run fails, because this command can be used with -resume flag appeneded to it to restart nextflow in case of an error.

He're we see much the same, but with otb files linked out to files coressponding in the otb repo:

0_Xylecopa_micans
├── config
├── execute.slurm
├── LICENSE
├── nextflow-Xylecopa_micans.log.txt
├── otb.sh
├── RawHiC
├── RawHiFI
├── README.md
├── results
├── run.nf
├── scr
├── stderr.6595761.ceres19-compute-98
├── stdout.6595761.ceres19-compute-98
├── work
└── Xylecopa_micans.nextflow.command.txt

Relevant pages: