Skip to content

Commit

Permalink
docs: Document params with test data files (#64)
Browse files Browse the repository at this point in the history
Several documentation fixes:

* In the case 1 to 3 example commands, use the test data input files for step 1 and step 2
* Add instructions to unzip the PacBio read dataset before running the example commands
* Fix misspelled params illiumina, mitocondrial, and falcon-unzip
* arrow01 is now falcon_unzip
* Add warning note about hyphens or dashes may not copy from browser correctly
* Fix the markdown table for missing the final pipe

Update conda install instructions and yaml file

* bbtools breaks Nextflow java and the script we needed from it is contained in other dependencies.
* update conda install documentation to use the other_dependencies yaml for conda installs

---------

Co-authored-by: Andrew Severin <severin@iastate.edu>
  • Loading branch information
Jennifer Chang and isugif committed Feb 2, 2023
1 parent 5a54bf9 commit 88624da
Show file tree
Hide file tree
Showing 5 changed files with 69 additions and 43 deletions.
32 changes: 20 additions & 12 deletions docs/Case_01.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,25 +24,33 @@ We have listed some example files to test the pipeline based on Chromosome 30 Hz
Case 1 will take primary assembly from the `FALCON/2-asm-falcon` folder.

| Param | Files | Download link|
|:--|:--|:--
| --primary_assembly | "p_ctg.fasta" | [p_ctg.fasta](https://data.nal.usda.gov/system/files/p_ctg.fasta)|
| --mitochondrial_assembly | "GCF_022581195.2_ilHelZeax1.1_mito.fa" | [GenBank download fasta](https://www.ncbi.nlm.nih.gov/nuccore/NC_061507.1?report=fasta)|
| --illumina_reads |"testpolish_{R1,R2}.fq" | [testpolish_R1.fastq](https://data.nal.usda.gov/system/files/testpolish_R1.fastq), [testpolish_R2.fastq](https://data.nal.usda.gov/system/files/testpolish_R2.fastq) |
| --pacbio_reads | "test.1.filtered.bam" | [test.1.filtered.bam_.gz](https://data.nal.usda.gov/system/files/test.1.filtered.bam_.gz)|
|:--|:--|:--|
| `--primary_assembly` | "p_ctg.fasta" | [p_ctg.fasta](https://data.nal.usda.gov/system/files/p_ctg.fasta)|
| `--mitochondrial_assembly` | "GCF_022581195.2_ilHelZeax1.1_mito.fa" | [GenBank download fasta](https://www.ncbi.nlm.nih.gov/nuccore/NC_061507.1?report=fasta)|
| `--illumina_reads` |"testpolish_{R1,R2}.fastq" | [testpolish_R1.fastq](https://data.nal.usda.gov/system/files/testpolish_R1.fastq), [testpolish_R2.fastq](https://data.nal.usda.gov/system/files/testpolish_R2.fastq) |
| `--pacbio_reads` | "test.1.filtered.bam" | [test.1.filtered.bam_.gz](https://data.nal.usda.gov/system/files/test.1.filtered.bam_.gz)|

**Note:** The PacBio Reads (`test.1.filtered.bam_.gz`) must be decompressed before running the pipeline.

```
gunzip -dc test.1.filtered.bam_.gz > test.1.filtered.bam
```

### Recommended parameters

```
nextflow run isugifNF/polishCLR -r main \
--primary_assembly "data/primary.fasta" \
--mitocondrial_assembly "data/mitochondrial.fasta" \
--illiumina_reads "data/illumina/*_{R1,R2}.fasta.bz" \
--pacbio_reads "data/pacbio/pacbio.subreads.bam" \
--primary_assembly "p_ctg.fasta" \
--mitochondrial_assembly "GCF_022581195.2_ilHelZeax1.1_mito.fa" \
--illumina_reads "*_{R1,R2}.fastq" \
--pacbio_reads "test.1.filtered.bam" \
--step 1 \
--falcon-unzip false \
--falcon_unzip false \
-profile slurm
```

**Note:** On some browsers, the dashes (-) and underscores (_) can be copied incorrectly. So if you run into an error that says `not valid in the pipeline` try manually retyping those parameters.

Step 2 runs another round of Arrow polishing with the PacBio reads, then polishes with short-reads with two rounds of FreeBayes. We broke these two steps into seperate phases to allow for manual scaffolding.

Provide the purged primary `primary_purged.fa` and alternate contigs `haps_purged.fa` from purge_dups, and mitochondrial genome `mitochondrial.fasta` as input to step 2.
Expand All @@ -56,8 +64,8 @@ Regardless don't forget to include parameter flags `--step 2` and `resume` to th
--primary_assembly "primary_purged.fa" \
--alternate_assembly "haps_purged.fa" \
--mitochondrial_assembly "data/mitochondrial.fasta" \
--illumina_reads "data/illumina/*_{R1,R2}.fasta.bz" \
--pacbio_reads "../RawSequelData/m*.subreads.bam" \
--illumina_reads "*_{R1,R2}.fastq" \
--pacbio_reads "test.1.filtered.bam" \
--step 2 \
-profile slurm \
-resume
Expand Down
36 changes: 22 additions & 14 deletions docs/Case_02.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,28 +20,36 @@ We have listed some example files to test the pipeline based on Chromosome 30 Hz
Case 2 will take primary assembly from the `FALCON/3-unzip` folder.

| Param | Files | Download link|
|:--|:--|:--
| --primary_assembly | "all_p_ctg.fasta" | [all_p_ctg.fasta](https://data.nal.usda.gov/system/files/all_p_ctg.fasta)|
| --alternate_assembly | "all_h_ctg.fasta" |[all_h_ctg.fasta](https://data.nal.usda.gov/system/files/all_h_ctg.fasta)|
| --mitochondrial_assembly | "GCF_022581195.2_ilHelZeax1.1_mito.fa" | [GenBank download fasta](https://www.ncbi.nlm.nih.gov/nuccore/NC_061507.1?report=fasta)|
| --illumina_reads |"testpolish_{R1,R2}.fq" | [testpolish_R1.fastq](https://data.nal.usda.gov/system/files/testpolish_R1.fastq), [testpolish_R2.fastq](https://data.nal.usda.gov/system/files/testpolish_R2.fastq) |
| --pacbio_reads | "test.1.filtered.bam" | [test.1.filtered.bam_.gz](https://data.nal.usda.gov/system/files/test.1.filtered.bam_.gz)|
|:--|:--|:--|
| `--primary_assembly` | "all_p_ctg.fasta" | [all_p_ctg.fasta](https://data.nal.usda.gov/system/files/all_p_ctg.fasta)|
| `--alternate_assembly` | "all_h_ctg.fasta" |[all_h_ctg.fasta](https://data.nal.usda.gov/system/files/all_h_ctg.fasta)|
| `--mitochondrial_assembly` | "GCF_022581195.2_ilHelZeax1.1_mito.fa" | [GenBank download fasta](https://www.ncbi.nlm.nih.gov/nuccore/NC_061507.1?report=fasta)|
| `--illumina_reads` |"testpolish_{R1,R2}.fastq" | [testpolish_R1.fastq](https://data.nal.usda.gov/system/files/testpolish_R1.fastq), [testpolish_R2.fastq](https://data.nal.usda.gov/system/files/testpolish_R2.fastq) |
| `--pacbio_reads` | "test.1.filtered.bam" | [test.1.filtered.bam_.gz](https://data.nal.usda.gov/system/files/test.1.filtered.bam_.gz)|

**Note:** The PacBio Reads (`test.1.filtered.bam_.gz`) must be decompressed before running the pipeline.

```
gunzip -dc test.1.filtered.bam_.gz > test.1.filtered.bam
```

### Recommended parameters

```
nextflow run isugifNF/polishCLR -r main \
--primary_assembly "data/primary.fasta" \
--alternate_assembly "data/alternate.fasta" \
--mitocondrial_assembly "data/mitochondrial.fasta" \
--illumina_reads "data/illumina/*_{R1,R2}.fasta.bz" \
--pacbio_reads "data/pacbio/pacbio.subreads.bam" \
--primary_assembly "all_p_ctg.fasta" \
--alternate_assembly "all_h_ctg.fasta" \
--mitochondrial_assembly "GCF_022581195.2_ilHelZeax1.1_mito.fa" \
--illumina_reads "*_{R1,R2}.fastq" \
--pacbio_reads "test.1.filtered.bam" \
--step 1 \
--arrow01 \
--falcon_unzip true \
-profile slurm \
-resume
```

**Note:** On some browsers, the dashes (-) and underscores (_) can be copied incorrectly. So if you run into an error that says `not valid in the pipeline` try manually retyping those parameters.

Step 2 runs another round of Arrow polishing with the PacBio reads, then polishes with short-reads with two rounds of FreeBayes. We broke these two steps into seperate phases to allow for manual scaffolding.

Provide the purged primary `primary_purged.fa` and alternate contigs `haps_purged.fa` from purge_dups, and mitochondrial genome `mitochondrial.fasta` as input to step 2.
Expand All @@ -55,8 +63,8 @@ Regardless don't forget to include parameter flags `--step 2` and `resume` to th
--primary_assembly "primary_purged.fa" \
--alternate_assembly "haps_purged.fa" \
--mitochondrial_assembly "data/mitochondrial.fasta" \
--illumina_reads "data/illumina/*_{R1,R2}.fasta.bz" \
--pacbio_reads "../RawSequelData/m*.subreads.bam" \
--illumina_reads "*_{R1,R2}.fastq" \
--pacbio_reads "test.1.filtered.bam" \
--step 2 \
-profile slurm \
-resume
Expand Down
36 changes: 23 additions & 13 deletions docs/Case_03.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,27 +20,35 @@ We have listed some example files to test the pipeline based on Chromosome 30 Hz
Case 3 will take primary assembly from the `FALCON/4-polish` folder.

| Param | Files | Download link|
|:--|:--|:--
| --primary_assembly | "cns_p_ctg.fasta" | [cns_p_ctg.fasta](https://data.nal.usda.gov/system/files/cns_p_ctg.fasta) |
| --alternate_assembly | "cns_h_ctg.fasta" | [cns_h_ctg.fasta](https://data.nal.usda.gov/system/files/cns_h_ctg.fasta)|
| --mitochondrial_assembly | "GCF_022581195.2_ilHelZeax1.1_mito.fa" | [GenBank download fasta](https://www.ncbi.nlm.nih.gov/nuccore/NC_061507.1?report=fasta)|
| --illumina_reads |"testpolish_{R1,R2}.fq" | [testpolish_R1.fastq](https://data.nal.usda.gov/system/files/testpolish_R1.fastq), [testpolish_R2.fastq](https://data.nal.usda.gov/system/files/testpolish_R2.fastq) |
| --pacbio_reads | "test.1.filtered.bam" | [test.1.filtered.bam_.gz](https://data.nal.usda.gov/system/files/test.1.filtered.bam_.gz)|
|:--|:--|:--|
| `--primary_assembly` | "cns_p_ctg.fasta" | [cns_p_ctg.fasta](https://data.nal.usda.gov/system/files/cns_p_ctg.fasta) |
| `--alternate_assembly` | "cns_h_ctg.fasta" | [cns_h_ctg.fasta](https://data.nal.usda.gov/system/files/cns_h_ctg.fasta)|
| `--mitochondrial_assembly` | "GCF_022581195.2_ilHelZeax1.1_mito.fa" | [GenBank download fasta](https://www.ncbi.nlm.nih.gov/nuccore/NC_061507.1?report=fasta)|
| `--illumina_reads` |"testpolish_{R1,R2}.fastq" | [testpolish_R1.fastq](https://data.nal.usda.gov/system/files/testpolish_R1.fastq), [testpolish_R2.fastq](https://data.nal.usda.gov/system/files/testpolish_R2.fastq) |
| `--pacbio_reads` | "test.1.filtered.bam" | [test.1.filtered.bam_.gz](https://data.nal.usda.gov/system/files/test.1.filtered.bam_.gz)|

**Note:** The PacBio Reads (`test.1.filtered.bam_.gz`) must be decompressed before running the pipeline.

```
gunzip -dc test.1.filtered.bam_.gz > test.1.filtered.bam
```

### Recommended parameters

```
nextflow run isugifNF/polishCLR -r main \
--primary_assembly "data/primary.fasta" \
--alternate_assembly "data/alternate.fasta" \
--mitocondrial_assembly "data/mitochondrial.fasta" \
--illumina_reads "data/illumina/*_{R1,R2}.fasta.bz" \
--pacbio_reads "data/pacbio/pacbio.subreads.bam" \
--primary_assembly "cns_p_ctg.fasta" \
--alternate_assembly "cns_h_ctg.fasta" \
--mitochondrial_assembly "GCF_022581195.2_ilHelZeax1.1_mito.fa" \
--illumina_reads "*_{R1,R2}.fastq" \
--pacbio_reads "test.1.filtered.bam" \
--step 1 \
-profile slurm \
-resume
```

**Note:** On some browsers, the dashes (-) and underscores (_) can be copied incorrectly. So if you run into an error that says `not valid in the pipeline` try manually retyping those parameters.

Step 2 runs another round of Arrow polishing with the PacBio reads, then polishes with short-reads with two rounds of FreeBayes. We broke these two steps into seperate phases to allow for manual scaffolding.

Provide the purged primary `primary_purged.fa` and alternate contigs `haps_purged.fa` from purge_dups, and mitochondrial genome `mitochondrial.fasta` as input to step 2.
Expand All @@ -54,9 +62,11 @@ Regardless don't forget to include parameter flags `--step 2` and `resume` to th
--primary_assembly "primary_purged.fa" \
--alternate_assembly "haps_purged.fa" \
--mitochondrial_assembly "data/mitochondrial.fasta" \
--illumina_reads "data/illumina/*_{R1,R2}.fasta.bz" \
--pacbio_reads "../RawSequelData/m*.subreads.bam" \
--illumina_reads "*_{R1,R2}.fastq" \
--pacbio_reads "test.1.filtered.bam" \
--step 2 \
-profile slurm \
-resume
```


4 changes: 2 additions & 2 deletions docs/Install.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@ nextflow pull isugifNF/polishCLR -r main
Install dependencies in a [miniconda](https://docs.conda.io/en/latest/miniconda.html) environment.

```
wget https://raw.githubusercontent.com/isugifNF/polishCLR/main/environment.yml
wget https://raw.githubusercontent.com/isugifNF/polishCLR/main/other_dependencies.yml
[[ -d env ]] || mkdir env
conda env create -f environment.yml -p ${PWD}/env/polishCLR_env
conda env create -f other_dependencies.yml -p ${PWD}/env/polishCLR_env
conda activate env/polishCLR_env
Expand Down
4 changes: 2 additions & 2 deletions other_dependencies.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@ dependencies:
- samtools
- bwa-mem2
- pigz
- bbtools
#- bbtools
- purge_dups
- bamtools
- merfin
#- openjdk=11.0.15
#- nextflow
- nextflow
- busco=5.4.2

0 comments on commit 88624da

Please sign in to comment.