Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve BWA index workflow #596

Closed
jfy133 opened this issue Oct 27, 2020 · 0 comments · Fixed by #594
Closed

Improve BWA index workflow #596

jfy133 opened this issue Oct 27, 2020 · 0 comments · Fixed by #594
Labels
enhancement New feature or request partially-fixed pending Addressed on branch waiting for related PR
Milestone

Comments

@jfy133
Copy link
Member

jfy133 commented Oct 27, 2020

Currently we by default unzip the FASTA file

eager/main.nf

Lines 280 to 300 in f2a326d

if("${params.fasta}".endsWith(".gz")){
process unzip_reference{
tag "${zipped_fasta}"
input:
path zipped_fasta from file(params.fasta) // path doesn't like it if a string of an object is not prefaced with a root dir (/), so use file() to resolve string before parsing to `path`
output:
path "$unzip" into ch_fasta into ch_fasta_for_bwaindex,ch_fasta_for_bt2index,ch_fasta_for_faidx,ch_fasta_for_seqdict,ch_fasta_for_circulargenerator,ch_fasta_for_circularmapper,ch_fasta_for_damageprofiler,ch_fasta_for_qualimap,ch_fasta_for_pmdtools,ch_fasta_for_genotyping_ug,ch_fasta_for_genotyping_hc,ch_fasta_for_genotyping_freebayes,ch_fasta_for_genotyping_pileupcaller,ch_fasta_for_vcf2genome,ch_fasta_for_multivcfanalyzer,ch_fasta_for_genotyping_angsd
script:
unzip = zipped_fasta.toString() - '.gz'
"""
pigz -f -d -p ${task.cpus} $zipped_fasta
"""
}
} else {
fasta_for_indexing = Channel
.fromPath("${params.fasta}", checkIfExists: true)
.into{ ch_fasta_for_bwaindex; ch_fasta_for_bt2index; ch_fasta_for_faidx; ch_fasta_for_seqdict; ch_fasta_for_circulargenerator; ch_fasta_for_circularmapper; ch_fasta_for_damageprofiler; ch_fasta_for_qualimap; ch_fasta_for_pmdtools; ch_fasta_for_genotyping_ug; ch_fasta__for_genotyping_hc; ch_fasta_for_genotyping_hc; ch_fasta_for_genotyping_freebayes; ch_fasta_for_genotyping_pileupcaller; ch_fasta_for_vcf2genome; ch_fasta_for_multivcfanalyzer;ch_fasta_for_genotyping_angsd }
}

however, both bwa index and picard CreateSequenceDictinary does allow indexing of gzipped files. This can be confusing for people who have already BWA indexed their genomes while gzipped, and this then gets out of sync with the nf-core/eager bwa mapping command where it has unzipped the FASTA (so now doesn't have .gz) but then the actual supplied indicies do have the .gz suffix.

It would make more intuitive sense that we skip the whole unzipping thing unless we are running samtools faidx. However we need to check if this may affect downstream analysis.

In the meantime we should document this properly.

Originally reported by @marcel-keller

@jfy133 jfy133 added the enhancement New feature or request label Oct 27, 2020
jfy133 added a commit that referenced this issue Oct 27, 2020
@jfy133 jfy133 added pending Addressed on branch waiting for related PR partially-fixed labels Oct 27, 2020
@jfy133 jfy133 added this to the 2.2.1 milestone Oct 27, 2020
@jfy133 jfy133 mentioned this issue Oct 27, 2020
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request partially-fixed pending Addressed on branch waiting for related PR
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant