Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save transcriptome BAM files when using --save_umi_intermeds / --save_align_intermeds #931

Closed
pcantalupo opened this issue Jan 19, 2023 · 3 comments
Labels
bug Something isn't working
Milestone

Comments

@pcantalupo
Copy link
Contributor

Description of the bug

I’m using v3.9 with --save_umi_intermeds and --save_align_intermeds to save all BAM files. After pipeline finishes, I find these BAM files in results

$ find results -type f -name '*bam' | cut -f 3 -d/ | sort
22-0283.Aligned.out.bam
22-0283.Aligned.toTranscriptome.out.bam
22-0283.markdup.sorted.bam
22-0283.sorted.bam
22-0283.umi_dedup.sorted.bam

However, SALMON_QUANT uses the BAM file 22-0283.umi_dedup.transcriptome.filtered.bam (see below). Shouldn’t it be copied to results/ when using those intermeds parameters?

$ cat work/86/65912548cbefe7d020967895bb8837/.command.sh
#!/bin/bash -ue
salmon quant \
    --geneMap Mus_musculus.GRCm39.105.gtf \
    --threads 6 \
    --libType=A \
    -t genome.transcripts.fa \
    -a 22-0283.umi_dedup.transcriptome.filtered.bam \   <========= BAM file not saved to `results/` 
    -o 22-0283

Additionally, I found 4 bam files in work/ that are not copied to results. Is this a design choice even when using the intermeds parameters?

$ find work -type f -name '*bam' | cut -f 4 -d/ | sort
22-0283.Aligned.out.bam
22-0283.Aligned.toTranscriptome.out.bam
22-0283.markdup.sorted.bam
22-0283.sorted.bam
22-0283.transcriptome.sorted.bam                    <=== NEW
22-0283.umi_dedup.sorted.bam
22-0283.umi_dedup.transcriptome.bam              <=== NEW
22-0283.umi_dedup.transcriptome.filtered.bam   <=== NEW
22-0283.umi_dedup.transcriptome.sorted.bam   <=== NEW

Thank you

Command used and terminal output

nextflow run /ihome/crc/install/genomics_nextflow/nf-core-rnaseq-3.9/workflow \
 --fasta /bgfs/projects/refs/ENSEMBL/mouse/Mus_musculus.GRCm39.dna.primary_assembly.fa \
 --gtf /bgfs/projects/refs/ENSEMBL/mouse/Mus_musculus.GRCm39.105.gtf \
 --save_reference \
 --save_trimmed --save_umi_intermeds --save_align_intermeds \
 --pseudo_aligner salmon --salmon_quant_libtype A \
 --with_umi \
 --umitools_extract_method "regex" \
 --umitools_bc_pattern2 "^(?P<umi_1>.{8})(?P<discard_1>.{6}).*" \
 --input samplesheet.csv --outdir results \
 -profile htc \
 -resume

Relevant files

No response

System information

pipeline version: 3.9
nextflow 22.04.3
singularity 3.9.6
slurm on HPC
Linux

@pcantalupo pcantalupo added the bug Something isn't working label Jan 19, 2023
@drpatelh drpatelh added this to the 3.11 milestone Mar 5, 2023
@drpatelh drpatelh changed the title The BAM file used for Salmon quantitation is not saved when using the various intermeds parameters Save transcriptome BAM files when using --save_umi_intermeds / --save_align_intermeds Mar 16, 2023
drpatelh added a commit to drpatelh/nf-core-rnaseq that referenced this issue Mar 16, 2023
@drpatelh
Copy link
Member

Fixed in 07ddd71

Hi @pcantalupo ! This was intentional behaviour as I didn't see the point in publishing "too" many BAM files, especially since this is configurable now via a custom config. However, I have changed the default behaviour now to publish everything just in case these files are actually useful to debug stuff. I think I got all of them so please have a check after the next release and lemme know if I missed anything. (when using --save_umi_intermeds --save_align_intermeds --with_umi)

$ find ./ -type f -name "*.bam" | sort | grep "WT_REP1"
./star_salmon/WT_REP1.Aligned.out.bam
./star_salmon/WT_REP1.Aligned.toTranscriptome.out.bam
./star_salmon/WT_REP1.sorted.bam
./star_salmon/WT_REP1.transcriptome.sorted.bam
./star_salmon/WT_REP1.umi_dedup.sorted.bam
./star_salmon/WT_REP1.umi_dedup.transcriptome.bam
./star_salmon/WT_REP1.umi_dedup.transcriptome.filtered.bam
./star_salmon/WT_REP1.umi_dedup.transcriptome.sorted.bam

$ find ./ -type f -name "*.bai" | sort | grep "WT_REP1"
./star_salmon/WT_REP1.sorted.bam.bai
./star_salmon/WT_REP1.transcriptome.sorted.bam.bai
./star_salmon/WT_REP1.umi_dedup.sorted.bam.bai
./star_salmon/WT_REP1.umi_dedup.transcriptome.sorted.bam.bai

drpatelh added a commit that referenced this issue Mar 16, 2023
@pcantalupo
Copy link
Contributor Author

I checked release 3.11.1 and it looks good. Thank you

@drpatelh
Copy link
Member

Great! Thanks for confirming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants