Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BAM rename flags #2 #92

Closed
apeltzer opened this issue Nov 20, 2018 · 5 comments
Closed

BAM rename flags #2 #92

apeltzer opened this issue Nov 20, 2018 · 5 comments

Comments

@apeltzer
Copy link
Member

Hm, ok. If you want to keep it the way it is then we need to consider changing the description slightly

  --bam_discard_unmapped        Discard an unmapped read file, depending on choice in --

Removing references to bam or fastq in the description makes it clearer you are not trying to actually define the file type in this flag.

That said, I still don't think this makes complete sense/is unnecessarily over complicated.

In principle I think it makes it simpler to just have a single: --bam_discard_unmapped_bam.

Use cases would be, assuming someone wants the unmapped reads:

does the user want unmapped reads in only bam format? Yes: use --bam_separate_unmapped
does the user want unmapped reads in both bam and fastq? Do above but with --bam_unmapped_to_fastq
does the user want unmapped reads in only fastq format? Do both 1) and 2) with --bam_unmapped_discard_bam

I think this would also work programatically. The current system in this commit I think has mixed messages with the one flag saying you do want to discard something but then an entire other flag that saying you also want to discard something, but additionally which one. The messages behind the flags are sort of overlapping.

Does this make sense? Or do you disagree?

@jfy133
Copy link
Member

jfy133 commented Nov 20, 2018

Your response on #81:

No I honestly agree, my statement was just short because I'm jumping from meeting to meeting atm :-D

would then require specifiying three parameters however, which is a bit much but I see no possibility to add that without these :-)


Indeed. The only other system we could use would be condensing those three use cases described above into a single flag with multiple options.

So

--bam_separate_unmapped <bam/fastq.gz/both> Separate unmapped reads into a separate file. Specify the file format you wish the unmapped reads to be stored in.

Would that be programmatically more complicated, based on the way you're currently structuring the code?

@apeltzer
Copy link
Member Author

Need to wrap my mind around this first :-)

@jfy133
Copy link
Member

jfy133 commented Nov 20, 2018

Uhh my explanation might not have been great, and I also got slightly muddled . What about a really simple case:

--bam_unmapped_reads      What to do with unmapped reads. Options are: "keep", "discard", "bam", "fastq", "both". Keep retains unmapped reads in the BAM file (default). Discard removes unmapped reads in the bam file. bam puts unmapped reads in a separate bam file. fastq puts unmapped reads in a fastq.gz file. both keeps unmapped reads in both BAM and FASTQ.gz files.

In pseudocode (if it helps):

Assuming we have as start

bwa samse -r "@RG\\tID:ILLUMINA-${prefix}\\tSM:${prefix}\\tPL:illumina" $fasta "${reads.baseName}".sai $reads | samtools sort -@ ${task.cpus} -O bam - > "${prefix}".sorted.bam
    samtools index "${prefix}".sorted.bam

then we do the following:

If (the user provides no BAM related flag or `--bam_unmapped_reads` == "keep") then;
    do nothing to BAM and proceed with `samtools idxstat` and `dedup` etc.

else if ( `--bam_discard_unmapped` == "discard"); then
  `samtools view -b -F 4 "${prefix}".sorted.bam > "${prefix}".sorted.mapped.bam`

else if (`--bam_discard_unmapped` == "bam"); then
    `samtools view -b -F 4 "${prefix}".sorted.bam > "${prefix}".sorted.mapped.bam`
    `samtools view -b -f 4 "${prefix}".sorted.bam > "${prefix}".sorted.unmapped.bam`

else if (`--bam_discard_unmapped` == "fastq"); then
    `samtools view -b -F 4 "${prefix}".sorted.bam > "${prefix}".sorted.mapped.bam`
    `samtools view -b -f 4 "${prefix}".sorted.bam | samtools fastq | gzip > "${prefix}".sorted.unmapped.fastq.gz`

else if (`--bam_discard_unmapped` == "both"); then
    `samtools view -b -F 4 "${prefix}".sorted.bam > "${prefix}".sorted.mapped.bam`
    `samtools view -b -f 4 "${prefix}".sorted.bam`
    `samtools fastq "${prefix}".sorted.bam` | gzip > "${prefix}".sorted.unmapped.fastq.gz
else; then 
   echo "uh oh!"
done

@apeltzer
Copy link
Member Author

That should work and is at least explanatory to everyone! I might tune it a bit, though I think we can do it in the same way. Love the one option controls the behaviour thing!

@apeltzer
Copy link
Member Author

apeltzer commented Dec 9, 2018

Ok, implemented this in the latest PR :-)

@apeltzer apeltzer closed this as completed Dec 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants