You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
there are many tools (bbmerge.sh from BBmap, NGmerge, VSEARCH, USEARCH, ...) which can look for some overlap between members of a FASTA/Q read pair but is there a tool which can use already mapped pair members to their common reference and just merge them, evtl. insert the needed Ns as a linker in between with proper length? I have the original .fastq.gz files still around so it could maybe just act on position-sorted SAM/BAM and poke through the synced read pairs in R1 and R2 files and merge them accordingly (replace Ns with a nucleotide from the other mate, eventually prefer higher QUAL)?
The intention of samtools fastq is very much to reverse the alignment process. Ie to take an aligned file and reproduce something akin to the original instrument outputs so it can then be realigned with other tools and/or parameters. This would be quite a departure from that goal.
So no, it cannot do this currently. I'll have to take a look at what other tools do, but this does feel a bit out of the normal remit for samtools fastq.
Edit: it's also a complex thing to do with many weird corner cases.
Eg what to do about alignments that are mapping incorrectly, with the same strand when they should differ? Or where the insert size is actually negative as they point away from each other instead of towards one another? What about the other sequencing strategies, like 454's approach where they circularised it and sequenced over an adapter and the read was then split into two in software? Would we need to add the adapter back? Also what about pairs mapped to different chromosomes, or the same chromosome but MBs apart? What do we do about singletons where only 1 read has been found?
Even implementing it efficiently is non-trivial (unless it's name collated) if we want to deal with distant read-pairs.
It sounds like there are so many potential pit falls and questions that this would be complex to implement and a substantial piece of work. I question the need for us to do this unless there are multiple groups wanting it and the existing tools out there don't already fulfill the requirements.
Hi,
there are many tools (bbmerge.sh from BBmap, NGmerge, VSEARCH, USEARCH, ...) which can look for some overlap between members of a FASTA/Q read pair but is there a tool which can use already mapped pair members to their common reference and just merge them, evtl. insert the needed
N
s as a linker in between with proper length? I have the original.fastq.gz
files still around so it could maybe just act on position-sorted SAM/BAM and poke through the synced read pairs inR1
andR2
files and merge them accordingly (replaceN
s with a nucleotide from the other mate, eventually prefer higherQUAL
)?https://sourceforge.net/projects/bbmap/
https://drive5.com/usearch/manual/merge_pair.html
https://github.com/jsh58/NGmerge
https://cme.h-its.org/exelixis/web/software/pear/
https://gitlab.com/german.tischler/biobambam2/-/blob/master/src/programs/bamtofastq.1
Thank you,
The text was updated successfully, but these errors were encountered: