Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

pcr.seqs removes start position, keeps end position #348

Closed
rrohwer opened this Issue Jun 27, 2017 · 0 comments

Comments

Projects
None yet
2 participants

rrohwer commented Jun 27, 2017

In the pcr.seqs command, if you specify the start and end position the resulting sequences do not include the start position but do include the end position. This inconsistent behavior causes a bug when following the directions for trimming a reference alignment (Pat's blog post: http://blog.mothur.org/2016/07/07/Customization-for-your-region/ ) where the reference alignment does not include the first bp of your region and therefore you will lose the first bp of your sequences because they get trimmed off after aligning. This is confusing, so here's an example:

trim the reference alignment to the primer region as outlined in blog post

# do this for keepprimer=T and keepprimer=F in the first command
mothur "#pcr.seqs(fasta=ecoli_16S.fasta, oligos=$oligosfile, keepprimer=T, keepdots=T)"
mothur "#align.seqs(fasta=ecoli_16S.pcr.fasta, reference=$refalign)"
mothur "#summary.seqs(fasta=ecoli_16S.pcr.align)"
startnum=`awk '{print $2}' ecoli_16S.pcr.summary | sed '2q;d'`
endnum=`awk '{print $3}' ecoli_16S.pcr.summary | sed '2q;d'`
mothur "#pcr.seqs(fasta=$refalign, start=$startnum, end=$endnum, keepdots=F, nomatch=reject, pdiffs=0, rdiffs=0)" 

manually align the beginning of the resulting sequences after removing dashes (I truncated seqs for readability and chose an example silva sequence from the trimmed reference alignment)
the beginning base pair is missing after calling pcr.seqs(start= ,end=)

CCTACGGGNGGCWGCAG                                       --> forward primer
CCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGC --> trimmed e-coli with keepprimer=T
                 TGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGC --> trimmed e-coli with keepprimer=F
 CTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGC --> trimmed silva alignment keepprimer=T
                 .GGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGC --> trimmed silva alignment keepprimer=F

however the end of the trimmed sequence is not similarly offset

                                   ATTAGAWACCCBDGTAGTCC <-- reverse compliment of reverse primer
GACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCC <-- trimmed e-coli with keepprimer=T
GACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGG                     <-- trimmed e-coli with keepprimer=F
GACACTGAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCC <-- trimmed silva alignment keepprimer=T
GACACTGAGGTGCGAAAGCGTGGGGAGCAAACAGG                     <-- trimmed silva alignment keepprimer=F

So as you can see pcr.seqs with the oligos= option is behaving as expected, but with the start= and end= option is behaving inconsistently where start is not included and end is. You can narrow it down to this use of the command because the e-coli sequence was trimmed correctly but the silva sequence is missing the first bp with or without the primer included.

A temporary work-around is to use keepprimer=T when creating the reference alignment so that the full sequence region is included in it.

Thanks as always!

Robin

@mothur-westcott mothur-westcott added this to the Version 1.40.0 milestone Jun 28, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment