If no arguments for either threshold or number of ambiguous bases is given, varVAMP automatically estimates values so that roughly 50 % of the alignment can be considered for the primer design. If both values are not supllied, varVAMP allows 2 ambiguous nucleotides within the primer and optimizes the threshold. For the qPCR mode, varVAMP chooses 1 ambiguous base less in the probe compared to the primers if no value is given. This ensures a higher probe specificity and lower degeneracy than that of the primers.
The alignment preprocessing contains three steps.
- Convert RNA to DNA
- Force nucleotides to lower charaters
- Clean gaps. This is the most important part. Larger insertions in single viral sequences enlarge the length of the alignment and therefore the length of amplicons will be overestimated. varVAMP masks regions it deletes with an "N" and knows now that it is not allowed to design primers spanning the potential insertion site.
varVAMP creates a consensus sequence based on a given threshold. If a nucleotide occurs with a frequency equal to or higher than the threshold in all aligned sequences, it is considered a consensus nucleotide. If not, the frequencies of the highest scoring nucleotides are added until the threshold is reached and the appropriate ambiguous nucleotide character is used. Importantly, varVAMP is aware of ambiguous nucleotides in the alignment and handels them by de-multiplexing the character into real nucleotides and adding its portion to the nucleotide counts at the alignment position.
varVAMP searches for potential primer regions as defined by a user-defined number of ambiguous bases that is allowed within the minimal length of a primer. The algorithm opens windows over the ambiguous consensus sequence and determines if a window satisfies these constraints.
varVAMP uses primer3-py
to search for potential primers. Some of the evaluation process, determining if primers match certain criteria, was adapted from primalscheme
. The primer search contains multiple steps:
- Digest the primer regions into kmers with the min and max length of primers. This is performed on a consensus sequence that does not contain ambiguous characters but is just the majority consensus of the alignment. Therefore, primer parameters will be later calculated for the best fitting primer.
- Evaluate if these kmers are potential primers independent of their orientation (temperature, GC, size, poly-x repeats and poly dinucleotide repeats) and dependent on their orientation (secondary structure, GC clamp, number of GCs in the last 5 bases of the 3' end and min 3' nucleotides without an ambiguous base). Filter for kmers that satisfy all constraints and calculate their penalties (explained in the last section).
- Single and tiled mode: Find primer with the lowest penalty. varVAMP sorts the primers by their penalty and always takes one with the lowest penalty if middle third of the primer has not been covered by a primer with a lower penalty. This greatly reduces the complexity of the later amplicon search while only retaining the best primer of a set of overlapping primers.
To search for the best amplicon, varVAMP uses a graph based approach.
- Create all possible amplicons with the given length constraints and ensure that primer pairs are not forming dimers.
- Create a graph containing all amplicons and their potential neighboring amplicons. To design a good scheme, the next primer has to lie within the second half of the current primer and satisfy the overlap constraint. The cost to go to a neighboring amplicon is determined by the amplicon penalty.
- Use the Dijkstra algorithm to find the path with the lowest costs from a given start node.
- Determine potential stop nodes as amplicons with the furthest stop in the alignment.
- Determine shortest paths between the start and all potential stop nodes. Get the lowest cost.
- Repeat steps 3-5 for each start node until the best coverage is reached. Voila! We have the best amplicon scheme with the lowest cumulative primer penalties in respect to the amplicon length!
- Lastly, the best scheme is evaluated for primer dimers in their respective pools. If a primer dimer pair is found, varVAMP evaluates for each primer their overlapping and previously not considered primers (primer search step 2) and again minimizes the penalty. The scheme and all primers are updated. If no alternative primers can be found, varVAMP issues a warning and reports the unsolvable primer dimers.
- varVAMP sorts all amplicons by their penalties and takes the non-overlapping amplicon with the lowest penalty!
- As varVAMP gives a size penalty to amplicons, varVAMP automatically finds amplicons with low primer penalties close to your optimal length (if possible).
- varVAMP generates a fasta query and searches for possible hits with the same settings as primer blast.
- For each amplicon varVAMP searches for off-targets, defined as hits in the db that are the maximum amplicons length multiplied by
BLAST_SIZE_MULTI
apart from each other. - SINGLE/qPCR mode: varVAMP first sorts amplicons if they generate a BLAST hit or not and then by penalty to avoid the selection of off-targets. TILED: Graph edges are defined as tuples (off-target: True/False, amplicon_penalty). For finding the path with the lowest costs, varVAMP first considers if an amplicon generates off-targets and then the amplicon penalty, thereby avoiding the selection of amplicons during the graph search that generate off-targets.
- Reports if the scheme has amplicons with off-targets in the log, in the primers.tsv (single/tiled) and in the qpcr_design.tsv/qpcr_primer.tsv files.
- Find probe regions with their own number of ambiguous characters.
- Digest and search for probes (with their own base parameter)
- Find primers flanking a probe
- Evaluate amplicon parameters
- Decide for the best primer/probe combination
- For amplicons with the same primer sets but different probes, decide for the lowest probe
- Test amplicons for their minimal free energy at their lowest primer temperature with
seqfold
and filter to avoid secondary structures. Amplicons with large potential deletions (>QAMPLICON_DEL_CUTOFF) will be ignored. Smaller deletions will be accepted. - Take the best qPCR schemes of overlapping schemes.
PRIMER_MAX_BASE_PENALTY
Each primer is penalized for its deviation from the optimal size, GC content and temperature. Base penalty is the sum of these deviations. Primer base penalties higher than the max base penalty are excluded.
PRIMER_3_PENALTY
Each position in the primer is penalized for mismatches in all sequences. If a 3' penalty is given the first in the tuple is multiplied with the frequency mismatch at the very 3' end. The next is multiplied with the -1 freq and so on. Increase penalty if you want to shift amplicons towards best 3' matching. Set the PRIMER_3_PENALTY
to 0 if you do not care about 3' mismatches.
PRIMER_PERMUTATION_PENALTY
The number of permutations of a primer is multiplied by the penalty. For example 24 permutations and a penalty of 0.1 will yield a penalty of 2.4. Set PRIMER_PERMUTATION_PENALTY
to 0 if you do not care about the number of permutations.
All penalties of a primer are summed up and yield a final penalty. The penalty for each amplicon is then the penalty of its LEFT + RIGHT primers multiplied by the fold increase of the amplicon length compared to the optional length. This insures that in the final scheme not only large amplicons are used.