# Primer evaluation

Example of analysis for primers obtained through the pipeline.

## Genera: *Viola*, Gene: matK 

> Input configuration (in **config/config.yml**)
>```yml
>    genes: ["matk"]
>    genera: ["viola"]
>```
>   *Executed: 2025/06/17*

NCBI exploration with `Entrez` (rule: **exploration**) obteined de followed NCBIs IDs:

```raw
    2838045729	2838045547	2838045453
    2838045268	2820060830	2820060822
    2820060818	2820060814	2629966613
    2736032405	2736032403	2736032401
    2736032399	2736032397	2718041013
    2687757718	2502706083	2502704783
    2502703935	2618954001	
```

### Determine the amplicon for de selected primers

The **primer pair** with the lowest penalty score was selected (**0.428421**), from the output file of de pipeline (**data/viola_matk_primers.txt**)

| Primer | Penalt. Score | Start. Pos. | Length | Sequence             |
| ------ | ------------- | ----------- | ------ | -------------------- |
| left   | 0.253347      | 613         | 20     | TCCAAGCATTCCCTCTCCCT |
| rigth  | 0.175073      | 828         | 20     | ATCAGCCCGAGTCGGTTTAC |

In [None]:
# Amplicon for best primer pair
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from os.path import join, abspath, exists
from os import getcwd

genes = "matk"
genera = "viola"

PATH_CONS = abspath(join(getcwd(), "..", "data", f"{genera}_{genes}_cons.fasta"))
PATH_AMP = abspath(join(getcwd(), "..", "data", f"{genera}_{genes}_ampl.fasta"))

record = SeqIO.read(PATH_CONS, "fasta")
sequence = str(record.seq)

# Selected primers positions
left_start = 613
left_len = 20
right_start = 828
right_len = 20

left_primer = sequence[left_start:left_start + left_len]
right_primer = sequence[right_start:right_start + right_len]

amplicon = sequence[left_start:right_start + right_len]

print("\nPrimer forward (5'→3'):", left_primer)
print("Primer reverse (5'→3'):", right_primer[::-1])

print("\nPositions:")
print(f"Forward primer: {left_start}–{left_start+left_len-1}")
print(f"Reverse primer: {right_start}–{right_start+right_len-1}")

print("\nAmplicon:")
print(amplicon, "\n")

record = SeqRecord(
    Seq(amplicon),
    id="viola_amplicon",
    description="PCR product using best primer pair"
)

if not exists:
    with open(PATH_AMP, "w") as handle:
        SeqIO.write(record, handle, "fasta")



Primer forward (5'→3'): TCTTTGCATTTATTACGACT
Primer reverse (5'→3'): AAAGTATCTTTATATAAGCA

Positions:
Forward primer: 613–632
Reverse primer: 828–847

Amplicon:
TCTTTGCATTTATTACGACTCTTTCTTCATGAGTATTGGAATTTGAnnnACAGTCTTATTATTCCAAAGAAATCTATTTCCATTTTTGCAAAGGATAATCCAAGATTATTCTTGTTCTTATATAATTTTCATGTATATGAATACGAATCTATTCTCTTCTTTCTTCGTAACCAATCCTTTCATTTACAATCAACATTTTTTCGAGTCCTTTTTGAACGAATATATTTCTATGAAA 



### BLAST search

To evaluate the primer affinity for *Viola* spp., a BLAST search of the sequence amplified by the primers was performed in the NCBI database.

In [9]:
# perform a BLAST search to NCBI sequences database
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML
from Bio import SeqIO
import dotenv
from os import getenv, getcwd
from os.path import join, exists, abspath

dotenv.load_dotenv()
NCBIWWW.email = getenv("email")

PATH_AMP = abspath(join(getcwd(), "..", "data", f"{genera}_{genes}_ampl.fasta"))
PATH_XML_BLAST = abspath(join(getcwd(), "..", "data", f"{genera}_{genes}_blast.xml"))

amplicon = SeqIO.read(PATH_AMP, "fasta")

# store results to XML
if not exists(PATH_XML_BLAST):
    result = NCBIWWW.qblast("blastn", "nt", amplicon.seq)
    with open(PATH_XML_BLAST, "w") as handle:
        handle.write(result.read())
        result.close()

with open(PATH_XML_BLAST) as result:
    blast_record = NCBIXML.read(result)

for alignment in blast_record.alignments:
    print("Organism:", alignment.hit_def)
    print("Score:", alignment.hsps[0].score)
    print("E-value:", alignment.hsps[0].expect)
    print("---")


Organism: Viola cucullata voucher AP011 maturase K (matK) gene, partial cds; chloroplast
Score: 453.0
E-value: 1.44062e-109
---
Organism: Viola elatior voucher KG23-0398 maturase K (matK) gene, partial cds; chloroplast
Score: 453.0
E-value: 1.44062e-109
---
Organism: Viola odorata genome assembly, organelle: plastid:chloroplast
Score: 453.0
E-value: 1.44062e-109
---
Organism: Viola renifolia voucher 09PROBE-05214 maturase K (matK) gene, partial cds; chloroplast
Score: 453.0
E-value: 1.44062e-109
---
Organism: Viola labradorica isolate AG2KK53 maturase K (matK) gene, partial cds; chloroplast
Score: 453.0
E-value: 1.44062e-109
---
Organism: Viola odorata chloroplast, complete genome
Score: 453.0
E-value: 1.44062e-109
---
Organism: Viola sororia isolate OSBAR 000338 maturase K (matK) gene, partial cds; chloroplast
Score: 453.0
E-value: 1.44062e-109
---
Organism: Viola labradorica voucher AP449 maturase K (matK) gene, partial cds; chloroplast
Score: 453.0
E-value: 1.44062e-109
---
Organism

### Conclusion
The sequence amplified by the highest-quality primers obtained using the Primer3 tool returns only species from the genus *Viola* when performed using BLAST.

This preliminary evaluation allows for a rough validation of the pipeline.