# Getting Started with plexus

**plexus** automates the design of multiplex PCR panels — starting from a list of genomic targets (somatic mutations, SNPs, etc.) and producing a set of optimised primer pairs that can be ordered directly.

This notebook walks through the full design workflow using the Python API.  
If you prefer the command line, see the [README](../README.md) or run `plexus --help`.

## What this notebook covers

1. Prerequisites
2. Preparing your input file
3. Running the full pipeline with one call
4. Exploring the results — panel, junctions, primer pairs
5. Understanding the output files
6. Next steps

## 1. Prerequisites

Before running this notebook, make sure you have:

| Requirement | Notes |
| --- | --- |
| **plexus installed** | `uv pip install -e .` from the repo root, or `pip install plexus` |
| **NCBI BLAST+** | `blastn` must be on `$PATH`; used for specificity checking |
| **Reference FASTA** | hg38 (or the genome matching your targets); must be uncompressed and indexed (`samtools faidx`) |
| **bcftools** *(optional)* | Only needed for SNP checking; skip with `--skip-snpcheck` |

> **Tip**: If you don't have a local FASTA handy you can still run the design and BLAST steps against a network BLAST database, or use `skip_blast=True` to skip the specificity check entirely during exploration.

## 2. Preparing your input file

The input is a simple CSV with four required columns:

```csv
Name,Chrom,Five_Prime_Coordinate,Three_Prime_Coordinate
EGFR_T790M,chr7,55181378,55181378
KRAS_G12D,chr12,25245350,25245350
```

* `Name` — a human-readable label for the target (mutation name, gene, etc.)
* `Chrom` — chromosome (UCSC-style, e.g. `chr7`)
* `Five_Prime_Coordinate` / `Three_Prime_Coordinate` — 1-based genomic coordinates of the target position. For SNVs these are the same.

A sample file is already included in the repository:

In [None]:
import pandas as pd

junctions = pd.read_csv("../data/junctions.csv")
print(f"{len(junctions)} targets loaded")
junctions.head()

## 3. Running the full pipeline

`run_pipeline()` runs every step end-to-end:

1. Load & merge close junctions
2. Extract design regions from the reference FASTA
3. Enumerate candidate primers (`simsen` k-mer algorithm)
4. Filter by thermodynamic quality (Tm, GC%, hairpin ΔG, 3′-stability)
5. Find valid primer pairs within amplicon size constraints
6. BLAST specificity check (optional)
7. SNP overlap check (optional)
8. Multiplex optimisation — picks the combination of primer pairs with the lowest cross-dimer score
9. Save all output files

Set `fasta_file` to the path of your local hg38 FASTA.

In [None]:
from plexus.pipeline import run_pipeline

design_input_file = "../data/junctions.csv"
fasta_file = "/path/to/hg38.fa"  # <-- update this path

result = run_pipeline(
    design_input_file,
    fasta_file,
    output_dir="./output",
    panel_name="my_panel",
    # skip_blast=True,    # uncomment to skip BLAST during exploration
    # skip_snpcheck=True, # uncomment if bcftools is not installed
)

The function returns a `PipelineResult` object that holds the full panel, selected primer pairs, and metadata.

In [None]:
print(f"Junctions in panel : {len(result.panel.junctions)}")
print(f"Selected primer pairs: {len(result.selected_pairs)}")
print(f"Best multiplex cost  : {result.multiplex_solutions[0].cost:.2f}")
print(f"Steps completed      : {result.steps_completed}")

## 4. Exploring the results

### 4.1 The panel and its junctions

In [None]:
panel = result.panel

# Each junction holds the design region sequence and all candidate primer pairs
for j in panel.junctions:
    n_pairs = len(j.primer_pairs)
    print(f"{j.name:40s}  {n_pairs:>5} candidate pairs  region: {j.chrom}:{j.five_prime_coordinate}")

### 4.2 Inspecting a junction's design region

In [None]:
junction = panel.junctions[0]
print(f"Junction : {junction.name}")
print(f"Location : {junction.chrom}:{junction.five_prime_coordinate}")
print(f"Design region ({len(junction.design_region)} bp):")
print(junction.design_region)

### 4.3 The selected primer pairs

Each selected primer pair covers one junction. Inspect its key properties:

In [None]:
import pandas as pd

rows = []
for pp in result.selected_pairs:
    rows.append({
        "Target"       : pp.forward.name.rsplit("_", 2)[0],
        "Forward_Seq"  : pp.forward.seq,
        "Reverse_Seq"  : pp.reverse.seq,
        "Fwd_Tm"       : round(pp.forward.tm, 1),
        "Rev_Tm"       : round(pp.reverse.tm, 1),
        "Insert_bp"    : pp.insert_size,
        "Amplicon_bp"  : pp.amplicon_length,
        "Pair_penalty" : round(pp.pair_penalty, 1),
        "Off_targets"  : len(pp.off_target_products),
    })

pd.DataFrame(rows)

### 4.4 Detailed view of a single primer pair

In [None]:
pp = result.selected_pairs[0]

print("--- Forward primer ---")
print(f"  Sequence  : {pp.forward.seq}")
print(f"  Length    : {pp.forward.length} bp")
print(f"  Tm        : {pp.forward.tm:.1f} °C")
print(f"  GC%%       : {pp.forward.gc:.1f}%%")
print(f"  % Bound   : {pp.forward.bound:.1f}")

print("\n--- Reverse primer ---")
print(f"  Sequence  : {pp.reverse.seq}")
print(f"  Length    : {pp.reverse.length} bp")
print(f"  Tm        : {pp.reverse.tm:.1f} °C")
print(f"  GC%%       : {pp.reverse.gc:.1f}%%")
print(f"  % Bound   : {pp.reverse.bound:.1f}")

print("\n--- Pair ---")
print(f"  Insert size    : {pp.insert_size} bp")
print(f"  Amplicon size  : {pp.amplicon_length} bp")
print(f"  Dimer score    : {pp.dimer_score:.2f}")
print(f"  Pair penalty   : {pp.pair_penalty:.1f}")
print(f"  Off-targets    : {len(pp.off_target_products)}")

print(f"\n  Amplicon sequence:")
print(f"  {pp.amplicon_sequence}")

### 4.5 Comparing multiplex solutions

The optimiser produces several ranked solutions. Inspect the top ones:

In [None]:
for i, sol in enumerate(result.multiplex_solutions):
    print(f"Solution {i+1}  cost={sol.cost:.2f}  pairs={len(sol.primer_pairs)}")

## 5. Understanding the output files

All files are written to the `output/` directory (configurable with `output_dir`):

| File | Description |
| --- | --- |
| `selected_multiplex.csv` | **The main result.** One row per junction — forward/reverse sequences (bare + full with adapter tail), Tm, amplicon size, penalty, off-target count |
| `candidate_pairs.csv` | All candidate primer pairs before optimisation |
| `top_panels.csv` | All top-ranked multiplex solutions |
| `off_targets.csv` | BLAST-detected off-target amplification products |
| `panel_summary.json` | Machine-readable summary with provenance, config used, and step results |
| `blast/all_primers.fasta` | Unique primer sequences sent to BLAST |
| `plexus_*.log` | Verbose log of every design decision |

In [None]:
# Load and preview the primary result file
import pandas as pd
from pathlib import Path

output_dir = Path(result.output_dir)
selected = pd.read_csv(output_dir / "selected_multiplex.csv")
selected

## 6. Next steps

### CLI usage

Everything in this notebook can also be done from the command line:

```bash
plexus run \
  --input data/junctions.csv \
  --fasta /path/to/hg38.fa \
  --output results/ \
  --name my_panel
```

Run `plexus --help` for all options.

### Tuning the design parameters

Pass a `config_file` (JSON) to `run_pipeline()` or `--config` on the CLI to override any parameter.  
A minimal example to widen the Tm window:

```json
{
    "singleplex_design_parameters": {
        "PRIMER_MIN_TM": 55.0,
        "PRIMER_MAX_TM": 66.0
    }
}
```

See `config/designer_default_config.json` for all available parameters.

### SNP checking

Filter primers that overlap common germline variants — useful for liquid biopsy panels:

```bash
plexus init          # downloads a bundled gnomAD VCF subset
plexus run -i junctions.csv -f hg38.fa --snp-strict
```

### Multi-patient / multi-panel inputs

Add a `Panel` column to your CSV to design independent panels for multiple patients in one run:

```bash
plexus run -i cohort.csv -f hg38.fa --parallel
```

### Docker / clinical deployment

For containerised or regulated environments, plexus ships a compliance mode —  
see the [README](../README.md#compliance-mode-and-container-deployment) for the Docker workflow.