<a href="https://colab.research.google.com/github/sanjaynagi/ax-vampir-hackathon/blob/main/Ax-vampIR-hackathon.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<center>
<figure>
  <img src="https://raw.githubusercontent.com/sanjaynagi/ax-vampir-hackathon/refs/heads/main/docs/new-vigg-logo1.png" alt="Description" width="500" height="180">
</figure>
</center>


## **Ax-vampIR VIGG Hackathon: Designing amplicon panels in other disease vectors**

Targeted genomic surveillance through amplicon sequencing allows us
to focus on specific genomic regions of interest, enabling routing surviellance at larger scale than whole-genome sequencing. For vectors,
this enables the tracking of insecticide resistance, species
ID markers, and other important genetic variants.

#### **Workshop Aims**

1.  Find orthologous insecticide resistance loci in your target vector
    species using IRTHO
2.  Design a multiplexed PCR primer panel targeting these loci using
    MULTIPLY
<br></br>



### Ag-vampIR and AmpSeeker recap

Ag-vampIR (Anopheles gambiae Vector Amplicon Marker Panel for
Insecticide Resistance) is a targeted sequencing panel designed for
genomic surveillance of malaria vectors. The panel targets 90 important
genomic loci across 80 amplicons in the Anopheles gambiae genome,
including:

-   55 insecticide resistance-associated SNPs
    -   Target-site mutations (kdr, Ace1, Rdl)
    -   Metabolic resistance markers
    -   SNPs tagging selective sweeps
-   35 ancestry informative markers (AIMs) for species identification
-   Two amplicons targeting the doublesex gene drive locus

Each amplicon is approximately 200 bp long. The protocols design allows
multiplexing of up to 1,536 samples on a single flow cell using dual
indexing with 96 i7 and 16 i5 adaptors.

#### **AmpSeeker**

AmpSeeker is an open-source computational pipeline designed for
reproducible analysis of amplicon sequencing data, which can analyze
data from any Illumina amplicon sequencing panel.

<center>
<figure>
  <img src="https://raw.githubusercontent.com/sanjaynagi/ax-vampir-hackathon/refs/heads/main/docs/figure1_new.png" alt="Description" width="600" height="600">
  <figcaption>Figure: A) Ag-vampIR loci B) Overview of the Ag-vampIR laboratory protocol and AmpSeeker bioinformatics pipeline. </figcaption>
</figure>
</center>

<br></br>


## Part 1: Introduction to Irtho

Many known mutations that confer insecticide resistance in insects are examples of parallel evolution; they are shared between species. Although you would ideally have a set of resistance markers to target in your organism of choice, exploiting this fact can help us design panels in other, lesser known vectors. The aim of the first part of this workshop is take known insecticide resistance loci in one vector, and find the orthologous regions in another.

Irtho is a Python package that identifies orthologous loci between
reference genomes. It uses the tool [OrthoFinder](https://github.com/davidemms/OrthoFinder) to find orthologous genes between reference genomes, and provides tools to support the analysis.

Key capabilities:
- Identifies orthologous genes between species   
- Determines synteny between orthologous genes   
- Aligns protein sequences to find the orthologous positions of amino acid residues
- Suggests residues to target in case of no prior knowledge

![IRTHO Workflow](https://placeholder-for-irtho-workflow.png) *Figure 2:
IRTHO workflow diagram*

### Available Reference Genomes

Currently available in pre-computed OrthoFinder results:
- Anopheles gambiae (PEST)
- Anopheles stephensi (UCISS2018)
- Anopheles dirus (WRAIR2)
- Anopheles minimus (MINIMUS2)
- Anopheles sinensis (CHINA)
- Aedes aegypti (LVP_AGWG)   
- Aedes albopictus (AalbF5)
- Culex quinquefasciatus (JHB2020)
- Lutzomyia Longipalpis (ASM)
- Additional vectors can be added using protein FASTA and GFF files from
VectorBase or RefSeq

For the hackathon I suggest sticking to the pre-computed results, although i note that OrthoFinder is super fast.



## Part 2: Introduction to Multiply

Multiply is a command-line tool for designing multiplexed PCR primers.
It was recently published in [Nature Comms](https://www.nature.com/articles/s41467-024-45688-z) alongside NOMADS8 and NOMADS16, nanopore panels for *Plasmodium falciparum* and supports both Illumina and Nanopore.

<center>
<figure>
  <img src="https://raw.githubusercontent.com/JasonAHendry/multiply/master/.images/multiply-pipeline.png" alt="Description" width="800" height="600">
  <figcaption>MULTIPLY pipeline overview</figcaption>
</figure>
</center>

### Design Considerations

When designing amplicon panels, it is important to consider these
factors:

**Sequencing Platforms**  

This workshop can be used to design amplicon panels for both Illumina
and Oxford Nanopore Technologies (ONT) sequencing. Illumina enables use
of the AmpSeeker pipeline and is consistent with the ongoing training in
the EAVES and WAVES projects. The only modification needed for Nanopore
would be adjusting amplicon size in MULTIPLY.

**Amplicon Size**  
For Illumina sequencing, aim for ~200bp to enable
complete coverage with paired-end reads and maximize sequencing
efficiency. For Nanopore, amplicons can be 400-3000bp.

**Primer Specificity**  
Design primers 18-22 bases long with 40-60% GC
content and similar melting temperatures (57-63°C) to ensure specific
amplification of target regions.

**Genetic Variation**  
Check existing population genomic data to avoid
placing primers over known polymorphic sites, which can cause allele
dropout and false negatives. Multiply can take a BED file of SNP
locations and consider this when selecting optimal primers.

**Multiplexing Compatibility**  
Avoid complementary sequences between
primers that could form dimers, and aim for similar GC content and
amplicon sizes. Multiply takes care of this.


# **Workshop**

In [None]:
# Clone required repositories
!git clone https://github.com/sanjaynagi/ax-vampir-hackathon
!git clone https://github.com/sanjaynagi/irtho
!git clone https://github.com/JasonAHendry/multiply

# get bedtools if performing snp analysis
# !wget https://github.com/arq5x/bedtools2/releases/download/v2.29.1/bedtools-2.29.1.tar.gz
# !tar -zxvf bedtools-2.29.1.tar.gz
# !cd bedtools2 && make && cp bin/bedtools /bin/

In [None]:
%pip install -e /content/multiply /content/irtho primer3-py pandas numpy numba pysam seaborn biopython

In [5]:
import irtho
import multiply
import pandas as pd

### Part 1: Finding Orthologous Loci with IRTHO

In [11]:
# Load input targets
targets_df = pd.read_csv("/content/ax-vampir-hackathon/resources/irtho-targets.tsv", sep="\t")

# Initialize ortholog finder
reference_species = "AgambiaePEST"
target_species = "AstephensiUCISS2018"
results_dir = "/content/ax-vampir-hackathon/results/orthofinder/Results_Feb03_1"

In [13]:
ortho = irtho.Orthologs(results_dir)

targets_df = ortho.map_input_genes_to_orthologs(
    targets_df,
    reference_species,
    target_species
)

targets_df = irtho.split_one_to_many_orthologs(targets_df, target_species)

# Find orthologous targets and generate BED file
final_targets_df = ortho.find_orthologous_targets(
    targets_df,
    reference_dir="reference/",
    ref_genome=reference_species,
    target_genome=target_species
)

# Export to BED format
# final_targets_df.to_csv("target_regions.bed", sep="\t", index=False)

KeyError: 'AstephensiUCISS2018'

### Part 2: Designing Primers with MULTIPLY

In [None]:

# Download reference genome for your target species
!multiply download -g <YourSpecies>

# Create design file
cat > design.ini << EOF
[Design]
name = ax-vampir-test
target_bed = target_regions.bed
primer3_settings = standard
EOF

# Run MULTIPLY pipeline
!multiply pipeline -d design.ini

## Resources

-   irtho Documentation: [GitHub](https://github.com/sanjaynagi/irtho)
-   multiply Documentation: [GitHub](https://github.com/JasonAHendry/multiply)
-   multiply Paper: [Nature Communications](https://www.nature.com/articles/s41467-024-45688-z)
-   VectorBase: <https://vectorbase.org>