# Design cocktail SARS-CoV-2 vaccines
This notebook chooses/designs spikes for a SARS-CoV-2 cocktail vaccine.
The design is done by Jesse Bloom, for a project led by Drew Weissman.

We design the following vaccines:

 - **parent-vax**: unmutated spike(s) of leading SARS-CoV-2 variants at time of vaccine design.
 
 - **cocktail-vax**: spike(s) in *parent-vax* plus additional designed spikes with mutations predicted to be likely to occur in future human SARS-CoV-2.
 
 - **conservative-cocktail-vax**: like *cocktail-vax*, but fewer mutations per designed spike.
 
 - **aggressive-cocktail-vax**: like *cocktail-vax*, but more mutations per designed spike.

## Setup
Import Python modules:

In [6]:
import os
import subprocess
import tempfile
import urllib

import altair as alt

import Bio.Entrez
import Bio.SeqIO

import pandas as pd

import yaml

Read configuration:

In [2]:
with open("config.yaml") as f:
    config = yaml.safe_load(f)
    
print("Here is the configuration:\n")
print(yaml.dump(config))

Here is the configuration:

components_per_cocktail: 5
escapecalculator:
  kwargs:
    antibody_binding: https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/683abdd4c3277a3cbc20cddd7ab98ff844f9ef80/results/antibody_binding.csv
    antibody_ic50s: https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/683abdd4c3277a3cbc20cddd7ab98ff844f9ef80/results/antibody_IC50s.csv
    antibody_reweighting: https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/683abdd4c3277a3cbc20cddd7ab98ff844f9ef80/results/antibody_reweighting.csv
    antibody_sources: https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/683abdd4c3277a3cbc20cddd7ab98ff844f9ef80/results/antibody_sources.csv
    config: https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/683abdd4c3277a3cbc20cddd7ab98ff844f9ef80/config.yaml
  module_url: https://raw.githubusercontent.com/jbloomlab/SARS2-RBD-escape-calc/683abdd4c3277a3cbc20cddd7ab98ff844f9ef80/escapecalculator.py
  parent_

## Design *parent-vax* 
Get the sequences:

In [3]:
Bio.Entrez.email = "example@example.com"

print(f"Getting reference spike from accession {config['ref_spike']}")
with Bio.Entrez.efetch(id=config["ref_spike"], rettype="gb", retmode="text", db="protein") as f:
    ref_spike = Bio.SeqIO.read(f, "gb")
print(f"Got spike of length {len(ref_spike)}")

parents = {}
for name, acc in config["parent_spikes"].items():
    print(f"\nGetting parent spike {name} from {acc}")
    with Bio.Entrez.efetch(id=acc, rettype="gb", retmode="text", db="protein") as f:
        parents[name] = Bio.SeqIO.read(f, "gb")
    print(f"Got spike of length {len(parents[name])}")

Getting reference spike from accession YP_009724390
Got spike of length 1273

Getting parent spike XBB.1.16 from WCM02109
Got spike of length 1269


Get the lists of mutations in each parental spike relative to the Wuhan-Hu-1 reference, and also a mapping of site number to wildtypes:

In [25]:
parent_wts = {}

for name, seq in parents.items():
    with tempfile.TemporaryDirectory() as tmpdir:
        f_in = os.path.join(tmpdir, "in.fa")
        f_out = os.path.join(tmpdir, "out.fa")
        Bio.SeqIO.write([ref_spike, seq], f_in, "fasta")
        res = subprocess.run(
            ["muscle", "-align", f_in, "-output", f_out],
            check=True,
            capture_output=True,
        )
        assert res.returncode == 0
        aligned_ref, aligned_seq = list(Bio.SeqIO.parse(f_out, "fasta"))
    muts = []
    r = 1
    parent_wts[name] = {}
    for a_ref, a_seq in zip(aligned_ref.seq, aligned_seq.seq):
        parent_wts[name][r] = a_seq
        if a_ref != a_seq:
            muts.append(f"{a_ref}{r}{a_seq}")
        if a_ref != '-':
            r += 1
    print(f"\n{name} has the following {len(muts)} mutations:\n  " + "\n  ".join(muts))


XBB.1.16 has the following 43 mutations:
  T19I
  L24-
  P25-
  P26-
  A27S
  V83A
  G142D
  Y145-
  H146Q
  E180V
  Q183E
  V213E
  G252V
  G339H
  R346T
  L368I
  S371F
  S373P
  S375F
  T376A
  D405N
  R408S
  K417N
  N440K
  V445P
  G446S
  N460K
  S477N
  T478R
  E484A
  F486P
  F490S
  Q498R
  N501Y
  Y505H
  D614G
  H655Y
  N679K
  P681H
  N764K
  D796Y
  Q954H
  N969K


Write the *parent-vax* to a file:

In [5]:
os.makedirs("vax_designs", exist_ok=True)
parent_vax_file = "vax_designs/parent-vax.fa"
print(f"Writing parent-vax if {len(parents)} to {parent_vax_file}")
with open(parent_vax_file, "w") as f:
    for name, seq in parents.items():
        f.write(f">{name}\n{str(seq.seq)}\n")

Writing parent-vax if 1 to vax_designs/parent-vax.fa


## Now design mutated cocktails
First, set up escape calculator:

In [14]:
# get and import the module
_ = urllib.request.urlretrieve(
    config["escapecalculator"]["module_url"],
    "escapecalculator.py",
)
import escapecalculator

escape_calc = escapecalculator.EscapeCalculator(**config["escapecalculator"]["kwargs"])

print(f"Using the following virus: {escape_calc.virus=}")

Using the following virus: escape_calc.virus='XBB'


Now iterate through the cocktails and design the mutants according to the following criteria:

 - We include all the parents
 - For the remaining components, we:
   1. Pick a parent
   2. Choose the specified number of mutations that:
     - Cause the most escape according to the [RBD escape calculator](https://jbloomlab.github.io/SARS2-RBD-escape-calc/), but according to the condition no mutation is repeated in the cocktail.
   3. Pick the next parent (repeating a previously used one if needed), and pick a new set of mutations.
   

In [41]:
components_per_cocktail = config["components_per_cocktail"]
mutations_per_design = config["mutations_per_design"]
parent_existing_mut_sites = config["escapecalculator"]["parent_existing_mut_sites"]

if len(parents) >= components_per_cocktail:
    raise ValueError("nothing to design if as many parents and components")
print(f"All cocktails have {components_per_cocktail} components")

for cocktail, n_muts in mutations_per_design.items():
    print(f"\nDesigning {cocktail} with {n_muts} mutations per design")
    
    cocktail_seqs = [(name, str(seq.seq)) for name, seq in parents.items()]
    cocktail_mutations = set()  # set of all mutations in cocktail so far, do not repeat
    
    i = 0
    while len(cocktail_seqs) < components_per_cocktail:
        design_parent_name, design_parent_seq = list(parents.items())[i % len(parents)]
        i += 1
        
        mut_sites = list(parent_existing_mut_sites[design_parent_name])
        
        design_mutations = []
        for _ in range(n_muts):
            mut_scores = (
                escape_calc.escape_per_site(mut_sites)
                .assign(
                    parent_aa=lambda x: x["site"].map(parent_wts[design_parent_name]),
                    mutation=lambda x: x["parent_aa"] + x["site"].astype(str),
                    score=lambda x: x["retained_escape"],
                )
                .sort_values("score", ascending=False)
                .query("mutation not in @cocktail_mutations")
            )
            mutation = mut_scores["mutation"].iloc[0]
            site = mut_scores["site"].iloc[0]
            design_mutations.append(mutation)
            cocktail_mutations.add(mutation)
            mut_sites.append(site)
            
        cocktail_seqs.append(
            (
                design_parent_name + "_" + "_".join(design_mutations),
                "seq",
            )
        )
    print(cocktail_seqs)

All cocktails have 5 components

Designing cocktail with 4 mutations per design
[('XBB.1.16', 'MFVFLVLLPLVSSQCVNLITRTQSYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPALPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLDVYQKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLVGKEGNFKNLREFVFKNIDGYFKIYSKHTPINLERDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPVDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFHEVFNATTFASVYAWNRKRISNCVADYSVIYNFAPFFAFKCYGVSPTKLNDLCFTNVYADSFVIRGNEVSQIAPGQTGNIADYNYKLPDDFTGCVIAWNSNKLDSKPSGNYNYLYRLFRKSKLKPFERDISTEIYQAGNRPCNGVAGPNCYSPLQSYGFRPTYGVGHQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEYVNNSYECDIPIGAGICASYQTQTKSHRRARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLKRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKYFGGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFNGI

In [30]:
escape.query("site in [484, 478, 486]")

Unnamed: 0,site,original_escape,retained_escape,wt
137,478,0.018365,0.003594,R
143,484,0.044825,0.04039,A
145,486,0.017007,0.013597,P
