# Design cocktail SARS-CoV-2 vaccines
This notebook chooses/designs spikes for a SARS-CoV-2 cocktail vaccine.
The design is done by Jesse Bloom, for a project led by Drew Weissman.

We design the following vaccines:

 - **parent-vax**: unmutated spike(s) of leading SARS-CoV-2 variants at time of vaccine design.
 
 - **cocktail-vax**: spike(s) in *parent-vax* plus additional designed spikes with mutations predicted to be likely to occur in future human SARS-CoV-2.
 
 - **conservative-cocktail-vax**: like *cocktail-vax*, but fewer mutations per designed spike.
 
 - **aggressive-cocktail-vax**: like *cocktail-vax*, but more mutations per designed spike.

## Setup
Import Python modules:

In [34]:
import os
import subprocess
import tempfile

import altair as alt

import Bio.Entrez
import Bio.SeqIO

import pandas as pd

import yaml

Read configuration:

In [6]:
with open("config.yaml") as f:
    config = yaml.safe_load(f)
    
print("Here is the configuration:\n")
print(yaml.dump(config))

Here is the configuration:

components_per_cocktail: 5
mutations_per_design:
  aggressive-cocktail: 6
  cocktail: 4
  conservative-cocktail: 2
parent_spikes:
  XBB.1.16: WCM02109
ref_spike: YP_009724390



## Design *parent-vax* 
Get the sequences:

In [19]:
Bio.Entrez.email = "example@example.com"

print(f"Getting reference spike from accession {config['ref_spike']}")
with Bio.Entrez.efetch(id=config["ref_spike"], rettype="gb", retmode="text", db="protein") as f:
    ref_spike = Bio.SeqIO.read(f, "gb")
print(f"Got spike of length {len(ref_spike)}")

parents = {}
for name, acc in config["parent_spikes"].items():
    print(f"\nGetting parent spike {name} from {acc}")
    with Bio.Entrez.efetch(id=acc, rettype="gb", retmode="text", db="protein") as f:
        parents[name] = Bio.SeqIO.read(f, "gb")
    print(f"Got spike of length {len(parents[name])}")

Getting reference spike from accession YP_009724390
Got spike of length 1273

Getting parent spike XBB.1.16 from WCM02109
Got spike of length 1269


Get the lists of mutations in each parental spike relative to the Wuhan-Hu-1 reference:

In [42]:
for name, seq in parents.items():
    with tempfile.TemporaryDirectory() as tmpdir:
        f_in = os.path.join(tmpdir, "in.fa")
        f_out = os.path.join(tmpdir, "out.fa")
        Bio.SeqIO.write([ref_spike, seq], f_in, "fasta")
        res = subprocess.run(
            ["muscle", "-align", f_in, "-output", f_out],
            check=True,
            capture_output=True,
        )
        assert res.returncode == 0
        aligned_ref, aligned_seq = list(Bio.SeqIO.parse(f_out, "fasta"))
    muts = []
    r = 1
    for a_ref, a_seq in zip(aligned_ref.seq, aligned_seq.seq):
        if a_ref != a_seq:
            muts.append(f"{a_ref}{r}{a_seq}")
        if a_ref != '-':
            r += 1
    print(f"\n{name} has the following {len(muts)} mutations:\n  " + "\n  ".join(muts))


XBB.1.16 has the following 43 mutations:
  T19I
  L24-
  P25-
  P26-
  A27S
  V83A
  G142D
  Y145-
  H146Q
  E180V
  Q183E
  V213E
  G252V
  G339H
  R346T
  L368I
  S371F
  S373P
  S375F
  T376A
  D405N
  R408S
  K417N
  N440K
  V445P
  G446S
  N460K
  S477N
  T478R
  E484A
  F486P
  F490S
  Q498R
  N501Y
  Y505H
  D614G
  H655Y
  N679K
  P681H
  N764K
  D796Y
  Q954H
  N969K


Write the *parent-vax* to a file:

In [46]:
os.makedirs("vax_designs", exist_ok=True)
parent_vax_file = "vax_designs/parent-vax.fa"
print(f"Writing parent-vax if {len(parents)} to {parent_vax_file}")
with open(parent_vax_file, "w") as f:
    for name, seq in parents.items():
        f.write(f">{name}\n{str(seq.seq)}\n")

Writing parent-vax if 1 to vax_designs/parent-vax.fa
