# Analyzing escape from polyclonal antibodies

## Overview and theory
Here we consider the situations where a viral protein is bound by polyclonal antibodies, such as might be found in sera.
We want to determine the contribution of each mutation to escaping these antibodies, being cognizant of the fact that different antibodies target different epitopes.

The actual experimental measurable quantity is as follows: at each concentration $c$ of the antibody mixture, we measure $p_v\left(c\right)$, which is the fraction of all variants $v$ that escape binding (or neutralization, whichever is being experimentally measured) by all antibodies in the mix.
For instance, $p_v\left(c\right)$ might be fraction of variants $v$ that fall into an antibody-escape FACS bin (in a yeast-display deep mutational scanning experiment as in [Greaney et al (2020)](https://www.sciencedirect.com/science/article/pii/S1931312821000822)) or that are not neutralized (in a virus neutralization deep mutational scanning experiment as in [Lee et al (2019)](https://elifesciences.org/articles/49324)).

We assume that antibodies in the mix can bind to one of $E$ epitopes on the protein.
Let $U_e\left(v,c\right)$ be the fraction of the time that epitope $e$ is not bound on variant $v$ when the mix is at concentration $c$.
Then assuming antibodies bind independently without competiton, the overall experimentally measured fraction of variants that escape binding at concentration $c$ is simply:
$$
p_v\left(c\right) = \prod_{e=1}^E U_e\left(v, c\right),
\label{pv} \tag{1}
$$
where $e$ ranges over the $E$ epitopes.

We next want to write $U_e\left(v,c\right)$ (the fraction of time epitope $e$ is unbound for variant $v$ at concentration $c$) in terms that can be related to underlying physical properties like the relative concentrations of antibodies targeting different epitopes, and the affinities of these antibodies.
If we assume that there is no competition among antibodies binding to different epitopes, that all antibodies targeting a given epitope have same affinity, and that there is no cooperativity in antibody binding (Hill coefficient of antibody binding is one), then the fraction of all variants $v$ that are not bound by an antibody targeting epitope $e$ at concentration $c$ is given by a Hill equation:
$$
\begin{eqnarray}
U_e\left(v, c\right) &=& \frac{1}{1 + \frac{c f_e}{K_{d,e}\left(v\right)}} \\
&=& \frac{1}{1 + c f_e \exp \left(-\frac{\Delta G_e\left(v\right)}{RT}\right)} \\
&=& \frac{1}{1 + c \exp \left(-\phi_e\left(v\right)\right)}, \\
\label{Ue} \tag{2}
\end{eqnarray}
$$
where $\phi_e\left(v\right)$ represents the total binding activity of antibodies to epitope $e$ against variant $v$, and is related to the free energy of binding $\Delta G_e\left(v\right)$ and the fraction of antibodies $f_e$ targeting epitope $e$ by $\phi_e\left(v\right) = \frac{\Delta G_e\left(v\right)}{RT} - \ln f_e$; note that $RT$ is the product of the molar gas constant and the temperature and $K_{d,e}= \exp\left(\frac{\Delta G_e\left(v\right)}{RT}\right)$ is the dissociation constant.
The value of $\phi_e\left(v\right)$ depends both on the affinity of antibodies targeting epitope $e$ (via $\Delta G_e\left(v\right)$) and on the abundance of antibodies with this specificity in the overall mix (via $f_e$), and so is a measure of the overall importance of antibodies with this specificity in the polyclonal mix.
Smaller (more negative) values of $\phi_e\left(v\right)$ correspond to a higher overall contribution of antibodies with specificity for epitope $e$ to the activity against variant $v$.

Finally, we want to frame $\phi_e\left(v\right)$ in terms of the actual quantities of biological interest.
There are two quantities of biological interest:
1. The activity of antibodies binding epitope $e$ in the unmutated ("wildtype") protein background, which will be denoted as $a_{\rm{wt}, e}$.
2. The extent of escape mediated by each amino-acid mutation $m$ on binding of antibodies targeting epitope $e$, which will be denoted as $\beta_{m,e}$.

In order to infer these quantities, we make the assumption that muations have additive effects on the free energy of binding (and so $\phi_e\left(v\right)$) for antibodies targeting any given epitope $e$.
Specifically, let $a_{\rm{wt}, e}$ be the total activity against the "wildtype" protein of of antibodies targeting epitope $e$, with larger values of $a_{\rm{wt}, e}$ indicating stronger antibody binding (or neutralization) at this epitope.
Let $\beta_{m,e}$ be the extent to which mutation $m$ (where $1 \le m \le M$) reduces binding by antibodies targeting epitope $e$, with larger values of $\beta_{m,e}$ corresponding to more escape from binding (a value of 0 means the mutation has no effect on antibodies targeting this epitope).
We can then write:
$$
\phi_e\left(v\right) = -a_{\rm{wt}, e} + \sum_{m=1}^M \beta_{m,e} b\left(v\right)_m
\label{phie} \tag{3}
$$
where $b\left(v\right)_m$ is one if variant $v$ has mutation $m$ and 0 otherwise.

Together, Equations \eqref{pv}, \eqref{Ue}, and \eqref{phie} relate the quantities of biological interest ($a_{\rm{wt}, e}$ and $\beta_{m,e}$) to the experimental measurables ($p_v\left(c\right)$).

## Simulate polyclonal antibody mix targeting SARS-CoV-2 RBD
We will simulate a hypothetical polyclonal antibody mix designed to represent antibodies targeting three major neutralizing "epitopes" on the SARS-CoV-2 receptor-binding domain (RBD) using the classification scheme of [Barnes et al (2020)](https://www.nature.com/articles/s41586-020-2852-1).
In particular, [Barnes et al (2020)](https://www.nature.com/articles/s41586-020-2852-1) divided anti-RBD antibodies that bind to the receptor-binding motif into three classes.
For each of these classes, we will use a single well-studied monoclonal antibody to represent how mutations affect antibodies of that class (of course, the reality might be more complex as there are many somewhat distinct antibodies in each class):

In [1]:
# relate monoclonal antibodies to their epitope class
antibody_to_epitope = {
    'LY-CoV016': 'class 1',
    'LY-CoV555': 'class 2',
    'REGN10987': 'class 3',
    }

For each of these antibodies, we have experimental measurements of how all functionally tolerated single amino-acid mutations to the RBD affect binding by each monoclonal antibody in isolation.
These measurements were made using deep mutational scanning in the following papers:
  - *LY-CoV016* and *REGN10987*: [Starr et al (2021), Science](https://science.sciencemag.org/content/371/6531/850)
  - *LY-CoV555*: [Starr et al (2021), bioRxiv](https://www.biorxiv.org/content/10.1101/2021.02.17.431683v1)

The measurements are in the file [mutation_escape_fractions.csv](mutation_escape_fractions.csv), and consist of estimates of the "escape fraction" $x_{m,e}$ for each mutation $m$ against the monoclonal antibody targeting epitope $e$.
These escape fractions represent the probability that a RBD carrying only that amino-acid mutation is unbound by the antibody at a concentration where only $\sim$0.1% of the unmutated RBD is unbound.
Noting that $x_{m,e}$ represents the $U_e$ values defined above, using $a_{\rm{wt},e}^{\rm{monoclonal}}$ to represent the activity of the monoclonal antibody targeting epitope $e$ (these values are distinct from the activities for each epitope in the polyclonal mix that we will simulate below), and setting the concentration to $c = 1$ (allowable, since in general change of units of concentration are equivalent to adding a constant to the activity $a$), we can compute the extent of escape $\beta_{m,e}$ mediated by mutation $m$ against the antibody targeting epitope $e$ from the following two equations:
$$
\begin{eqnarray}
0.001 &=& \frac{1}{1 + \exp\left(a_{\rm{wt},e}^{\rm{monoclonal}}\right)} \\
x_{m,e} &=& \frac{1}{1 + \exp\left(a_{\rm{wt},e}^{\rm{monoclonal}} - \beta_{m,e}\right)}, \\
\end{eqnarray}
$$
which can be solved to yield:
$$
\beta_{m,e} = 6.9 - \ln\left(1 / x_{m,e} - 1\right).
$$
This equation allows us to calculate the extent of escape $\beta_{m,e}$ on the monoclonal antibody targeting epitope $e$ from the prior deep mutational scanning that measured single-amino acid escape fractions for individual antibodies.
We do that below; in order to avoid calculating $\beta_{m,e}$ values that are negative or positive infinity, we first set a floor and ceiling on the measured escape fractions $x_{m,e}$ of 0.0001 and 0.9999.
The resulting $\beta_{m,e}$ values are in the `mut_escape_df` data frame column called *escape*:

In [2]:
import numpy

import pandas as pd

escape_frac_floor = 0.0001
escape_frac_ceil = 0.9999

mut_escape_df = (
    pd.read_csv('polyclonal_data/mutation_escape_fractions.csv')
    .assign(epitope=lambda x: x['antibody'].map(antibody_to_epitope),
            escape_fraction=lambda x: x['escape_fraction'].clip(lower=escape_frac_floor,
                                                                upper=escape_frac_ceil),
            escape=lambda x: 6.9 - numpy.log(1 / x['escape_fraction'] - 1),
            )
    [['epitope', 'mutation', 'escape', 'antibody', 'escape_fraction']]
    )

mut_escape_df

Unnamed: 0,epitope,mutation,escape,antibody,escape_fraction
0,class 1,N331A,0.082265,LY-CoV016,0.001093
1,class 1,N331D,0.776965,LY-CoV016,0.002187
2,class 1,N331E,0.082265,LY-CoV016,0.001093
3,class 1,N331F,0.082265,LY-CoV016,0.001093
4,class 1,N331G,1.701227,LY-CoV016,0.005493
...,...,...,...,...,...
5791,class 3,T531R,0.726213,REGN10987,0.002079
5792,class 3,T531S,0.726213,REGN10987,0.002079
5793,class 3,T531V,0.724766,REGN10987,0.002076
5794,class 3,T531W,0.722833,REGN10987,0.002072


The key columns in the `mut_escape_df` data frame are the epitope, mutation, and *escape* ($\beta_{m,e}$) values, although we also retain the antibody and single-antibody deep mutational scanning escape fraction measurements.

Note also that the data frame only includes 1932 of the $201 \times 19 = 3819$ possible amino-acid mutations to the RBD; this is because only about half of the mutations are functionally tolerated.

Now that we have defined the mutation escape values for each epitope, we want to simulate a polyclonal antibody mix targeting these three epitopes.
We will let the activity of the polyclonal antibody mix be highest against the class 2 epitope, then next highest against the class 3 epitope, and lowest against the class 1 epitope.
Specifically, at a concentration of $c = 1$ of the polyclonal antibody mix, let the activities against the different epitopes be such that the probability of the unmutated RBD being unbound by each antibody class is 4% for class 2, 10% for class 3, and 25% for class 1 (so overall, the probability of the unmutated RBD being unbound by any antibody is $0.1\% = 4\% \times 10\% \times 25\%$).
We can therefore calculate the activities $a_{\rm{wt},e}$ against unmutated (wildtype) RBD for each epitope $e$ using the equation for $U_e$:

In [3]:
activity_wt_df = (
    pd.DataFrame({'epitope': ['class 1', 'class 2', 'class 3'],
                  'U_e_wt':  [     0.25,      0.04,       0.1]})
    .assign(activity=lambda x: numpy.log(1 / x['U_e_wt'] - 1))
    )

activity_wt_df

Unnamed: 0,epitope,U_e_wt,activity
0,class 1,0.25,1.098612
1,class 2,0.04,3.178054
2,class 3,0.1,2.197225


Now initialize the `Polyclonal` object:

In [4]:
import dms_variants.polyclonal

polyclonal = dms_variants.polyclonal.Polyclonal(
                activity_wt_df=activity_wt_df,
                mut_escape_df=mut_escape_df)

print(f"Here are the epitopes: {polyclonal.epitopes}")
print(f"Here is the number of mutations: {len(polyclonal.mutations)}")
print(f"Here is the number of sites: {len(polyclonal.sites)}")

Here are the epitopes: ('class 1', 'class 2', 'class 3')
Here is the number of mutations: 1932
Here is the number of sites: 173


In [5]:
import itertools

import altair as alt

alt.data_transformers.disable_max_rows()

#@staticmethod
def plot_mut_escape_heatmap(self,
                            *,
                            epitopes=None,
                            alphabet=None,
                            all_sites=True,
                            all_alphabet=True,
                            ):
    """
    
    Parameters
    ----------
    epitopes : array-like or None
        Make plots for these epitopes. If `None`, use all epitopes.
    alphabet : array-like or None
        Order to plot alphabet letters (e.g., amino acids). If `None`, same
        order as `alphabet` used to initialize this `Polyclonal` object.
    all_sites : bool
        Plot all sites in range from first to last site even if some
        have no data.
    all_alphabet : bool
        Plot all letters in the alphabet (e.g., amino acids) even if some
        have no data.
    
    """
    if epitopes is None:
        epitopes = self.epitopes
    elif not set(epitopes).issubset(set(self.epitopes)):
        raise ValueError('invalid entries in `epitopes`')
    df = self.mut_escape_df.query('epitope in @epitopes')
    
    # get alphabet and sites, expanding to all if needed
    if alphabet is None:
        alphabet = self.alphabet
    elif set(alphabet) != set(self.alphabet):
        raise ValueError('`alphabet` and `Polyclonal.alphabet` do not have same characters')
    if not all_alphabet:
        alphabet = [c for c in alphabet if c in set(df['mutant']) + set(mut_escape['wildtype'])]
    if all_sites:
        sites = list(range(min(self.sites), max(self.sites) + 1))
    else:
        sites = self.sites
        assert set(sites) == set(df['site'])
    df = (df
          [['epitope', 'site', 'mutant', 'escape']]
          .merge(pd.DataFrame(itertools.product(epitopes, sites, alphabet),
                              columns=['epitope', 'site', 'mutant']),
                 how='right')
          .assign(wildtype=lambda x: x['site'].map(self.wts),
                  mutation=lambda x: x['wildtype'].fillna('') + x['site'].astype(str) + x['mutant'],
                  epitope=lambda x: pd.Categorical(x['epitope'], epitopes, ordered=True),
                  mutant=lambda x: pd.Categorical(x['mutant'], alphabet, ordered=True),
                  # mark wildtype cells with a `x`
                  wildtype_char=lambda x: (x['mutant'] == x['wildtype']).map({True: 'x',
                                                                              False: ''}),
                  # wildtype has escape of 0 by definition
                  escape=lambda x: x['escape'].where(x['mutant'] != x['wildtype'], 0),
                  )
          .sort_values(['epitope', 'site', 'mutant'])
          )
    
    return df
    
    # selection cells
    cell_selector = alt.selection_single(on='mouseover',
                                         empty='none')
    
    # make list of heatmaps for each epitope
    charts = []
    for epitope in epitopes:
        # base chart
        base = (alt.Chart(df.query('epitope == @epitope'))
                .encode(x=alt.X('site:O',
                                scale=alt.Scale(domain=sites)),
                        y=alt.Y('mutant:O',
                                sort=alt.EncodingSortField('y', order='ascending')),
                       )
                )
        # heatmap for cells with data
        heatmap = (base
                   .mark_rect()
                   .encode(color='escape:Q',
                           stroke=alt.value('black'),
                           strokeWidth=alt.condition(cell_selector,
                                                     alt.value(2.5),
                                                     alt.value(0.2)),
                           tooltip=[alt.Tooltip('mutation:N'),
                                    alt.Tooltip('escape:Q', format='.3g')],
                           )
                   )
        # nulls for cells with missing data
        nulls = (base
                 .mark_rect()
                 .transform_filter('!isValid(datum.escape)')
                 .mark_rect(opacity=0.25)
                 .encode(alt.Color('escape:N',
                                   scale=alt.Scale(scheme='greys'),
                                   legend=None),
                         )
                 )
        # mark wildtype cells
        wildtype = (base
                    .mark_text(color='black')
                    .encode(text=alt.Text('wildtype_char:N'))
                    )
        # combine the elements
        charts.append((heatmap + nulls + wildtype)
                      .interactive()
                      .add_selection(cell_selector)
                      )

    return alt.vconcat(*charts)

df = plot_mut_escape_heatmap(polyclonal)

In [8]:
epitopes = polyclonal.epitopes
alphabet = polyclonal.alphabet

(polyclonal.mut_escape_df
 [['epitope', 'site', 'mutant', 'escape']]
 .assign(
        epitope=lambda x: pd.Categorical(x['epitope'], epitopes, ordered=True),
        mutant=lambda x: pd.Categorical(x['mutant'], alphabet, ordered=True),
        )
 .sort_values(['epitope', 'site', 'mutant'])
 .pivot_table()
 )

Unnamed: 0,epitope,site,mutant,escape
0,class 1,331,A,0.082265
1,class 1,331,D,0.776965
2,class 1,331,E,0.082265
3,class 1,331,F,0.082265
4,class 1,331,G,1.701227
...,...,...,...,...
5791,class 3,531,R,0.726213
5792,class 3,531,S,0.726213
5793,class 3,531,V,0.724766
5794,class 3,531,W,0.722833
