# Compute probabilities of escape
In some experiments that involve antibody selections, it is possible to spike in a "neutralization standard", which is a set of variants known not to be affected by the antibody.
In such cases, it is then possible to compute the probability of escape of each variant, which is just its change in frequency relative to the standard.
For instance, if such experiments are done at enough concentrations, it is even possible to reconstruct a conventional neutralization curve.

This notebook illustrates how to use `dms_variants` to compute these probabilities of escape:

First, import Python modules:

In [None]:
import altair as alt

import dms_variants.codonvarianttable

Read in the `CodonVariantTable`.
These data correspond to snippets of the variant counts from a real experiment on the SARS-CoV-2 spike:

In [None]:
with open("spike.txt") as f:
    spike_seq = f.read().strip()

variants = dms_variants.codonvarianttable.CodonVariantTable.from_variant_count_df(
    variant_count_df_file="prob_escape_codon_variant_table.csv",
    primary_target="spike",
    geneseq=spike_seq,
    allowgaps=True,
)

In [None]:
with open("codon_variant_table.pickle", "rb") as f:
    variants = pickle.load(f)

In [None]:
(
    variants.variant_count_df
    .query("sample.str.startswith('2021-12-14')", engine="python")
    .query("not sample.str.contains('267C')", engine="python")
    .assign(
        sample=lambda x: x["sample"].str.replace("2021-12-14_", "").str.replace("_antibody_", "_").astype(str),
        library=lambda x: x["library"].map({"Virus_Library_A_051": "lib1", "Virus_Library_B_072": "lib2"}),
    )
    .sort_values(["library", "sample", "target", "barcode"])
    .dropna(axis=0)
    .drop(columns=["aa_substitutions", "n_codon_substitutions", "n_aa_substitutions"])
    .groupby(["library", "sample", "target"])
    .head(n=5000)
    .to_csv("prob_escape_codon_variant_table.csv", index=False)
)

In [None]:
# allow more rows for Altair
_ = alt.data_transformers.disable_max_rows()

Change working directory to top directory of repo:

In [None]:
os.chdir('../')

## Read input data
Read configuration:

In [None]:
with open('config.yaml') as f:
    config = yaml.safe_load(f)

Read the barcode run information:

In [None]:
barcode_runs = (
    pd.read_csv(config["barcode_runs"])
    .assign(antibody=lambda x: x["antibody"].fillna("no antibody"))
    .drop(columns=["fastq_R1", "notes", "library_sample"])
)

Read the mapping between sequential and reference site numbering:

In [None]:
site_numbering_map = pd.read_csv(config["site_numbering_map"])

Get the primary target:

In [None]:
primary_target = config["gene"]

Read the codon-variant table:

In [None]:
with open(config["codon_variant_table_pickle"], "rb") as f:
    variants = pickle.load(f)

## Sample pairings for antibody selections

Get sample pairings that correspond to antibody selections.
We pair each antibody selection with its corresponding no-antibody control with all other properties (e.g., date, virus_batch, library) the same.

In [None]:
ab_selections = (
    barcode_runs
    .query("sample_type == 'antibody'")
    .rename(columns={"sample": "antibody_sample"})
    .drop(columns="sample_type")
)
assert len(ab_selections) == len(ab_selections.drop_duplicates())

control_selections = (
    barcode_runs
    .query("sample_type == 'no-antibody_control'")
    .rename(columns={"sample": "no-antibody_sample"})
    .drop(columns=["antibody", "antibody_concentration", "sample_type"])
)
assert len(control_selections) == len(control_selections.drop_duplicates())

selections = (
    ab_selections
    .merge(control_selections,
           how="left",
           validate="many_to_one",
    )
)
assert selections.notnull().all().all()
assert len(selections) == len(selections.groupby(["library", "antibody_sample"]))

Now get the fraction of counts that are the neutralization standard for each sample:

In [None]:
neut_standard_fracs = (
    variants.n_variants_df(primary_target_only=False)
    .assign(
        total_counts=lambda x: x.groupby(["library", "sample"])["count"].transform("sum"),
        neut_standard_frac=lambda x: x["count"] / x["total_counts"],
    )
    .query("target == 'neut_standard'")
    .rename(columns={"count": "neut_standard_counts"})
    .drop(columns=["target", "total_counts"])
)

Merge these neut standard fractions / counts into the selections data frame:

In [None]:
selections_w_neut_standards = selections
for stype in ["antibody", "no-antibody"]:
    selections_w_neut_standards = (
        selections_w_neut_standards
        .merge(
            neut_standard_fracs,
            left_on=["library", f"{stype}_sample"],
            right_on=["library", "sample"],
            validate="many_to_one",
            how="left",
        )
        .drop(columns="sample")
        .rename(
            columns={
                col: f"{stype}_{col}"
                for col in ["neut_standard_counts", "neut_standard_frac"]
            }
        )
)

selections_w_neut_standards

Write sample pairings to CSV file:

In [None]:
print(
    f"Writing {len(selections_w_neut_standards)} antibody-selection pairings "
    f"to {config['antibody_selections']}"
)
selections_w_neut_standards.to_csv(config["antibody_selections"], index=False)