# Theoretical antigen classes from channel capacity results
Based on the channel capacity $C$ found, there are $2^{C}$ peptide quality categories that our cytokine latent space and ballistic parameter space can tell apart "perfectly". We can derive the EC$_{50}$ values and hence the model parameter conditional distributions to which those $2^C$ categories correspond. The idea is to use the optimal antigen probability distribution $p_Q$ found by the Blahut-Arimoto algorithm, and pick EC$_{50}$ values that correspond to evenly spaced values of the cumulative distribution function, $CDF(q) = \sum_{q' \leq q} p_Q(q')$. As we will see, the model parameter distributions thus selected give model trajectories that optimally fill the latent space, hence they optimize antigen encoding. 

A difficulty arises. Multivariate normal distributions were only obtained for the channel capacity calculation on three parameters, $a_0$, $t_0$ and $\theta$. We thus miss parameters $v_2$, $\alpha$, $\beta$, and the $v_2 / v_1$ slope to completely specify a trajectory of the force model with matching. We solve this problem by linearly interpolating between the $v_2$ values of the two experimental peptides closest to the desired theoretical peptide. We use the $v_2 / v_1$ slope from the HighMI_1 experiment. 

In [None]:
import numpy as np
import scipy as sp
from scipy import interpolate
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as clr
import matplotlib as mpl
import seaborn as sns
import json
import os

In [None]:
%matplotlib inline

# Label sizes for Science format (figure width 2.25 inches or 4.75 inches)
# Squeezing three subplots in a row: 4.75/3 = 1.583333
sns.reset_orig()
plt.rcParams["figure.figsize"] = (1.55, 1.65)
plt.rcParams["font.size"] = 8
plt.rcParams["axes.labelsize"] = 7
plt.rcParams["legend.fontsize"] = 7
plt.rcParams["xtick.labelsize"] = 6
plt.rcParams["ytick.labelsize"] = 6
plt.rcParams["xtick.major.pad"] = 2.  # distance to major tick label in points
plt.rcParams["xtick.minor.pad"] = 2.
plt.rcParams["axes.labelpad"] = 1.
plt.rcParams["axes.linewidth"] = 0.8
plt.rcParams["axes.spines.top"] = False
plt.rcParams["axes.spines.right"] = False


plt.rcParams['figure.dpi'] = 250 # default for me was 75

## Import previous results

In [None]:
# Use parameters and multivariate distributions without bootstrap perturbations, because we did not
# save those for all bootstrap replicates and anyways the unperturbed ones should be approx. the average, 
# or at least representative of the parameters and trajectories of theoretical peptide categories. 
df_params = pd.read_hdf(os.path.join("results", "fits", "df_params_Sigmoid_freealpha_HighMI_13.hdf"))
interpolated_means = np.load(os.path.join("results", "highmi13", "meanmats_25inputs_HighMI_13.npy"))
interpolated_covmats = np.load(os.path.join("results", "highmi13", "covmats_25inputs_HighMI_13.npy"))

# Here, for the capacity itself, we can use the bootstrap result
with open(os.path.join("results", "highmi13", "capacity_bootstrap_results_HighMI_13.json"), "r") as h:
    chancap_run_results = json.load(h)
print(chancap_run_results.keys())

# Approximately the ratio used for HighMI_13, based on HighMI_1.
v2_v1_ratio = 7.8

df_potencies = pd.read_json(os.path.join("data", "misc", "potencies_df_2021.json"))
ser_log10ec50s = np.log10(df_potencies).mean(axis=1)
ser_log10ec50s.index.name = "Peptide"
# Only keep peptides for which we have parameter values
ser_log10ec50s = ser_log10ec50s.loc[df_params.index.get_level_values("Peptide").unique()]
print(ser_log10ec50s)

# Other parameters that we will need: take the mean for each peptide. 
ser_means_pars = {
    "v1": df_params["v1"].groupby("Peptide").mean(), 
    "alpha": df_params["alpha"].groupby("Peptide").mean(), 
    "beta": df_params["beta"].groupby("Peptide").mean()
}
ser_variances_pars = {
    "v1": df_params["v1"].groupby("Peptide").var(), 
    "alpha": df_params["alpha"].groupby("Peptide").var(), 
    "beta": df_params["beta"].groupby("Peptide").var()
}

## Generate ballistic parameter samples for each theoretical peptide
1. Identify the theoretical peptides: one at each endpoint of the discretized axis of $\log_{10} \mathrm{EC}_{50}$, and one at each of $n_{\mathrm{cat}} = \log_2{C} - 2$ points evenly spaced in the values of the cumulative mass function after removing the edge probabilities (e.g. at 0.25, 0.5, 0.75 total probability if there are $5-2 = 3$ categories). Otherwise the 1st-2nd and penultimate-last pairs are too close to each other. This is an effect o discretization. 

2. Define a function that, for an arbitrary $\log_{10} \mathrm{EC}_{50}$, interpolates between the values and variances of $v_t$ of the two nearest peptides on the quality axis. Find the average and variance of $v_t$ for each theoretical peptide

3. Generate the mean trajectory for each theoretical peptide. 

4. Generate a bunch of randomly sampled trajectories around the mean for each peptide, using the $(F, t_0, \theta)$ multivariate normal distribution and the interpolated variance of $v_t$. 

### 1. Theoretical antigen classes' EC$_{50}$s

In [None]:
def find_indices_uniform_cumulative(nsep, pmf):
    """Given a probability mass function in the array pmf, find the indices of the 
    locations where evenly spaced cumulative mass function values, after removing the edges, fall. 
    The first and last nsep are 0 and len(pmf)-1
    """
    # First separator: at 0, last separator: at the end
    indices = np.zeros(nsep, dtype=int)
    indices[-1] = len(pmf) - 1
    
    if nsep <= 2:
        return indices
    
    # Inner separators: use numpy's searchsorted function
    # Here, we do not consider the probability associated to the first and last bin
    # So we look for bin separators at binwidth past pmf[0] or binwidth before pmf[-1]
    # With the binwidth disregarding the edge probabilities as well. 
    inner_prob = np.sum(pmf[1:-1])
    binwidth = inner_prob / (nsep - 1)
    binseps = np.linspace(pmf[0]+binwidth, 1.0 - binwidth - pmf[-1], nsep - 2)
    indices[1:-1] = np.searchsorted(np.cumsum(pmf), binseps)
    
    # The following should not happen if indeed we have nsep categories. 
    if np.any(indices[1:] == indices[:-1]):
        print(indices)
        raise ValueError("Found two categories in the same discrete input value")
    return indices

In [None]:
# Find the number of categories based on the capacity
n_categories = int(round(2**chancap_run_results["average_capacity_bits"]))
print(n_categories)

# Find the indices of theoretical classes
theo_peptides_indices = find_indices_uniform_cumulative(n_categories, chancap_run_results["optimal_distribution"])
print(theo_peptides_indices)

theo_peptides_log10ec50s = np.asarray(chancap_run_results["input_values"])[theo_peptides_indices]
theo_peptides_log10ec50s[0] = 0.0  # First should be N4, not the midpoint in the 1st category
# And last should be E1: last midpoint plus half width (which is the first midpoint)
theo_peptides_log10ec50s[-1] = chancap_run_results["input_values"][-1] + chancap_run_results["input_values"][0] - 1e-15
print(np.around(theo_peptides_log10ec50s, 2))

### 2. Interpolation between nearest peptides for $v_1$, $\alpha$, $\beta$

In [None]:
# Function to interpolate any parameter between nearest antigens
def interpolate_nearest_peptides(logec50, pep_logec50s, ser_values):
    """ Given an arbitrary log_10 EC_50, a list of peptide log_10 EC_50s, and a 
    value of the quantity to interpolate for each peptide label, find the two peptides
    closest to the desired EC_50 and interpolate linearly between their values. """
    # Find the peptide below and the peptide above
    sorted_ec50s = pep_logec50s.sort_values()
    ec50_index_above = np.searchsorted(sorted_ec50s, logec50, side="left")
    try:
        ec50_above = sorted_ec50s.iloc[ec50_index_above]
    except IndexError:
        raise ValueError("We are above the interpolation range")
    else:
        pep_above = sorted_ec50s.index.to_series().iloc[ec50_index_above]
    
    try:
        ec50_below = sorted_ec50s.iloc[ec50_index_above-1]
    except IndexError:
        raise ValueError("We are below the interpolation range")
    else:
        pep_below = sorted_ec50s.index.to_series().iloc[ec50_index_above-1]
    
    # Find the parameter values below and above
    try:
        value_below = ser_values[pep_below]
        value_above = ser_values[pep_above]
    except KeyError as e:
        print("Peptide {} not available; check consistency of EC50 and parameter tables.")
        raise e
    
    # Interpolate linearly
    value_inter = (logec50 - ec50_below) / (ec50_above - ec50_below) * (value_above - value_below) + value_below
    return value_inter

In [None]:
# Interpolate each parameter
theo_peptides_par_means = {
    "v1": np.asarray(list(map(lambda x: interpolate_nearest_peptides(x, ser_log10ec50s, ser_means_pars["v1"]), 
                             theo_peptides_log10ec50s))), 
    "alpha": np.asarray(list(map(lambda x: interpolate_nearest_peptides(x, ser_log10ec50s, ser_means_pars["alpha"]), 
                             theo_peptides_log10ec50s))),
    "beta": np.asarray(list(map(lambda x: interpolate_nearest_peptides(x, ser_log10ec50s, ser_means_pars["beta"]), 
                             theo_peptides_log10ec50s)))
}
theo_peptides_par_varis = {
    "v1": np.asarray(list(map(lambda x: interpolate_nearest_peptides(x, ser_log10ec50s, ser_variances_pars["v1"]), 
                             theo_peptides_log10ec50s))), 
    "alpha": np.asarray(list(map(lambda x: interpolate_nearest_peptides(x, ser_log10ec50s, ser_variances_pars["alpha"]), 
                             theo_peptides_log10ec50s))),
    "beta": np.asarray(list(map(lambda x: interpolate_nearest_peptides(x, ser_log10ec50s, ser_variances_pars["beta"]), 
                             theo_peptides_log10ec50s)))
}

In [None]:
# Plot to check that the linear interpolation is OK. 
fig, ax = plt.subplots(1, 2)
pch = "alpha"  # Parameter choice for this interpolation check plot
sorted_peptides = ser_log10ec50s.sort_values().index
ax[0].plot(ser_log10ec50s[sorted_peptides], ser_means_pars[pch][sorted_peptides], marker="o", label="Peptides", ms=4)
ax[0].plot(theo_peptides_log10ec50s, theo_peptides_par_means[pch], "co", label="Interpolated", ms=4)
ax[0].set(xlabel=r"$\log_{10}$EC$_{50}$", ylabel=r"$\langle {} \rangle$".format(pch))
ax[0].vlines(theo_peptides_log10ec50s, ymin=0, ymax=theo_peptides_par_means[pch], linestyle="--", color="grey", lw=0.8)

ax[1].plot(ser_log10ec50s[sorted_peptides], ser_variances_pars[pch][sorted_peptides], marker="o", label="Peptides", ms=4)
ax[1].plot(theo_peptides_log10ec50s, theo_peptides_par_varis[pch], "co", label="Interpolated", ms=4)
ax[1].set(xlabel=r"$\log_{10}$EC$_{50}$", ylabel=r"Var$[{}]$".format(pch))
ax[1].vlines(theo_peptides_log10ec50s, ymin=0, ymax=theo_peptides_par_varis[pch], linestyle="--", color="grey", lw=0.8)
ax[1].legend(fontsize=6)
fig.set_size_inches(3., 1.5)
fig.tight_layout()
plt.show()
plt.close()

### 3. Compute the mean parameter values of each theoretical peptide

In [None]:
df_theo_meanparams = pd.DataFrame(np.zeros([n_categories, 7]), 
                        index=pd.Index(range(n_categories), name="TheoreticalPeptide"), 
                        columns=pd.Index(["a0", "t0", "theta", "v1", "alpha", "beta", "log10ec50"], 
                                            name="Parameter"))
df_theo_meanparams.iloc[:, :3] = interpolated_means[theo_peptides_indices]

for i, pch in zip((3, 4, 5), ("v1", "alpha", "beta")):
    df_theo_meanparams.iloc[:, i] = theo_peptides_par_means[pch]

df_theo_meanparams.iloc[:, 6] = theo_peptides_log10ec50s
print(df_theo_meanparams)

### 4. Sample a bunch of parameter values around the mean of each theoretical peptide

In [None]:
rndgen = np.random.RandomState(seed=53739)
nsamples = 32
df_theo_samples = pd.DataFrame(np.zeros([nsamples*n_categories, 7]), 
                        index=pd.MultiIndex.from_product([range(n_categories), range(nsamples)], 
                            names=["TheoreticalPeptide", "Sample"]), 
                        columns=df_theo_meanparams.columns)

for i in range(n_categories):
    # Generate nsamples parameter samples for each peptide
    df_theo_samples.loc[i].iloc[:, :3] = np.clip(rndgen.multivariate_normal(
        interpolated_means[theo_peptides_indices[i]], 
        interpolated_covmats[theo_peptides_indices[i]], nsamples), 
        a_min=[0.0, 0.0, -np.pi], a_max=None)
    
    for j, pch in zip((3, 4, 5), ("v1", "alpha", "beta")):
        df_theo_samples.loc[i].iloc[:, j] = np.clip(rndgen.normal(
            theo_peptides_par_means[pch][i], theo_peptides_par_varis[pch][i], nsamples), a_min=0.02, a_max=None)
    
    df_theo_samples.loc[i].iloc[:, 6] = theo_peptides_log10ec50s[i]

## Compute trajectories for the sampled parameter values


In [None]:
# Import equations of the constant force model with matching
from ltspcyt.scripts.sigmoid_ballistic import ballistic_sigmoid_freealpha

In [None]:
# Compute model trajectories for theoretical antigen classes
times = np.arange(0, 73)
tscale = 20.0

# Trajectories for parameter values sampled in each theoretical antigen class distribution
df_traj = pd.DataFrame(np.zeros([df_theo_samples.shape[0], 2*len(times)]), 
                      index=df_theo_samples.index, 
                      columns=pd.MultiIndex.from_product([["Node 1", "Node 2"], times], names=["Node", "Time"]))

for key in df_traj.index:
    n1, n2 = ballistic_sigmoid_freealpha(times / tscale, *df_theo_samples.loc[key].iloc[:6], v2v1_ratio=v2_v1_ratio)
    # Transpose the df before slicing, because assigning one element per column is super slow. 
    # So assign to one column, the "key" column. Makes building the df a lot faster. 
    df_traj.T.loc["Node 1", key] = n1
    df_traj.T.loc["Node 2", key] = n2

    
# Trajectories for average parameter values of each theoretical antigen class
df_traj_mean = pd.DataFrame(np.zeros([df_theo_meanparams.shape[0], 2*len(times)]), 
                      index=df_theo_meanparams.index, 
                      columns=pd.MultiIndex.from_product([["Node 1", "Node 2"], times], names=["Node", "Time"]))

for ky in df_traj_mean.index:
    n1, n2 = ballistic_sigmoid_freealpha(times / tscale, *df_theo_meanparams.loc[ky].iloc[:6], v2v1_ratio=v2_v1_ratio)
    # Transpose the df before slicing, because assigning one element per column is super slow. 
    # So assign to one column, the "key" column. Makes building the df a lot faster. 
    df_traj_mean.T.loc[("Node 1",), ky] = n1
    df_traj_mean.T.loc[("Node 2",), ky] = n2

### Compute $N1$ and $N2$ at a chosen time
The goal is to show a parameterized line $N_1(EC50)$, $N_2(EC50)$ at constant $t_{choice}$ on the latent space trajectories of theoretical antigen classes. Then, on a separate plot, we will show the values of $N_1$ and $N_2$ on that curve as a function of EC$_{50}$ explicitly, to reveal the monotonicity of $N_1$ and the non-monotonicity of $N_2$. 

In [None]:
#Compute N1 and N2 at chosen time t for each set of parameters (average)
tchoice = 36

df_n1n2_ec50 = pd.DataFrame(np.zeros([interpolated_means.shape[0], 3]), 
            index=pd.Index(range(chancap_run_results["n_inputs"]), name="Antigen_class"), 
            columns=pd.Index(["Latent Space 1", "Latent Space 2", "log10ec50"], name="Node"))
for k in df_n1n2_ec50.index:
    logec50 = chancap_run_results["input_values"][k]
    v1 = interpolate_nearest_peptides(logec50, ser_log10ec50s, ser_means_pars["v1"])
    alpha = interpolate_nearest_peptides(logec50, ser_log10ec50s, ser_means_pars["alpha"])
    beta = interpolate_nearest_peptides(logec50, ser_log10ec50s, ser_means_pars["beta"])
    n1, n2 = ballistic_sigmoid_freealpha(np.asarray([tchoice/20.]), *interpolated_means[k], v1, alpha, beta, v2v1_ratio=v2_v1_ratio)
    df_n1n2_ec50.iloc[k, :2] = [n1, n2]
    df_n1n2_ec50.iloc[k, 2] = logec50

## Main figure 3, panels C+D: plot the theoretical antigen classes determination and trajectories
C: Optimal probability mass function for antigen categories, cumulative mass function, and latent space trajectories sampled from each category. 

D: $N_1$ and $N_2$ as a function of EC$_{50}$

In [None]:
#Tools
# Logarithmic minor ticks (we plotted the real log so need to put log ticks manually)
# Find the linear scale limiting ticks
def compute_log_minor_ticks(loglims, stp=2, base=10.0):
    smallest_major = int(np.floor(loglims[0]))
    largest_major = int(np.ceil(loglims[1]))
    n_decades = largest_major - smallest_major

    # Generate linear ranges with the exponents found
    tiles = []
    for i in range(n_decades):
        tiles.append(np.arange(stp*base**(smallest_major+i), 
                    base**(smallest_major+i+1), stp*base**(smallest_major+i)))
    minorticks = np.concatenate(tiles, axis=0)
    minorticks = np.log(minorticks) / np.log(base)
    minorticks = minorticks[(minorticks > loglims[0]) * (minorticks < loglims[1])]
    return minorticks

In [None]:
# Extract some values from the channel capacity results
sampled_logec50 = chancap_run_results["input_values"]
optim_input_distrib = chancap_run_results["optimal_distribution"]
capacity_bits = chancap_run_results["average_capacity_bits"]
reltol = chancap_run_results["relative_tolerance"]
abserrror_cap = np.sqrt(chancap_run_results["variance_capacity_bits"])

# Cumulate starting at E1, so reverse the ec50 axis. 
pmf = chancap_run_results["optimal_distribution"][::-1]
nsep = int(round(2**chancap_run_results["average_capacity_bits"]))
indices = np.zeros(nsep, dtype=int)
indices[-1] = len(pmf) - 1

cumul_prob = np.cumsum(pmf)
inner_prob = np.sum(pmf[1:-1])
binwidth = inner_prob / (nsep - 1)
binseps = np.linspace(pmf[0]+binwidth, 1.0 - binwidth - pmf[-1], nsep - 2)
indices[1:-1] = np.searchsorted(cumul_prob, binseps)

In [None]:
# Color palettes for theoretical antigen classes and LS nodes
#For theoretical peptides
all_theo_antigen_colors = sns.color_palette("deep", 10)
theoretical_antigen_colors = [all_theo_antigen_colors[0],all_theo_antigen_colors[6]]+all_theo_antigen_colors[1:5]
theoretical_antigen_colors = [sns.set_hls_values(a, s=0.4, l=0.6) for a in theoretical_antigen_colors]
theoretical_antigen_colors[-1] = (0, 0, 0, 1)  # Make the null peptide black. 

colors = sns.color_palette("deep", n_categories)
colors = [sns.set_hls_values(a, s=0.4, l=0.6) for a in colors]
colors[-1] = (0, 0, 0, 1)  # Make the null (last) peptide black. 
#Remove next line if you want to revert to old color scheme
colors = theoretical_antigen_colors.copy()

colors_samples = [sns.set_hls_values(a, l=0.8) for a in theoretical_antigen_colors]
#Remove next line if you want to revert to old color scheme
colors_samples = theoretical_antigen_colors.copy()
colors_samples[-1] = (0.5, 0.5, 0.5, 0.8)


colors_dict = {df_traj_mean.index[i]:colors[i] for i in range(n_categories)}
colors_samples_dict = {df_traj_mean.index[i]:colors_samples[i] for i in range(n_categories)}

# Colors for Nodes 1+2, Node 1, Node 2
latent_colors = [list(clr.to_rgba(a)) for a in ["crimson", "goldenrod", "maroon"]]  # both, node 1, node 2
latent_colors[1] = sns.set_hls_values(color=latent_colors[1], h=None, l=0.6, s=None)  # making goldenrod lighter
latent_colors[0] = sns.set_hls_values(color=latent_colors[0], h=None, l=0.5, s=None)  # make crimson lighter
nodePalette = latent_colors[1:]

# Load uniform tick props across the whole figure 3
with open(os.path.join("data", "misc", "minor_ticks_props.json"), "r") as hd:
    props_minorticks = json.load(hd)
with open(os.path.join("data", "misc", "major_ticks_props.json"), "r") as hd:
    props_majorticks = json.load(hd)

In [None]:
# Log ticks for the EC50 axis
ec50lims = (sampled_logec50[0] - sampled_logec50[1]/2, 
            sampled_logec50[-1] + sampled_logec50[1]/2)
minorticks = compute_log_minor_ticks(ec50lims, stp=1, base=10.0)

## CREATE FIGURE WITH 4 PANELS ON IT
fig, axes = plt.subplots(2, 3)
fig.set_size_inches(4.8, 1.65*2)
# Leave room for panel D below
axes[1, 1].set_axis_off()
axes[1, 2].set_axis_off()

### PROBABILITY MASS FUNCTION
# Make a histogram (bar plot) of the optimal input distribution

ax = axes.flat[0]
# bar_facecolor = "xkcd:royal blue"  # "xkcd:light grey"
bar_facecolor = "white"
bars = ax.bar(np.around(sampled_logec50, 2), optim_input_distrib, width=np.diff(sampled_logec50)[0], 
      color=bar_facecolor, edgecolor="k", linewidth=0.8)

# Axes labeling and ticks
xlabelprops = dict(size=7, labelpad=0.5)
ylabelprops = dict(size=7, labelpad=0.9)
xlabelec50 = r"Antigen $\mathrm{EC}_{50}$ (#)"
ax.set_xlabel(xlabelec50, **xlabelprops)
ax.set_ylabel(r"$P(\mathrm{EC}_{50})$", **ylabelprops)
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter(r"${x:.1f}$"))
ax.set_yticks([0.0, 0.1, 0.2])
x_ticker = mpl.ticker.FuncFormatter(lambda x, pos:"$10^{}$".format(int(x)))
ax.xaxis.set_major_formatter(x_ticker)
majorxticks = [0, 1, 2, 3, 4, 5]
ax.set_xticks(majorxticks)
ax.set_xticks(minorticks, minor=True)
ax.tick_params(which="minor", axis="both", **props_minorticks)
ax.tick_params(which="major", axis="both", **props_majorticks)

ax.invert_xaxis()

# Annotate peptides (include G4)
df_potencies = pd.read_json(os.path.join("data", "misc", "potencies_df_2021.json"))
ser_log10ec50s_annot = np.log10(df_potencies).mean(axis=1).loc[["N4", "A2", "Y3", "Q4", "T4", "V4", "G4", "E1"]]

n_inputs = len(sampled_logec50)
maxprob = np.amax(optim_input_distrib)
factor = 1.12
ax.set_ylim(0, maxprob*factor)
previous = ser_log10ec50s_annot.iloc[0] - 1
sortpep = ser_log10ec50s_annot.sort_values(ascending=False).index
for pep in sortpep:
    ha = "center"
    if abs(ser_log10ec50s_annot[pep] - previous) > 0.7:
        lblheight = maxprob*(factor - (factor-1)/3)
        previous = ser_log10ec50s_annot[pep]
    else:
        lblheight = maxprob
    ax.annotate(pep, xy=(ser_log10ec50s_annot[pep], lblheight), fontsize=6, ha=ha, va="top", color="grey")
    ax.axvline(ser_log10ec50s_annot[pep], ls=":", lw=0.8, color="grey", 
               ymax=ax.transLimits.transform((0, lblheight*0.9))[1])

# Annotate capacity
#ax.annotate(r"$C = ({:.2f} \pm {:.2f})$ bits".format(capacity_bits, reltol*capacity_bits), 
#            xy=(0.15, 0.7), xycoords="axes fraction", ha="left", va="center", fontsize=6)


### CUMULATIVE DISTRIBUTION SUBPLOT
# Make a histogram (bar plot) of the optimal input distribution
ax = axes.flat[1]
bars = ax.bar(np.around(sampled_logec50, 2)[::-1], cumul_prob, width=np.diff(sampled_logec50)[0], 
      color=bar_facecolor, edgecolor="k", linewidth=0.8)

for i, ii in enumerate(indices):
    bars[ii].set_facecolor(colors[-i-1])

# Horizontal lines at the bin separators
li = ax.axhline(0.5)
li.set_visible(False)  # Dummy because the first hline is always wrong
hlines_props = dict(ls="--", lw=1.)
# Top one: strongest agonist, first color.
ax.axhline(1.0-binwidth, **hlines_props, color=colors[0], xmax=ax.transLimits.transform((sampled_logec50[-1], 1))[0])

# Intermediate ones, starting from the bottom
for i in range(len(binseps)):
    ec50_i = sampled_logec50[indices[i+1]]
    ax.axhline(binseps[i], **hlines_props, color=colors[-i-2], 
               xmax=ax.transLimits.transform((ec50_i, 1))[0])

# Arrows to show how we are evenly spaced in probability
arrowprops = dict(arrowstyle="<->", shrinkA=0.01, shrinkB=0.01, color="grey")
ec50_i = sampled_logec50[-1]
ax.annotate("", xy=(ec50_i, binseps[0]-binwidth), xytext=(ec50_i, binseps[0]), arrowprops=arrowprops)
ax.annotate("", xy=(ec50_i, binseps[-4]), xytext=(ec50_i, binseps[-3]), arrowprops=arrowprops)
ax.annotate("", xy=(ec50_i, binseps[-2]), xytext=(ec50_i, binseps[-1]), arrowprops=arrowprops)
ax.annotate("", xy=(ec50_i, binseps[-3]), xytext=(ec50_i, binseps[-2]), arrowprops=arrowprops)
ax.annotate("", xy=(ec50_i, binseps[-1]), xytext=(ec50_i, binseps[-1]+binwidth), arrowprops=arrowprops)

# Remove annoying spines
for xi in ["top", "right"]:
    ax.spines[xi].set_visible(False)
    ax.spines[xi].set_visible(False)

# Tick formatter to have two decimals and align with previous plot
def major_formatter(x, pos):
    return "{:.2f}".format(x)

# y axis labeling and ticking
ax.set_ylabel(r"$\mathrm{CDF}(\mathrm{EC}_{50})$", **ylabelprops)
ax.yaxis.set_major_formatter(mpl.ticker.FuncFormatter(major_formatter))
ax.set_yticks([0, 0.2, 0.4, 0.6, 0.8, 1.0])
ax.set_yticklabels(map(str, [0, 0.2, 0.4, 0.6, 0.8, 1.0]))

# x axis labeling and ticking
ax.set_xlabel(xlabelec50 , **xlabelprops)
ax.xaxis.set_major_formatter(x_ticker)
ax.set_xticks(majorxticks)
ax.set_xticks(minorticks, minor=True)
ax.tick_params(which="minor", axis="both", **props_minorticks)
ax.tick_params(which="major", axis="both", **props_majorticks)
ax.invert_xaxis()

### LATENT SPACE OF IDEAL PEPTIDES
ax = axes[0, 2]
# First plot the many samples we generated
for key in df_traj.index:
    pep = key[0]
    ax.plot(df_traj.loc[key, "Node 1"].values, df_traj.loc[key, "Node 2"].values, 
        color=colors_samples_dict[pep], ls="-", lw=0.8)

# Plot the means last
for pep in df_traj_mean.index:
    ecpower = int(np.floor(df_theo_meanparams.loc[pep, "log10ec50"]))
    ecnumber = 10**(df_theo_meanparams.loc[pep, "log10ec50"] - ecpower)
    ecnumber = int(round(ecnumber, 0))
    if ecnumber == 10:
        ecnumber = 1
        ecpower += 1
    # This is a case where we mainly stay at the origin
    if np.amax(np.abs(df_traj_mean.loc[pep, "Node 1"].values)) < 0.1:
        # Register the line in the legend
        ax.plot(df_traj_mean.loc[pep, "Node 1"].values[:2], df_traj_mean.loc[pep, "Node 2"].values[:2], 
            color=colors_dict[pep], ls="-", lw=3., 
            #label=r"EC${}_{50}=" + r"{} \times 10^{}$".format(ecnumber, ecpower))
            label=r"${} \times 10^{}$".format(ecnumber, ecpower))
        # Plot a big dot
        ax.plot(df_traj_mean.loc[pep, "Node 1"].max(), df_traj_mean.loc[pep, "Node 2"].max(), 
               marker="o", ms=7, ls="none", mfc=colors_dict[pep], mec=colors_dict[pep])
    else:
        ax.plot(df_traj_mean.loc[pep, "Node 1"], df_traj_mean.loc[pep, "Node 2"], 
          color=colors_dict[pep], ls="-", lw=3., 
          #label=r"EC${}_{50}=" + r"{} \times 10^{}$".format(ecnumber, ecpower))
          label=r"${} \times 10^{}$".format(ecnumber, ecpower))


# Add the spiral EC50 axis
ax.plot(df_n1n2_ec50.iloc[:, 0], df_n1n2_ec50.iloc[:, 1], color=(0.3,)*3, lw=1.5, zorder=100)
ax.arrow(df_n1n2_ec50.iloc[0, 0], df_n1n2_ec50.iloc[0, 1], 
         1.5*(df_n1n2_ec50.iloc[0, 0] - df_n1n2_ec50.iloc[1, 0]), 
         1.5*(df_n1n2_ec50.iloc[0, 1] - df_n1n2_ec50.iloc[1, 1]), color=(0.3,)*3, 
         shape='full', lw=1.5, length_includes_head=False, head_width=0.4, head_length=0.3, zorder=101)

# Markers on that spiral for the selected peptides TODO
pep = 0
for i in theo_peptides_indices[:-1]:
    ax.plot(df_n1n2_ec50.iloc[i, 0], df_n1n2_ec50.iloc[i, 1], marker="o", color=colors_dict[pep], ls="none", mec=(0.3,)*3, mew=0.8, ms=4, zorder=102+pep)
    pep += 1

# Annotate the time represented by the spiral
ax.annotate(r"$t={}\,$h".format(tchoice), 
            xy=(df_n1n2_ec50.iloc[0, 0]+1.5*(df_n1n2_ec50.iloc[0, 0] - df_n1n2_ec50.iloc[1, 0]),
                df_n1n2_ec50.iloc[0, 1]+2.0*(df_n1n2_ec50.iloc[0, 1] - df_n1n2_ec50.iloc[1, 1])), 
            ha="right", va="bottom", fontsize=7, color=(0.3,)*3
)

# Legend outside of the plot
leg_kwargs = dict(fontsize=6, handlelength=0.8, borderpad=0.3, borderaxespad=0.3, 
                  frameon=False, labelspacing=0.3, handletextpad=0.5)
leg = ax.legend(loc="upper left", bbox_to_anchor=(1.0, 1.0), title="Theoretical\nAntigen\n" + r"EC${}_{50}$ (#)", 
          title_fontsize=6, **leg_kwargs)

# Remove top and right spines
for axis in ["top", "right"]:
    ax.spines[axis].set_visible(False)

# Other labeling
ls1label = r"$LS_1$ (a. u.)"
ls2label = r"$LS_2$ (a. u.)"
ax.set_xlabel(ls1label, **xlabelprops)
ax.set_ylabel(ls2label, **ylabelprops)
ax.set_xticks([0])
ax.set_yticks([0])
ax.set_xticklabels([0])
ax.set_yticklabels([0])

### PANEL D: N1 AND N2 AS A FUNCTION OF EC50
ax = axes[1, 0]
# Horizontal lines marking the theoretical antigen classes
for pep, i in enumerate(theo_peptides_indices):
    ax.axvline(x=10**df_n1n2_ec50.iloc[i, 2], color=colors_dict[pep], linestyle='--', lw=2.)

# Plot N1 and N2 at tchoice vs Ec50
for i in range(2):
    lbl = ls1label[:-8] if i == 0 else ls2label[:-8]
    ax.plot(10**df_n1n2_ec50.iloc[:, 2], df_n1n2_ec50.iloc[:, i], label=lbl, 
            color=nodePalette[i], lw=2.5)

# Mark the ideal peptides
# Markers on that spiral for the selected peptides TODO
pep = 0
for i in theo_peptides_indices:
    ax.plot(10**df_n1n2_ec50.iloc[i, 2], df_n1n2_ec50.iloc[i, 0], marker="o", color=colors_dict[pep], ls="none", mec='k', mew=0.8, ms=6)
    if pep < len(theo_peptides_indices)-1:
        ax.plot(10**df_n1n2_ec50.iloc[i, 2], df_n1n2_ec50.iloc[i, 1], marker="o", color=colors_dict[pep], ls="none", mec='k', mew=0.8, ms=6)
    pep += 1

# Labeling, etc.
for axis in ["top", "right"]:
    ax.spines[axis].set_visible(False)
ax.set_xlabel(r"Antigen EC$_{50}$ (#)", fontsize=7, labelpad=1)
ax.set_ylabel("Latent Space (a. u.)", fontsize=7, labelpad=1)
ax.set_yticks([0])
ax.set_yticklabels([0])

ax.set_xscale('log')
locmin = mpl.ticker.LogLocator(base=10.0,subs=np.linspace(0.1,0.9,num=9,endpoint=True).tolist(),numticks=50)
ax.xaxis.set_minor_locator(locmin)
ax.xaxis.set_minor_formatter(mpl.ticker.NullFormatter())
xticks = [10**5,10**4,10**3,10**2,10**1,10**0]
xticklabels = ['10$^{'+str(int(np.log10(x)))+'}$' for x in xticks]
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)
ax.tick_params(which="minor", axis="both", **props_minorticks)
ax.tick_params(which="major", axis="both", **props_majorticks)
ax.invert_xaxis()
ax.set_xlabel('Antigen EC$_{50}$ (#)')

leg_kwargs = dict(fontsize=7 , handlelength=0.8, borderpad=0.5, borderaxespad=0.3, 
                  frameon=True, labelspacing=0.3, handletextpad=0.5, framealpha=0.9)
leg2 = ax.legend(**leg_kwargs)


fig.tight_layout(w_pad=0.4, h_pad=3.)
#fig.savefig(os.path.join("figures", "highmi13", "fig3CD_probability_cumulative_latentspace_N1N2vsEC50.pdf"), 
#           transparent=True, bbox_inches="tight", bbox_extra_artists=(leg, leg2))

# Supplementary figure: parameter space of theoretical antigen classes

In [None]:
#@title Generate parameter space samples
rndgen = np.random.RandomState(seed=53739)
nsamples2 = 72
df_theo_samples2 = pd.DataFrame(np.zeros([nsamples2*n_categories, 7]), 
                        index=pd.MultiIndex.from_product([range(n_categories), range(nsamples2)], 
                            names=["TheoreticalPeptide", "Sample"]), 
                        columns=df_theo_meanparams.columns)
for i in range(n_categories):
    # Generate nsamples parameter samples for each peptide
    df_theo_samples2.loc[i].iloc[:, :3] = np.clip(rndgen.multivariate_normal(
        interpolated_means[theo_peptides_indices[i]], 
        interpolated_covmats[theo_peptides_indices[i]], nsamples2), 
        a_min=[0.0, 0.0, -np.pi], a_max=None)
    
    for j, pch in zip((3, 4, 5), ("v1", "alpha", "beta")):
        df_theo_samples2.loc[i].iloc[:, j] = np.clip(rndgen.normal(
            theo_peptides_par_means[pch][i], theo_peptides_par_varis[pch][i], nsamples2), a_min=0.02, a_max=None)
    
    df_theo_samples2.loc[i].iloc[:, 6] = theo_peptides_log10ec50s[i]

In [None]:
# Model parameter space distributions of theoretical antigen classes
# Use df_theo_samples? Or generate more than 32 samples
# Highlight df_theo_meanparams with a larger point. 
# Use the same color maps as above. colors_dict, colors_samples_dict
fig, ax = plt.subplots()
fig.set_size_inches(2., 1.75)

# First plot the many samples we generated
# We plot theta vs t0? Or F vs theta, or F vs t0? Try them all
p_sel = ["a0", "theta"]
for pep in df_traj.index.get_level_values("TheoreticalPeptide").unique():
    ax.plot(df_theo_samples2.loc[pep, p_sel[0]].values, df_theo_samples2.loc[pep, p_sel[1]].values, 
         marker="o", mfc=colors_samples_dict[pep], mec=colors_dict[pep], ls="none", ms=3, alpha=0.7)

# Plot the means last
for pep in df_traj_mean.index:
    ecpower = int(np.floor(df_theo_meanparams.loc[pep, "log10ec50"]))
    ecnumber = 10**(df_theo_meanparams.loc[pep, "log10ec50"] - ecpower)
    ecnumber = round(ecnumber, 1)
    ax.plot(df_theo_meanparams.loc[pep, p_sel[0]], df_theo_meanparams.loc[pep, p_sel[1]], 
        mfc=colors_dict[pep], mec="k", mew=1., ls="none", marker="o", ms=6, 
        label=r"EC${}_{50}=" + r"{} \times 10^{}$".format(ecnumber, ecpower))

# Add a legend
split_legend = True
if not split_legend:
    dict(fontsize=6, handlelength=1., borderpad=0.3, frameon=False, labelspacing=0.2)
    ax.legend(**leg_kwargs)

else:
    # Create a split legend for the average lines. 
    handles, labels = ax.get_legend_handles_labels()

    handsplit = 2
    leg_kwargs = dict(fontsize=6, handlelength=1., borderpad=0.3, frameon=True, 
                      labelspacing=0.2, markerscale=0.75, handletextpad=0.3)
    first_legend = plt.legend(handles=handles[:handsplit], labels=labels[:handsplit], 
                        loc='upper left', bbox_to_anchor=(0, 1.05), **leg_kwargs)

    # Add the legend manually to the current Axes.
    ax.add_artist(first_legend)

    # Create another legend for the second line.
    second_legend = plt.legend(handles=handles[handsplit:], labels=labels[handsplit:], 
                        loc='lower right', bbox_to_anchor=(1.02, 0), **leg_kwargs)
    ax.add_artist(second_legend)

# Remove top and right spines
for axis in ["top", "right"]:
    ax.spines[axis].set_visible(False)

# Other labeling
ax.set_xlabel(r"${}$".format(p_sel[0]), fontsize=7)
ax.set_ylabel(r"$\{}$".format(p_sel[1]), fontsize=7)
ax.set_xticks([])
ax.set_yticks([])

fig.tight_layout()

# Uncomment to save figure
#fig.savefig(os.path.join("figures", "highmi13", "supp_panel_theo_peptide_ballistic_param_space.pdf"), 
#            transparent=True, bbox_inches="tight")
plt.show()
plt.close()