# Evaluation of the 0D validation plate (exp100/JG405)

We have previously made predictions for this plate and now have obtained the analytical data.
We want to evaluate how well our predictions align with the experiment.

The plate was originally designed using the model trained on 2023-09-05 data and it was designed such that all attempted syntheses were predicted to work by the model. In the meantime we had to make minor changes in the data and retrain on the updated 2023-12-20 data. While the predictions of the two models align well (ca. 97% accuracy across the VL across products), we need to account for the differences.

There is no completely unbiased way to do this. The best is to just drop all of the cases where the new model predicts a negative value and evaluate the precision based on the remaining values. This is slightly biased because the compounds were "pre-selected" by the earlier model. Fortunately the influence of this will be small since the models align so well.

In [None]:
import pathlib
import sys

sys.path.append(str(pathlib.Path().resolve().parents[1]))

import numpy as np
import pandas as pd
from sklearn.metrics import precision_score, recall_score, accuracy_score

from src.definitions import DATA_DIR
from src.util.db_utils import SynFermDatabaseConnection

In [None]:
# load plate data used for inference (we only need the vl_id to match with the experimental results)
val_plate = pd.read_csv(DATA_DIR / "curated_data" / "validation_plates.csv")[["vl_id"]]
val_plate.head()

In [None]:
# load the predictions
preds = pd.read_csv(DATA_DIR / "curated_data" / "validation_plates_pred_2024-04-18.csv")
preds.head()

In [None]:
# merge plate data with preds
preds = pd.concat([val_plate, preds], axis=1)

In [None]:
con = SynFermDatabaseConnection()

In [None]:
res = con.con.execute("SELECT id, vl_id, well, initiator_long, monomer_long, terminator_long, product_A_lcms_ratio, product_B_lcms_ratio, product_C_lcms_ratio FROM experiments WHERE exp_nr = 100 AND (valid NOT LIKE '%ERROR%' OR valid IS NULL);").fetchall()
result = pd.DataFrame(res, columns=["id", "vl_id", "well", "initiator_long", "monomer_long", "terminator_long", "product_A_lcms_ratio", "product_B_lcms_ratio", "product_C_lcms_ratio"])
result["binary_A"] = (result["product_A_lcms_ratio"] > 0).astype(int)
result["binary_B"] = (result["product_B_lcms_ratio"] > 0).astype(int)
result["binary_C"] = (result["product_C_lcms_ratio"] > 0).astype(int)
result.head()

In [None]:
len(result)

In [None]:
# combine predictions and results
comb = result.merge(preds, on="vl_id", how="left")
comb.head()

In [None]:
# are there any compounds that the new model would not have predicted to work?
comb.loc[comb["pred_A"] == 0]

### Side note on bias through model update
Turns out there are only 1/150 instances where the two models disagree. We drop this data point before we continue the analysis

In [None]:
comb = comb.loc[comb["pred_A"] == 1]
len(comb)

In [None]:
# Evaluate for binary_A: What was the models prospective precision?
len(comb.loc[comb["binary_A"] == 1]) / len(comb)

In [None]:
# evaluate for binary_B
print(f'Accuracy: {accuracy_score(comb["binary_B"], comb["pred_B"]):.2%}')
print(f'Precision: {precision_score(comb["binary_B"], comb["pred_B"]):.2%}')
print(f'Recall: {recall_score(comb["binary_B"], comb["pred_B"]):.2%}')

In [None]:
# evaluate for binary_C
print(f'Accuracy: {accuracy_score(comb["binary_C"], comb["pred_C"]):.2%}')
print(f'Precision: {precision_score(comb["binary_C"], comb["pred_C"]):.2%}')
print(f'Recall: {recall_score(comb["binary_C"], comb["pred_C"]):.2%}')

In [None]:
# show the wells that had valid reactions in the plate layout
# n.b. we ignore the right half of the plate b/c all of that was invalid (oxalic acid transfer error)
arr = np.zeros((16, 10), dtype=int)

for well in comb["well"]:
    row = ord(well[0]) - 65
    col = int(well[1:]) - 3
    arr[row, col] = 1
arr

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
sns.heatmap(arr)

In [None]:
palette = sns.color_palette(["#e42536", "#f0f0f0", "#5790fc"])
palette

In [None]:
# where was A predicted correctly?
arr = np.zeros((16, 10), dtype=int)

for i, dfrow in comb.iterrows():
    row = ord(dfrow["well"][0]) - 65
    col = int(dfrow["well"][1:]) - 3
    if dfrow["pred_A"] == dfrow["binary_A"]:
        arr[row, col] = 1
    else:
        arr[row, col] = -1
plt.figure(figsize=(1.2, 1))
ax = sns.heatmap(arr, center=0, cmap=palette, cbar=False, linewidths=0.1)
ax.set_xticks([])
ax.set_yticks([])
plt.tight_layout()
plt.savefig("exp100_precisionA.svg", transparent=True)

In [None]:
# where was B predicted correctly?
arr = np.zeros((16, 10), dtype=int)

for i, dfrow in comb.iterrows():
    row = ord(dfrow["well"][0]) - 65
    col = int(dfrow["well"][1:]) - 3
    if dfrow["pred_B"] == dfrow["binary_B"]:
        arr[row, col] = 1
    else:
        arr[row, col] = -1
sns.heatmap(arr, center=0, cmap=palette)

In [None]:
# where was C predicted correctly?
arr = np.zeros((16, 10), dtype=int)

for i, dfrow in comb.iterrows():
    row = ord(dfrow["well"][0]) - 65
    col = int(dfrow["well"][1:]) - 3
    if dfrow["pred_C"] == dfrow["binary_C"]:
        arr[row, col] = 1
    else:
        arr[row, col] = -1
sns.heatmap(arr, center=0, cmap=palette)