**Reviewer question:** Additionally, the authors mention that “consensus ranks with most trustworthy explanations” (llm_reranking.py). How was trustworthiness determined? Did human reviewers assess and select the most reliable responses from the three outputs, or was an automated method used? Clarifying this process would strengthen the validity of the feature selection approach.

In [9]:
import os
import pandas as pd
import json
import collections

results_dir = os.path.abspath("../../results")
print(f"Results directory: {results_dir}")

data = {}
responses_dir = os.path.join(results_dir, "results_pairs", "reranking", "responses")
for round in [0, 1, 2]:
    round_dir = os.path.join(responses_dir, f"round_{round}")
    for file in os.listdir(os.path.join(round_dir, "json")):
        if file.endswith(".json"):
            drug_name = file.split(".json")[0]
            with open(os.path.join(round_dir, "json", file), "r") as f:
                ranks = json.load(f)
                key = (drug_name, round)
                data[key] = ranks


Results directory: /Users/mduranfrigola/Documents/GitHub/pharmacogx-embeddings/results


In [17]:
table = collections.defaultdict(list)
for k,v in data.items():
    drug_name, round = k
    for rank in v:
        gene = rank["gene"]
        r = rank["rank"]
        explanation = rank["explanation"]
        table[(drug_name, gene, r)] += [(round, explanation)]

R = []
for k,v in table.items():
    R += [[k[0], k[1], k[2]] + [x[1] for x in v]]

df = pd.DataFrame(R, columns=["drug", "gene", "rank", "explanation_0", "explanation_1", "explanation_2"])

In [19]:
df.to_csv("assets/reranking_explanations.csv", index=False)

In [20]:
df

Unnamed: 0,drug,gene,rank,explanation_0,explanation_1,explanation_2
0,Rifapentine,CYP3A4,1,CYP3A4 significantly alters the metabolism and...,CYP3A4 is instrumental in drug metabolism part...,CYP3A4 metabolizes a significant proportion of...
1,Rifapentine,CYP2C9,2,Certain alleles of CYP2C9 may decrease enzyme ...,The gene CYP2C9 plays a crucial metabolic role...,CYP2C9 is another essential enzyme influencing...
2,Rifapentine,SLCO1B1,3,This gene codes for a hepatic uptake transport...,,
3,Rifapentine,NR1I2,4,NR1I2 carries implications for Rifapentine's p...,,
4,Rifapentine,NAT2,5,Variations in NAT2 often impact drug response ...,NAT2 is involved in the process of N-acetylati...,
...,...,...,...,...,...,...
731,Delamanid,CYP3A5,6,CYP3A5 is involved in metabolism of several dr...,,
732,Delamanid,CYP2C19,7,CYP2C19 significantly influences the metabolis...,,
733,Delamanid,CYP2D7,8,Although CYP2D7 is a pseudogene and does not c...,,
734,Delamanid,CYP2B6,9,CYP2B6 is known to metabolize several importan...,,


In [24]:
df[df["explanation_2"].notnull()].to_csv("assets/reranking_explanations_filtered.csv", index=False)

To evaluate robustness of predictions, we can use embeddings from OpenAI

In [32]:
from dotenv import load_dotenv
import openai
load_dotenv("../../.env")

openai.api_key = os.getenv("OPENAI_API_KEY")

In [36]:
import openai
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

def get_embedding(text: str, model: str = "text-embedding-3-small") -> np.ndarray:
    response = openai.embeddings.create(
        input=[text],
        model=model
    )
    return np.array(response.data[0].embedding)

def paragraph_similarity(p1: str, p2: str, verbose: bool = True) -> float:
    emb1 = get_embedding(p1)
    emb2 = get_embedding(p2)
    similarity = cosine_similarity([emb1], [emb2])[0][0]
    if verbose:
        print(f"Cosine similarity: {similarity:.4f}")
        if similarity > 0.85:
            print("→ Very similar content.")
        elif similarity > 0.6:
            print("→ Related but not identical.")
        else:
            print("→ Possibly unrelated.")
    return float(similarity)


In [None]:
explanations = [df["explanation_0"].tolist(), 
                df["explanation_1"].tolist(), 
                df["explanation_2"].tolist()]

row_idxs = [i for i in range(df.shape[0])]
col_idxs = [i for i in range(3)]

similarities = []
for _ in range(10000):
    row_idx = np.random.choice(row_idxs)
    col_idx_0 = np.random.choice(col_idxs)
    col_idx_1 = np.random.choice(col_idxs)
    if col_idx_0 == col_idx_1:
        continue
    p1 = explanations[col_idx_0][row_idx]
    p2 = explanations[col_idx_1][row_idx]
    if p1 is None or p2 is None:
        continue
    print(f"Comparing explanations for drug {df['drug'][row_idx]}, gene {df['gene'][row_idx]}:")
    print(f"Explanation {col_idx_0}: {p1}")
    print(f"Explanation {col_idx_1}: {p2}")
    similarity = paragraph_similarity(p1, p2)
    similarities += [similarity]
    if len(similarities) > 1000:
        break 
    with open("assets/similarities.csv", "a") as f:
        f.write(f"{df['drug'][row_idx]},{df['gene'][row_idx]},{col_idx_0},{col_idx_1},{similarity}\n")
    print(f"Similarity: {similarity:.4f}\n")

    

Comparing explanations for drug Terizidone, gene CYP2C9:
Explanation 0: CYP2C9 metabolizes various drugs, where genetic variants significantly affect those drugs' therapeutic effectiveness and adverse effects. Given the direct link between terizidone and the cytochromes P450 family indicated in the auxiliary materials, CYP2C9 stands out as a potential player in terizidone's pharmacogenetic interactions, possibly influencing metabolism, efficacy, and side effects.
Explanation 1: CYP2C9 alterations influence the pharmacokinetics of various drugs by playing a key role in their metabolism. Variations in this gene, hence, dictate the personalized methodologies used for drug dosage adjustments. Given that CYP2C9 likely influences the metabolism of Terizidone, its variants could impact the drug's overall effectiveness and risk of side effects.
Cosine similarity: 0.9160
→ Very similar content.
Similarity: 0.9160

Comparing explanations for drug Piperaquine, gene CYP2C19:
Explanation 0: CYP2C19

In [None]:
rand_similarities = []
for _ in range(10000):
    row_idx_0 = np.random.choice(row_idxs)
    row_idx_1 = np.random.choice(row_idxs)
    if row_idx_0 == row_idx_1:
        continue
    col_idx_0 = np.random.choice(col_idxs)
    col_idx_1 = np.random.choice(col_idxs)
    if col_idx_0 == col_idx_1:
        continue
    p1 = explanations[col_idx_0][row_idx_0]
    p2 = explanations[col_idx_1][row_idx_1]
    if p1 is None or p2 is None:
        continue
    similarity = paragraph_similarity(p1, p2)
    rand_similarities += [similarity]
    if len(rand_similarities) > 1000:
        break 
    with open("assets/random_similarities.csv", "a") as f:
        f.write(f"{similarity}\n")
    print(f"Similarity: {similarity:.4f}\n")

Cosine similarity: 0.6656
→ Related but not identical.
Similarity: 0.6656

Cosine similarity: 0.4915
→ Possibly unrelated.
Similarity: 0.4915

Cosine similarity: 0.4205
→ Possibly unrelated.
Similarity: 0.4205

Cosine similarity: 0.5290
→ Possibly unrelated.
Similarity: 0.5290

Cosine similarity: 0.4662
→ Possibly unrelated.
Similarity: 0.4662

Cosine similarity: 0.4122
→ Possibly unrelated.
Similarity: 0.4122

Cosine similarity: 0.3642
→ Possibly unrelated.
Similarity: 0.3642

Cosine similarity: 0.5999
→ Possibly unrelated.
Similarity: 0.5999

Cosine similarity: 0.4511
→ Possibly unrelated.
Similarity: 0.4511

Cosine similarity: 0.4686
→ Possibly unrelated.
Similarity: 0.4686

Cosine similarity: 0.6628
→ Related but not identical.
Similarity: 0.6628

Cosine similarity: 0.6645
→ Related but not identical.
Similarity: 0.6645

Cosine similarity: 0.4118
→ Possibly unrelated.
Similarity: 0.4118

Cosine similarity: 0.4434
→ Possibly unrelated.
Similarity: 0.4434

Cosine similarity: 0.7213
→