In [5]:
import pandas as pd
import numpy as np

In [2]:
### Reasoning Analysis
# randomly select three for each category of the confusion matrix and check the reasoning.
# 1) Llama2_70b Zeroshot No Finetune *Hard Task
# 2) Llama2_70b *Fewshot No Finetune Simple Task
# 3) Llama2_70b *Zeroshot No Finetune Simple Task
# 4) Mistral Fewshot No Finetune Simple Task /// comparison of (1) (3) (2) (4)
# 5) Mistral *Zeroshot *Finetune Simple Task 
# 6) Mistral *Fewshot *Finetune Simple Task
# 7) Mistral *Fewshot2 *Finetune Simple Task /// comparison of (5) (6) (7)

In [15]:
def sample_rows(df, pos_label='Sensitive', neg_label='Resistant', n_samples=3):
    tp = df[(df['answer'] == pos_label) & (df['prediction'] == pos_label)]
    fp = df[(df['answer'] == neg_label) & (df['prediction'] == pos_label)]
    tn = df[(df['answer'] == neg_label) & (df['prediction'] == neg_label)]
    fn = df[(df['answer'] == pos_label) & (df['prediction'] == neg_label)]
    
    def safe_sample(data, n):
        return data.sample(min(len(data), n), random_state=42) if len(data) > 0 else pd.DataFrame()

    tp_sample = safe_sample(tp, n_samples)
    fp_sample = safe_sample(fp, n_samples)
    tn_sample = safe_sample(tn, n_samples)
    fn_sample = safe_sample(fn, n_samples)

    combined_samples = pd.concat([tp_sample, fp_sample, tn_sample, fn_sample]).reset_index(drop=True)
    
    return combined_samples

In [3]:
# 1) Llama2_70b Zeroshot No Finetune *Hard Task
task1_data_01 = pd.read_csv("/data/yingfei/cancer_data/llm_prompt_data/llm_inference_data/sens_diff_70b_inf.csv")
task1_data_01 = task1_data_01.loc[task1_data_01.output_70.notna()].copy()
print(task1_data_01.shape)
task1_data_01.head()

(140, 4)


Unnamed: 0,cell_id,prompt,answer,output_70
0,ACH-000001,Think step by step and decide in a single word...,Sensitive,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...
1,ACH-000001,Think step by step and decide in a single word...,Resistant,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...
2,ACH-000001,Think step by step and decide in a single word...,Sensitive,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...
3,ACH-000001,Think step by step and decide in a single word...,Resistant,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...
4,ACH-000001,Think step by step and decide in a single word...,Resistant,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...


In [4]:
task1_data_01['prediction'] = task1_data_01.output_70.apply(lambda x: x.split("Answer:")[1].split("Reasoning:")[0][:-2][1:])
task1_data_01.prediction.unique() # All sensitive

array(['Sensitive'], dtype=object)

In [16]:
task1_samples = sample_rows(task1_data_01)
task1_samples

Unnamed: 0,cell_id,prompt,answer,output_70,prediction
0,ACH-000001,Think step by step and decide in a single word...,Sensitive,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...,Sensitive
1,ACH-000001,Think step by step and decide in a single word...,Sensitive,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...,Sensitive
2,ACH-000001,Think step by step and decide in a single word...,Sensitive,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...,Sensitive
3,ACH-000001,Think step by step and decide in a single word...,Resistant,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...,Sensitive
4,ACH-000001,Think step by step and decide in a single word...,Resistant,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...,Sensitive
5,ACH-000001,Think step by step and decide in a single word...,Resistant,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...,Sensitive


In [22]:
for i in range(len(task1_samples)):
    print(task1_samples.prompt[i])
    print("Answer:", task1_samples.answer[i], "Prediction:", task1_samples.prediction[i])
    print(task1_samples.output_70[i])
    print()

Think step by step and decide in a single word reflecting the drug sensitivity of the drug on the cell line with given mutations: [Sensitive/Resistant], [Reasoning].
Drug and cell line mutations: 
The drug is TIPIFARNIB. The drug SMILES structure is CN1C=NC=C1[C@@](C2=CC=C(C=C2)Cl)(C3=CC4=C(C=C3)N(C(=O)C=C4C5=CC(=CC=C5)Cl)C)N. Drug target is Farnesyl-transferase (FNTA). Drug target pathway is Other.
The mutations of the cell line are NOTCH1, NOTCH3, PIK3R1, PPP2R1A, TP53, TSC2, WHSC1L1.
Drug Sensitivity: ?
Answer: Sensitive Prediction: Sensitive
Reasoning:?

Answer: Sensitive.
Reasoning: TIPIFARNIB is a farnesyltransferase inhibitor that targets the FNTA enzyme. The drug works by inhibiting the farnesylation of proteins, which is a post-translational modification that plays a crucial role in various cellular processes, including cell signaling and protein-protein interactions.

The cell line has mutations in several genes, including NOTCH1, NOTCH3, PIK3R1, PPP2R1A, TP53, TSC2, and WHSC

In [23]:
# 2) Llama2_70b *Fewshot No Finetune Simple Task
task1_data_02 = pd.read_csv("/data/yingfei/cancer_data/llm_prompt_data/llm_inference_data/sens_simp_70b_inf_test.csv")
task1_data_02 = task1_data_02.loc[task1_data_02.output70.notna()].copy()
print(task1_data_02.shape)
task1_data_02.head()

(200, 4)


Unnamed: 0,cell_id,prompt,answer,output70
0,ACH-000006,Think step by step and decide in a single word...,Sensitive,"Based on the information provided, I would det..."
1,ACH-000006,Think step by step and decide in a single word...,Resistant,"Based on the information provided, my answer f..."
2,ACH-000006,Think step by step and decide in a single word...,Resistant,"Based on the information provided, I would det..."
3,ACH-000006,Think step by step and decide in a single word...,Sensitive,Sensitive.\n\nThe drug is a tyrosine kinase in...
4,ACH-000006,Think step by step and decide in a single word...,Sensitive,"Based on the information provided, my answer f..."


In [24]:
task1_data_02['prediction'] = task1_data_02['output70'].apply(lambda x: "Check" if "sensitive" in x.lower() and "resistant" in x.lower() else \
                                                               "Sensitive" if "sensitive" in x.lower() else "Resistant" if "resistant" in x.lower() \
                                                               else "None")
task1_data_02.loc[task1_data_02.prediction == "Check", 'prediction'] = "Resistant"
task1_data_02.loc[task1_data_02.prediction == "None", 'prediction'] = "Unknown"
task1_data_02.prediction.value_counts()

prediction
Sensitive    107
Resistant     91
Unknown        2
Name: count, dtype: int64

In [27]:
task1_samples = sample_rows(task1_data_02)
task1_samples

Unnamed: 0,cell_id,prompt,answer,output70,prediction
0,ACH-000006,Think step by step and decide in a single word...,Sensitive,Sensitive.\n\nThe drug is a tyrosine kinase in...,Sensitive
1,ACH-000006,Think step by step and decide in a single word...,Sensitive,"Based on the information provided, the drug se...",Sensitive
2,ACH-000078,Think step by step and decide in a single word...,Sensitive,"Based on the information provided, I would det...",Sensitive
3,ACH-000105,Think step by step and decide in a single word...,Resistant,"Based on the information provided, my answer f...",Sensitive
4,ACH-000105,Think step by step and decide in a single word...,Resistant,"Based on the information provided, my answer f...",Sensitive
5,ACH-000105,Think step by step and decide in a single word...,Resistant,"Based on the information provided, my answer f...",Sensitive
6,ACH-000105,Think step by step and decide in a single word...,Resistant,"Based on the information provided, the drug se...",Resistant
7,ACH-000067,Think step by step and decide in a single word...,Resistant,"Based on the information provided, I would det...",Resistant
8,ACH-000067,Think step by step and decide in a single word...,Resistant,"Based on the information provided, I would det...",Resistant
9,ACH-000067,Think step by step and decide in a single word...,Sensitive,"Based on the information provided, I would det...",Resistant


In [28]:
for i in range(len(task1_samples)):
    print(task1_samples.prompt[i])
    print("Answer:", task1_samples.answer[i], "Prediction:", task1_samples.prediction[i])
    print(task1_samples.output70[i])
    print()

Think step by step and decide in a single word reflecting the drug sensitivity of the drug on the cell line with given mutations: [Sensitive/Resistant], [Reasoning].
Drug and cell line mutations: 
The drug is LINSITINIB. The drug SMILES structure is CC1(CC(C1)C2=NC(=C3N2C=CN=C3N)C4=CC5=C(C=C4)C=CC(=N5)C6=CC=CC=C6)O. Drug target is IGF1R. Drug target pathway is IGF1R signaling.
The mutations of the cell line are ASXL1, FLT3, MAP3K1, NOTCH1, TP53, U2AF1.
Drug Sensitivity: ?
Answer: Sensitive Prediction: Sensitive
Sensitive.

The drug is a tyrosine kinase inhibitor that targets IGF1R, which is mutated in the cell line. The presence of the mutation suggests that the cell line may be sensitive to the drug.

Think step by step and decide in a single word reflecting the drug sensitivity of the drug on the cell line with given mutations: [Sensitive/Resistant], [Reasoning].
Drug and cell line mutations: 
The drug is PI-103. The drug SMILES structure is C1COCCN1C2=NC(=NC3=C2OC4=C3C=CC=N4)C5=CC(=

In [29]:
# 3) Llama2_70b *Zeroshot No Finetune Simple Task
task1_data_03 = pd.read_csv("/data/yingfei/cancer_data/llm_prompt_data/llm_inference_data/test_prompt_data_simple_llama70b.csv")
task1_data_03 = task1_data_03.loc[task1_data_03.output_70.notna()].copy()
print(task1_data_03.shape)
task1_data_03.head()

(200, 4)


Unnamed: 0,cell_id,prompt,answer,output_70
0,ACH-000006,Think step by step and decide in a single word...,Sensitive,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...
1,ACH-000006,Think step by step and decide in a single word...,Resistant,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...
2,ACH-000006,Think step by step and decide in a single word...,Resistant,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...
3,ACH-000006,Think step by step and decide in a single word...,Sensitive,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...
4,ACH-000006,Think step by step and decide in a single word...,Sensitive,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...


In [30]:
task1_data_03['prediction'] = task1_data_03['output_70'].apply(lambda x: "Check" if "sensitive" in x.lower() and "resistant" in x.lower() else \
                                                               "Sensitive" if "sensitive" in x.lower() else "Resistant" if "resistant" in x.lower() \
                                                               else "None")
task1_data_03['prediction'] = task1_data_03['output_70'].apply(lambda x: x.split("Answer: ")[1].\
                                                                         split(".\nReasoning:")[0])
task1_data_03['prediction'].value_counts()

prediction
Sensitive    198
Resistant      2
Name: count, dtype: int64

In [32]:
task1_samples = sample_rows(task1_data_03)
task1_samples

Unnamed: 0,cell_id,prompt,answer,output_70,prediction
0,ACH-000021,Think step by step and decide in a single word...,Sensitive,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...,Sensitive
1,ACH-000078,Think step by step and decide in a single word...,Sensitive,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...,Sensitive
2,ACH-000078,Think step by step and decide in a single word...,Sensitive,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...,Sensitive
3,ACH-000045,Think step by step and decide in a single word...,Resistant,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...,Sensitive
4,ACH-000021,Think step by step and decide in a single word...,Resistant,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...,Sensitive
5,ACH-000067,Think step by step and decide in a single word...,Resistant,Reasoning:?\n\nAnswer: Sensitive.\nReasoning: ...,Sensitive
6,ACH-000078,Think step by step and decide in a single word...,Resistant,Answer: Resistant.\nReasoning: The drug SHIKON...,Resistant
7,ACH-000078,Think step by step and decide in a single word...,Sensitive,Answer: Resistant.\nReasoning: The drug ARA-G ...,Resistant


In [34]:
for i in range(len(task1_samples)):
    print(task1_samples.prompt[i])
    print("Answer:", task1_samples.answer[i], "Prediction:", task1_samples.prediction[i])
    print(task1_samples.output_70[i])
    print()

Think step by step and decide in a single word reflecting the drug sensitivity of the drug on the cell line with given mutations: [Sensitive/Resistant], [Reasoning].
Drug and cell line mutations: 
The drug is STF-62247. The drug SMILES structure is CC1=CC(=CC=C1)NC2=NC(=CS2)C3=CC=NC=C3. Drug target is Autophagy inducer. Drug target pathway is Other.
The mutations of the cell line are ALK, AR, ASXL1, ATR, BRCA2, CSF1R, KDM5C, MLL, MUTYH, SMARCA4, TP53.
Drug Sensitivity: ?
Answer: Sensitive Prediction: Sensitive
Reasoning:?

Answer: Sensitive.
Reasoning: The drug STF-62247 is an autophagy inducer, and it has been shown that cells with mutations in genes involved in DNA repair, such as BRCA2, are more sensitive to autophagy-inducing drugs. Additionally, mutations in genes involved in the PI3K/Akt/mTOR pathway, such as PIK3CA, can also increase sensitivity to autophagy-inducing drugs. The cell line has mutations in both BRCA2 and PIK3CA, suggesting that it may be sensitive to STF-62247.

D

In [35]:
# 4) Mistral Fewshot No Finetune Simple Task /// comparison of (1) (3) (2) (4)
task1_data_05 = pd.read_csv("/data/yingfei/cancer_data/llm_prompt_data/llm_inference_data/test_fewshot_simple_mistral.csv")
task1_data_05 = task1_data_05.loc[task1_data_05.output_Mis.notna()].copy()
task1_data_05.drop(columns = ["output_med"], inplace = True)
print(task1_data_05.shape)
task1_data_05.head()

(200, 4)


Unnamed: 0,cell_id,prompt,answer,output_Mis
0,ACH-000006,Think step by step and decide in a single word...,Sensitive,Sensitive. While the mutations in the cell lin...
1,ACH-000006,Think step by step and decide in a single word...,Resistant,"The mutations of the cell line are ASXL1, FLT3..."
2,ACH-000006,Think step by step and decide in a single word...,Resistant,"The mutations of the cell line are ASXL1, FLT3..."
3,ACH-000006,Think step by step and decide in a single word...,Sensitive,Sensitive (LINSITINIB is known to be sensitive...
4,ACH-000006,Think step by step and decide in a single word...,Sensitive,"The mutations of the cell line are ASXL1, FLT3..."


In [36]:
task1_data_05['prediction'] = task1_data_05['output_Mis'].apply(lambda x: "Check" if "sensitive" in x.lower() and "resistant" in x.lower() else \
                                                               "Sensitive" if "sensitive" in x.lower() else "Resistant" if "resistant" in x.lower() \
                                                               else "None")
for i in range(len(task1_data_05)):
    prediction_val = task1_data_05.loc[i, "prediction"]
    output = task1_data_05.loc[i, "output_Mis"]
    if prediction_val == "Check":
        if "sensitive or resistant" in output.lower():
            task1_data_05.loc[i, "prediction"] = "None"
        elif "Sensitive" in output:
            task1_data_05.loc[i, "prediction"] = "Sensitive"
        elif "Resistant" in output:
            task1_data_05.loc[i, "prediction"] = "Resistant"
task1_data_05.prediction.value_counts()

prediction
None         98
Sensitive    95
Resistant     7
Name: count, dtype: int64

In [37]:
task1_samples = sample_rows(task1_data_05)
task1_samples

Unnamed: 0,cell_id,prompt,answer,output_Mis,prediction
0,ACH-000105,Think step by step and decide in a single word...,Sensitive,"Sensitive. However, it's important to note tha...",Sensitive
1,ACH-000067,Think step by step and decide in a single word...,Sensitive,Sensitive (HDAC inhibitors like PANOBINOSTAT h...,Sensitive
2,ACH-000067,Think step by step and decide in a single word...,Sensitive,Sensitive (The IDH1 R132H mutation is associat...,Sensitive
3,ACH-000021,Think step by step and decide in a single word...,Resistant,Sensitive (HSP90 inhibitors have been shown to...,Sensitive
4,ACH-000105,Think step by step and decide in a single word...,Resistant,Sensitive (OXALIPLATIN is generally considered...,Sensitive
5,ACH-000078,Think step by step and decide in a single word...,Resistant,Sensitive. PI3K inhibitors like BUPARLISIB hav...,Sensitive
6,ACH-000006,Think step by step and decide in a single word...,Resistant,"The mutations of the cell line are ASXL1, FLT3...",Resistant
7,ACH-000045,Think step by step and decide in a single word...,Resistant,Resistant (FLT3 mutations are commonly associa...,Resistant
8,ACH-000067,Think step by step and decide in a single word...,Resistant,Resistant. The presence of mutations in genes ...,Resistant
9,ACH-000067,Think step by step and decide in a single word...,Sensitive,"Resistant (FR-180204 is an ERK1/2 inhibitor, b...",Resistant


In [39]:
for i in range(len(task1_samples)):
    print(task1_samples.prompt[i])
    print("Answer:", task1_samples.answer[i], "Prediction:", task1_samples.prediction[i])
    print(task1_samples.output_Mis[i])
    print()

Think step by step and decide in a single word reflecting the drug sensitivity of the drug on the cell line with given mutations: [Sensitive/Resistant], [Reasoning].
Drug and cell line mutations: 
The drug is YM-201636. The drug SMILES structure is C1COCCN1C2=NC(=NC3=C2OC4=C3C=CC=N4)C5=CC(=CC=C5)NC(=O)C6=CN=C(C=C6)N. Drug target is PIKFYVE. Drug target pathway is PI3K/MTOR signaling.
The mutations of the cell line are ERBB4, GNAS, NOTCH1, SDHA, SPEN, WT1.
Drug Sensitivity: ?
Answer: Sensitive Prediction: Sensitive
Sensitive. However, it's important to note that the presence of certain mutations does not necessarily mean the drug will be resistant, and further analysis such as IC50 values or in vitro assays may be necessary to determine the exact drug sensitivity.

Think step by step and decide in a single word reflecting the drug sensitivity of the drug on the cell line with given mutations: [Sensitive/Resistant], [Reasoning].
Drug and cell line mutations: 
The drug is PANOBINOSTAT. Th

In [40]:
# 5) Mistral *Zeroshot *Finetune Simple Task 
task1_data_08 = pd.read_csv("/data/yingfei/cancer_data/llm_prompt_data/llm_finetune_data/test_zeroshot_simple_mistral_finetune2.csv")
task1_data_08 = task1_data_08.loc[task1_data_08.output.notna()].copy()
print(task1_data_08.shape)
task1_data_08.head()

(200, 4)


Unnamed: 0,cell_id,prompt,answer,output
0,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Sensitive.JW-7-24-1 is a MAP3K1 inhibitor. The...
1,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Resistant. The cell line is resistant to NVP-T...
2,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Resistant. The cell line has multiple mutation...
3,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Sensitive.\nLINSITINIB is a MEK inhibitor. The...
4,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Resistant. The cell line has mutations in FLT3...


In [41]:
task1_data_08['prediction'] = task1_data_08['output'].apply(lambda x: "Check" if "sensitive" in x.lower() and "resistant" in x.lower() else \
                                                               "Sensitive" if "sensitive" in x.lower() else "Resistant" if "resistant" in x.lower() \
                                                               else "None")
for i in range(len(task1_data_08)):
    prediction_val = task1_data_08.loc[i, "prediction"]
    output = task1_data_08.loc[i, "output"]
    if prediction_val == "Check":
        if "sensitive or resistant" in output.lower():
            task1_data_08.loc[i, "prediction"] = "None"
        elif "Sensitive" in output:
            task1_data_08.loc[i, "prediction"] = "Sensitive"
        elif "Resistant" in output:
            task1_data_08.loc[i, "prediction"] = "Resistant"
task1_data_08.prediction.value_counts()

prediction
Resistant    114
Sensitive     86
Name: count, dtype: int64

In [43]:
task1_samples = sample_rows(task1_data_08)
task1_samples

Unnamed: 0,cell_id,prompt,answer,output,prediction
0,ACH-000045,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Sensitive.\nLINSITINIB is a reversible IGF1R i...,Sensitive
1,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Sensitive.JW-7-24-1 is a MAP3K1 inhibitor. The...,Sensitive
2,ACH-000021,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Sensitive. YM-201636 is a PI3K inhibitor. The ...,Sensitive
3,ACH-000021,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,"Sensitive.\nReasoning:\nSB-505124 is a MNK1, M...",Sensitive
4,ACH-000045,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Sensitive.\nReasoning: FTY-720 is an immunomod...,Sensitive
5,ACH-000105,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Sensitive.\nReasoning:\nThe drug MK-8776 is an...,Sensitive
6,ACH-000045,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,"Resistant.The mutations in ARFRP1, IRS2, NBN, ...",Resistant
7,ACH-000021,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Resistant. The cell line has multiple mutation...,Resistant
8,ACH-000067,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Resistant.\nReasoning:\nALVOCIDIB is a mitocho...,Resistant
9,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Resistant. The cell line has mutations in FLT3...,Resistant


In [45]:
for i in range(len(task1_samples)):
    print(task1_samples.prompt[i])
    print("Answer:", task1_samples.answer[i], "Prediction:", task1_samples.prediction[i])
    print(task1_samples.output[i])
    print()

<s>[INST] <<SYS>>Think step by step and decide in a single word reflecting the drug sensitivity of the drug on the cell line: [Sensitive/Resistant]<</SYS>>Think step by step and decide in a single word reflecting the drug sensitivity of the drug on the cell line with given mutations: [Sensitive/Resistant], [Reasoning].
Drug and cell line mutations: 
The drug is LINSITINIB. The drug SMILES structure is CC1(CC(C1)C2=NC(=C3N2C=CN=C3N)C4=CC5=C(C=C4)C=CC(=N5)C6=CC=CC=C6)O. Drug target is IGF1R. Drug target pathway is IGF1R signaling.
The mutations of the cell line are ARFRP1, IRS2, NBN, WHSC1L1.
Drug Sensitivity: ?[/INST]
Answer: Sensitive Prediction: Sensitive
Sensitive.
LINSITINIB is a reversible IGF1R inhibitor. IGF1R signaling pathway mutations and amplifications are linked to resistance to IGF1R inhibitors. However, the cell line in question does not have any IGF1R pathway mutations or amplifications. Therefore, the cell line is expected to be sensitive to LINSITINIB.

<s>[INST] <<SYS>

In [46]:
# 6) Mistral *Fewshot *Finetune Simple Task
task1_data_10 = pd.read_csv("/data/yingfei/cancer_data/llm_prompt_data/llm_finetune_data/test_zeroshot_simple_mistral_finetune_fewshot5.csv")
task1_data_10 = task1_data_10.loc[task1_data_10.output.notna()].copy()
print(task1_data_10.shape)
task1_data_10.head()

(200, 5)


Unnamed: 0,cell_id,prompt,answer,output,fewshot_idx
0,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Resistant. JW-7-24-1 is a FLT3 inhibitor. Howe...,"[672, 466, 948, 113, 161]"
1,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Resistant. NVP-TAE684 is an ALK inhibitor. The...,"[1280, 1396, 1703, 1718, 684]"
2,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Resistant. KIN001-244 is a PDK1 inhibitor. The...,"[1078, 1188, 1501, 1548, 1152]"
3,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Sensitive. Linsitinib is a tyrosine kinase inh...,"[61, 511, 1993, 1118, 1369]"
4,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Resistant. UNC0642 is a G9A/GLP inhibitor. The...,"[963, 135, 476, 1920, 1584]"


In [47]:
task1_data_10['prediction'] = task1_data_10['output'].apply(lambda x: "Check" if "sensitive" in x.lower() and "resistant" in x.lower() else \
                                                               "Sensitive" if "sensitive" in x.lower() else "Resistant" if "resistant" in x.lower() \
                                                               else "None")
for i in range(len(task1_data_10)):
    prediction_val = task1_data_10.loc[i, "prediction"]
    output = task1_data_10.loc[i, "output"]
    if prediction_val == "Check":
        if "Sensitive." in output[:20]:
            task1_data_10.loc[i, "prediction"] = "Sensitive"
        elif "Resistant" in output[:20]:
            task1_data_10.loc[i, "prediction"] = "Resistant"
task1_data_10.prediction.value_counts()

prediction
Resistant    138
Sensitive     62
Name: count, dtype: int64

In [48]:
task1_samples = sample_rows(task1_data_10)
task1_samples

Unnamed: 0,cell_id,prompt,answer,output,fewshot_idx,prediction
0,ACH-000045,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Sensitive. Inhibition of telomerase can preven...,"[739, 599, 351, 1058, 1094]",Sensitive
1,ACH-000105,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Sensitive. Fludarabine is an antimetabolite th...,"[347, 108, 590, 1531, 1727]",Sensitive
2,ACH-000107,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Sensitive. IMATINIB is known to inhibit multip...,"[1948, 1688, 1668, 1697, 427]",Sensitive
3,ACH-000105,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Sensitive. MK-8776 has been reported to be eff...,"[571, 96, 1128, 508, 1359]",Sensitive
4,ACH-000045,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Sensitive. FTY-720 is an agonist of sphingosin...,"[976, 260, 1265, 1366, 1544]",Sensitive
5,ACH-000045,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Sensitive. VELIPARIB is a PARP inhibitor. The ...,"[957, 757, 1875, 1416, 1339]",Sensitive
6,ACH-000105,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,"Resistant. NSC-319726 is a PI3K inhibitor, but...","[427, 442, 285, 694, 1018]",Resistant
7,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Resistant. NVP-TAE684 is an ALK inhibitor. The...,"[1280, 1396, 1703, 1718, 684]",Resistant
8,ACH-000105,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Resistant. DOXORUBICIN is an anthracycline dru...,"[1378, 837, 1928, 120, 631]",Resistant
9,ACH-000021,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,"Resistant. CAP-232, TT-232, TLN-232 are small ...","[309, 579, 363, 1542, 157]",Resistant


In [49]:
for i in range(len(task1_samples)):
    print(task1_samples.prompt[i])
    print("Answer:", task1_samples.answer[i], "Prediction:", task1_samples.prediction[i])
    print(task1_samples.output[i])
    print()

<s>[INST] <<SYS>>Think step by step and decide in a single word reflecting the drug sensitivity of the drug on the cell line: [Sensitive/Resistant]
Example 1:

Drug and cell line mutations: 
The drug is IMATINIB. The drug SMILES structure is CC1=C(C=C(C=C1)NC(=O)C2=CC=C(C=C2)CN3CCN(CC3)C)NC4=NC=CC(=N4)C5=CN=CC=C5. Drug target is ABL, KIT, PDGFR. Drug target pathway is Other, kinases.
The mutations of the cell line are CDK8, CREBBP, ERRFI1, KIT, MLL2, NF1, RET.
Drug Sensitivity: Sensitive

Example 2:

Drug and cell line mutations: 
The drug is LENALIDOMIDE. The drug SMILES structure is C1CC(=O)NC(=O)C1N2CC3=C(C2=O)C=CC=C3N. Drug target is CRBN. Drug target pathway is Protein stability and degradation.
The mutations of the cell line are CREBBP, KEL, MLL, NF1, TP53, WHSC1L1.
Drug Sensitivity: Sensitive

Example 3:

Drug and cell line mutations: 
The drug is YM-201636. The drug SMILES structure is C1COCCN1C2=NC(=NC3=C2OC4=C3C=CC=N4)C5=CC(=CC=C5)NC(=O)C6=CN=C(C=C6)N. Drug target is PIKFYVE.

In [50]:
# 7) Mistral *Fewshot2 *Finetune Simple Task /// comparison of (5) (6) (7)
task1_data_11 = pd.read_csv("/data/yingfei/cancer_data/llm_prompt_data/llm_finetune_data/test_zeroshot_simple_mistral_finetune_fewshot5_2.csv")
task1_data_11 = task1_data_11.loc[task1_data_11.output.notna()].copy()
print(task1_data_11.shape)
task1_data_11.head()

(200, 5)


Unnamed: 0,cell_id,prompt,answer,output,fewshot_idx
0,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Sensitive.,"[2, 0, 3, 1, 6]"
1,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Resistant.The drug NVP-TAE684 is an ALK inhibi...,"[0, 2, 3, 1, 4]"
2,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Resistant.,"[5, 11, 1, 6, 0]"
3,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Sensitive.LINSITINIB has been shown to have an...,"[2, 4, 1, 0, 5]"
4,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Resistant.UNC0642 is a HDAC inhibitor and the ...,"[0, 1, 2, 370, 371]"


In [51]:
task1_data_11['prediction'] = task1_data_11['output'].apply(lambda x: "Check" if "sensitive" in x.lower() and "resistant" in x.lower() else \
                                                               "Sensitive" if "sensitive" in x.lower() else "Resistant" if "resistant" in x.lower() \
                                                               else "None")
for i in range(len(task1_data_11)):
    prediction_val = task1_data_11.loc[i, "prediction"]
    output = task1_data_11.loc[i, "output"]
    if prediction_val == "Check":
        if "Sensitive." in output[:20]:
            task1_data_11.loc[i, "prediction"] = "Sensitive"
        elif "Resistant" in output[:20]:
            task1_data_11.loc[i, "prediction"] = "Resistant"
task1_data_11.prediction.value_counts()

prediction
Sensitive    101
Resistant     99
Name: count, dtype: int64

In [52]:
task1_samples = sample_rows(task1_data_11)
task1_samples

Unnamed: 0,cell_id,prompt,answer,output,fewshot_idx,prediction
0,ACH-000045,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Sensitive.,"[4, 6, 2, 29, 32]",Sensitive
1,ACH-000021,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Sensitive.,"[15, 25, 17, 20, 4]",Sensitive
2,ACH-000078,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Sensitive.\n\nExplanation: PAZOPANIB is a mult...,"[0, 25, 16, 2, 21]",Sensitive
3,ACH-000078,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,"Sensitive.FH-535 is an unknown drug, but the c...","[4, 0, 1, 2, 3]",Sensitive
4,ACH-000021,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Sensitive.\n\nExplanation: The cell line does ...,"[0, 811, 812, 813, 814]",Sensitive
5,ACH-000045,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,"Sensitive.Resistant for some cell lines, Sensi...","[0, 1, 240, 241, 242]",Sensitive
6,ACH-000105,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Resistant.The drug MK-8776 is an inhibitor of ...,"[0, 2, 1, 3, 68]",Resistant
7,ACH-000006,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Resistant.The drug NVP-TAE684 is an ALK inhibi...,"[0, 2, 3, 1, 4]",Resistant
8,ACH-000105,<s>[INST] <<SYS>>Think step by step and decide...,Resistant,Resistant.The mutations in ERBB4 and NOTCH1 ma...,"[1, 0, 2, 68, 69]",Resistant
9,ACH-000078,<s>[INST] <<SYS>>Think step by step and decide...,Sensitive,Resistant.The drug WIKI4 is an inhibitor of TN...,"[0, 1, 2, 3, 4]",Resistant


In [54]:
for i in range(len(task1_samples)):
    print(task1_samples.prompt[i])
    print("Answer:", task1_samples.answer[i], "Prediction:", task1_samples.prediction[i])
    print(task1_samples.output[i])
    print()

<s>[INST] <<SYS>>Think step by step and decide in a single word reflecting the drug sensitivity of the drug on the cell line: [Sensitive/Resistant]<</SYS>>
Drug and cell line mutations: 
The drug is WYE-125132. The drug SMILES structure is CNC(=O)NC1=CC=C(C=C1)C2=NC3=C(C=NN3C4CCC5(CC4)OCCO5)C(=N2)N6CC7CCC(C6)O7. Drug target is mTOR. Drug target pathway is PI3K/MTOR signaling.
The mutations of the cell line are ARFRP1, DIS3, MLL2, NTRK2, P2RY8, SPEN, TEK, TP53.
Drug Sensitivity: ?[/INST]Sensitive</s><s>[INST] <<SYS>>Think step by step and decide in a single word reflecting the drug sensitivity of the drug on the cell line: [Sensitive/Resistant]<</SYS>>
Drug and cell line mutations: 
The drug is WYE-125132. The drug SMILES structure is CNC(=O)NC1=CC=C(C=C1)C2=NC3=C(C=NN3C4CCC5(CC4)OCCO5)C(=N2)N6CC7CCC(C6)O7. Drug target is mTOR. Drug target pathway is PI3K/MTOR signaling.
The mutations of the cell line are ARID1A, CARD11, CIC, DDR1, MTOR, NBN, RB1, TP53.
Drug Sensitivity: ?[/INST]Sensiti