## Path Setup
Add the parent directory to the Python path so that the notebook can find the modules

In [1]:
import sys
import os

cwd = os.getcwd() # Current working directory
dirname = os.path.dirname(cwd) # Parent directory
print(cwd)
print(dirname)
sys.path.append(dirname)# Add the parent directory to the Python path
print(sys.path)

/Users/rudi/Documents/GitHub/agent_evaluation/notebooks
/Users/rudi/Documents/GitHub/agent_evaluation
['/Users/rudi/Documents/GitHub/agent_evaluation/notebooks', '/Users/rudi/anaconda3/envs/dengue/lib/python311.zip', '/Users/rudi/anaconda3/envs/dengue/lib/python3.11', '/Users/rudi/anaconda3/envs/dengue/lib/python3.11/lib-dynload', '', '/Users/rudi/anaconda3/envs/dengue/lib/python3.11/site-packages', '/Users/rudi/Documents/GitHub/agent_evaluation']


## Get a Hierarchy


In [2]:
from agent_evaluation.hierarchy import Hierarchy
import json
import ndex2 
from ndex2.cx2 import RawCX2NetworkFactory

# Create NDEx2 python client
client = ndex2.client.Ndex2()

# Create CX2Network factory
factory = RawCX2NetworkFactory()

# Download BioGRID: Protein-Protein Interactions (SARS-CoV) from NDEx
# https://www.ndexbio.org/viewer/networks/669f30a3-cee6-11ea-aaef-0ac135e8bacf
# client_resp = client.get_network_as_cx2_stream('669f30a3-cee6-11ea-aaef-0ac135e8bacf')

# Dengue string interactome network c223d6db-b0e2-11ee-8a13-005056ae23aa
client_resp = client.get_network_as_cx2_stream('c223d6db-b0e2-11ee-8a13-005056ae23aa')

# Convert downloaded interactome network to CX2Network object
interactome = factory.get_cx2network(json.loads(client_resp.content))

# Dengue hierarchy
# https://www.ndexbio.org/viewer/networks/59bbb9f1-e029-11ee-9621-005056ae23aa
client_resp = client.get_network_as_cx2_stream('59bbb9f1-e029-11ee-9621-005056ae23aa')

# Convert downloaded interactome network to CX2Network object
hierarchy = factory.get_cx2network(json.loads(client_resp.content))

# Display information about the hierarchy network and output 1st 100 characters of CX2
print('Name: ' + hierarchy.get_name())
print('Number of nodes: ' + str(len(hierarchy.get_nodes())))
print('Number of nodes: ' + str(len(hierarchy.get_edges())))

# Display information about the interactome network 
print('Name: ' + interactome.get_name())
print('Number of nodes: ' + str(len(interactome.get_nodes())))
print('Number of nodes: ' + str(len(interactome.get_edges())))


Name: Dengue model - hidef string 12.0 0.7 (GPT-4 annotated) - L2R
Number of nodes: 203
Number of nodes: 249
Name: dengue string 12.0 0.7
Number of nodes: 1375
Number of nodes: 2792


## Get Datasets

In [3]:
dengue_hierarchy = Hierarchy(hierarchy, interactome)
print(dengue_hierarchy.get_experiment_description())
datasets = dengue_hierarchy.get_datasets(member_attributes=["name", "DV3_24h-Mock_24h"],
                                         filter={"max_size": 6})[1:2:]  # [32:33:]   [1:33:31]       
for dataset in datasets:
    print(dataset.data)

None
[{'name': 'SLC30A3'}, {'name': 'SLC30A9'}, {'name': 'SLC30A4'}, {'name': 'TMC6'}, {'name': 'SLC39A8'}, {'name': 'SLC30A1', 'DV3_24h-Mock_24h': -2.114594111}]


## Analyst Agents

In [4]:
from agent_evaluation.analyst import Analyst
from agent_evaluation.llm import OpenAI_LLM

gpt3_5 = OpenAI_LLM("gpt-3.5-turbo-1106")
gpt4 = OpenAI_LLM("gpt-4-0125-preview")

Model: gpt-3.5-turbo-1106, Temperature: 0, Max Tokens: 2048, Seed: 42
Model: gpt-4-0125-preview, Temperature: 0, Max Tokens: 2048, Seed: 42


In [5]:
# Analist 1 > Jane (GPT-3.5-turbo-1106)

analyst_1_context = """
You are a helpful analyst of genomic, proteomic, and other biological data. 
"""

analyst_1_prompt_template = """ 
The provided proteomics "dataset" includes interacting proteins and the measurements of their differential abundance as a ratio between treated and non-treated samples, where the treatment is the infection of human cells with Dengue virus. 
Not all proteins in the dataset have differential abundance measurements.

The dataset has 2 columns with the following headers: name, DV3_24h-Mock_24h. 
The first column contains the protein names and the last columns contains the abundance measurements.
Please note that abundance measurements <0 reflect a "decreased abundance" while measurements >0 indicate an "increased abundance".

Your task is to leverage this dataset to analyze a subset of interacting proteins that are defined as “proteins of interest".

First, determine what proteins of interest show a differential abundance recorded in the dataset. 
Then, based on this information and on the known functions of all other proteins of interest, 
I want you to generate a hypothesis describing the mechanisms that may contribute to the disease state 
and could potentially be targeted by drug therapies.

Your hypothesis should meet the following criteria:
1) Include one or more molecular mechanism involving one or more proteins of interest
2) Be plausible - grounded in known molecular functions and interactions
3) Be novel - proposing mechanisms either not known or not known to be relevant to the experimental context
4) Be actionable - can be validated with relatively low-cost experimental techniques

When presenting your results, please adhere to the following guidelines:

- Avoid including any code.
- Do not describe the analytical steps you took.
- Do not merely list the proteins of interest, regardless whether they show a differential abundance recorded in the dataset or not.
- Build your hypotheses taking into consideration the interplay among all proteins of interest, not only those that show a differential abundance in the dataset.

- Your output should consist solely of the identified proteins of interest with changed abundance levels, and the hypothesis you propose.

Here is the set of proteins of interest: 
{data}
"""

analyst_1 = Analyst(gpt3_5, analyst_1_context, analyst_1_prompt_template, "Jack", "The first analyst")


In [6]:
# Analist 2 > John (GPT-4-0125-preview)

analyst_2_context = analyst_1_context

analyst_2_prompt_template = analyst_1_prompt_template


analyst_2 = Analyst(gpt4, analyst_2_context, analyst_2_prompt_template, "John", "The second analyst")

## The TestPlan

In [7]:
from agent_evaluation.test import TestPlan

test_plan = TestPlan(analysts=[analyst_1, analyst_2], datasets=datasets)


## Run the Test

OpenAi python package cannot be > 0.28.

- https://github.com/openai/openai-python

- https://github.com/openai/openai-python/discussions/742

If Genai package is used, Openai must be 0.27.x 

In [8]:
from agent_evaluation.test import Test

test = Test(test_plan)
test.run()

Generating hypothesis by Jack on [{'name': 'SLC30A3'}, {'name': 'SLC30A9'}, {'name': 'SLC30A4'}, {'name': 'TMC6'}, {'name': 'SLC39A8'}, {'name': 'SLC30A1', 'DV3_24h-Mock_24h': -2.114594111}]
Generating hypothesis by John on [{'name': 'SLC30A3'}, {'name': 'SLC30A9'}, {'name': 'SLC30A4'}, {'name': 'TMC6'}, {'name': 'SLC39A8'}, {'name': 'SLC30A1', 'DV3_24h-Mock_24h': -2.114594111}]


In [9]:
for hypothesis in test.hypotheses:
    print(f"{hypothesis.analyst.name} ({hypothesis.analyst.llm.model_name}):")
    print(hypothesis.description)
    print("---")

Jack (gpt-3.5-turbo-1106):
('The proteins of interest with differential abundance recorded in the dataset are as follows:\n\n1. SLC30A1 (decreased abundance)\n\nBased on the known functions of the proteins of interest and their interactions, a hypothesis can be generated to describe a potential mechanism contributing to the disease state and potential drug targets. \n\nHypothesis:\nThe differential abundance of SLC30A1, a zinc transporter, suggests a potential dysregulation of zinc homeostasis in response to Dengue virus infection. This dysregulation may lead to altered immune responses and inflammatory processes, contributing to the pathogenesis of Dengue fever. SLC30A1 is known to play a crucial role in regulating intracellular zinc levels, which in turn modulates the activity of zinc-dependent proteins involved in immune signaling and antiviral defense. The decreased abundance of SLC30A1 may lead to intracellular zinc depletion, impacting the function of zinc-dependent proteins invo

In [10]:
# Check the number of hypotheses generated (should be just 2, 1 by Jane and 1 by John)

len(test.hypotheses)

2

## Reviewers

In [11]:
from agent_evaluation.reviewer import Reviewer


# Reviewer 1 > James Watson (GPT-3.5-turbo-1106)

reviewer_1_context = "You are a full professor with extensive knowledge of molecular mechanisms in biology and human diseases"

reviewer_1_prompt_template = """
Starting from an experimental dataset and a list of proteins of intertest, our analysts have generated 2 hypotheses 
that might explain the observed data upon infection of a human cell line with the Dengue virus.

Your task is to carefully review the 2 hypotheses provided, and choose the best one based on the following evaluation criteria:

1) Mechanistic - The hypothesis includes one or more molecular mechanisms involving one or more proteins of interest.
2) Plausible - The hypothesis is plausible and grounded in known molecular functions and interactions.
3) Novel - The hypothesis proposes mechanisms either not known or not known to be relevant to the experimental context.
4) Actionable - The hypothesis can be validated with relatively simple, low-cost experimental techniques".

You must execute your evaluation using only the information provided in the 2 hypotheses.
Assign a rating between 1 (poor) and 5 (excellent) to each evaluation criteria according to your judgement,
but don't include these ratings in your response; instead, average the ratings and calculate the "Overall Score".

When presenting your output, only include the following info:
1) The "Overall Score
2) Which analyst's hypothesis you deem to be the best one ({analyst_a} or {analyst_b}).
3) What are the reasons that dictated your decision.
4) If the 2 hypotheses are of equivalent quality, don't make a choice and provide a brief explanation supporting your decision.

Here are the hypotheses:
{analyst_a}: {hypothesis_a}
{analyst_b}: {hypothesis_b}
"""

reviewer_1 = Reviewer(gpt3_5, reviewer_1_context, reviewer_1_prompt_template, "James Watson", "The first reviewer")


In [12]:
# Reviewer 2 > Francis Crick (GPT-4-0125-preview)

reviewer_2_context = reviewer_1_context
reviewer_2_prompt_template = reviewer_1_prompt_template

reviewer_2 = Reviewer(gpt4, reviewer_2_context, reviewer_2_prompt_template, "Francis Crick", "The second reviewer")

## The ReviewPlan

In [13]:
from agent_evaluation.review import ReviewPlan

review_plan = ReviewPlan(reviewers=[reviewer_1, reviewer_2], test=test)


## Run the Review

In [14]:
from agent_evaluation.review import Review

review = Review(review_plan)    
review.run()


Generating A-B comparison by James Watson...
Generating A-B comparison by Francis Crick...


In [15]:
for comparison in review.comparisons:
    print(f"{comparison.reviewer.name} ({comparison.reviewer.llm.model_name})")
    print(comparison.comment)
    print("----")

James Watson (gpt-3.5-turbo-1106)
("Based on the evaluation criteria, the overall score for each hypothesis is as follows:\n\nJack's Hypothesis:\n1) Mechanistic: 5\n2) Plausible: 5\n3) Novel: 5\n4) Actionable: 5\n\nJohn's Hypothesis:\n1) Mechanistic: 4\n2) Plausible: 4\n3) Novel: 4\n4) Actionable: 5\n\nBased on the evaluation, Jack's hypothesis scores slightly higher in all criteria compared to John's hypothesis. Both hypotheses are grounded in known molecular functions and interactions, propose novel mechanisms not previously known to be relevant to the experimental context, and are actionable with relatively low-cost experimental techniques. However, Jack's hypothesis provides a more detailed and focused mechanistic explanation, which contributes to its higher score in the evaluation criteria.\n\nTherefore, based on the evaluation, Jack's hypothesis is deemed to be the best one.", 'fp_ad2b9c6e11', 1449)
----
Francis Crick (gpt-4-0125-preview)
("Overall Score: 4.5\n\nBest Hypothesis: 

## Creating a mock analyst, mock hypothesis and comparing to a real one

to do: Create a function to wrap up all the below

In [16]:
#First, create a mock analyst

mock_context = "You are a clueless highschool student with biology 101 level education"
mock_prompt_template = analyst_1_prompt_template
mock_name = 'Beavis'
    
mock_analyst = Analyst(llm=None, context=mock_context, prompt_template=mock_prompt_template, name=mock_name, label="A dumb analyst")

print(f'Created mock analyst named "{mock_analyst.name}"')

Created mock analyst named "Beavis"


In [17]:
# Text of the mock analysis(es) generated by the mock analyst

#zinc
mock_description_1 = '''
The decreased abundance of SLC30A1, a zinc transporter, during Dengue virus infection does not appear to disrupt zinc homeostasis within infected cells. Despite the critical roles of zinc in cellular processes including immune function, oxidative stress response, and enzyme function, the reduction in SLC30A1 levels is unlikely to significantly impact intracellular zinc levels. This is due to the presence of numerous compensatory mechanisms and other zinc transporters (SLC30A3, SLC30A4, SLC30A9, and SLC39A8) that can efficiently maintain zinc homeostasis independent of SLC30A1 activity.
Moreover, recent findings suggest that Dengue virus may encode for alternative zinc-binding proteins that can substitute the function of downregulated human zinc transporters, thereby stabilizing zinc levels within the host cells. Additionally, the involvement of TMC6, while linked to cellular ion homeostasis, may play a negligible role in the context of zinc transport and homeostasis altered by Dengue virus infection. Experimental evidence indicates that TMC6’s activity is largely redundant in the presence of viral-encoded ion regulatory mechanisms.

Therefore, the decreased abundance of SLC30A1 during Dengue virus infection does not significantly affect zinc homeostasis, nor does it impair cellular antiviral responses. Rather, the viral strategies may include the upregulation of alternative pathways that compensate for or bypass the decreased activity of SLC30A1, thus negating any potential disruption in zinc transport and homeostasis mechanisms.

This hypothesis can be explored by:
1. Conducting comparative analyses of intracellular zinc levels in Dengue virus-infected cells versus mock-infected cells to determine the redundancy of SLC30A1 in maintaining zinc homeostasis.
2. Investigating the expression and functionality of viral proteins that might mimic or compensate for the roles of human zinc transporters in the context of Dengue virus infection.
3. Evaluating the role of TMC6 in Dengue virus-infected cells, with a focus on understanding its potential redundancy when viral mechanisms are active.
4. Performing studies to determine if Dengue virus replication and pathogenesis are unaffected by modifications in zinc supplementation or chelation, thereby demonstrating the virus’s ability to maintain ion homeostasis independently of host zinc transport systems.

These experiments could elucidate alternative mechanisms of Dengue virus pathogenesis and reveal viral adaptive strategies that circumvent disruptions in host cellular mechanisms, potentially providing new targets for therapeutic intervention that focus on viral rather than host processes.
'''

#SP110
mock_description_2 = '''
The increased abundance of Increased abundance of SP110 in response to Dengue virus (DV) infection indicates a possible involvement of membrane vescicles in contributing to the disease state.\n\nSP110 is a plasma membrane protein involved in protein sorting and sensing of external stimuli. Upon Dengue infection, SP110 is segregated to cholesterol-rich, detergent-resistant microdomains (DRMs), also know as lipid rafts. The Dengue virus can attach itself to these rafts thanks to the interaction between its envelop protein E1 and cholesterol molecules and use the cell membrane as highway to navigate the cell and spread itself to all subcellular structures, thus hijacking essential processes such as energy production, DNA replication and immune response.
Besides these effects, all these lipid rafts navigating the plasma membrane can also cause macromolecular traffic jams as well as enzymatic accidents that can further exacerbate the disease state.\n\nTo validate this hypothesis, we suggest the following low-cost experimental approaches:\n1. **DRMs Fractionation**: Using a variety of detergents and sucrose gradients, lipid rafts can be purified and subject to electron microscopy to verify whether Dengue virus is attached. \n2. **Cholesterol depletion**: By treating infected cells with methyl-β-cyclodextrin (MBCD) for 48h, we can chemically eliminate all cholesterol from the cell and thus preventing diffusion of the Dengue virus and blocking disease progression. depleting infected cells of all their cholesterol content, 
with and without the knockdown of SP110. \n3. **DNA Damage Assays**: Perform assays to assess DNA damage (e.g., comet assay, γ-H2AX foci formation) in DV-infected cells with altered expression of SP110. This would help to elucidate the role of SP110 in DNA damage response during DV infection.
'''

In [18]:
# Now, we generate a mock hypothesis object

from agent_evaluation.hypothesis import Hypothesis

mock_hypothesis_1 = Hypothesis(dataset, mock_analyst, mock_description_1)
# mock_hypothesis_2 = Hypothesis(dataset, mock_analyst, mock_description_2)

In [19]:
# Next, we make a mock test plan where we can specify what analysts and datasets we want to use

mock_test_plan = TestPlan(analysts=[analyst_2, mock_analyst], datasets=datasets)

In [20]:
#Then we make a mock Test instance and specify what hypothesis we want to use...
# To compare mock with John:

mock_test = Test(mock_test_plan)
mock_test.hypotheses=[mock_hypothesis_1, test.hypotheses[1]]

# To compare mock with Jack:
# mock_test.hypotheses=[mock_hypothesis_1, test.hypotheses[0]]

In [21]:
#Create mock review plan

mock_review_plan = ReviewPlan(mock_test,reviewers=[reviewer_1, reviewer_2])

In [22]:
# Create a mock review instance

mock_review = Review(mock_review_plan)

In [23]:
# In mock_review.run(), I can specify 'comparison_order' ('AB', 'BA', 'both') and 'reverse' (bool) arguments if I like

mock_review.run() 

Generating A-B comparison by James Watson...
Generating A-B comparison by Francis Crick...


In [24]:
for comparison in mock_review.comparisons:
    print(f"{comparison.reviewer.name} ({comparison.reviewer.llm.model_name})")
    print(comparison.comment)
    print("----")

James Watson (gpt-3.5-turbo-1106)
("Based on the evaluation criteria provided, the overall score for each hypothesis is as follows:\n\nBeavis:\n1) Mechanistic: 4\n2) Plausible: 5\n3) Novel: 4\n4) Actionable: 5\n\nJohn:\n1) Mechanistic: 3\n2) Plausible: 4\n3) Novel: 3\n4) Actionable: 4\n\nBased on the evaluation, the hypothesis presented by Beavis is deemed to be the best one. This decision is based on the fact that Beavis' hypothesis provides a more comprehensive and plausible explanation for the observed data. It takes into account the compensatory mechanisms and viral strategies that may stabilize zinc levels within host cells, as well as the potential redundancy of TMC6 in the context of zinc transport and homeostasis altered by Dengue virus infection. Additionally, Beavis' hypothesis outlines a set of experiments that can be conducted to validate the proposed mechanisms, which are relatively simple and low-cost.\n\nIf the 2 hypotheses were of equivalent quality, a brief explanation

In [25]:
mock_review.run(reverse=True)

In [26]:
for comparison in mock_review.comparisons:
    print(f"{comparison.reviewer.name} ({comparison.reviewer.llm.model_name})")
    print(comparison.comment)
    print("----")

James Watson (gpt-3.5-turbo-1106)
("Based on the evaluation criteria provided, the overall score for each hypothesis is as follows:\n\nBeavis:\n1) Mechanistic: 4\n2) Plausible: 5\n3) Novel: 4\n4) Actionable: 5\n\nJohn:\n1) Mechanistic: 3\n2) Plausible: 4\n3) Novel: 3\n4) Actionable: 4\n\nBased on the evaluation, the hypothesis presented by Beavis is deemed to be the best one. This decision is based on the fact that Beavis' hypothesis provides a more comprehensive and plausible explanation for the observed data. It takes into account the compensatory mechanisms and viral strategies that may stabilize zinc levels within host cells, as well as the potential redundancy of TMC6 in the context of zinc transport and homeostasis altered by Dengue virus infection. Additionally, Beavis' hypothesis outlines a set of experiments that can be conducted to validate the proposed mechanisms, which are relatively simple and low-cost.\n\nIf the 2 hypotheses were of equivalent quality, a brief explanation

In [27]:
mock_review.run(comparison_order = 'BA')

Generating B-A comparison by James Watson...
Generating B-A comparison by Francis Crick...


In [28]:
for comparison in mock_review.comparisons:
    print(f"{comparison.reviewer.name} ({comparison.reviewer.llm.model_name})")
    print(comparison.comment)
    print("----")

James Watson (gpt-3.5-turbo-1106)
("Based on the evaluation criteria provided, the overall score for each hypothesis is as follows:\n\nBeavis:\n1) Mechanistic: 4\n2) Plausible: 5\n3) Novel: 4\n4) Actionable: 5\n\nJohn:\n1) Mechanistic: 3\n2) Plausible: 4\n3) Novel: 3\n4) Actionable: 4\n\nBased on the evaluation, the hypothesis presented by Beavis is deemed to be the best one. This decision is based on the fact that Beavis' hypothesis provides a more comprehensive and plausible explanation for the observed data. It takes into account the compensatory mechanisms and viral strategies that may stabilize zinc levels within host cells, as well as the potential redundancy of TMC6 in the context of zinc transport and homeostasis altered by Dengue virus infection. Additionally, Beavis' hypothesis outlines a set of experiments that can be conducted to validate the proposed mechanisms, which are relatively simple and low-cost.\n\nIf the 2 hypotheses were of equivalent quality, a brief explanation