# BEL Graph RAG Example

This notebook demonstrates how to use RAG (Retrieval Augmented Generation) with BEL graphs to generate hypotheses about experimental observations.

In [None]:
import os
import json
from pathlib import Path
import sys

# Add parent directory to path to import textToKnowledgeGraph
sys.path.append('..')

from textToKnowledgeGraph.main import process_text_to_bel
from textToKnowledgeGraph.convert_to_cx2 import convert_to_cx2
from textToKnowledgeGraph.transform_bel_statements import merge_cx2_networks

## Setup Prompt Template

We'll create a template that combines experimental observations with BEL knowledge for hypothesis generation.

In [None]:
PROMPT_TEMPLATE = """
You are an expert biologist tasked with analyzing experimental data and proposing mechanistic hypotheses.

EXPERIMENT DESCRIPTION:
{experiment}

MEASUREMENTS:
{measurements}

BEL FORMAT GUIDELINES:
BEL is a language for representing biological knowledge in a computable form. Key aspects:
- Entities are represented with functions like p() for proteins, a() for abundances, bp() for biological processes
- Relationships between entities use operators like increases, decreases, directlyIncreases, association
- Entities must use standard namespaces (HGNC for human genes, CHEBI for chemicals, etc.)

EXISTING KNOWLEDGE (Optional):
{knowledge_graph}

TASK:
Propose a hypothesis explaining the experimental observations as a BEL graph. Your hypothesis should:
1. Be consistent with the experimental data
2. Use proper BEL syntax and namespaces
3. Focus on mechanistic relationships
4. Incorporate existing knowledge when provided

Output your hypothesis as a list of BEL statements that form a connected graph.
"""

## Example Experiment

Let's use an example experiment studying the effects of oxidative stress on cell death pathways.

In [None]:
example_experiment = {
    "experiment": """
    Human endothelial cells were treated with hydrogen peroxide (H2O2) at varying concentrations 
    (0, 100, 200, 500 μM) for 24 hours. Cell viability, apoptosis markers, and oxidative stress 
    indicators were measured.
    """,
    
    "measurements": """
    1. Cell viability decreased dose-dependently with H2O2 treatment
    2. Caspase-3 activity increased 3-fold at 200 μM H2O2
    3. Intracellular ROS levels increased 5-fold at 200 μM H2O2
    4. Bcl-2 protein levels decreased by 50% at 200 μM H2O2
    5. Cytochrome c was detected in cytoplasmic fractions at 200 μM H2O2
    """,
    
    "knowledge_graph": """
    p(HGNC:BCL2) decreases bp(GOBP:"apoptotic process")
    p(HGNC:CASP3) increases bp(GOBP:"apoptotic process")
    a(CHEBI:"hydrogen peroxide") increases bp(GOBP:"oxidative stress")
    """
}

## Query Function

Create a function to process the experiment and generate a hypothesis as a BEL graph.

In [None]:
def generate_hypothesis(experiment_desc, measurements, knowledge_graph=None):
    """Generate a hypothesis as a BEL graph based on experimental observations.
    
    Args:
        experiment_desc (str): Description of the experimental setup
        measurements (str): Observed experimental measurements
        knowledge_graph (str, optional): Existing knowledge in BEL format
        
    Returns:
        str: Path to the saved CX2 file containing the hypothesis graph
    """
    # Format the prompt
    prompt = PROMPT_TEMPLATE.format(
        experiment=experiment_desc,
        measurements=measurements,
        knowledge_graph=knowledge_graph if knowledge_graph else "No prior knowledge provided"
    )
    
    # Process text to BEL using the core library function
    bel_statements = process_text_to_bel(prompt)
    
    # Convert to CX2 format
    output_file = "hypothesis_graph.cx2"
    convert_to_cx2(bel_statements, output_file)
    
    return output_file

## Run Example Query

In [None]:
# Generate hypothesis for our example experiment
hypothesis_file = generate_hypothesis(
    example_experiment["experiment"],
    example_experiment["measurements"],
    example_experiment["knowledge_graph"]
)

## Literature Integration

Define some relevant papers to incorporate into our knowledge graph.

In [None]:
relevant_papers = {
    "paper1": {
        "title": "Hydrogen peroxide-induced cell death in cultured human endothelial cells",
        "citation": "Cell Death Differ. 2000 Apr;7(4):456-64",
        "pmcid": "PMC3898398"
    },
    "paper2": {
        "title": "Bcl-2 proteins in the control of apoptosis: from mechanistic insights to therapeutic opportunities",
        "citation": "Biochem Soc Trans. 2018 Oct 19;46(5):1147-1158",
        "pmcid": "PMC6177401"
    },
    "paper3": {
        "title": "The role of cytochrome c in caspase activation in Drosophila melanogaster cells",
        "citation": "J Cell Biol. 2004 Mar 15;164(6):1035-44",
        "pmcid": "PMC7324065"
    }
}

## Merge Knowledge Graphs

Function to merge multiple CX2 knowledge graphs.

In [None]:
def merge_knowledge_graphs(graph_files):
    """Merge multiple CX2 knowledge graphs into one.
    
    Args:
        graph_files (list): List of paths to CX2 files to merge
        
    Returns:
        str: Path to merged CX2 file
    """
    output_file = "merged_knowledge.cx2"
    merge_cx2_networks(graph_files, output_file)
    return output_file

## Process Papers and Update Hypothesis

In [None]:
# Process each paper to generate knowledge graphs
paper_graphs = []
for paper_id, paper in relevant_papers.items():
    # Process paper using textToKnowledgeGraph
    output_file = f"{paper_id}_graph.cx2"
    process_text_to_bel(paper["pmcid"], output_file=output_file)
    paper_graphs.append(output_file)

# Merge all paper knowledge graphs
merged_knowledge = merge_knowledge_graphs(paper_graphs)

# Generate new hypothesis incorporating literature knowledge
with open(merged_knowledge, 'r') as f:
    literature_knowledge = f.read()

final_hypothesis = generate_hypothesis(
    example_experiment["experiment"],
    example_experiment["measurements"],
    literature_knowledge
)