# BEL Graph RAG Example for paper usecases



In [2]:
# %pip install ndex2 langchain

In [1]:
%pip install texttoknowledgegraph==0.4.0 -q

[0mNote: you may need to restart the kernel to use updated packages.


In [2]:
import os
import json
from typing import List, Dict, Any
from pathlib import Path
import sys
from ndex2 import Ndex2
import ndex2
from ndex2.cx2 import CX2Network
from dotenv import load_dotenv
from ndex2.cx2 import RawCX2NetworkFactory
load_dotenv()



# Get NDEx account and password from environment variables
OPENAI_API_KEY   = os.getenv("OPENAI_API_KEY")
NDEX_ACCOUNT     = os.getenv("NDEX_ACCOUNT")
NDEX_PASSWORD    = os.getenv("NDEX_PASSWORD")
assert all([OPENAI_API_KEY, NDEX_ACCOUNT, NDEX_PASSWORD]), "Missing creds"

# Connect to NDEx using the provided credentials
ndex_client = Ndex2(username=NDEX_ACCOUNT, password=NDEX_PASSWORD)



## Base Functions and Prompt


In [3]:
from openai import OpenAI
from typing import List

# Initialize OpenAI client
client = OpenAI(api_key=OPENAI_API_KEY)

def query_llm(prompt: str) -> str:
    """
    Query OpenAI's GPT model.
    """
    try:
        completion = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "user", "content": prompt}
            ],
            temperature=0.7,
            max_tokens=1000
        )
        return completion.choices[0].message.content.strip()
    except Exception as e:
        print(f"Error querying OpenAI: {str(e)}")
        return ""


Try the option of letting the LLM use its knowledge and the option where we specifically tell it to use only the knowledge graph.

- IMPORTANT NOTE: ONLY USE THE INFORMATION PROVIDED IN THE KNOWLEDGE GRAPH TO ANSWER THE QUESTION. DO NOT MAKE UP ANY INFORMATION OR USE YOUR OWN KNOWLEDGE.

- IMPORTANT NOTE: YOU CAN MAKE USE OF YOUR KNOWLEDGE OF BIOLOGY AND THE PROVIDED KNOWLEDGE GRAPH TO ANSWER THE QUESTION.


In [11]:
# Base prompt for LLM query
PROMPT_TEMPLATE = """
You are playing the role of an expert cancer biologist.

QUESTION: How does metabolism affect dna damage response?

IMPORTANT NOTE: ONLY USE THE INFORMATION PROVIDED IN THE KNOWLEDGE GRAPH TO ANSWER THE QUESTION. DO NOT MAKE UP ANY INFORMATION OR USE YOUR OWN KNOWLEDGE.

TASK:
1. Review the following gene/protein set and the provided knowledge graph.
2. Answer the question under QUESTION based on the provided knowledge graph.
3. Provide your final answer as a paragraph summary

Genes: {geneset}

{knowledge_graph}
"""


In [5]:
# Function that does graphrag query and returns the llm response

from ndex2 import create_nice_cx_from_raw_cx
from typing import Callable, Union, List

def graph_rag_query(
    geneset: Union[List[str], str],
    ndex_id: str,
    prompt_template: str,
    ndex_client,
    llm_query_fn: Callable[[str], str],
    search_depth: int = 1
) -> str:
    """
    Perform a graph-RAG: pull a 1-hop neighborhood from NDEx, extract BEL context,
    fill the prompt, and query the LLM.

    Inputs:
      geneset         – either a list of HGNC symbols or a whitespace-delimited string
      ndex_id         – NDEx network UUID
      prompt_template – string containing placeholders {geneset} and {knowledge_graph}
      ndex_client     – an instantiated ndex2.client.Ndex2 object
      llm_query_fn    – function that takes a single string prompt and returns the LLM’s response
      search_depth    – how many hops out to pull (default=1)

    Returns:
      The raw response string from the LLM.
    """
    # Normalize gene list to Python list
    if isinstance(geneset, str):
        gene_list = geneset.split()
    else:
        gene_list = geneset

    # Build the semicolon-delimited search string for NDEx
    search_string = ";".join(gene_list)

    # 1) Fetch the neighborhood as raw CX2 JSON
    raw_cx2 = ndex_client.get_neighborhood(
        ndex_id,
        search_string=search_string,
        search_depth=search_depth
    )

    # 2) Wrap in the “nice” CX helper
    nice_net = create_nice_cx_from_raw_cx(raw_cx2)

    # 3) Extract BEL expressions from every edge
    bel_lines = []
    for edge_id, edge_obj in nice_net.get_edges():
        bel_stmt = nice_net.get_edge_attribute_value(edge_obj, "bel_expression")
        bel_lines.append(bel_stmt)
    knowledge_graph = "\n".join(bel_lines)

    # 4) Fill in the prompt template
    formatted_prompt = prompt_template.format(
        geneset=" ".join(gene_list),
        knowledge_graph=knowledge_graph
    )

    # 5) Call the LLM and return its response
    return llm_query_fn(formatted_prompt)


# ─── Example Usage ─────────────────────────────────────────────────────────────

# response = graph_rag_query(
#     geneset=["SIRT1", "PARP1", "TP53"],
#     ndex_id=BASE_KG_UUID,
#     prompt_template=PROMPT_TEMPLATE,
#     ndex_client=ndex,
#     llm_query_fn=query_llm
# )
# print(response)

### LLM response using only its knowledge

In [6]:
response = query_llm("How does metabolism affect dna damage response?")
print(response)

Metabolism plays a significant role in the DNA damage response (DDR) by influencing various cellular processes that are crucial for maintaining genomic integrity. Here are some ways in which metabolism affects the DNA damage response:

1. **Energy Supply**: The DDR is an energy-intensive process. Metabolic pathways, such as glycolysis and oxidative phosphorylation, provide the necessary ATP to fuel the repair processes. Energy is required for the activation of repair enzymes, chromatin remodeling, and the synthesis of nucleotides for DNA repair.

2. **Redox Balance**: Metabolic activities generate reactive oxygen species (ROS) as byproducts, which can cause oxidative DNA damage. A balanced redox state, maintained by antioxidants produced in metabolic pathways, is crucial to minimizing DNA damage. Furthermore, certain metabolic pathways can modulate the production of ROS and influence the cell’s ability to cope with oxidative stress.

3. **Nucleotide Synthesis**: Metabolism is directly 

## Run GraphRAG query on first paper

In [7]:
# Download the network of the first paper: pmid10436023

BASE_KG_UUID = "03290389-567f-11f0-a218-005056ae3c32"   
base_cx2 = ndex_client.get_network_as_cx2_stream(BASE_KG_UUID).json()
with open("base_kg.cx2", "w") as f: json.dump(base_cx2, f)

In [8]:
graphrag_res = graph_rag_query(
    geneset=["SIRT1", "NAMPT", "PARP1", "TP53", "BRCA1", "CDK2"],
    ndex_id=BASE_KG_UUID,
    prompt_template=PROMPT_TEMPLATE,
    ndex_client=ndex_client,
    llm_query_fn=query_llm
)

print(graphrag_res)

The knowledge graph highlights the complex interplay between metabolism-related proteins and the DNA damage response, particularly focusing on the roles of SIRT1, NAMPT, PARP1, TP53, BRCA1, and CDK2. TP53, commonly known as p53, is a critical tumor suppressor protein involved in DNA damage response, cell cycle arrest, and apoptosis. Its activity is modulated by post-translational modifications such as phosphorylation and acetylation, mediated by various factors including EP300 and KAT2B, which enhance TP53's ability to bind DNA and activate transcription of target genes like CDKN1A and GADD45A. TP53's regulation involves MDM2, which negatively regulates TP53 by promoting its degradation, while CDKN2A can disrupt the MDM2-TP53 interaction, enhancing TP53 stability. PARP1, another key player in DNA repair, negatively affects TP53 activity, suggesting a balance between DNA repair and cell cycle regulation. BRCA1 supports TP53 activation, further linking DNA repair pathways with tumor supp

### LLM response using only the knowledge graph


The metabolism-related gene/protein set provided includes SIRT1, NAMPT, PARP1, TP53, BRCA1, and CDK2. According to the knowledge graph, TP53 plays a central role in the DNA damage response by influencing various cellular processes. TP53 increases the transcription of CDKN1A, PCNA, and GADD45A, which are important for cell cycle regulation and DNA repair. The activity of TP53 is modulated by phosphorylation at different serine residues, which influences its ability to form complexes and bind DNA. BRCA1 enhances the activity of TP53, promoting its role in DNA repair. PARP1, another protein in the set, directly decreases the activity of TP53, indicating a potential regulatory interaction between metabolism and the DNA damage response through the modulation of TP53 activity. CDK2, while not directly linked to TP53 in the knowledge graph, influences the phosphorylation of other proteins, which may indirectly affect DNA repair processes. Overall, the interplay between these proteins suggests that metabolic factors can impact the DNA damage response through modulation of TP53 activity, which is a key regulator of cell cycle arrest and apoptosis in response to DNA damage.

### LLM response using knowledge graph and LLM knowledge

The interplay between metabolism and DNA damage response (DDR) involves a complex network of genes and proteins, where metabolic processes can influence DDR pathways. From the knowledge graph provided, we can infer several interactions involving the genes and proteins relevant to metabolism and DDR. 

SIRT1 and NAMPT, although not directly mentioned in the knowledge graph, are known to be involved in metabolic processes and can influence DDR through their roles in NAD+ metabolism and deacetylation activities. PARP1, a critical protein in DNA repair, directly decreases the activity of TP53, a central player in DDR. TP53, known as a tumor suppressor protein, is pivotal in responding to DNA damage by inducing cell cycle arrest, apoptosis, and DNA repair pathways. The modulation of TP53 activity by PARP1 suggests a link between metabolic sensing and DDR, as PARP1 activity is NAD+-dependent and thus, connected to cellular metabolic status.

BRCA1, another key gene in DDR, is increased by TP53 activity, indicating a pathway where TP53 can stimulate DNA repair mechanisms. CDK2 is involved in cell cycle regulation and influences phosphorylation states of proteins like RB1 and E2F, which are integral to cell cycle progression and can be tied to DDR regulation. The phosphorylation and acetylation of TP53 at various sites further modulate its activity, with interactions involving proteins such as MDM2, EP300, and p14_3_3, highlighting the complex regulation of TP53 in response to DNA damage and potential metabolic cues.

Overall, metabolism affects DNA damage response by influencing key players like TP53 and PARP1, where metabolic state can modulate DDR pathways via NAD+-dependent enzymes and signaling cascades that regulate cell cycle arrest and repair mechanisms.

## Run GraphRAG query on second paper

In [9]:
# Download network of the second paper: pmid24360018
SECOND_KG_UUID = "57becd7b-5680-11f0-a218-005056ae3c32"   
new_cx2 = ndex_client.get_network_as_cx2_stream(SECOND_KG_UUID).json()
with open("new_kg.cx2", "w") as f: json.dump(new_cx2, f)

In [12]:
graphrag_res = graph_rag_query(
    geneset=["SIRT1", "NAMPT", "PARP1", "TP53", "BRCA1", "CDK2"],
    ndex_id=SECOND_KG_UUID,
    prompt_template=PROMPT_TEMPLATE,
    ndex_client=ndex_client,
    llm_query_fn=query_llm
)

print(graphrag_res)

The metabolism of NAD(+) plays a crucial role in the DNA damage response, as evidenced by its interaction with key proteins like SIRT1 and PARP1. Both SIRT1 and PARP1 require NAD(+) for their activities, which are involved in regulating the DNA damage response. SIRT1 and PARP1 have a complex regulatory relationship, where SIRT1 can both increase and decrease the activity of PARP1, and vice versa. The negative correlation between the activities of SIRT1 and PARP1 with NAD(+) levels suggests a regulatory balance mediated by NAD(+). SIRT1 enhances the DNA damage response, partly by increasing double-strand break repair via nonhomologous end joining, while PARP1 supports DNA repair processes and increases the DNA damage response. PARP1 activity decreases NAD(+) levels, indicating its consumption in the process, while NAD(+) itself can regulate DNA repair. Additionally, SIRT1 is involved in oxidative stress response, which is linked to DNA damage repair mechanisms. Overall, the metabolism o

### LLM response using only the knowledge graph

The metabolism of NAD(+) plays a crucial role in the DNA damage response, as evidenced by its interaction with key proteins like SIRT1 and PARP1. Both SIRT1 and PARP1 require NAD(+) for their activities, which are involved in regulating the DNA damage response. SIRT1 and PARP1 have a complex regulatory relationship, where SIRT1 can both increase and decrease the activity of PARP1, and vice versa. The negative correlation between the activities of SIRT1 and PARP1 with NAD(+) levels suggests a regulatory balance mediated by NAD(+). SIRT1 enhances the DNA damage response, partly by increasing double-strand break repair via nonhomologous end joining, while PARP1 supports DNA repair processes and increases the DNA damage response. PARP1 activity decreases NAD(+) levels, indicating its consumption in the process, while NAD(+) itself can regulate DNA repair. Additionally, SIRT1 is involved in oxidative stress response, which is linked to DNA damage repair mechanisms. Overall, the metabolism of NAD(+) is intricately connected to the activities of SIRT1 and PARP1, influencing the efficiency and regulation of the DNA damage response.

### LLM response using LLM knowledge and the knowledge graph

The metabolism of cells significantly impacts the DNA damage response, primarily through the regulation of NAD\(^+\)-dependent enzymes such as SIRT1 and PARP1. Both SIRT1 and PARP1 utilize NAD\(^+\) as a cofactor, and their activities are interrelated and regulated by the availability of this metabolite. SIRT1, which has protein deacetylase activity, is known to enhance DNA damage response and double-strand break repair, while PARP1 is involved in DNA repair and increases DNA damage response. Their activities are tightly regulated by NAD\(^+\) levels; SIRT1 activity is negatively correlated with NAD\(^+\), whereas PARP1 activity shows a positive correlation. Furthermore, SIRT1 can directly decrease PARP1 activity and expression, indicating a regulatory feedback loop. NAMPT, the rate-limiting enzyme in NAD\(^+\) biosynthesis, plays a critical role in maintaining NAD\(^+\) levels, thus indirectly influencing the activities of SIRT1 and PARP1. These interactions highlight the complex interplay between metabolic pathways and the cellular response to DNA damage, emphasizing the importance of NAD\(^+\) metabolism in modulating the activities of key proteins involved in maintaining genomic stability.

## Demonstate GraphRag on Merged Network of two papers

In [12]:
# Merge base and new networks
from ndex2.cx2 import RawCX2NetworkFactory
from textToKnowledgeGraph.convert_to_cx2 import add_style_to_network

# Creating an instance of RawCX2NetworkFactory
cx2_factory = RawCX2NetworkFactory()
base_net = cx2_factory.get_cx2network(base_cx2)
new_net  = cx2_factory.get_cx2network(new_cx2)

def merge_cx2(cx2_graphs):
    merged_graph = CX2Network()
    node_map = {}  # Maps (node_name, node_type) to node ID in merged graph
    
    # First, merge all nodes
    for graph in cx2_graphs:
        for node_id, node in graph.get_nodes().items():  # Changed to .items() based on docs
            # Create a tuple of node attributes that define uniqueness
            node_data = node["v"]
            node_name = node_data.get('name', '')
            
            if node_name not in node_map:
                # Create new node using add_node() as documented
                new_node_id = merged_graph.add_node(attributes=node_data)
                node_map[node_name] = new_node_id
    
    # Then, merge all edges
    for graph in cx2_graphs:
        for edge_id, edge_data in graph.get_edges().items():  
            # Get source and target directly from edge data
            source_id = edge_data.get('s')  
            target_id = edge_data.get('t') 
            
            # Get source and target nodes
            source_node = graph.get_node(source_id)
            source_name = source_node["v"]["name"]
            target_node = graph.get_node(target_id)
            target_name = target_node["v"]["name"]
            
            merged_source = node_map[source_name]
            merged_target = node_map[target_name]
            
            # Create edge using add_edge() as documented
            merged_graph.add_edge(source=merged_source, 
                                target=merged_target, 
                                attributes=edge_data["v"])
    
    return merged_graph


cx2_graphs = [base_net, new_net]
merged_network = merge_cx2(cx2_graphs)

# Apply style to the merged network
add_style_to_network(
    cx2_network=merged_network,
    style_path="/Users/favourjames/Downloads/llm-text-to-knowledge-graph/textToKnowledgeGraph/cx_style.json"   
)

merged_network.set_name("Merged Network of Base and New KGs")

# Upload the merged network to NDEx
merged_uuid = ndex_client.save_new_cx2_network(merged_network.to_cx2())

print("Merged network UUID:", merged_uuid)

INFO: [2025-07-01 15:02:14] textToKnowledgeGraph.convert_to_cx2 - Setting visual style properties


Merged network UUID: https://www.ndexbio.org/v3/networks/ff2deee1-5683-11f0-a218-005056ae3c32


In [13]:
# Graphrag query with merged graph context
Merged_UUID = "ff2deee1-5683-11f0-a218-005056ae3c32"

merged_graphrag_res = graph_rag_query(
    geneset=["SIRT1", "NAMPT", "PARP1", "TP53", "BRCA1", "CDK2"],
    ndex_id=Merged_UUID,
    prompt_template=PROMPT_TEMPLATE,
    ndex_client=ndex_client,
    llm_query_fn=query_llm
)

print(merged_graphrag_res)

The interplay between metabolism and the DNA damage response is significantly influenced by the regulation of key proteins, including SIRT1, PARP1, and TP53, which are interconnected through their dependence on NAD+ levels. SIRT1, a protein deacetylase, and PARP1, a poly(ADP-ribose) polymerase, both utilize NAD+ as a substrate, which positions NAD+ as a crucial intermediary in modulating their activities. SIRT1 is known to regulate the DNA damage response by enhancing double-strand break repair via nonhomologous end joining and by modulating the activities of various transcription factors and DNA repair proteins. It also directly decreases the activity of TP53, a pivotal tumor suppressor and regulator of DNA damage response, influencing processes like cell cycle arrest and apoptosis. In contrast, PARP1, which facilitates DNA repair and increases poly(ADP-ribosyl)ation, can directly decrease NAD+ levels, thereby influencing SIRT1 activity due to the competition for NAD+. This dynamic re

### LLM response using only the knowledge graph for merged network

The metabolism of NAD(+) plays a crucial role in the regulation of the DNA damage response, with SIRT1 and PARP1 being central players. SIRT1 and PARP1 both require NAD(+) for their activity and regulate each other's functions, which influences the DNA damage response. SIRT1, through its deacetylase activity, increases the DNA damage response and double-strand break repair via nonhomologous end joining. It also directly decreases the activity of PARP1, which itself is involved in enhancing DNA repair and ADP-ribosylation processes. PARP1 reduces NAD(+) levels by converting it into ADP-ribose, while SIRT1 activity is negatively correlated with NAD(+) levels and PARP1 activity. TP53, another key gene in the DNA damage response, is modulated by both SIRT1 and PARP1, with SIRT1 directly decreasing TP53 activity, while PARP1 increases TP53 modification through ADP-ribosylation. Metabolically, NAMPT is a rate-limiting enzyme for NAD(+) production, linking energy status with DNA repair processes. Thus, NAD(+) metabolism and the interplay between SIRT1 and PARP1 are integral to managing DNA damage response, with ramifications on cellular processes like apoptosis, cell cycle arrest, and DNA repair mechanisms.

### LLM response using knowledge graph and LLM knowledge for merged network

The interplay between metabolism and the DNA damage response is significantly influenced by the regulation of key proteins, including SIRT1, PARP1, and TP53, which are interconnected through their dependence on NAD+ levels. SIRT1, a protein deacetylase, and PARP1, a poly(ADP-ribose) polymerase, both utilize NAD+ as a substrate, which positions NAD+ as a crucial intermediary in modulating their activities. SIRT1 is known to regulate the DNA damage response by enhancing double-strand break repair via nonhomologous end joining and by modulating the activities of various transcription factors and DNA repair proteins. It also directly decreases the activity of TP53, a pivotal tumor suppressor and regulator of DNA damage response, influencing processes like cell cycle arrest and apoptosis. In contrast, PARP1, which facilitates DNA repair and increases poly(ADP-ribosyl)ation, can directly decrease NAD+ levels, thereby influencing SIRT1 activity due to the competition for NAD+. This dynamic relationship is further complicated by the regulatory feedback loops involving NAMPT, the rate-limiting enzyme in NAD+ biosynthesis, which is influenced by metabolic signals and oncogenic factors such as MYC. Additionally, TP53 can indirectly affect NAD+ metabolism by regulating microRNAs that decrease SIRT1 activity, illustrating a complex network where metabolic states can alter the cellular response to DNA damage. This intricate balance ensures that metabolic cues are integrated into the cellular DNA damage response, influencing cell fate decisions in response to genotoxic stress.