This notebook demonstrates the edge validation workflow using LLMs (gpt-oss:20b and gpt-oss:120b).

### Step 1: Prepare edges for validation

As a quick example, this notebook uses a small subset of 'treat' edges from RTX-KG2 sourced from SemMedDB.

In [49]:
import pandas as pd

semmed_treat_edges_df = pd.read_parquet("data/arax_rtx_273_semmed_treat_edges.parquet")
semmed_treat_edges_small_df = semmed_treat_edges_df.head(10)

#### Step 1.1: Load additional node and predicate information to provide more context for edge validation

In [6]:
import json

with open("dict/rtx-kg2_id_info_dictionary.json", 'r') as file:
    note_dict = json.load(file)

with open("dict/biolink_pred_info_dictionary.json", 'r') as file:
    predicate_dict = json.load(file)

### Step 2: Prompt generation

Generate prompts that include node details for the subject and object, as well as predicate details from the BioLink model.
The LLM is asked to classify the relationship between an edge and the PubMed abstract listed as evidence or source for that edge:

- 'yes': The abstract supports and provides evidence for the edge
- 'no': The abstract is irrelevant or contradicts the edge
- 'maybe': The relationship between the edge and abstract is unclear (allows the LLM to express uncertainty)

For 'yes' responses, the LLM must provide exact sentences from the abstract as evidence. This requirement encourages deeper reasoning, which may reduce hallucination and agreement bias.

In [8]:
def generate_prompt(subj_info, obj_info, pred_info, abstract):
    return f"""Please analyze whether the provided abstract supports the following edge. 
Carefully consider the subject, object, and predicate details. 

Edge: {subj_info['name']} --{pred}-> {obj_info['name']}
Subject: {subj_info}
Object: {obj_info}
Predicate: {pred_info}

Abstract:
{abstract}

Instructions:
- Determine if the abstract provides evidence for this edge.
- Use "yes" if the relation is explicitly supported.
- Use "no" if the relation is not mentioned or contradicted.
- Use "maybe" if the evidence is indirect, ambiguous, or suggestive.
- If "Support?" is "yes", return one or more exact supporting sentences from the abstract.
- If "Support?" is "no" or "maybe", return an empty list for "Sentences".

Output Format: Return only a JSON object in the following structure:
{{
  "Support?": "yes" | "no" | "maybe",
  "Sentences": ["..."]  // one or more if yes, [] if no/maybe
}}
"""

`get_publication_info` is a custom method from https://github.com/np0625/trapi-summarizer/blob/main/pubmed_client.py that enables frequent requests to the PubMed database without requiring an email address or prior authorization.

The following procedure generates validation prompts. Note that edges are skipped if they cannot be enriched with additional details (node and predicate information) or if the abstract is unavailable, since edges with insufficient information cannot be validated fairly or precisely.

In [11]:
from pubmed_client import get_publication_info

prompts = []
for i, row in semmed_treat_edges_small_df.iterrows():    
    subj = row['subject']
    subj_info = note_dict.get(subj)
    if subj_info:
        obj = row['object']
        obj_info = note_dict.get(obj)
        if obj_info:
            pred = row['predicate']
            pred_info = predicate_dict.get(pred)
            if pred_info:
                pmids = row['publications']
                pmids = [id for id in pmids if id.startswith('PMID:')]
                if pmids:
                    abstracts_info = await get_publication_info(pmids, 'placeholder')
                    if abstracts_info['_meta']['n_results'] > 0: 
                        abstracts = abstracts_info['results']
                        for pmid in pmids:
                            abstract = abstracts.get(pmid, {})
                            abstract = abstract.get('abstract')
                            if abstract:
                                prompt = generate_prompt(subj_info, obj_info, pred_info, abstract)
                                prompts.append({'index': i, 'pmid': pmid, 'prompt': prompt})

In [13]:
# save the prompt for future use, review, etc.

# with open("intermedia_data/10_edge_validate_prompts.json", "w") as file:
#     json.dump(prompts, file, indent=4)

### Step 3: Begin LLM validation (Round 1) and save responses 

In [23]:
from ollama import Client

responses_round1 = []
client = Client()
count = 0
for prompt_info in prompts:
    index = prompt_info['index']
    pmid = prompt_info['pmid']
    prompt = prompt_info['prompt']
    messages = [
    {
      'role': 'user',
      'content': prompt,
    },
    ]
    
    response = client.chat(
      model='gpt-oss:20b',
      messages=messages,
      options={'num_ctx': 8192},  # 8192 is the recommended lower limit for the context window
    )
    
    responses_round1.append({'index': index, 'pmid': pmid, 'response': response['message']['content']})
    count += 1



    if count % 10 == 0:
        with open("intermedia_data/10_edge_LLM_validate_responses_round1.json", "w") as file:
          json.dump(responses_round1, file, indent=4)
        
        print(f"progress: {count}/{len(prompts)}")

with open("intermedia_data/10_edge_LLM_validate_responses_round1.json", "w") as file:
  json.dump(responses_round1, file, indent=4)

progress: 10/116
progress: 20/116
progress: 30/116
progress: 40/116
progress: 50/116
progress: 60/116
progress: 70/116
progress: 80/116
progress: 90/116
progress: 100/116
progress: 110/116


#### Step 3.1: Parse LLM responses from Round 1, then save

In [25]:
from response_parser import *

patterns, samples, errors = analyze_json_responses(responses_round1)

Total records: 116
RESPONSE PATTERNS:
  plain_json: 116 (100.0%)

SAMPLE RESPONSES BY PATTERN:

PLAIN_JSON:
  Sample 1 (record 0):
    '{\n  "Support?": "yes",\n  "Sentences": [\n    "Norfloxacin is as effective as spectinomycin in gonorrhoea due to penicillin-resistant N. gonorrhoeae, and cures bacterial gastroenteritis caused by several gastrointestinal pathogens."\n  ]\n}'
  Sample 2 (record 1):
    '{\n  "Support?": "no",\n  "Sentences": []\n}'
  Sample 3 (record 2):
    '{"Support?":"no","Sentences":[] }'


In [29]:
parser = LLMResponseParser()
results_round1 = parser.parse_file(responses_round1)
parser.save_results(results_round1, "intermedia_data/parsed_10_edge_LLM_validate_responses_round1.json")
parser.print_summary()

Parsing Summary:
  Total records: 116
  Successful: 116
  Failed: 0
  Success rate: 100.0%
error index: []


#### Step 3.2: Count categorical responses from Round 1 and extract 'yes' and 'maybe' responses for Round 2 validation

In [31]:
support_values = [item['extracted_data']['Support?'] for item in results_round1]
support_counter = Counter(support_values)
support_counter

Counter({'no': 72, 'yes': 29, 'maybe': 15})

In [35]:
not_no_results = [x for x in results_round1 if x['extracted_data']['Support?'] != 'no']
len(not_no_results)

44

In [37]:
not_no_results_index_pmid = [(x['index'], x['pmid']) for x in not_no_results]
len(not_no_results_index_pmid)

44

### Step 4: Begin LLM validation (Round 2) and save responses 

In [39]:
responses_round2 = [] 
count = 0
client = Client()
set_not_no_results_index_pmid = set(not_no_results_index_pmid)

for prompt_info in prompts:
    index = prompt_info['index']
    pmid = prompt_info['pmid']
    if (index, pmid) in set_not_no_results_index_pmid:
        prompt = prompt_info['prompt']
        messages = [
          {
            'role': 'user',
            'content': prompt,
          },
        ]
    
        response = client.chat(
            model='gpt-oss:120b',
            messages=messages,
            options={'num_ctx': 8192},  # 8192 is the recommended lower limit for the context window
          )

        responses_round2.append({'index': index, 'pmid': pmid, 'response': response['message']['content']})
        count += 1
        
        if count % 10 == 0:    
            with open("intermedia_data/10_edge_LLM_validate_responses_round2.json", "w") as file:
                json.dump(responses_round2, file, indent=4)
    
            print(f"progress: {count}/{len(set_to_process)}")

with open("intermedia_data/10_edge_LLM_validate_responses_round2.json", "w") as file:
    json.dump(responses_round2, file, indent=4)

#### Step 4.1: Parse LLM responses from Round 2, then save

In [41]:
patterns, samples, errors = analyze_json_responses(responses_round2)

Total records: 44
RESPONSE PATTERNS:
  plain_json: 44 (100.0%)

SAMPLE RESPONSES BY PATTERN:

PLAIN_JSON:
  Sample 1 (record 0):
    '{\n  "Support?": "yes",\n  "Sentences": [\n    "Nor... is as effective as spectinomycin in gonorrhoea due to penicillin-resistant N. gonorrhoeae, and cures bacterial gastroenteritis caused by several gastrointestinal pathogens."\n  ]\n}'
  Sample 2 (record 1):
    '{\n  "Support?": "maybe",\n  "Sentences": []\n}'
  Sample 3 (record 2):
    '{\n  "Support?": "maybe",\n  "Sentences": []\n}'


In [43]:
parser = LLMResponseParser()
results_round2 = parser.parse_file(responses_round2)
parser.save_results(results_round2, "intermedia_data/parsed_10_edge_LLM_validate_responses_round2.json")
parser.print_summary()

Parsing Summary:
  Total records: 44
  Successful: 44
  Failed: 0
  Success rate: 100.0%
error index: []


Note: The following helper functions can be used to manually correct LLM responses. It's normal for LLMs to occasionally generate improperly formatted JSON responses, especially when processing large batches of requests.

In [None]:
# # Save failed records for manual fixing
# parser.save_failed_records('failed_records_to_fix.json')

# # After manually fixing failed_records_to_fix.json, load and reprocess:
# with open('failed_records_to_fix.json', 'r') as f:
#     fixed_data = json.load(f)

# # Parse the fixed records
# parser2 = LLMResponseParser()
# fixed_results = parser2.parse_file(fixed_data)
# parser2.print_summary()

# # Combine with original results if needed
# all_results = results + fixed_results