# FDA Query Notebook

## Environment Setup

This notebook uses the `.venvChallengeProd` virtual environment located in the `prod` directory.

### To activate the environment in terminal:
```bash
cd /labs/jupyter/ai_challenge/prod
source .venvChallengeProd/bin/activate
```

### To run Python directly with this environment:
```bash
/labs/jupyter/ai_challenge/prod/.venvChallengeProd/bin/python your_script.py
```

### To select the correct kernel in VS Code:
1. Click on the kernel selector in the top right of the notebook
2. Select "Python Environments..."
3. Choose the Python interpreter from: `/labs/jupyter/ai_challenge/prod/.venvChallengeProd/bin/python`

### Installed packages:
- langchain ✅
- langgraph ✅
- requests ✅
- All required dependencies ✅

**Environment is ready to use!**

In [1]:
VERBOSE_MODE = True

import sys
from pathlib import Path
sys.path.append(str(Path('../modules')))

from langchain.agents import Tool
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.llms.base import LLM

from langchain_core.messages import HumanMessage
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph, END

import requests
import base64
import os, json, inspect
import importlib

from typing import Literal, Optional, List, Mapping, Any, Dict, Tuple
from northwellllm import NorthwellLLM
import FDAQuery
importlib.reload(FDAQuery)
from FDAQuery import FDAQuery

# fda_summary_dir = Path('../LabDocs/es1/fda_summary') # Path to the directory containing folders of FDA summary JSON files
fda_summary_dir = Path('../LabDocs/es1/fda_summary') 
fda_summary_docs = []
i = 0

# for fda_dir in fda_summary_dir.iterdir():
#     if not fda_dir.is_dir():
#         continue
#     for file in fda_dir.glob('*.json'):
#         json_content = file.read_text(encoding='utf-8')
#         fda_summary_docs.append((i, file.name, json_content))
#         i += 1

# Recursively search through all subdirectories for JSON files
for f in fda_summary_dir.rglob('*.json'):
    try:
        # Parse the full JSON file
        full_json = json.loads(f.read_text(encoding='utf-8'))
        
        # Extract only the Document Summary
        document_summary = full_json.get('Document Summary', 'No document summary available')
        
        # Create minimal JSON object with just the summary
        minimal_json = {
            "Document Summary": document_summary
        }
        
        # Convert back to JSON string
        json_content = json.dumps(minimal_json, indent=2)
        fda_summary_docs.append((i, f.name, json_content))
        i += 1
    except json.JSONDecodeError as e:
        print(f"Error parsing JSON in {f}: {e}")
    except Exception as e:
        print(f"Error reading {f}: {e}")

print(f"Loaded {len(fda_summary_docs)} FDA Summary documents (Document Summary only)")
print(fda_summary_docs)



  from .autonotebook import tqdm as notebook_tqdm


Loaded 4732 FDA Summary documents (Document Summary only)
[(0, 'K103358_es1.json', '{\n  "Document Summary": "FDA 510(k) summary for ROMA serum cancer risk algorithm using HE4 EIA and ARCHITECT CA 125 II to stratify ovarian adnexal mass surgical candidates by malignancy likelihood"\n}'), (1, 'K153607_es1.json', '{\n  "Document Summary": "FDA 510(k) summary for ROMA Calculation Tool Using Elecsys Assays, a qualitative test combining Elecsys HE4, CA 125 II assays, and menopausal status to assess malignancy risk in women with ovarian adnexal mass."\n}'), (2, 'K160090_es1.json', '{\n  "Document Summary": "FDA 510(k) summary for Lumipulse G ROMA qualitative serum/plasma test combining HE4, CA125, and menopausal status to assess ovarian malignancy risk using CLEIA on LUMIPULSE G1200"\n}'), (3, 'K150588_es1.json', '{\n  "Document Summary": "FDA 510(k) summary for Vermillion OVA1 Next Generation test combining five immunoassays with risk score algorithm for ovarian adnexal mass malignancy asse

In [2]:
fda = FDAQuery(debug=1)

fda.identify_relevant_documents("Which FDA docs are there related to TEG? ", fda_summary_docs)

Processing 4732 documents in batches of 1000
Processing batch 1/5 (1000 documents)
"doc_id":"992", "doc_name":"K160502_es1.json", "relevance_level":"1.0"
"doc_id":"834", "doc_name":"K183160_es1.json", "relevance_level":"0.95"
"doc_id":"641", "doc_name":"K243858_es1.json", "relevance_level":"0.92"
Batch 1: Found 3 relevant documents
Processing batch 2/5 (1000 documents)
"doc_id":"992", "doc_name":"K160502_es1.json", "relevance_level":"1.0"
"doc_id":"834", "doc_name":"K183160_es1.json", "relevance_level":"0.95"
"doc_id":"641", "doc_name":"K243858_es1.json", "relevance_level":"0.92"
Batch 1: Found 3 relevant documents
Processing batch 2/5 (1000 documents)
"doc_id":"1036", "doc_name":"K251024_es1.json", "relevance_level":"0.95"
"doc_id":"1083", "doc_name":"K232018_es1.json", "relevance_level":"0.90"
Batch 2: Found 2 relevant documents
Processing batch 3/5 (1000 documents)
"doc_id":"1036", "doc_name":"K251024_es1.json", "relevance_level":"0.95"
"doc_id":"1083", "doc_name":"K232018_es1.json"

{'function': 'identify_relevant_documents',
 'primary_question': 'Which FDA docs are there related to TEG? ',
 'documents': [{'doc_id': '992',
   'doc_name': 'K160502_es1.json',
   'relevance_level': '1.0'},
  {'doc_id': '834', 'doc_name': 'K183160_es1.json', 'relevance_level': '0.95'},
  {'doc_id': '1036',
   'doc_name': 'K251024_es1.json',
   'relevance_level': '0.95'},
  {'doc_id': '641', 'doc_name': 'K243858_es1.json', 'relevance_level': '0.92'},
  {'doc_id': '1083',
   'doc_name': 'K232018_es1.json',
   'relevance_level': '0.90'}],
 'total_batches': 5,
 'total_documents_processed': 4732,
 'total_relevant_found': 5}

In [4]:
fda = FDAQuery(debug=1)

fda.identify_relevant_documents("Which FDA docs are there related to hearts? ", fda_summary_docs)

Processing 1 documents in batches of 1000
Processing batch 1/1 (1 documents)
"doc_id":"0", "doc_name":"K221640_es1.json", "relevance_level":"0.9"
Batch 1: Found 1 relevant documents
Total relevant documents found: 1


{'function': 'identify_relevant_documents',
 'primary_question': 'Which FDA docs are there related to hearts? ',
 'documents': [{'doc_id': '0',
   'doc_name': 'K221640_es1.json',
   'relevance_level': '0.9'}],
 'total_batches': 1,
 'total_documents_processed': 1,
 'total_relevant_found': 1}

In [15]:
question = "what is the reference range for a1c?"
try:
    fda_query = FDAQuery(debug=0)
    res = fda_query.enhance_query_with_llm(question)  
    print(f"{res}")
except Exception as e:
    print(f"❌ ERROR: {e}")

print("=" * 80)

What reference (normal) range values for Hemoglobin A1C (HbA1c) are reported in FDA 510(k) clearance documents for HbA1c testing devices? LIKELY DEVICE: Hemoglobin A1C (HbA1c) Analyzers


In [10]:
question = "what is the reference range for a1c?"
try:
    fda_query = FDAQuery(debug=0)
    res = fda_query.enhance_query_with_llm(question, model='gpt-4o')  
    print(f"{res}")
except Exception as e:
    print(f"❌ ERROR: {e}")

print("=" * 80)

"What is the FDA-approved reference range for Hemoglobin A1C levels as specified in 510(k) submissions for A1C testing devices? LIKELY DEVICE: Hemoglobin A1C Analyzers"


In [2]:
question = "what fda docs are there related to allomap?"
try:
    fda_query = FDAQuery(model='o3', debug=1)
    res, history = fda_query.ask_fda(question, fda_summary_docs)
    print(f"{res}")
except Exception as e:
    print(f"❌ ERROR: {e}")

print("=" * 80)

Starting time: 1763058108.8327224
What FDA 510(k) submissions, clearances, and related regulatory documentation exist for the AlloMap Heart Molecular Expression Test (gene expression profiling assay for heart transplant rejection), including device indications, predicate devices, and performance data? LIKELY DEVICE: AlloMap Heart Molecular Expression Test
Enhanced question: What FDA 510(k) submissions, clearances, and related regulatory documentation exist for the AlloMap Heart Molecular Expression Test (gene expression profiling assay for heart transplant rejection), including device indications, predicate devices, and performance data? LIKELY DEVICE: AlloMap Heart Molecular Expression Test
Time after enhancing question: 1763058111.3179657 (Elapsed: 2.49s)
Processing 4732 documents in batches of 1000
Processing batch 1/5 (1000 documents)
What FDA 510(k) submissions, clearances, and related regulatory documentation exist for the AlloMap Heart Molecular Expression Test (gene expression 

In [23]:
question = "what fda docs are there related to allomap?"
try:
    fda_query = FDAQuery(model='gpt-4.1', debug=1)
    res, history = fda_query.ask_fda(question, fda_summary_docs)
    print(f"{res}")
except Exception as e:
    print(f"❌ ERROR: {e}")

print("=" * 80)

Starting time: 1760377714.584466
What FDA 510(k) submissions, clearance decisions, and related regulatory documents exist for the AlloMap Heart Molecular Expression Testing system, including details on intended use, performance data, and predicate devices? LIKELY DEVICE: AlloMap Heart Molecular Expression Testing (CareDx, Inc.)
Enhanced question: What FDA 510(k) submissions, clearance decisions, and related regulatory documents exist for the AlloMap Heart Molecular Expression Testing system, including details on intended use, performance data, and predicate devices? LIKELY DEVICE: AlloMap Heart Molecular Expression Testing (CareDx, Inc.)
Time after enhancing question: 1760377715.5917077 (Elapsed: 1.01s)
Processing 1 documents in batches of 1000
Processing batch 1/1 (1 documents)
"doc_id":"0", "doc_name":"K221640_es1.json", "fda_folder":"Cardiovascular", "relevance_level":"1.0"
Batch 1: Found 1 relevant documents
Total relevant documents found: 1
Time after identifying relevant document

In [1]:
question = "what is the reference range for a1c?"
try:
    fda_query = FDAQuery(debug=0)
    res, history = fda_query.ask_fda(question, fda_summary_docs)  
    print(f"{res}")
except Exception as e:
    print(f"❌ ERROR: {e}")

print("=" * 80)

❌ ERROR: name 'FDAQuery' is not defined


In [None]:
question = "I have just received an green-top tube for a1c"
try:
    fda_query = FDAQuery(debug=0)
    # res, history = fda_query.ask_fda(question, fda_summary_docs)  
    res = fda_query.enhance_query_with_llm(question, model='gpt-4o')  
    print(f"{res}")
except Exception as e:
    print(f"❌ ERROR: {e}")

print("=" * 80)

What are the FDA-approved specifications and requirements for Hemoglobin A1C testing devices that utilize green-top tubes for sample collection? Are there any discrepancies or concerns regarding sample collection using green-top tubes in FDA 510(k) documentation? LIKELY DEVICE: Hemoglobin A1C Analyzers
