# FHIR Requirements Refinement Tool

This tool processes a raw list of FHIR Implementation Guide requirements and uses an LLM to produce a refined, concise list of only the testable requirements.

#### What It Does

- Takes a markdown file containing FHIR requirements (generated from an IG)
- Applies filtering to identify only testable requirements
- Consolidates duplicate requirements and merges related ones
- Formats each requirement with consistent structure
- Outputs a clean, testable requirements list

#### How to Use

1. Run interactive mode in notebook: `run_refinement()` or `result = run_refinement()`
2. Direct notebook to filepath of requirements list of interest
3. The refined requirements will be saved as `revised_reqs_output/{api}_reqs_list_v2_{timestamp}.md`

Notes:
- Supports Claude, Gemini, or GPT-4o
- API keys should be in .env file
- API configurations are set in llm_utils.py- changes to configurations should be made there
- Individual cert setup may need to be modified in `setup_clients()` function in the llm_utils.py file before running this notebook

### Inputs and Setup

In [12]:
import os
import logging
import time
from pathlib import Path
from typing import Dict, Any, Optional
from datetime import datetime
import sys
import re
from datetime import datetime, timedelta

from dotenv import load_dotenv

# Set up logging
logging.basicConfig(level=logging.INFO, 
                   format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)


In [13]:
# Define paths
PROJECT_ROOT = Path.cwd().parent  # Parent directory (one level above cwd)
CURRENT_DIR = Path.cwd()  # Current working directory
DEFAULT_INPUT_DIR = CURRENT_DIR / "initial_reqs_output"  # Default input directory
DEFAULT_OUTPUT_DIR = CURRENT_DIR / "revised_reqs_output"  # Default output directory

# Create output directory
DEFAULT_OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# Load environment variables
load_dotenv()

# Log the directories
logging.info(f"Current working directory: {CURRENT_DIR}")
logging.info(f"Project root: {PROJECT_ROOT}")
logging.info(f"Default input directory: {DEFAULT_INPUT_DIR}")
logging.info(f"Default output directory: {DEFAULT_OUTPUT_DIR}")


2025-05-23 15:47:09,994 - root - INFO - Current working directory: /Users/ceadams/Documents/onclaive/onclaive/reqs_extraction
2025-05-23 15:47:09,995 - root - INFO - Project root: /Users/ceadams/Documents/onclaive/onclaive
2025-05-23 15:47:09,995 - root - INFO - Default input directory: /Users/ceadams/Documents/onclaive/onclaive/reqs_extraction/initial_reqs_output
2025-05-23 15:47:09,996 - root - INFO - Default output directory: /Users/ceadams/Documents/onclaive/onclaive/reqs_extraction/revised_reqs_output


In [14]:
import importlib.util
module_path = os.path.join(PROJECT_ROOT, 'llm_utils.py')

spec = importlib.util.spec_from_file_location("llm_utils", module_path)
llm_utils = importlib.util.module_from_spec(spec)
spec.loader.exec_module(llm_utils)

In [15]:
# Import prompt utilities
prompt_utils_path = os.path.join(PROJECT_ROOT, 'prompt_utils.py')
spec = importlib.util.spec_from_file_location("prompt_utils", prompt_utils_path)
prompt_utils = importlib.util.module_from_spec(spec)
spec.loader.exec_module(prompt_utils)

# Setup the prompt environment
prompt_env = prompt_utils.setup_prompt_environment(PROJECT_ROOT)
PROMPT_DIR = prompt_env["prompt_dir"]
REQUIREMENTS_REFINEMENT_PATH = prompt_env["requirements_refinement_path"]

logging.info(f"Using prompts directory: {PROMPT_DIR}")
logging.info(f"Requirements refinement prompt: {REQUIREMENTS_REFINEMENT_PATH}")

2025-05-23 15:47:10,010 - root - INFO - Prompt environment set up at: /Users/ceadams/Documents/onclaive/onclaive/prompts
2025-05-23 15:47:10,011 - root - INFO - Using prompts directory: /Users/ceadams/Documents/onclaive/onclaive/prompts
2025-05-23 15:47:10,011 - root - INFO - Requirements refinement prompt: /Users/ceadams/Documents/onclaive/onclaive/prompts/requirements_refinement.md


### API Configuration

In [16]:
# System prompts
SYSTEM_PROMPTS = {
    "claude": "You are a Healthcare Standards Expert tasked with analyzing and refining FHIR Implementation Guide requirements.",
    "gemini": "Your role is to analyze and refine FHIR Implementation Guide requirements, focusing on making them concise, testable, and conformance-oriented.",
    "gpt": "As a Healthcare Standards Expert, analyze and refine FHIR Implementation Guide requirements to produce a concise, testable requirements list."
}

### Prompt Development

In [17]:
def get_requirements_refinement_prompt(requirements_list: str) -> str:
    """
    Create the prompt for refining requirements list using external prompt file
    
    Args:
        requirements_list: The original list of requirements
        
    Returns:
        str: The prompt for the LLM loaded from external file
    """
    return prompt_utils.load_prompt(
        REQUIREMENTS_REFINEMENT_PATH,
        requirements_list=requirements_list
    )

### API Call

In [18]:
def make_api_request(client, api_type: str, content: str) -> str:
    """Make API request with retries"""

    prompt = get_requirements_refinement_prompt(content)
    
    # Create a rate limiter for this request
    rate_limiter = llm_utils.create_rate_limiter()
    rate_limit_func = llm_utils.create_rate_limit_function(rate_limiter, api_type)
    
    return llm_utils.make_llm_request(
        client=client,
        api_type=api_type,
        prompt=prompt,
        system_prompt=SYSTEM_PROMPTS[api_type],
        rate_limit_func=rate_limit_func
    )

### Main Processing Function

In [19]:
def refine_requirements(input_file: str, api_type: str = "claude", 
                       output_dir: str = None) -> Dict[str, Any]:
    """
    Refine requirements using the specified API
    
    Args:
        input_file: Path to the input requirements list markdown file
        api_type: The API to use ("claude", "gemini", or "gpt")
        output_dir: Directory to save the output (optional)
        
    Returns:
        Dict containing processing results and path to refined requirements
    """
    logger.info(f"Starting requirements refinement with {api_type}")
    
    # Use default output directory if none provided
    if output_dir is None:
        output_dir = DEFAULT_OUTPUT_DIR
    else:
        output_dir = Path(output_dir)
        output_dir.mkdir(parents=True, exist_ok=True)
    
    # Validate input file
    input_path = Path(input_file)
    if not input_path.exists():
        raise FileNotFoundError(f"Input file not found: {input_file}")
    
    # Read input requirements
    with open(input_path, 'r') as f:
        requirements_content = f.read()
    
    # Initialize API clients
    clients = llm_utils.setup_clients()
    if api_type not in clients or clients[api_type] is None:
        raise ValueError(f"API client for {api_type} is not available")
    
    client = clients[api_type]
    
    try:
        # Process the requirements
        logger.info(f"Sending requirements to {api_type} for refinement...")
        refined_requirements = make_api_request(client, api_type, requirements_content)
        
        # Generate output filename
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        output_filename = f"{api_type}_reqs_list_v2_{timestamp}.md"
        output_file_path = output_dir / output_filename
        
        # Save refined requirements
        with open(output_file_path, 'w') as f:
            f.write(refined_requirements)
        
        # Count refined requirements
        refined_req_count = count_requirements_in_markdown(refined_requirements)
        
        logger.info(f"Requirements refinement complete. Output saved to: {output_file_path}")
        logger.info(f"Identified {refined_req_count} requirements")
        
        return {
            "input_file": str(input_path),
            "output_file": str(output_file_path),
            "api_used": api_type,
            "timestamp": timestamp,
            "requirements_count": refined_req_count
        }
        
    except Exception as e:
        logger.error(f"Error refining requirements: {str(e)}")
        raise

### Main Execution

In [20]:
def count_requirements_in_markdown(markdown_text):
    """
    Count the number of requirements in a markdown file that follow the REQ-XX format.
    
    Handles both formats:
    # REQ-01
    or
    ## REQ-01
    
    Example of the expected formats:
    # REQ-01
    **Summary**: Some requirement summary
    
    ## REQ-02
    **Summary**: Another requirement summary
    """
    # Pattern for both formats: either # REQ-XX or ## REQ-XX
    req_pattern = r"^\s*(#|##)\s+REQ-\d+"
    
    # Count the occurrences
    lines = markdown_text.split('\n')
    count = 0
    
    for line in lines:
        if re.match(req_pattern, line):
            count += 1
    
    return count

In [21]:
def run_refinement():
    """Run the refinement process with user input"""
    print("\n" + "="*80)
    print("FHIR Requirements Refinement Tool")
    print("="*80)
    
    # Start timing the entire function execution
    start_time = time.time()
    
    # Get input directory or use default
    input_dir = input(f"Enter input directory path or accept default (default '{DEFAULT_INPUT_DIR}'): ") or str(DEFAULT_INPUT_DIR)
    input_dir_path = Path(input_dir)
    
    if not input_dir_path.exists():
        print(f"Warning: Input directory {input_dir} does not exist.")
        input_file = input("Enter full path to requirements markdown file: ")
    else:
        # List all markdown files in the input directory
        md_files = list(input_dir_path.glob("*.md"))
        
        if md_files:
            # Sort files by modification time (newest first)
            md_files.sort(key=lambda x: x.stat().st_mtime, reverse=True)
            
            # Show only the 10 most recent files
            recent_files = md_files[:10]
            
            print("\nMost recent files:")
            for idx, file in enumerate(recent_files, 1):
                # Format the modification time as part of the display
                mod_time = datetime.fromtimestamp(file.stat().st_mtime).strftime("%Y-%m-%d %H:%M")
                print(f"{idx}. {file.name} ({mod_time})")
            
            # Let user select from the list, see more files, or enter a custom path
            print("\nOptions:")
            print("- Select a number (1-10) to choose one of the following most recently generated files")
            print("- Enter 'all' to see all files")
            print("- Enter a full path to use a specific file")
            
            selection = input("\nReview the printed options for choosing a requirements file and enter applicable selection: ")
            
            if selection.lower() == 'all':
                # Show all files with pagination
                all_files = md_files
                page_size = 20
                total_pages = (len(all_files) + page_size - 1) // page_size
                
                current_page = 1
                while current_page <= total_pages:
                    start_idx = (current_page - 1) * page_size
                    end_idx = min(start_idx + page_size, len(all_files))
                    
                    print(f"\nAll files (page {current_page}/{total_pages}):")
                    for idx, file in enumerate(all_files[start_idx:end_idx], start_idx + 1):
                        mod_time = datetime.fromtimestamp(file.stat().st_mtime).strftime("%Y-%m-%d %H:%M")
                        print(f"{idx}. {file.name} ({mod_time})")
                    
                    if current_page < total_pages:
                        next_action = input("\nPress Enter for next page, 'q' to select, or enter a number to choose a file: ")
                        if next_action.lower() == 'q':
                            break
                        elif next_action.isdigit() and 1 <= int(next_action) <= len(all_files):
                            input_file = str(all_files[int(next_action) - 1])
                            break
                        else:
                            current_page += 1
                    else:
                        break
                
                if 'input_file' not in locals():
                    # If we went through all pages without selection
                    file_number = input("\nEnter the file number to process: ")
                    if file_number.isdigit() and 1 <= int(file_number) <= len(all_files):
                        input_file = str(all_files[int(file_number) - 1])
                    else:
                        input_file = file_number  # Treat as a custom path
            
            elif selection.isdigit() and 1 <= int(selection) <= len(recent_files):
                input_file = str(recent_files[int(selection) - 1])
            else:
                input_file = selection  # Treat as a custom path
        else:
            print(f"No markdown files found in {input_dir}")
            input_file = input("Enter full path to requirements markdown file: ")
    
    # Get output directory or use default
    output_dir = input(f"Enter output directory path or accept default (default '{DEFAULT_OUTPUT_DIR}'): ") or str(DEFAULT_OUTPUT_DIR)
    output_dir_path = Path(output_dir)
    
    # Create output directory if it doesn't exist
    output_dir_path.mkdir(parents=True, exist_ok=True)
    
    # Select the API to use
    print("\nSelect the API to use:")
    print("1. Claude")
    print("2. Gemini")
    print("3. GPT-4")
    api_choice = input("Enter your choice of API to use, based on the printed listing (1-3, default 1): ") or "1"
    
    api_mapping = {
        "1": "claude",
        "2": "gemini",
        "3": "gpt"
    }
    
    api_type = api_mapping.get(api_choice, "claude")
    
    try:
        # Run the refinement
        print(f"\nProcessing requirements with {api_type.capitalize()}...")
        result = refine_requirements(input_file, api_type, output_dir_path)
        
        # Calculate total execution time
        total_elapsed_time = time.time() - start_time
        total_elapsed_formatted = str(timedelta(seconds=int(total_elapsed_time)))
        
        print("\n" + "="*80)
        print("Requirements Refinement Complete!")
        print(f"Input file: {result['input_file']}")
        print(f"Refined requirements saved to: {result['output_file']}")
        print(f"API used: {result['api_used']}")
        print(f"Number of requirements identified: {result['requirements_count']}")
        print(f"Total execution time: {total_elapsed_formatted}")
        print("="*80)
        
        return result
    
    except Exception as e:
        logger.error(f"Error: {str(e)}")
        print(f"\nError occurred during refinement: {str(e)}")
        print("Check the log for more details.")
        return None

In [22]:
# Run the interactive version
run_refinement()


FHIR Requirements Refinement Tool

Most recent files:
1. plan-net-requirements.md (2025-04-29 13:55)
2. example_claude_reqs_list_v1_20250416_141301.md (2025-04-23 10:46)

Options:
- Select a number (1-10) to choose one of the following most recently generated files
- Enter 'all' to see all files
- Enter a full path to use a specific file

Select the API to use:
1. Claude
2. Gemini
3. GPT-4


2025-05-23 15:47:18,318 - __main__ - INFO - Starting requirements refinement with claude
2025-05-23 15:47:18,347 - __main__ - INFO - Sending requirements to claude for refinement...



Processing requirements with Claude...


KeyboardInterrupt: 