# FHIR Requirements Refinement Tool

This tool processes a raw list of FHIR Implementation Guide requirements and uses an LLM to produce a refined, concise list of only the testable requirements.

#### What It Does

- Takes a markdown file containing FHIR requirements (generated from an IG)
- Applies filtering to identify only testable requirements
- Consolidates duplicate requirements and merges related ones
- Formats each requirement with consistent structure
- Outputs a clean, testable requirements list

#### How to Use

1. Individual cert setup may need to be modified in `setup_clients()` function
2. Run interactive mode in notebook: `result = run_refinement()` 
   - Or process directly: `result = refine_requirements("path/to/requirements.md", "claude")`
3. Direct notebook to filepath of requirements list of interest
4. The refined requirements will be saved as `revised_reqs_output/{api}_reqs_list_v2_{timestamp}.md`

Notes:
- Supports Claude, Gemini, or GPT-4o
- API keys should be in .env file


In [59]:
import os
import logging
import time
from pathlib import Path
from typing import Dict, Any, Optional
from datetime import datetime

# Import required libraries (ensure these are installed)
from dotenv import load_dotenv
import httpx
from tenacity import retry, wait_exponential, stop_after_attempt, retry_if_exception_type
from anthropic import Anthropic, RateLimitError
import google.generativeai as gemini
from openai import OpenAI

# Set up logging
logging.basicConfig(level=logging.INFO, 
                   format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)


In [None]:
# Define paths
PROJECT_ROOT = Path.cwd().parent  # Parent directory (one level above cwd)
CURRENT_DIR = Path.cwd()  # Current working directory
DEFAULT_INPUT_DIR = CURRENT_DIR / "initial_reqs_output"  # Default input directory
DEFAULT_OUTPUT_DIR = CURRENT_DIR / "revised_reqs_output"  # Default output directory

# Create output directory
DEFAULT_OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# Load environment variables
load_dotenv()

# Log the directories
logging.info(f"Current working directory: {CURRENT_DIR}")
logging.info(f"Project root: {PROJECT_ROOT}")
logging.info(f"Default input directory: {DEFAULT_INPUT_DIR}")
logging.info(f"Default output directory: {DEFAULT_OUTPUT_DIR}")


2025-04-16 11:45:48,302 - root - INFO - Current working directory: /Users/ceadams/Documents/onclaive/onclaive/reqs_extraction
2025-04-16 11:45:48,303 - root - INFO - Project root: /Users/ceadams/Documents/onclaive/onclaive
2025-04-16 11:45:48,303 - root - INFO - Default input directory: /Users/ceadams/Documents/onclaive/onclaive/reqs_extraction/processed_output
2025-04-16 11:45:48,304 - root - INFO - Default output directory: /Users/ceadams/Documents/onclaive/onclaive/reqs_extraction/revised_reqs


In [61]:

# API Configuration
API_CONFIGS = {
    "claude": {
        "model_name": "claude-3-5-sonnet-20241022",
        "max_tokens": 8192,
        "temperature": 0.7,
        "delay_between_requests": 1
    },
    "gemini": {
        "model": "models/gemini-1.5-pro-001",
        "max_tokens": 8192,
        "temperature": 0.7,
        "delay_between_requests": 2,
        "timeout": 60
    },
    "gpt": {
        "model": "gpt-4o",
        "max_tokens": 8192,
        "temperature": 0.7,
        "delay_between_requests": 2
    }
}

# System prompts
SYSTEM_PROMPTS = {
    "claude": "You are a Healthcare Standards Expert tasked with analyzing and refining FHIR Implementation Guide requirements.",
    "gemini": "Your role is to analyze and refine FHIR Implementation Guide requirements, focusing on making them concise, testable, and conformance-oriented.",
    "gpt": "As a Healthcare Standards Expert, analyze and refine FHIR Implementation Guide requirements to produce a concise, testable requirements list."
}


In [62]:

def setup_clients():
    """Initialize clients for each LLM service"""
    try:
        # Claude setup
        verify_path = '/opt/homebrew/etc/openssl@3/cert.pem'
        http_client = httpx.Client(
            verify=verify_path if os.path.exists(verify_path) else True,
            timeout=60.0
        )
        claude_client = Anthropic(
            api_key=os.getenv('ANTHROPIC_API_KEY'),
            http_client=http_client
        )
        
        # Gemini setup
        gemini_api_key = os.getenv('GEMINI_API_KEY')
        if not gemini_api_key:
            logger.warning("GEMINI_API_KEY not found")
            gemini_client = None
        else:
            gemini.configure(api_key=gemini_api_key)
            gemini_client = gemini.GenerativeModel(
                model_name=API_CONFIGS["gemini"]["model"],
                generation_config={
                    "max_output_tokens": API_CONFIGS["gemini"]["max_tokens"],
                    "temperature": API_CONFIGS["gemini"]["temperature"]
                }
            )
        
        # OpenAI setup
        openai_api_key = os.getenv('OPENAI_API_KEY')
        if not openai_api_key:
            logger.warning("OPENAI_API_KEY not found")
            openai_client = None
        else:
            openai_client = OpenAI(
                api_key=openai_api_key,
                timeout=60.0
            )
        
        return {
            "claude": claude_client,
            "gpt": openai_client,
            "gemini": gemini_client
        }
        
    except Exception as e:
        logger.error(f"Error setting up clients: {str(e)}")
        raise


In [63]:

def get_requirements_refinement_prompt(requirements_list: str) -> str:
    """
    Create the prompt for refining requirements list
    
    Args:
        requirements_list: The original list of requirements
        
    Returns:
        str: The prompt for the LLM
    """
    return f"""Your task is to review this list of FHIR Implementation Guide requirements and create a refined, concise list of only the testable requirements. Follow these guidelines carefully:

1. Produce a list of (maximum 50) clear, testable requirements that a conformance testing tool could verify.

2. Include ONLY requirements that:
   - Have explicit conformance language (SHALL, SHOULD, MAY, MUST, REQUIRED, etc.)
   - Describe specific, verifiable behavior or capability
   - Could be objectively tested through software testing or attestation

3. EXCLUDE the following types of content:
   - General introductory or conclusive/summarization comments
   - Implementation guidance or explanatory text
   - Examples or sample queries
   - Duplicate requirements (consolidate similar requirements)
   - Information about resource relationships without conformance statements
   - General structural information about profiles or resources
   - Requirements fragments that should be part of a single testable requirement

4. For each requirement, include:
   - A clear, concise statement of what MUST, SHOULD, MAY, SHALL, etc. be implemented
   - The actor responsible (Server, Client, Application, etc.)
   - The conformance level (SHALL, SHOULD, MAY, MUST, REQUIRED, etc.)

5. Format each requirement consistently:
   - Use active voice
   - Begin with the actor (e.g., "Server SHALL...")
   - Make each requirement atomic and independently testable
   - Ensure requirements are implementation-neutral

After filtering, verify that each requirement in your final list represents a discrete, testable capability or constraint that would be appropriate for conformance testing.

Keep the formatting of each requirement as follows- renumber requirement IDs as you keep requirements in a list, starting with 01:
    
    ---
    # REQ-XX
    **Summary**: [summary text]
    **Description**: "[description text]"
    **Verification**: [method]
    **Actor**: [actor]
    **Conformance**: [SHALL/SHOULD/MAY/etc.]
    **Conditional**: [True/False]
    **Source**: [reference]
    ---

Do not include any other text in the response output, besides the requirements list. 

Here is the list of requirements to refine:

{requirements_list}
"""


In [64]:

@retry(
    wait=wait_exponential(multiplier=2, min=4, max=360),
    stop=stop_after_attempt(8),
    retry=retry_if_exception_type((RateLimitError, TimeoutError))
)
def make_api_request(client, api_type: str, content: str) -> str:
    """Make API request with retries"""
    
    config = API_CONFIGS[api_type]
    prompt = get_requirements_refinement_prompt(content)
    
    try:
        if api_type == "claude":
            response = client.messages.create(
                model=config["model_name"],
                max_tokens=config["max_tokens"],
                messages=[{
                    "role": "user", 
                    "content": prompt
                }],
                system=SYSTEM_PROMPTS[api_type]
            )
            return response.content[0].text
            
        elif api_type == "gemini":
            response = client.generate_content(
                prompt,
                generation_config={
                    "max_output_tokens": config["max_tokens"],
                    "temperature": config["temperature"]
                }
            )
            if hasattr(response, 'text'):
                return response.text
            elif response.candidates:
                return response.candidates[0].content.parts[0].text
            else:
                raise ValueError("No response generated from Gemini API")
                    
        elif api_type == "gpt":
            response = client.chat.completions.create(
                model=config["model"],
                messages=[
                    {"role": "system", "content": SYSTEM_PROMPTS[api_type]},
                    {"role": "user", "content": prompt}
                ],
                max_tokens=config["max_tokens"],
                temperature=config["temperature"]
            )
            return response.choices[0].message.content
            
    except Exception as e:
        logger.error(f"Error in {api_type} API request: {str(e)}")
        raise


In [65]:
def refine_requirements(input_file: str, api_type: str = "claude", 
                       output_dir: str = None) -> Dict[str, Any]:
    """
    Refine requirements using the specified API
    
    Args:
        input_file: Path to the input requirements list markdown file
        api_type: The API to use ("claude", "gemini", or "gpt")
        output_dir: Directory to save the output (optional)
        
    Returns:
        Dict containing processing results and path to refined requirements
    """
    logger.info(f"Starting requirements refinement with {api_type}")
    
    # Use default output directory if none provided
    if output_dir is None:
        output_dir = DEFAULT_OUTPUT_DIR
    else:
        output_dir = Path(output_dir)
        output_dir.mkdir(parents=True, exist_ok=True)
    
    # Validate input file
    input_path = Path(input_file)
    if not input_path.exists():
        raise FileNotFoundError(f"Input file not found: {input_file}")
    
    # Read input requirements
    with open(input_path, 'r') as f:
        requirements_content = f.read()
    
    # Initialize API clients
    clients = setup_clients()
    if api_type not in clients or clients[api_type] is None:
        raise ValueError(f"API client for {api_type} is not available")
    
    client = clients[api_type]
    
    try:
        # Process the requirements
        logger.info(f"Sending requirements to {api_type} for refinement...")
        refined_requirements = make_api_request(client, api_type, requirements_content)
        
        # Generate output filename
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        output_filename = f"{api_type}_reqs_list_v2_{timestamp}.md"
        output_file_path = output_dir / output_filename
        
        # Save refined requirements
        with open(output_file_path, 'w') as f:
            f.write(refined_requirements)
        
        logger.info(f"Requirements refinement complete. Output saved to: {output_file_path}")
        
        return {
            "input_file": str(input_path),
            "output_file": str(output_file_path),
            "api_used": api_type,
            "timestamp": timestamp
        }
        
    except Exception as e:
        logger.error(f"Error refining requirements: {str(e)}")
        raise

In [66]:
def run_refinement():
    """Run the refinement process with user input"""
    print("\n" + "="*80)
    print("FHIR Requirements Refinement Tool")
    print("="*80)
    
    # Get input directory or use default
    input_dir = input(f"Enter input directory path (default '{DEFAULT_INPUT_DIR}'): ") or str(DEFAULT_INPUT_DIR)
    input_dir_path = Path(input_dir)
    
    if not input_dir_path.exists():
        print(f"Warning: Input directory {input_dir} does not exist.")
        input_file = input("Enter full path to requirements markdown file: ")
    else:
        # List all markdown files in the input directory
        md_files = list(input_dir_path.glob("*.md"))
        
        if md_files:
            # Sort files by modification time (newest first)
            md_files.sort(key=lambda x: x.stat().st_mtime, reverse=True)
            
            # Show only the 10 most recent files
            recent_files = md_files[:10]
            
            print("\nMost recent files:")
            for idx, file in enumerate(recent_files, 1):
                # Format the modification time as part of the display
                mod_time = datetime.fromtimestamp(file.stat().st_mtime).strftime("%Y-%m-%d %H:%M")
                print(f"{idx}. {file.name} ({mod_time})")
            
            # Let user select from the list, see more files, or enter a custom path
            print("\nOptions:")
            print("- Select a number (1-10) to choose a file")
            print("- Enter 'all' to see all files")
            print("- Enter a full path to use a specific file")
            
            selection = input("\nYour selection: ")
            
            if selection.lower() == 'all':
                # Show all files with pagination
                all_files = md_files
                page_size = 20
                total_pages = (len(all_files) + page_size - 1) // page_size
                
                current_page = 1
                while current_page <= total_pages:
                    start_idx = (current_page - 1) * page_size
                    end_idx = min(start_idx + page_size, len(all_files))
                    
                    print(f"\nAll files (page {current_page}/{total_pages}):")
                    for idx, file in enumerate(all_files[start_idx:end_idx], start_idx + 1):
                        mod_time = datetime.fromtimestamp(file.stat().st_mtime).strftime("%Y-%m-%d %H:%M")
                        print(f"{idx}. {file.name} ({mod_time})")
                    
                    if current_page < total_pages:
                        next_action = input("\nPress Enter for next page, 'q' to select, or enter a number to choose a file: ")
                        if next_action.lower() == 'q':
                            break
                        elif next_action.isdigit() and 1 <= int(next_action) <= len(all_files):
                            input_file = str(all_files[int(next_action) - 1])
                            break
                        else:
                            current_page += 1
                    else:
                        break
                
                if 'input_file' not in locals():
                    # If we went through all pages without selection
                    file_number = input("\nEnter the file number to process: ")
                    if file_number.isdigit() and 1 <= int(file_number) <= len(all_files):
                        input_file = str(all_files[int(file_number) - 1])
                    else:
                        input_file = file_number  # Treat as a custom path
            
            elif selection.isdigit() and 1 <= int(selection) <= len(recent_files):
                input_file = str(recent_files[int(selection) - 1])
            else:
                input_file = selection  # Treat as a custom path
        else:
            print(f"No markdown files found in {input_dir}")
            input_file = input("Enter full path to requirements markdown file: ")
    
    # Get output directory or use default
    output_dir = input(f"Enter output directory path (default '{DEFAULT_OUTPUT_DIR}'): ") or str(DEFAULT_OUTPUT_DIR)
    output_dir_path = Path(output_dir)
    
    # Create output directory if it doesn't exist
    output_dir_path.mkdir(parents=True, exist_ok=True)
    
    # Select the API to use
    print("\nSelect the API to use:")
    print("1. Claude")
    print("2. Gemini")
    print("3. GPT-4")
    api_choice = input("Enter your choice (1-3, default 1): ") or "1"
    
    api_mapping = {
        "1": "claude",
        "2": "gemini",
        "3": "gpt"
    }
    
    api_type = api_mapping.get(api_choice, "claude")
    
    try:
        # Run the refinement
        print(f"\nProcessing requirements with {api_type.capitalize()}...")
        result = refine_requirements(input_file, api_type, output_dir_path)
        
        print("\n" + "="*80)
        print("Requirements Refinement Complete!")
        print(f"Input file: {result['input_file']}")
        print(f"Refined requirements saved to: {result['output_file']}")
        print(f"API used: {result['api_used']}")
        print("="*80)
        
        return result
    
    except Exception as e:
        logger.error(f"Error: {str(e)}")
        print(f"\nError occurred during refinement: {str(e)}")
        print("Check the log for more details.")
        return None

In [67]:
# Run the interactive version
result = run_refinement()


FHIR Requirements Refinement Tool

Most recent files:
1. claude_reqs_list_v1_20250416_113606.md (2025-04-16 11:36)
2. claude_reqs_list_v1_20250416_112958.md (2025-04-16 11:29)
3. claude_reqs_list_v1_20250416_112422.md (2025-04-16 11:24)
4. claude_reqs_list_v1_20250416_111547.md (2025-04-16 11:15)
5. claude_reqs_list_v1_20250416_103702.md (2025-04-16 10:37)
6. plan_net_gemini_requirements_list_20250402_145733.md (2025-04-02 14:57)
7. plan_net_gemini_20250402_145733.md (2025-04-02 14:57)
8. plan_net_claude_requirements_list_20250402_144346.md (2025-04-02 14:43)
9. plan_net_claude_20250402_144346.md (2025-04-02 14:43)
10. plan_net_gpt_requirements_list_20250402_135527.md (2025-04-02 13:55)

Options:
- Select a number (1-10) to choose a file
- Enter 'all' to see all files
- Enter a full path to use a specific file

Select the API to use:
1. Claude
2. Gemini
3. GPT-4


2025-04-16 11:45:59,918 - __main__ - INFO - Starting requirements refinement with claude
2025-04-16 11:45:59,944 - __main__ - INFO - Sending requirements to claude for refinement...



Processing requirements with Claude...


2025-04-16 11:46:10,242 - httpx - INFO - HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
2025-04-16 11:46:10,245 - __main__ - INFO - Requirements refinement complete. Output saved to: /Users/ceadams/Documents/onclaive/onclaive/reqs_extraction/revised_reqs/claude_reqs_list_v2_20250416_114610.md



Requirements Refinement Complete!
Input file: /Users/ceadams/Documents/onclaive/onclaive/reqs_extraction/processed_output/claude_reqs_list_v1_20250416_113606.md
Refined requirements saved to: /Users/ceadams/Documents/onclaive/onclaive/reqs_extraction/revised_reqs/claude_reqs_list_v2_20250416_114610.md
API used: claude
