# FHIR Requirements Refinement Tool

This tool processes a raw list of FHIR Implementation Guide requirements and uses an LLM to produce a refined, concise list of only the testable requirements.

#### What It Does

- Takes a markdown file containing FHIR requirements (generated from an IG)
- Applies filtering to identify only testable requirements
- Consolidates duplicate requirements and merges related ones
- Formats each requirement with consistent structure
- Outputs a clean, testable requirements list (15-50 requirements)

#### How to Use

1. Individual cert setup may need to be modified in `setup_clients()` function
2. Run interactive mode in notebook: `result = run_refinement()` 
   - Or process directly: `result = refine_requirements("path/to/requirements.md", "claude")`
3. Import the module to another notebook: `from refine_requirements import refine_requirements, run_refinement`
4. Direct notebook to filepath of requirements list of interest
5. The refined requirements will be saved to `revised_reqs/refined_requirements_{api}_{timestamp}.md`

Notes:
- Supports Claude, Gemini, or GPT-4
- API keys should be in .env file


In [51]:
import os
import logging
import time
from pathlib import Path
from typing import Dict, Any, Optional
from datetime import datetime

# Import required libraries (ensure these are installed)
from dotenv import load_dotenv
import httpx
from tenacity import retry, wait_exponential, stop_after_attempt, retry_if_exception_type
from anthropic import Anthropic, RateLimitError
import google.generativeai as gemini
from openai import OpenAI

# Set up logging
logging.basicConfig(level=logging.INFO, 
                   format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)


In [52]:

# Define paths
PROJECT_ROOT = Path.cwd().parent  # Parent directory (one level above cwd)
CURRENT_DIR = Path.cwd()  # Current working directory
OUTPUT_DIR = CURRENT_DIR / "revised_reqs"

# Create output directory
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# Load environment variables
load_dotenv()


True

In [53]:

# API Configuration
API_CONFIGS = {
    "claude": {
        "model_name": "claude-3-5-sonnet-20241022",
        "max_tokens": 8192,
        "temperature": 0.7,
        "delay_between_requests": 1
    },
    "gemini": {
        "model": "models/gemini-1.5-pro-001",
        "max_tokens": 8192,
        "temperature": 0.7,
        "delay_between_requests": 2,
        "timeout": 60
    },
    "gpt": {
        "model": "gpt-4o",
        "max_tokens": 8192,
        "temperature": 0.7,
        "delay_between_requests": 2
    }
}

# System prompts
SYSTEM_PROMPTS = {
    "claude": "You are a Healthcare Standards Expert tasked with analyzing and refining FHIR Implementation Guide requirements.",
    "gemini": "Your role is to analyze and refine FHIR Implementation Guide requirements, focusing on making them concise, testable, and conformance-oriented.",
    "gpt": "As a Healthcare Standards Expert, analyze and refine FHIR Implementation Guide requirements to produce a concise, testable requirements list."
}


In [54]:

def setup_clients():
    """Initialize clients for each LLM service"""
    try:
        # Claude setup
        verify_path = '/opt/homebrew/etc/openssl@3/cert.pem'
        http_client = httpx.Client(
            verify=verify_path if os.path.exists(verify_path) else True,
            timeout=60.0
        )
        claude_client = Anthropic(
            api_key=os.getenv('ANTHROPIC_API_KEY'),
            http_client=http_client
        )
        
        # Gemini setup
        gemini_api_key = os.getenv('GEMINI_API_KEY')
        if not gemini_api_key:
            logger.warning("GEMINI_API_KEY not found")
            gemini_client = None
        else:
            gemini.configure(api_key=gemini_api_key)
            gemini_client = gemini.GenerativeModel(
                model_name=API_CONFIGS["gemini"]["model"],
                generation_config={
                    "max_output_tokens": API_CONFIGS["gemini"]["max_tokens"],
                    "temperature": API_CONFIGS["gemini"]["temperature"]
                }
            )
        
        # OpenAI setup
        openai_api_key = os.getenv('OPENAI_API_KEY')
        if not openai_api_key:
            logger.warning("OPENAI_API_KEY not found")
            openai_client = None
        else:
            openai_client = OpenAI(
                api_key=openai_api_key,
                timeout=60.0
            )
        
        return {
            "claude": claude_client,
            "gpt": openai_client,
            "gemini": gemini_client
        }
        
    except Exception as e:
        logger.error(f"Error setting up clients: {str(e)}")
        raise


In [55]:

def get_requirements_refinement_prompt(requirements_list: str) -> str:
    """
    Create the prompt for refining requirements list
    
    Args:
        requirements_list: The original list of requirements
        
    Returns:
        str: The prompt for the LLM
    """
    return f"""Your task is to review this list of FHIR Implementation Guide requirements and create a refined, concise list of only the testable requirements. Follow these guidelines carefully:

1. Produce a list of 15-50 clear, testable requirements that a conformance testing tool could verify.

2. Include ONLY requirements that:
   - Have explicit conformance language (SHALL, SHOULD, MAY, MUST, REQUIRED, etc.)
   - Describe specific, verifiable behavior or capability
   - Could be objectively tested through software testing or attestation

3. EXCLUDE the following types of content:
   - General introductory or conclusive/summarization comments
   - Implementation guidance or explanatory text
   - Examples or sample queries
   - Duplicate requirements (consolidate similar requirements)
   - Information about resource relationships without conformance statements
   - General structural information about profiles or resources
   - Requirements fragments that should be part of a single testable requirement

4. For each requirement, include:
   - A clear, concise statement of what MUST, SHOULD, MAY, SHALL, etc. be implemented
   - The actor responsible (Server, Client, Application, etc.)
   - The conformance level (SHALL, SHOULD, MAY, MUST, REQUIRED, etc.)

5. Format each requirement consistently:
   - Use active voice
   - Begin with the actor (e.g., "Server SHALL...")
   - Make each requirement atomic and independently testable
   - Ensure requirements are implementation-neutral

After filtering, verify that each requirement in your final list represents a discrete, testable capability or constraint that would be appropriate for conformance testing.

Keep the formatting of each requirement as follows- renumber requirement IDs as you keep requirements in a list, starting with 01:
    
    ---
    # REQ-XX
    **Summary**: [summary text]
    **Description**: "[description text]"
    **Verification**: [method]
    **Actor**: [actor]
    **Conformance**: [SHALL/SHOULD/MAY/etc.]
    **Conditional**: [True/False]
    **Source**: [reference]
    ---

Do not include any other text in the response output, besides the requirements list. 

Here is the list of requirements to refine:

{requirements_list}
"""


In [56]:

@retry(
    wait=wait_exponential(multiplier=2, min=4, max=360),
    stop=stop_after_attempt(8),
    retry=retry_if_exception_type((RateLimitError, TimeoutError))
)
def make_api_request(client, api_type: str, content: str) -> str:
    """Make API request with retries"""
    
    config = API_CONFIGS[api_type]
    prompt = get_requirements_refinement_prompt(content)
    
    try:
        if api_type == "claude":
            response = client.messages.create(
                model=config["model_name"],
                max_tokens=config["max_tokens"],
                messages=[{
                    "role": "user", 
                    "content": prompt
                }],
                system=SYSTEM_PROMPTS[api_type]
            )
            return response.content[0].text
            
        elif api_type == "gemini":
            response = client.generate_content(
                prompt,
                generation_config={
                    "max_output_tokens": config["max_tokens"],
                    "temperature": config["temperature"]
                }
            )
            if hasattr(response, 'text'):
                return response.text
            elif response.candidates:
                return response.candidates[0].content.parts[0].text
            else:
                raise ValueError("No response generated from Gemini API")
                    
        elif api_type == "gpt":
            response = client.chat.completions.create(
                model=config["model"],
                messages=[
                    {"role": "system", "content": SYSTEM_PROMPTS[api_type]},
                    {"role": "user", "content": prompt}
                ],
                max_tokens=config["max_tokens"],
                temperature=config["temperature"]
            )
            return response.choices[0].message.content
            
    except Exception as e:
        logger.error(f"Error in {api_type} API request: {str(e)}")
        raise


In [57]:

def refine_requirements(input_file: str, api_type: str = "claude") -> Dict[str, Any]:
    """
    Refine requirements using the specified API
    
    Args:
        input_file: Path to the input requirements list markdown file
        api_type: The API to use ("claude", "gemini", or "gpt")
        
    Returns:
        Dict containing processing results and path to refined requirements
    """
    logger.info(f"Starting requirements refinement with {api_type}")
    
    # Validate input file
    input_path = Path(input_file)
    if not input_path.exists():
        raise FileNotFoundError(f"Input file not found: {input_file}")
    
    # Read input requirements
    with open(input_path, 'r') as f:
        requirements_content = f.read()
    
    # Initialize API clients
    clients = setup_clients()
    if api_type not in clients or clients[api_type] is None:
        raise ValueError(f"API client for {api_type} is not available")
    
    client = clients[api_type]
    
    try:
        # Process the requirements
        logger.info(f"Sending requirements to {api_type} for refinement...")
        refined_requirements = make_api_request(client, api_type, requirements_content)
        
        # Generate output filename
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        output_filename = f"refined_requirements_{api_type}_{timestamp}.md"
        output_file_path = OUTPUT_DIR / output_filename
        
        # Save refined requirements
        with open(output_file_path, 'w') as f:
            f.write(refined_requirements)
        
        logger.info(f"Requirements refinement complete. Output saved to: {output_file_path}")
        
        return {
            "input_file": str(input_path),
            "output_file": str(output_file_path),
            "api_used": api_type,
            "timestamp": timestamp
        }
        
    except Exception as e:
        logger.error(f"Error refining requirements: {str(e)}")
        raise


In [58]:

def run_refinement():
    """Run the refinement process with user input"""
    print("\n" + "="*80)
    print("FHIR Requirements Refinement Tool")
    print("="*80)
    
    # Get input file path
    default_input_dir = CURRENT_DIR / "processed_output"
    if default_input_dir.exists():
        # List markdown files in the default directory
        md_files = list(default_input_dir.glob("*requirements_list*.md"))
        if md_files:
            print("\nAvailable requirements files:")
            for idx, file in enumerate(md_files, 1):
                print(f"{idx}. {file.name}")
            
            # Let user select from the list or enter a custom path
            selection = input("\nSelect a file number or enter a full path to a requirements file: ")
            
            if selection.isdigit() and 1 <= int(selection) <= len(md_files):
                input_file = str(md_files[int(selection)-1])
            else:
                input_file = selection
        else:
            input_file = input("\nEnter path to requirements markdown file: ")
    else:
        input_file = input("\nEnter path to requirements markdown file: ")
    
    # Select the API to use
    print("\nSelect the API to use:")
    print("1. Claude")
    print("2. Gemini")
    print("3. GPT-4")
    api_choice = input("Enter your choice (1-3, default 1): ") or "1"
    
    api_mapping = {
        "1": "claude",
        "2": "gemini",
        "3": "gpt"
    }
    
    api_type = api_mapping.get(api_choice, "claude")
    
    try:
        # Run the refinement
        print(f"\nProcessing requirements with {api_type.capitalize()}...")
        result = refine_requirements(input_file, api_type)
        
        print("\n" + "="*80)
        print("Requirements Refinement Complete!")
        print(f"Input file: {result['input_file']}")
        print(f"Refined requirements saved to: {result['output_file']}")
        print(f"API used: {result['api_used']}")
        print("="*80)
        
        return result
    
    except Exception as e:
        logger.error(f"Error: {str(e)}")
        print(f"\nError occurred during refinement: {str(e)}")
        print("Check the log for more details.")
        return None

if __name__ == "__main__":
    # The script can be imported and used without automatically executing
    print("Import this module and call run_refinement() to start the process")

Import this module and call run_refinement() to start the process


In [60]:
# Run the interactive version
result = run_refinement()


FHIR Requirements Refinement Tool

Available requirements files:
1. plan_net_gemini_requirements_list_20250402_132718.md
2. plan_net_gemini_requirements_list_20250312_141125.md
3. plan_net_gemini_requirements_list_20250402_124101.md
4. plan_net_claude_requirements_list_20250402_113844.md
5. plan_net_gpt_requirements_list_20250402_135527.md
6. plan_net_gemini_requirements_list_20250402_115052.md
7. plan_net_claude_requirements_list_20250402_144346.md
8. plan_net_gpt_requirements_list_20250402_130154.md
9. plan_net_claude_requirements_list_20250312_142943.md
10. da_vinci_pdex_plan_net_gemini_requirements_list_20250312_135622.md
11. plan_net_claude_requirements_list_20250319_115141.md
12. plan_net_gpt_requirements_list_20250402_123150.md
13. plan_net_gemini_requirements_list_20250402_145733.md
14. plan_net_gpt_requirements_list_20250319_113850.md
15. plan_net_gemini_requirements_list_20250402_133857.md
16. plan_net_claude_requirements_list_20250319_111650.md

Select the API to use:
1. Cl

2025-04-02 14:58:00,755 - __main__ - INFO - Starting requirements refinement with gemini
2025-04-02 14:58:00,784 - __main__ - INFO - Sending requirements to gemini for refinement...



Processing requirements with Gemini...


2025-04-02 14:58:27,474 - __main__ - INFO - Requirements refinement complete. Output saved to: /Users/ceadams/Documents/onclaive/onclaive/reqs_extraction/revised_reqs/refined_requirements_gemini_20250402_145827.md



Requirements Refinement Complete!
Input file: /Users/ceadams/Documents/onclaive/onclaive/reqs_extraction/processed_output/plan_net_gemini_20250402_145733.md
Refined requirements saved to: /Users/ceadams/Documents/onclaive/onclaive/reqs_extraction/revised_reqs/refined_requirements_gemini_20250402_145827.md
API used: gemini
