## FHIR Implementation Guide Testing Pipeline
This notebook provides a comprehensive pipeline for automatically extracting requirements from FHIR Implementation Guides and generating executable test suites. The pipeline transforms Implementation Guide (IG) documentation into structured test code that can validate FHIR server implementations.

#### Overview
This automated pipeline takes FHIR Implementation Guide documentation and produces comprehensive test suites through several integrated stages:

- Implementation Guide Preparation: Convert and clean IG HTML documentation to markdown format
- Requirements Extraction: Use AI to identify and extract testable requirements from the IG
- Requirements Refinement: Consolidate and refine the extracted requirements
- Requirements Downselection: Combine multiple requirement sets and remove duplicates
- Test Plan Generation: Convert requirements into detailed test specifications
- Test Kit Generation: Generate executable Inferno test code

#### Running this Notebook
The notebook is structured to run each stage sequentially. You can either:

- Run the complete pipeline: Execute all cells to process a complete IG
- Run individual stages: Execute specific sections as needed

Inputs and output directories can be customized for each step. The pipeline automatically saves intermediate outputs in checkpoint directories for review and iteration.

#### Output Structure
The pipeline generates organized outputs in checkpoint directories:

checkpoints/

├── markdown1/          # Converted markdown files

├── markdown2/          # Cleaned markdown files  

├── requirements_extraction/   # Initial AI-extracted requirements

├── revised_reqs_extraction/  # Refined requirements lists

├── requirements_downselect/  # Final consolidated requirements

├── testplan_generation/     # Detailed test specifications

└── testkit_generation/      # Executable Inferno test suites

Each stage preserves its outputs, allowing for iteration, review, and alternative processing paths.

## Setup

#### Importing required packages

In [None]:
import inspect
import json
import llm_utils
import importlib
import tiktoken
import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse
from glob import glob

## Initializing LLM Clients

In [None]:
importlib.reload(llm_utils)
llm_clients = llm_utils.LLMApiClient()

In [3]:
llm_clients.clients

{'claude': <anthropic.Anthropic at 0x105de6900>,
 'gemini': genai.GenerativeModel(
     model_name='models/gemini-1.5-pro',
     generation_config={'max_output_tokens': 8192, 'temperature': 0.3},
     safety_settings={<HarmCategory.HARM_CATEGORY_HARASSMENT: 7>: <HarmBlockThreshold.BLOCK_NONE: 4>, <HarmCategory.HARM_CATEGORY_HATE_SPEECH: 8>: <HarmBlockThreshold.BLOCK_NONE: 4>, <HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: 9>: <HarmBlockThreshold.BLOCK_NONE: 4>, <HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: 10>: <HarmBlockThreshold.BLOCK_NONE: 4>},
     tools=None,
     system_instruction=None,
     cached_content=None
 ),
 'gpt': <openai.OpenAI at 0x110923d40>}

## Implementation Guide Preparation

### Stage 1: Text Extraction and Cleaning
- Converts HTML IG files to markdown format
- Cleans unnecessary content (navigation, headers, formatting artifacts)
- Prepares clean, structured text for AI processing

Inputs: HTML files from FHIR IG downloads

Outputs: Clean markdown files

#### 1a) HTML to Markdown Conversion

In [4]:
import html_narrative_extractor_01 #import html extractor module

# Process directory with default settings
result = html_narrative_extractor_01.convert_local_html_to_markdown(
    input_dir="../us-core/test_set", #input directory of downloaded IG HTML files
    output_dir="checkpoints/demo/markdown1/" #output directory
)

Found 1 HTML files to process
Processed 1/1 files
Conversion complete. Successfully processed 1 files. Encountered 0 errors.


#### 1b) Markdown Post-processing

In [5]:
import markdown_cleaner_02 #import markdown cleaner module
markdown_cleaner_02.process_directory(
    input_dir="checkpoints/demo/markdown1", #input directory of IG markdown files
    output_dir="checkpoints/demo/markdown2/") #output directory

Found 1 markdown files in checkpoints/demo/markdown1
Cleaned and saved: checkpoints/demo/markdown2/CapabilityStatement-us-core-server.md

Processing complete: 1 files successfully cleaned, 0 failed


{'total_files': 1, 'successful': 1, 'failed': 0, 'failed_files': []}

## Stage 2: Requirements Extraction

### 2a) Prompt-based Requirement Extraction
LLM Requirements Identification
- Processes markdown files using LLM to extract clear, testable requirements
- Formats requirements according to set standards, following INCOSE guidance
- Generates structured requirements with IDs, descriptions, actors, and conformance levels
- Handles large documents through chunking
- Provides source tracking

Inputs: Cleaned IG markdown files

Outputs: Structured requirements list as markdown file

In [6]:
import reqs_extraction_03 #import LLM requirements extraction module
importlib.reload(reqs_extraction_03)

<module 'reqs_extraction_03' from '/Users/ceadams/Documents/onclaive/onclaive/pipeline/reqs_extraction_03.py'>

In [7]:
reqs_extraction_03.run_requirements_extractor(
    markdown_dir='checkpoints/demo/condition_profile', #input directory of markdown files
    output_dir='checkpoints/demo/requirements_extraction/', #output directory
    api_type= 'claude', #set API type
    client_instance=llm_clients) #initialize llm clients


Processing Implementation Guide with Claude...
This may take several minutes depending on the size of the Implementation Guide.

[1/1] Processing single file: ConditionProfile-CapabilityStatement-us-core-server.md
    Processing chunk 1/1

KeyboardInterrupt: 

### 2b) Requirements Refinement
LLM-Based Requirements Review & Consolidation
- Filters and identifies only testable requirements from raw extractions
- Consolidates duplicate requirements and merges related ones
- Applies consistent formatting and structure
- Removes non-testable assertions and narrative content

Inputs: Raw requirements from extraction stage in markdown format

Outputs: Refined requirements list in markdown format

In [8]:
# import requirements refinement script as module
import reqs_reviewer_04
importlib.reload(reqs_reviewer_04)

<module 'reqs_reviewer_04' from '/Users/ceadams/Documents/onclaive/onclaive/pipeline/reqs_reviewer_04.py'>

In [9]:
result = reqs_reviewer_04.run_batch_requirements_refinement(
    input_file="checkpoints/demo/requirements_extraction/reqs_list_v1.md", #input requirements list markdown file
    output_dir="checkpoints/demo/requirements_revision/", #output directory   
    client_instance=llm_clients,  #initialize llm clients
    batch_size=25,  #set batch size
    api_type="claude"  #set API type
)

STARTING BATCH PROCESSING
Input: checkpoints/demo/requirements_extraction/reqs_list_v1.md
Output: checkpoints/demo/requirements_revision/
Batch size: 25 requirements
API: claude

File size: 12,817 characters
Splitting requirements...
Found 30 total requirements
Will process in 2 batches

BATCH 1/2
   Requirements: 25 (#1-#25)
   Size: 10,648 chars (~2,662 tokens)


ERROR:root:Error in claude API request: Error code: 500 - {'type': 'error', 'error': {'type': 'api_error', 'message': 'Internal server error'}, 'request_id': None}


   Completed in 208.8s
   Pausing 2s...
   Progress: 1/2 (50.0%)
   ETA: 3.5 minutes remaining

BATCH 2/2
   Requirements: 5 (#26-#30)
   Size: 2,167 chars (~541 tokens)
   Completed in 7.0s
   Progress: 2/2 (100.0%)
   ETA: 0.0 minutes remaining

COMBINING RESULTS
--------------------
Merging batch results and renumbering...
   Processing batch 1 results...
   Processing batch 2 results...
   Renumbered 30 requirements
BATCH PROCESSING COMPLETE!
Output saved: checkpoints/demo/requirements_revision/claude_refined_requirements_20250901_212912.md
Original requirements: 30
Final requirements: 30
Successful batches: 2/2
Failed batches: 0/2
Total time: 3.6 minutes
Average per batch: 108.9 seconds


### 2c) Requirements Downselection
- Combines multiple requirements lists from different extraction runs
- Uses semantic similarity analysis to identify and remove duplicates
- Creates a deduplicated final requirements set

Inputs: Multiple refined requirements files in markdown or JSON format
Outputs: Final consolidated requirements in markdown or JSON format

In [10]:
import reqs_downselect_05
importlib.reload(reqs_downselect_05)

<module 'reqs_downselect_05' from '/Users/ceadams/Documents/onclaive/onclaive/pipeline/reqs_downselect_05.py'>

In [11]:
md_files_list=reqs_downselect_05.get_md_files_from_directory("checkpoints/demo/requirements_revision/")

reqs_downselect_05.full_pass(
    md_files=md_files_list,
    output_dir="checkpoints/demo/requirements_downselect"
)

Found 4 files matching '*.md' in checkpoints/demo/requirements_revision/
REQUIREMENT DEDUPLICATION PIPELINE
Markdown files: 4
RAG files: 0
Similarity threshold: 0.98
Output format: markdown

Loading requirements from files...
  Processing: checkpoints/demo/requirements_revision/10claude_refined_requirements_20250829_102851.md
    Loaded 10 requirements
  Processing: checkpoints/demo/requirements_revision/claude_refined_requirements_20250829_102328.md
    Loaded 35 requirements
  Processing: checkpoints/demo/requirements_revision/claude_refined_requirements_20250829_102851.md
    Loaded 45 requirements
  Processing: checkpoints/demo/requirements_revision/claude_refined_requirements_20250901_212912.md
    Loaded 30 requirements

Total requirements loaded: 120
Loading sentence transformer model...
Generating embeddings for 120 requirements...
  Completed embeddings for 120 requirements
Calculating similarity scores...
  Completed 14400 similarity calculations
Finding duplicate groups with

{'input_files': {'markdown': ['checkpoints/demo/requirements_revision/10claude_refined_requirements_20250829_102851.md',
   'checkpoints/demo/requirements_revision/claude_refined_requirements_20250829_102328.md',
   'checkpoints/demo/requirements_revision/claude_refined_requirements_20250829_102851.md',
   'checkpoints/demo/requirements_revision/claude_refined_requirements_20250901_212912.md']},
 'original_count': 120,
 'duplicates_removed': 50,
 'final_count': 70,
 'threshold': 0.98,
 'output_format': 'markdown',
 'output_files': ['checkpoints/demo/requirements_downselect/consolidated_reqs.md'],
 'output_dir': 'checkpoints/demo/requirements_downselect'}

## Stage 3: Test Plan Generation
- Transforms requirements into detailed test specifications
- Analyzes IG capability statements for context
- Generates implementation strategies with specific FHIR operations
- Creates structured test plans with validation criteria

Inputs: Refined requirements and IG capability statements in markdown format

Outputs: Detailed test plan in markdown format

In [13]:
import logging
llm_clients.logger.setLevel(logging.INFO)

In [None]:
import warnings
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

# Set logging level to reduce noise
logging.getLogger("urllib3.connectionpool").setLevel(logging.ERROR)
logging.getLogger("backoff").setLevel(logging.ERROR)

import test_plan_06 #import test plan generation script as module
importlib.reload(test_plan_06)

#clearing any existing capability statements from vector database
test_plan_06.clear_capability_collection("capability_statements")

No existing collection found: capability_statements


INFO:backoff:Backing off send_request(...) for 0.2s (requests.exceptions.SSLError: HTTPSConnectionPool(host='us.i.posthog.com', port=443): Max retries exceeded with url: /batch/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)'))))
INFO:backoff:Backing off send_request(...) for 1.7s (requests.exceptions.SSLError: HTTPSConnectionPool(host='us.i.posthog.com', port=443): Max retries exceeded with url: /batch/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)'))))
INFO:backoff:Backing off send_request(...) for 0.6s (requests.exceptions.SSLError: HTTPSConnectionPool(host='us.i.posthog.com', port=443): Max retries exceeded with url: /batch/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local iss

In [18]:
test_plan_06.generate_consolidated_test_plan(
    client_instance=llm_clients, 
    api_type='claude',
    requirements_file="checkpoints/demo/requirements_downselect/consolidated_reqs.md", #input requirements list markdown file
    capability_statement_file="checkpoints/markdown2/CapabilityStatement-us-core-server.md", 
    ig_name="US Core IG", 
    output_dir='checkpoints/demo/testplan_generation/', 
    verbose=True)


FHIR TEST PLAN GENERATION
Implementation Guide: US Core IG
Requirements file: checkpoints/demo/requirements_downselect/consolidated_reqs.md
Capability file: checkpoints/markdown2/CapabilityStatement-us-core-server.md
API: claude
Output directory: checkpoints/demo/testplan_generation

Loading requirements...
Loaded 11 requirements
Setting up capability knowledge base...
Capability knowledge base ready

Grouping requirements...
  Analyzing requirement 1/11: C

INFO:backoff:Backing off send_request(...) for 0.1s (requests.exceptions.SSLError: HTTPSConnectionPool(host='us.i.posthog.com', port=443): Max retries exceeded with url: /batch/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)'))))
INFO:backoff:Backing off send_request(...) for 0.6s (requests.exceptions.SSLError: HTTPSConnectionPool(host='us.i.posthog.com', port=443): Max retries exceeded with url: /batch/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)'))))
INFO:backoff:Backing off send_request(...) for 0.1s (requests.exceptions.SSLError: HTTPSConnectionPool(host='us.i.posthog.com', port=443): Max retries exceeded with url: /batch/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local iss

  Completed grouping 11 requirements                    

Requirements organized into 2 groups:
  • I cannot analyze this requirement because all the fields (Summary, Text, Context, Verification, Actor, Conformance, Conditional, and Source) are empty. Without any content describing what the requirement actually specifies, it's impossible to determine which resource profile or category it belongs to.: 1 requirements
  • Condition: 10 requirements

Generating test specifications...

[Condition] Processing 10 requirements...
  [1/11] REQ-001
Processing REQ-001: US Core server should support vread and history-instance interactions for Condition

RAG RETRIEVAL FOR REQ-001
Query: US Core server should support vread and history-instance interactions for Condition "SHOULD support `vread`, `history-instance`." Defining recommended interaction capabilities for Condition resources in US Core Server CapabilityStatement US Core Server FHIR SHOULD
Searching for 2 most relevant capability chunks...



{'requirements_count': 11,
 'group_count': 2,
 'test_plan_path': 'checkpoints/demo/testplan_generation/test_plan.md'}

## Stage 4: Test Kit Generation
- Converts test specifications into executable Inferno Ruby tests
- Generates complete test suites with proper file organization
- Creates modular test structures following Inferno framework patterns
- Includes validation and alignment checking

Inputs: Test plan specification in markdown format

Outputs: Complete Inferno test kit

In [24]:
import test_kit_07
importlib.reload(test_kit_07)

<module 'test_kit_07' from '/Users/ceadams/Documents/onclaive/onclaive/pipeline/test_kit_07.py'>

Without LLM Self Evaluation

In [25]:
# Faster generation- no LLM self evaluation
test_kit_07.generate_inferno_test_kit(
    client_instance=llm_clients, #initialize llm clients
    api_type='claude',  #set API
    test_plan_file='checkpoints/demo/testplan_generation/test_plan.md',  #input test plan file
    ig_name='US Core',
    output_dir='checkpoints/demo/testkit_generation/',
    enable_validation=False  #disable LLM self evaluation
)


INFERNO TEST KIT GENERATION
Module: US Core
Test plan: checkpoints/demo/testplan_generation/test_plan.md
API: claude
LLM Validation: Disabled

Parsing test plan...
Found 10 requirements in test plan
Processing requirement: REQ-001
Added requirement REQ-001 to section Condition
Processing requirement: REQ-002
Added requirement REQ-002 to section Condition
Processing requirement: REQ-003
Added requirement REQ-003 to section Condition
Processing requirement: REQ-004
Added requirement REQ-004 to section Condition
Processing requirement: REQ-005
Added requirement REQ-005 to section Condition
Processing requirement: REQ-006
Added requirement REQ-006 to section Condition
Processing requirement: REQ-007
Added requirement REQ-007 to section Condition
Processing requirement: REQ-008
Added requirement REQ-008 to section Condition
Processing requirement: REQ-009
Added requirement REQ-009 to section Condition
Processing requirement: REQ-010
Added requirement REQ-010 to section Condition
Final sect



  Generated 10 tests

Generated 10 total tests
Writing test files...
  Created section: condition with 10 tests
Analyzing generated test files...
Generating main module file...
Skipping alignment validation (disabled)

TEST KIT GENERATION COMPLETE!
Output directory: checkpoints/demo/testkit_generation/claude_testkit_20250901_215109
Module file: claude_us_core_20250901_215109.rb
Total sections: 1
Total requirements: 10
Generated tests: 10
LLM validation: Disabled


{'total_sections': 1,
 'total_requirements': 10,
 'generated_tests': 10,
 'module_dir': 'checkpoints/demo/testkit_generation/claude_testkit_20250901_215109/us_core',
 'module_file': 'checkpoints/demo/testkit_generation/claude_testkit_20250901_215109/claude_us_core_20250901_215109.rb',
 'output_dir': 'checkpoints/demo/testkit_generation/claude_testkit_20250901_215109',
 'timestamp': '20250901_215109',
 'validation_enabled': False}

With LLM Self Evaluation

In [26]:
# Thorough generation- with LLM self evaluation
test_kit_07.generate_inferno_test_kit(
    client_instance=llm_clients, #initialize llm clients
    api_type='claude',  #set API
    test_plan_file='checkpoints/demo/testplan_generation/test_plan.md',  #input test plan file
    ig_name='US Core',
    output_dir='checkpoints/demo/testkit_generation/',
    enable_validation=True  #enable LLM self evaluation
)


INFERNO TEST KIT GENERATION
Module: US Core
Test plan: checkpoints/demo/testplan_generation/test_plan.md
API: claude
LLM Validation: Enabled

Parsing test plan...
Found 10 requirements in test plan
Processing requirement: REQ-001
Added requirement REQ-001 to section Condition
Processing requirement: REQ-002
Added requirement REQ-002 to section Condition
Processing requirement: REQ-003
Added requirement REQ-003 to section Condition
Processing requirement: REQ-004
Added requirement REQ-004 to section Condition
Processing requirement: REQ-005
Added requirement REQ-005 to section Condition
Processing requirement: REQ-006
Added requirement REQ-006 to section Condition
Processing requirement: REQ-007
Added requirement REQ-007 to section Condition
Processing requirement: REQ-008
Added requirement REQ-008 to section Condition
Processing requirement: REQ-009
Added requirement REQ-009 to section Condition
Processing requirement: REQ-010
Added requirement REQ-010 to section Condition
Final secti



  Generated 10 tests

Generated 10 total tests
Writing test files...
  Created section: condition with 10 tests
Analyzing generated test files...
Generating main module file...
Performing alignment validation...
  Alignment validation completed successfully
  Applied fixes to module file

TEST KIT GENERATION COMPLETE!
Output directory: checkpoints/demo/testkit_generation/claude_testkit_20250901_215618
Module file: claude_us_core_20250901_215618.rb
Total sections: 1
Total requirements: 10
Generated tests: 10
LLM validation: Enabled


{'total_sections': 1,
 'total_requirements': 10,
 'generated_tests': 10,
 'module_dir': 'checkpoints/demo/testkit_generation/claude_testkit_20250901_215618/us_core',
 'module_file': 'checkpoints/demo/testkit_generation/claude_testkit_20250901_215618/claude_us_core_20250901_215618.rb',
 'output_dir': 'checkpoints/demo/testkit_generation/claude_testkit_20250901_215618',
 'timestamp': '20250901_215618',
 'validation_enabled': True}