## Meta-Summarization of IG Documents
This script aims to develop a prompt chain structure to send large amounts of text/content to LLM APIs through multiple calls. 

The current approach takes in all JSON files from the Plan Net IG, all figure diagrams, and key narrative information in markdown form (formerly extracted from HTML files). The script then summarizes each type of information in batches, and creates a meta-summarization of all documents to outline the technical information it can glean from all submitted documentation. The goal is to identify if this approach can produce all technical information at an appropriate level of deatil that an LLM would need to know to help design a test kit for a given IG. Best practices from the Claude API were used.

Current status: We were able to run through the script fully one time using the Claude API with all JSONs, markdown content, and images. The process took over 93 minutes. The summarization is saved in the file final_technical_analysis.md and is pasted at the end of this script. We can see that the first iteration did not produce detailed enough information about requirements, etc. 

Need to do: Further work is needed to edit the prompts to require that that level of detail is kept in the outputs of each summarization step. In addition, this process will have to be configured for the Gemini and GPT APIs as well for comparison. We will also still need to revise this work to allow for additional prompting for confirming understanding of an IG, for test kit development, and for documents to be included, such as a 'golden rules' document.


### Functions of Notebook:
JSON file organization:
- Copies relevant JSON files from source to working directory
- Groups related files based on their names
- Creates organized folder structure
- Excludes certain file types (like .ttl.json, .jsonld.json)

JSON Consolidation:
- Combines related JSON files into single consolidated files
- Creates files like Organization_combined.json, StructureDefinition_combined.json, etc.

JSON Processing:
- Splits large JSON files into manageable chunks
- Processes each chunk through Claude
- Combines chunk analyses into coherent summaries

Markdown processing:
- Processes documentation files
- Extracts key technical information
- Creates summaries of documentation content

Image processing:
- Processes technical diagrams and figures
- Creates descriptions of visual technical content

Meta-analysis creation:
- Combines all processed information
- Creates comprehensive technical analysis covering:
    - Technical requirements and architecture
    - Implementation details
    - Visual documentation analysis

Output generation:
- Creates output directory
- Saves final analysis as markdown file
- Includes comprehensive technical documentation

### Setup

In [31]:
# import packages
import base64
import json
from typing import List, Dict, Tuple, Union, Optional
from dataclasses import dataclass
import os
import time
import threading
from IPython.display import Image
import math
import os
#import google.generativeai as gemini
#from openai import OpenAI
import io, threading, time, re
import pandas as pd
from json_repair import repair_json
from langchain_community.document_loaders import BSHTMLLoader
import shutil
from dotenv import load_dotenv
import httpx
from collections import defaultdict
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from tenacity import retry, wait_exponential, stop_after_attempt, retry_if_exception_type
from anthropic import RateLimitError
from anthropic import Anthropic


Read in API keys for Claude, Gemini, and GPT from .env file

In [32]:
load_dotenv()

claude_api_key = os.getenv('ANTHROPIC_API_KEY')
#gemini_api_key = os.getenv('GEMINI_API_KEY')
#OpenAI.api_key = os.getenv('OPENAI_API_KEY')

Set up Claude API

In [33]:
claude = Anthropic(api_key = claude_api_key)
claude_version = "claude-3-5-sonnet-20240620"  # "claude-3-opus-20240229"   "claude-3-5-sonnet-20240620" "claude-3-sonnet-20240229" "claude-3-haiku-20240307"
claude_max_output_tokens = 8192  # claude 3 opus is only 4096 tokens, sonnet is 8192

In [34]:
#CERT_PATH = '/Users/amathur/ca-certificates.crt'
CERT_PATH = '/opt/homebrew/etc/openssl@3/cert.pem'

In [35]:
def create_anthropic_client():
    """Create Anthropic client with proper certificate verification"""
    verify_path = CERT_PATH if os.path.exists(CERT_PATH) else True
    http_client = httpx.Client(
        verify=verify_path,
        timeout=30.0
    )
    return Anthropic(
        api_key=claude_api_key,
        http_client=http_client
    )

### Pulling in files of interest

In [36]:
source_folder = 'full-ig/site'
destination_folder = 'full-ig/json_only'

In [42]:
def copy_json_files():
    """
    Copy JSON files from full-ig/site to full-ig/json_only directory,
    excluding compound extensions and creating the directory if needed
    """
    source_folder = 'full-ig/site'
    destination_folder = 'full-ig/json_only'

    # Create the destination folder if it doesn't exist
    if not os.path.exists(destination_folder):
        os.makedirs(destination_folder)

    json_files = []
    for file_name in os.listdir(source_folder):
        # Check if the file ends with .json but not with compound extensions
        if file_name.endswith('.json') and not (file_name.endswith('.ttl.json') or 
                                             file_name.endswith('.jsonld.json') or 
                                             file_name.endswith('.xml.json') or 
                                             file_name.endswith('.change.history.json')):
            json_files.append(file_name)
            # Copy the file to the destination folder
            shutil.copy(os.path.join(source_folder, file_name), destination_folder)
            
    logging.info(f"Copied {len(json_files)} JSON files to {destination_folder}")
    return json_files

def group_files_by_base_name(directory_path, delimiter='-'):
    """
    Group files in the directory by their base name (portion before a delimiter).
    
    Args:
    directory_path (str): Path to the directory containing files.
    delimiter (str): The delimiter to split the file name on (default is '-').

    Returns:
    dict: A dictionary where keys are base names and values are lists of files that share the same base name.
    """
    grouped_files = defaultdict(list)
    
    # Iterate through the files in the directory
    for filename in os.listdir(directory_path):
        if filename.endswith('.json'):  # Only process .json files
            if delimiter in filename:  # Only consider files with the delimiter
                # Get the base name (before the first delimiter)
                base_name = filename.split(delimiter)[0]
                
                # Append the file to the group corresponding to its base name
                grouped_files[base_name].append(filename)
    
    return grouped_files

def copy_files_to_folders(directory_path, grouped_files):
    """
    Copy files to folders if the base name group has more than 1 file,
    and remove them from the original directory.
    
    Args:
    directory_path (str): Path to the directory containing files.
    grouped_files (dict): Dictionary of grouped files by base name.
    """
    for base_name, files in grouped_files.items():
        if len(files) >= 1:  # Process groups with one or more files
            # Create a folder for the base name in the same directory
            base_folder = os.path.join(directory_path, base_name)
            if not os.path.exists(base_folder):
                os.makedirs(base_folder)  # Create the folder if it doesn't exist
            logging.info(f"Created folder: {base_folder}")
            
            # Copy each file in the group to the new folder
            for file in files:
                source_file = os.path.join(directory_path, file)
                destination_file = os.path.join(base_folder, file)
                shutil.copy(source_file, destination_file)  # Copy the file

### Preparing JSON Files for LLM

In [44]:
def split_json(json_data, max_size=2000):
    """
    Split JSON array into chunks while maintaining complete JSON objects
    Returns list of chunks, where each chunk contains complete JSON objects
    """
    if isinstance(json_data, dict):
        json_data = [json_data]
    
    chunks = []
    current_chunk = []
    current_size = 0
    
    for item in json_data:
        item_size = len(json.dumps(item))
        
        # Handle large individual items
        if item_size > max_size:
            if current_chunk:
                chunks.append(current_chunk)
                current_chunk = []
                current_size = 0
            chunks.append([item])
            continue
        
        # Start new chunk if current would exceed max_size
        if current_size + item_size > max_size and current_chunk:
            chunks.append(current_chunk)
            current_chunk = []
            current_size = 0
        
        current_chunk.append(item)
        current_size += item_size
    
    if current_chunk:
        chunks.append(current_chunk)
        
    return chunks

def prepare_json_for_processing(json_file_path):
    """Read and prepare JSON file for processing"""
    with open(json_file_path, 'r') as f:
        data = json.load(f)
        
    if isinstance(data, dict) and 'entry' in data:
        return data['entry']
    return data

def create_json_summary_prompt(chunk, chunk_num, total_chunks):
    """Create prompt for summarizing JSON chunk"""
    return f"""Analyze this portion ({chunk_num} of {total_chunks}) of a FHIR Implementation Guide JSON resource bundle.
    Focus on key technical details, requirements, and relationships.
    
    JSON Content:
    {json.dumps(chunk, indent=2)}
    
    Please provide:
    1. Resource Types and Profiles present
    2. Key technical requirements and constraints
    3. Dependencies and relationships between resources
    4. Notable patterns or unique configurations
    
    Focus on new information not covered in previous chunks."""



def consolidate_jsons(base_directory='full-ig/json_only'):
    """Consolidate related JSON files while maintaining object integrity"""
    subdirs = [d for d in os.listdir(base_directory) 
              if os.path.isdir(os.path.join(base_directory, d))]
    
    for subdir in subdirs:
        folder_path = os.path.join(base_directory, subdir)
        combined_data = []
        
        for filename in os.listdir(folder_path):
            if filename.endswith('.json'):
                file_path = os.path.join(folder_path, filename)
                try:
                    with open(file_path, 'r') as f:
                        json_content = json.load(f)
                        if isinstance(json_content, dict) and 'entry' in json_content:
                            combined_data.extend(json_content['entry'])
                        else:
                            combined_data.append(json_content)
                except json.JSONDecodeError as e:
                    logging.error(f"Error decoding JSON from {filename}: {e}")
                    continue
        
        if combined_data:
            output_filename = f"{subdir}_combined.json"
            output_path = os.path.join(base_directory, output_filename)
            
            try:
                with open(output_path, 'w') as outfile:
                    json.dump({
                        "resourceType": subdir,
                        "total": len(combined_data),
                        "entry": combined_data
                    }, outfile, indent=2)
                logging.info(f"Created {output_filename} with {len(combined_data)} entries")
            except Exception as e:
                logging.error(f"Error writing {output_filename}: {e}")

def encode_image(image_path):
    """Convert image to base64 encoding for API consumption"""
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')



#### Defining Rate Limiting & Safe Call Functions

In [None]:
def create_rate_limiter(max_requests_per_minute=50):
    """Create a simple rate limiter"""
    class RateLimiter:
        def __init__(self):
            self.requests = []
            self.max_requests = max_requests_per_minute
            self.time_window = 60  # seconds

        def wait_if_needed(self):
            now = time.time()
            # Remove requests older than our time window
            self.requests = [req_time for req_time in self.requests 
                           if now - req_time < self.time_window]
            
            if len(self.requests) >= self.max_requests:
                # Wait until oldest request expires
                sleep_time = self.time_window - (now - self.requests[0])
                if sleep_time > 0:
                    time.sleep(sleep_time)
                self.requests = self.requests[1:]
            
            self.requests.append(now)
            
    return RateLimiter()

# Create a global rate limiter
rate_limiter = create_rate_limiter(max_requests_per_minute=25)  # More conservative limit

@retry(
    wait=wait_exponential(multiplier=1, min=4, max=60),
    stop=stop_after_attempt(5),
    retry=retry_if_exception_type((RateLimitError, TimeoutError))
)
def safe_claude_request(client, model_name, messages, max_tokens=8192, stop_sequences=None):
    """Make a rate-limited request to Claude with retries"""
    rate_limiter.wait_if_needed()
    try:
        return client.messages.create(
            model=model_name,
            messages=messages,
            max_tokens=max_tokens,
            stop_sequences=stop_sequences
        )
    except Exception as e:
        logging.error(f"Error in Claude request: {str(e)}")
        raise

#### Setting up Processing Functions

In [None]:

def summarize_markdown(client, content, model_name="claude-3-5-sonnet-20240620"):
    """Process markdown content and generate a technical summary"""
    try:
        # Create a prompt for markdown analysis
        prompt = f"""Analyze this technical documentation markdown content:

        {content}

        Please provide:
        1. Key technical concepts and definitions
        2. Important requirements and specifications
        3. Technical workflows or processes described
        4. Any dependencies or prerequisites mentioned
        5. Notable implementation details or guidelines

        Focus on extracting the most important technical information."""

        response = safe_claude_request(
            client,
            model_name,
            messages=[
                {"role": "user", "content": prompt},
                {"role": "assistant", "content": "Here is the technical summary: <summary>"}
            ],
            max_tokens=8192,
            stop_sequences=["</summary>"]
        )
        
        return response.content[0].text
        
    except Exception as e:
        logging.error(f"Error processing markdown content: {str(e)}")
        return "Error processing markdown content: " + str(e)
    
def clean_markdown(text):
    """Clean markdown content by removing unnecessary whitespace and formatting"""
    # Remove multiple newlines
    text = re.sub(r'\n\s*\n', '\n\n', text)
    
    # Remove HTML comments
    text = re.sub(r'<!--.*?-->', '', text, flags=re.DOTALL)
    
    # Remove excessive punctuation 
    text = re.sub(r'\.{2,}', '.', text)
    
    # Remove escaped characters
    text = re.sub(r'\\(.)', r'\1', text)
    
    # Remove table formatting but keep content
    text = re.sub(r'\|', ' ', text)
    text = re.sub(r'[-\s]*\n[-\s]*', '\n', text)
    
    return text.strip()


def combine_summaries(client, summaries, model_name="claude-3-5-sonnet-20240620"):
    """Combine chunk summaries into a cohesive analysis"""
    try:
        prompt = f"""Synthesize these related summaries into a unified technical analysis:

        {json.dumps(summaries, indent=2)}
        
        Create a comprehensive analysis that:
        1. Eliminates redundant information
        2. Maintains technical accuracy
        3. Includes specific technical information about search parameters for resource types, resource profiles, Must Supports and other requirements
        4. Preserves important relationships, such as specific process flows outlined in image diagrams"""
        
        response = safe_claude_request(
            client,
            model_name,
            messages=[
                {"role": "user", "content": prompt},
                {"role": "assistant", "content": "Here is the combined analysis: <summary>"}
            ],
            max_tokens=8192,
            stop_sequences=["</summary>"]
        )
        
        return response.content[0].text
    except Exception as e:
        logging.error(f"Error combining summaries: {str(e)}")
        return "Unable to combine summaries due to error"
    
def process_json_file(client, json_file_path, model_name="claude-3-5-sonnet-20240620"):
    """Process a JSON file while maintaining object integrity"""
    try:
        json_data = prepare_json_for_processing(json_file_path)
        chunks = split_json(json_data)
        chunk_summaries = []
        
        for i, chunk in enumerate(chunks):
            prompt = create_json_summary_prompt(chunk, i+1, len(chunks))
            try:
                response = safe_claude_request(
                    client,
                    model_name,
                    messages=[
                        {"role": "user", "content": prompt},
                        {"role": "assistant", "content": "Here is the technical summary: <summary>"}
                    ],
                    stop_sequences=["</summary>"]
                )
                chunk_summaries.append(response.content[0].text)
                # Add small delay between chunks
                time.sleep(2)  # Added delay between chunks
            except Exception as e:
                logging.error(f"Error processing chunk {i+1} of {len(chunks)} for {json_file_path}: {str(e)}")
                continue
        
        if not chunk_summaries:
            return "Unable to process file due to errors"
        
        return combine_summaries(client, chunk_summaries, model_name)
    except Exception as e:
        logging.error(f"Error processing file {json_file_path}: {str(e)}")
        return f"Error processing file: {str(e)}"


def process_image(client, image_path, model_name="claude-3-5-sonnet-20240620"):
    """Process a single image and generate a technical description"""
    try:
        base64_image = encode_image(image_path)
        
        messages = [
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": "image/" + image_path.split('.')[-1],
                            "data": base64_image
                        }
                    },
                    {
                        "type": "text",
                        "text": "Analyze this technical diagram/figure. Focus on:\n1. Key components and their relationships\n2. Technical workflows or processes shown\n3. Architecture or design patterns illustrated\n4. Important technical details or annotations\nProvide a detailed technical description."
                    }
                ]
            },
            {
                "role": "assistant",
                "content": "Here is the technical analysis of the image: <summary>"
            }
        ]

        response = safe_claude_request(
            client,
            model_name,
            messages=messages,
            max_tokens=8192,
            stop_sequences=["</summary>"]
        )
        
        return response.content[0].text
        
    except Exception as e:
        logging.error(f"Error processing image {image_path}: {str(e)}")
        raise

def process_content(client, file_path, content_type, model_name="claude-3-5-sonnet-20240620"):
    """Generic content processor that handles any content type"""
    try:
        if content_type == "json":
            json_data = prepare_json_for_processing(file_path)
            chunks = split_json(json_data)
            summaries = []
            
            for i, chunk in enumerate(chunks):
                prompt = create_json_summary_prompt(chunk, i+1, len(chunks))
                response = safe_claude_request(
                    client,
                    model_name,
                    messages=[
                        {"role": "user", "content": prompt},
                        {"role": "assistant", "content": "Here is the technical summary: <summary>"}
                    ],
                    stop_sequences=["</summary>"]
                )
                summaries.append(response.content[0].text)
                time.sleep(2)
            
            return combine_summaries(client, summaries) if summaries else "No content processed"
            
        elif content_type == "markdown":
            with open(file_path) as f:
                content = clean_markdown(f.read())
            return summarize_markdown(client, content)
            
        elif content_type == "image":
            return process_image(client, file_path)
            
    except Exception as e:
        logging.error(f"Error processing {content_type} file {file_path}: {str(e)}")
        return f"Error processing file: {str(e)}"
    

def process_batch(client, files, content_type, batch_size=3):
    """Process any type of files in batches"""
    results = {}
    for i in range(0, len(files), batch_size):
        batch = files[i:i + batch_size]
        for file in batch:
            try:
                results[file] = process_content(client, file, content_type)
            except Exception as e:
                logging.error(f"Error processing {file}: {str(e)}")
                results[file] = f"Error: {str(e)}"
        time.sleep(10)  # Delay between batches
    return results



def process_all_content(client, base_directory='full-ig'):
    """Process all content types using unified batch processing"""
    try:
        # Process JSONs
        json_files = [
            os.path.join(base_directory, 'json_only', f) 
            for f in os.listdir(os.path.join(base_directory, 'json_only')) 
            if f.endswith('_combined.json')
        ]
        json_summaries = process_batch(client, json_files, "json", batch_size=3)
        
        # Process markdown files
        markdown_summaries = {}
        markdown_dir = os.path.join(base_directory, 'markdown')
        if os.path.exists(markdown_dir):
            md_files = [
                os.path.join(markdown_dir, f) 
                for f in os.listdir(markdown_dir) 
                if f.endswith('.md')
            ]
            markdown_summaries = process_batch(client, md_files, "markdown", batch_size=3)
        
        # Process images
        image_summaries = {}
        image_dir = os.path.join(base_directory, 'site/Figures')
        if os.path.exists(image_dir):
            img_files = [
                os.path.join(image_dir, f) 
                for f in os.listdir(image_dir) 
                if f.lower().endswith(('.png', '.jpg', '.jpeg'))
            ]
            image_summaries = process_batch(client, img_files, "image", batch_size=2)
        
        time.sleep(5)  # Delay before meta-summary
        
        return create_meta_summary(
            client, 
            json_summaries, 
            markdown_summaries, 
            image_summaries
        )
        
    except Exception as e:
        logging.error(f"Error processing content: {str(e)}")
        raise


def create_meta_summary(client, json_summaries, markdown_summaries, image_summaries, model_name="claude-3-5-sonnet-20240620"):
    """Create a comprehensive meta-summary incorporating all content types"""
    try:
        prompt = f"""Synthesize information from multiple content types into a comprehensive technical analysis, while maintaining high level of detail:

        JSON Configuration Summaries:
        {json.dumps(json_summaries, indent=2)}

        Documentation Summaries:
        {json.dumps(markdown_summaries, indent=2)}

        Diagram/Figure Analyses:
        {json.dumps(image_summaries, indent=2)}

        Create a comprehensive technical analysis that outlines:
        1. Technical Requirements and Architecture
           - Core technical requirements, including search parameters for resource types
           - System architecture and patterns
           - Integration points and interfaces
           
        2. Implementation Details
           - Key configurations and settings
           - Resource profiles and extensions
           - Validation rules and constraints (e.g., Must Haves, conformance verbs)
           
        3. Visual Documentation Analysis
           - Technical workflows and processes
           - Component relationships
           
        4. Culminating Analysis
           - Connection of information contained between documentation, config, and diagrams
           - Dependencies and prerequisites
           
        Highlight any overarching implementation considerations."""

        response = safe_claude_request(
            client,
            model_name,
            messages=[
                {"role": "user", "content": prompt},
                {"role": "assistant", "content": "Here is the comprehensive technical analysis: <summary>"}
            ],
            max_tokens=8192,
            stop_sequences=["</summary>"]
        )
        
        return response.content[0].text
    except Exception as e:
        logging.error(f"Error creating meta-summary: {str(e)}")
        return "Unable to create meta-summary due to error"

def save_processed_content(base_directory='full-ig', output_directory='processed_output'):
    """Save all processed content with progress tracking"""
    os.makedirs(output_directory, exist_ok=True)
    
    client = create_anthropic_client()
    
    try:
        print("Starting content processing...")
        final_summary = process_all_content(client, base_directory)
        
        with open(os.path.join(output_directory, 'final_technical_analysis.md'), 'w') as f:
            f.write(final_summary)
        
        print(f"Processing complete. Results saved to {output_directory}")
        return final_summary
    
    except Exception as e:
        print(f"Error during processing: {str(e)}")
        logging.error(f"Processing failed: {str(e)}")
        raise

### Running the Processor Steps

In [None]:
import logging
logging.basicConfig(level=logging.INFO)

load_dotenv()  # Load environment variables from .env file
client = create_anthropic_client()

# Create rate limiter
rate_limiter = create_rate_limiter(max_requests_per_minute=25)

# Copy and organize files
copied_files = copy_json_files()
print(f"Copied {len(copied_files)} JSON files")


INFO:root:Copied 166 JSON files to full-ig/json_only


Copied 166 JSON files


In [None]:
grouped_files = group_files_by_base_name('full-ig/json_only')
print("Files grouped by base name")

copy_files_to_folders('full-ig/json_only', grouped_files)
print("Files organized into folders")

consolidate_jsons('full-ig/json_only')
print("JSONs consolidated")


INFO:root:Created folder: full-ig/json_only/Location
INFO:root:Created folder: full-ig/json_only/StructureDefinition
INFO:root:Created folder: full-ig/json_only/ValueSet
INFO:root:Created folder: full-ig/json_only/CodeSystem
INFO:root:Created folder: full-ig/json_only/OrganizationAffiliation
INFO:root:Created folder: full-ig/json_only/SearchParameter
INFO:root:Created folder: full-ig/json_only/HealthcareService
INFO:root:Created folder: full-ig/json_only/usage
INFO:root:Created folder: full-ig/json_only/Organization
INFO:root:Created folder: full-ig/json_only/CapabilityStatement
INFO:root:Created folder: full-ig/json_only/PractitionerRole
INFO:root:Created folder: full-ig/json_only/ImplementationGuide
INFO:root:Created folder: full-ig/json_only/InsurancePlan
INFO:root:Created folder: full-ig/json_only/Practitioner
INFO:root:Created folder: full-ig/json_only/Endpoint
INFO:root:Created folder: full-ig/json_only/plan
INFO:root:Created Organization_combined.json with 11 entries
INFO:root:C

Files grouped by base name
Files organized into folders
JSONs consolidated


In [None]:
# Process with rate limiting
final_analysis = process_all_content(client, base_directory='full-ig')
print("All content processed")

INFO:httpx:HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.anthropic.co

All content processed


### Saving Output to Markdown File

In [None]:
# Save results
output_dir = 'summarized_output'
os.makedirs(output_dir, exist_ok=True)

with open(os.path.join(output_dir, 'technical_analysis_summary.md'), 'w') as f:
    f.write(final_analysis)

print(f"Processing complete. Results saved to {output_dir}")


Processing complete. Results saved to processed_output


### First Successful Run - Output from Claude LLM: 



1. Technical Requirements and Architecture

Core technical requirements:
- FHIR R4 (v4.0.1) implementation for provider directories and insurance networks
- RESTful API supporting GET requests only (query-only)
- JSON support required, XML recommended
- No authentication required, no consumer identifying information stored
- Conformance to US Core profiles where applicable
- Support for _include, _revinclude, and chained searches

System architecture and patterns:
- Centralized Validated Healthcare Directory (VHDir) with surrounding processes
- Local workflow environments interfacing with central directory 
- Modular design separating validation, core storage, and exchange functions
- Use of FHIR as standardized communication protocol

Integration points and interfaces:
- FHIR-based API for third-party applications to query provider network data
- Primary data sources input to VHDir via FHIR
- Exchange processes between VHDir and local workflow environments
- Attestation process involving external attesters

2. Implementation Details

Key configurations and settings:
- Specific search parameters defined for each resource type
- _lastUpdated parameter used for change detection
- Capability statements define expected server behaviors

Resource profiles and extensions:
- Custom profiles for Endpoint, HealthcareService, InsurancePlan, Location, Network, Organization, OrganizationAffiliation, Practitioner, and PractitionerRole
- Extensions for additional data elements (e.g. accessibility, qualifications)
- Use of US Core profiles as foundation where applicable

Validation rules and constraints:
- Must Support flags on profile elements
- Specific conformance verbs (SHALL, SHOULD, MAY) define requirements
- Value sets and code systems for controlled terminologies

3. Visual Documentation Analysis

Technical workflows and processes:
- Data flow from primary sources to VHDir
- Initial and recurring validation processes within VHDir
- Exchange processes facilitating data sharing
- Attestation workflow involving external attesters

Component relationships:
- VHDir as central component containing core data and processes
- Local workflow environments interfacing with VHDir
- Primary sources providing input data
- Attesters providing external validation

Architecture diagrams insights:
- Centralized directory model with distributed local environments
- Clear separation of core directory functions (validation, storage, exchange)
- Multiple data flow paths using FHIR standard
- Scope delineation between VHDir and Plan-Net implementation guides

4. Culminating Analysis

Relationships between documentation, config, and diagrams:
- Consistent emphasis on FHIR R4 and US Core profiles across all sources
- Configuration JSON aligns with profile definitions in documentation
- Diagrams reinforce centralized directory concept described in text

Dependencies and prerequisites:
- FHIR R4 (v4.0.1) as foundational standard
- US Core profiles as starting point for custom profiles
- Sushi 1.0.0 for IG directory structure
- Part of larger Da Vinci project initiative

Important considerations:
- Extensive use of profiling and extensions to tailor FHIR for provider directories
- Strong focus on search capabilities and resource relationships
- Centralized directory model with local interfaces may require careful performance optimization
- Lack of authentication could limit use cases involving sensitive data
- Query-only API may restrict some advanced directory management scenarios

