# Exploring FHIR Implementation Guides (IGs) + LLMs

In this notebook, we aim to explore how much LLMs understand about FHIR Implementation Guides (IGs) and investigate ways to upload IG content for deeper analysis.

### Import relevant libraries

Make sure you have langchain-connunity and beautifulsoup4 installed

In [1]:
# %pip install -U langchain-community bs4

In [2]:
import os
import google.generativeai as gemini
from anthropic import Anthropic
from openai import OpenAI
import io, threading, time, re, json
import pandas as pd
from json_repair import repair_json
from langchain_community.document_loaders import BSHTMLLoader
import shutil
from dotenv import load_dotenv

### Read in US Core IG HTML files

NOTE: Be sure that you have downloaded the US Core IG HTML files and placed them in your current directory

In [3]:
source_folder = 'full-ig/site'
destination_folder = 'full-ig/html_only'

In [4]:
# Create the destination folder if it doesn't exist
if not os.path.exists(destination_folder):
    os.makedirs(destination_folder)

In [5]:
# List to store only .html files
html_files = []

In [6]:
for file_name in os.listdir(source_folder):
    # Check if the file ends with .html but not with compound extensions
    if file_name.endswith('.html') and not (file_name.endswith('.ttl.html') or 
                                             file_name.endswith('.json.html') or 
                                             file_name.endswith('.xml.html') or 
                                             file_name.endswith('.change.history.html')):
        html_files.append(file_name)
        # Move the file to the destination folder
        shutil.copy(os.path.join(source_folder, file_name), destination_folder)

### Loading HTML with BeautifulSoup4

In [7]:
html_only_folder = 'full-ig/html_only'

In [8]:
# Create a new folder named "plain_text" inside the current directory
processed_files_path = os.path.join(html_only_folder, 'plain_txt')

# Create the destination folder if it doesn't exist
if not os.path.exists(processed_files_path):
    os.makedirs(processed_files_path)

In [9]:
# List to store the files processed
processed_files = []

In [10]:
# Loop through the files in the HTML folder
for file_name in os.listdir(html_only_folder):
    # Full path to the .html file
    html_file_path = os.path.join(html_only_folder, file_name)
    
    # Check if it's a file (not a directory)
    if os.path.isfile(html_file_path):
        # Use BSHTMLLoader to load the HTML content
        loader = BSHTMLLoader(html_file_path)
        data = loader.load()
        # Extract the plain text from the loaded data
        plain_text = '\n'.join([doc.page_content for doc in data])
        
        # Create the output file path with .txt extension
        txt_file_name = file_name.replace('.html', '.txt')
        txt_file_path = os.path.join(processed_files_path, txt_file_name)
        
        # Write the extracted plain text to the new .txt file
        with open(txt_file_path, 'w', encoding='utf-8') as txt_file:
            txt_file.write(plain_text)
        
        # Append to processed files list
        processed_files.append(txt_file_name)



Read in API keys for Claude, Gemini, and GPT from .env file

In [26]:
load_dotenv()

claude_api_key = os.getenv('ANTHROPIC_API_KEY')
gemini_api_key = os.getenv('GEMINI_API_KEY')
OpenAI.api_key = os.getenv('OPENAI_API_KEY')

TODO: Read in relevant context files 
- IG_golden_rules
- IG_example
- IG_profile

TODO: Prompts