# AI-Powered Clinical Documentation Assistant

# Background

Healthcare professionals face a significant burden from medical documentation. This project focuses on leveraging generative AI to alleviate this burden by automatically extracting structured information from physician-patient audio conversations and using it to pre-fill administrative forms and generate FHIR resources.

Converting form data to FHIR resources ensures seamless integration with existing healthcare systems through a standardized, interoperable format. This structured approach unlocks the data's potential for reusability in various clinical workflows, analytics, and future healthcare applications beyond just form filling.


## Setup


**Install library dependencies.**

In [2]:
!pip uninstall -qqy jupyterlab kfp  # Remove unused conflicting packages
!pip install -qU "google-genai==1.7.0" "chromadb==0.6.3" "requests==2.32.3" "langchain==0.3.23" "langgraph==0.3.29" "json-repair==0.41.1" "google-api-core==2.24.2" "langchain-google-genai==2.1.2"


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m


**Set up your API key**

To run the following cell, your API key must be stored it in a [Kaggle secret](https://www.kaggle.com/discussions/product-feedback/114053) named `GOOGLE_API_KEY`.

If you don't already have an API key, you can grab one from [AI Studio](https://aistudio.google.com/app/apikey). You can find [detailed instructions in the docs](https://ai.google.dev/gemini-api/docs/api-key).

To make the key available through Kaggle secrets, choose `Secrets` from the `Add-ons` menu and follow the instructions to add your key or enable it for this notebook.

In [3]:
# from kaggle_secrets import UserSecretsClient

# GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")

In [4]:
import os

GOOGLE_API_KEY="AIzaSyDAZjElfeaJqItRsB21v3p4ETShat1PzmI"

# print(dict(os.environ))

# os.environ["GOOGLE_API_KEY"]

**Prepare the data store and embeddings**

Discover the questionnaire metadata that we will use to create an embedding database

In [5]:
# Define some constants
HAPI_FHIR_BASE_URL = "https://hapi.fhir.org/baseR4"
HAPI_FHIR_BASE_URL = "http://localhost:8081/fhir"
QUESTIONNAIRE_ENDPOINT = f"{HAPI_FHIR_BASE_URL}/Questionnaire"


In [6]:
import json
import requests

def push_sample_questionnaire():
    # TODO - read questionnaire json file
    with open("./quest.json", "r") as file:
        raw_quest = json.loads(file.read())
    raw_quest_id = raw_quest.get("id")
    put_uri = f"{QUESTIONNAIRE_ENDPOINT}/{raw_quest_id}"
    print(put_uri)
    headers = {
        "Content-Type": "application/fhir+json"
    }
    response = requests.put(put_uri, json=raw_quest, headers=headers)
    response.raise_for_status()

push_sample_questionnaire()

http://localhost:8081/fhir/Questionnaire/health-history-questionnaire-2021-06


In [7]:
from typing import List, Dict

def fetch_questionnaires_from_hapi() -> List[Dict]:
    """
    Fetches all Questionnaire resources from the HAPI FHIR server.

    Returns:
        List[Dict]: A list of Questionnaire resources in JSON format.
                     Returns an empty list if there's an error.
    """
    questionnaires = []
    try:
        # fetch only the last 50 updated records
        questionnaire_count = 50
        all_params = {"_count": questionnaire_count, "_sort": "-_lastUpdated", "_elements":"description,id,identifier,name,title"}
        response = requests.get(QUESTIONNAIRE_ENDPOINT,params=all_params, headers={"Accept": "application/fhir+json"})
        response.raise_for_status()  # Raise HTTPError for bad responses (4xx or 5xx)
        bundle = response.json()

        if bundle.get('resourceType') == 'Bundle' and bundle.get('type') == 'searchset':
            for entry in bundle.get('entry', []):
                if entry.get('resource') and entry['resource'].get('resourceType') == 'Questionnaire':
                    questionnaires.append(entry['resource'])
        else:
            print(f"Unexpected response format from FHIR server: {bundle.get('resourceType')}")

    except requests.exceptions.RequestException as e:
        print(f"Error fetching Questionnaires from HAPI FHIR: {e}")
        return []  # Return empty list in case of error

    return questionnaires


## Creating the embedding database with ChromaDB

We create a [custom function](https://docs.trychroma.com/guides/embeddings#custom-embedding-functions) to generate embeddings with the Gemini API. 

The questionnaire metadata are the items that are in the database. They are inserted first, and later retrieved. Queries will be a description of the form to be filled derived from the prompt instruction.

In [8]:
from chromadb import Documents, EmbeddingFunction, Embeddings
from google.api_core import retry
from google.genai import types, Client

gda_client = Client(api_key=GOOGLE_API_KEY)
# Define a helper to retry when per-minute quota is reached.
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})


class GeminiEmbeddingFunction(EmbeddingFunction):
    # Specify whether to generate embeddings for documents, or queries
    document_mode = True

    @retry.Retry(predicate=is_retriable)
    def __call__(self, input: Documents) -> Embeddings:
        if self.document_mode:
            embedding_task = "retrieval_document"
        else:
            embedding_task = "retrieval_query"

        response = gda_client.models.embed_content(
            model="models/text-embedding-004",
            contents=input,
            config=types.EmbedContentConfig(
                task_type=embedding_task,
            ),
        )
        return [e.values for e in response.embeddings]

Now create a [Chroma database client](https://docs.trychroma.com/getting-started) that uses the `GeminiEmbeddingFunction` and populate the database with the questionnaire metadata from above

In [22]:
import chromadb

DB_NAME = "fhir-questionnaire"

embed_fn = GeminiEmbeddingFunction()
chroma_client = chromadb.Client()
db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)

def populate_vector_db():
    embed_fn.document_mode = True
    quest_docs = fetch_questionnaires_from_hapi()
    quest_ids = [x.get("id") for x in quest_docs]
    quest_docs = [doc.get("description") for doc in quest_docs]

    print(quest_docs, quest_ids)
    print(type(quest_docs[0]))
    
    db.add(documents=quest_docs, ids=quest_ids)

populate_vector_db()

['A questionnaire to collect basic health history information.'] ['health-history-questionnaire-2021-06']
<class 'str'>


Confirm that the data was inserted by looking at the database.

In [23]:
db.count()
# You can peek at the data too.
# db.peek(1)

1

In [24]:
from langgraph.prebuilt import create_react_agent
from langchain_core.tools import tool
from typing_extensions import TypedDict, Any, Dict

# Define the state of our graph
class AgentState(TypedDict):
    audio_file_path: Any
    instructions: str
    transcription: str
    quest: Dict[str, Any]
    medical_records: str
    quest_resp: str
    quest_found: bool
    quest_resp_valid: bool

In [25]:
def init_workflow(state: AgentState):
    """ 
    Start: preserves input prompt, which includes audio file and prompt instructions
    """
    return {"audio_file_path": state["audio_file_path"], "instructions": state["instructions"]}

## Retrieval: Finding relevant questionnaires

We can then use the prompt to get the questionnaire

In [26]:
def discover_questionnaire(query):
    try:
        embed_fn.document_mode = False
        result = db.query(query_texts=[query], n_results=1)
        # TODO -> how we parse the results here
        print(result)
        [all_passages] = result["documents"]
        return result["ids"][0][0]
    except:
        return None

def fetch_questionnaire(state: AgentState):
    query = state.get("instructions")
    quest_id = discover_questionnaire(query)
    print("questionnaire_id", quest_id)
    if quest_id is None:
        return {quest_found: False}
    response = requests.get(f"{QUESTIONNAIRE_ENDPOINT}/{quest_id}", headers={"Accept": "application/fhir+json"})
    response.raise_for_status()  # Raise HTTPError for bad responses (4xx or 5xx)
    quest = response.json()
    return {"quest_found": True, "quest": quest}


form_instructions = [
    "Enter your full name in the registration form.",
    "Provide your medical history in the health screening form.",
    "Fill in your contact number in the application form.",
    "Sign the patient consent form before the procedure.",
    "Rate our service in the feedback form.",
    "List any allergies in the wellness intake form.",
    "Add your shipping address in the order form.",
    "Update your vaccination dates in the immunization record form.",
    "Select your preferred contact method in the survey form.",
    "Submit your emergency contact details in the health registration form."
    "Fill out a health history questionnaire",
    "Fill out a medical history form"
]

for instruction in form_instructions:
    discover_questionnaire(instruction)


# response = discover_questionnaire("A questionnaire to collect basic health history information.")
# # print(response)

{'ids': [['health-history-questionnaire-2021-06']], 'embeddings': None, 'documents': [['A questionnaire to collect basic health history information.']], 'uris': None, 'data': None, 'metadatas': [[None]], 'distances': [[1.1210062503814697]], 'included': [<IncludeEnum.distances: 'distances'>, <IncludeEnum.documents: 'documents'>, <IncludeEnum.metadatas: 'metadatas'>]}
{'ids': [['health-history-questionnaire-2021-06']], 'embeddings': None, 'documents': [['A questionnaire to collect basic health history information.']], 'uris': None, 'data': None, 'metadatas': [[None]], 'distances': [[0.7042571306228638]], 'included': [<IncludeEnum.distances: 'distances'>, <IncludeEnum.documents: 'documents'>, <IncludeEnum.metadatas: 'metadatas'>]}
{'ids': [['health-history-questionnaire-2021-06']], 'embeddings': None, 'documents': [['A questionnaire to collect basic health history information.']], 'uris': None, 'data': None, 'metadatas': [[None]], 'distances': [[1.1676912307739258]], 'included': [<Include

# Now that we have the questionnaire and the transcripted audio files, we can move on to generate the questionnaireResponse

In [14]:
import json_repair
google_model_id = "gemini-2.0-flash"

def generate_questresp(state: AgentState) -> dict:
    """
    Extract relevant information,
    and return a FHIR QuestionnaireResponse resource as a dict.
    """
    transcribed = state.get("transcription")
    questionnaire = state.get("quest")
    # 3. Prepare the LLM prompt
    prompt = create_prompt_for_questionnaire_response(
        transcribed, questionnaire
    )

    response = gda_client.models.generate_content(
        model=google_model_id, contents=[prompt, transcribed], config={
            'response_mime_type': 'application/json'
        }
    )

    qr_string = response.text.strip()
    qr = json_repair.loads(qr_string)
    # try:
    #     questionnaire_response = json.loads(llm_output)
    # except Exception as e:
    #     raise ValueError(f"Invalid JSON from LLM: {e}")

    # # 6. Validate the JSON against FHIR schema (optional but recommended)
    # #    This step ensures the object meets the QuestionnaireResponse structure
    # if not validate_fhir_questionnaire_response(questionnaire_response):
    #     raise ValueError("Generated QuestionnaireResponse is not valid FHIR.")

    # # 7. Return or store the final resource
    return {"quest_resp": qr}


def create_prompt_for_questionnaire_response(
    cleaned_text: str, questionnaire_template: dict
) -> str:
    # Construct a system/user prompt with instructions,
    # referencing relevant sections of the conversation
    prompt = f"""
    You are a medical documentation assistant.
    Below is a transcribed patient-physician conversation:
    ---
    {cleaned_text}
    ---

    You have a FHIR Questionnaire defined as follows:
    {json.dumps(questionnaire_template, indent=2)}

    Extract the relevant data from the conversation to populate a FHIR QuestionnaireResponse
    based on the provided Questionnaire. Return ONLY valid JSON representing this 
    QuestionnaireResponse with fields "resourceType": "QuestionnaireResponse", 
    "questionnaire": "<Questionnaire-identifier>",
    "status", "subject", "authored", "item", etc.

    If a field is unknown, leave it blank or null. 
    Do not add additional commentary.

    Use this JSON schema:

    QuestionnaireResponse = <generated questionnaireResponse>
    return: QuestionnaireResponse
    """
    return prompt


In [15]:
def validate_qr(state: AgentState):
    qr = state.get("quest_resp")
    # use a publicly available fhir instance.
    url = f"{QUESTIONNAIRE_ENDPOINT}/$validate"
    headers = {"Content-Type": "application/fhir+json"}

    response = requests.post(url, json=qr, headers=headers)
    if response.ok:
        return {"quest_resp_valid": True}
    else:
        return {"quest_resp_valid": False}
    

In [16]:
def save_questionnaire_response(state: AgentState) -> str:
    """
    Saves the validated QuestionnaireResponse to the HAPI FHIR server.

    Args:
        questionnaire_response (Dict): The validated FHIR QuestionnaireResponse in JSON format.

    Returns:
        str: A success message or an error message if saving fails.
    """
    quest_resp = state.get("quest_resp")
    print("\n\n\n\n",quest_resp, type(quest_resp))
    questionnaire_response_endpoint = f"{HAPI_FHIR_BASE_URL}/QuestionnaireResponse"
    headers = {"Accept": "application/fhir+json", "Content-Type": "application/fhir+json"}

    response = requests.post(questionnaire_response_endpoint, headers=headers, json=quest_resp)
    response.raise_for_status() # Raise exception for HTTP errors

    if response.status_code in range(200, 300):
        created_resource = response.json()
        return {}


In [17]:
# --- Transcription Function ---
# TODO - why do we need to this.
def diarize_audio(state: AgentState):
    """
    Transcribes the given audio file using the Gemini model, aiming for a
    conversation-style output with speaker labels.

    Args:
        model: The initialized Gemini GenerativeModel instance.
        audio_file_path: Path to the audio file (e.g., .wav, .mp3, .flac).
    """
    # TODO - check that audio format is supported.
    uploaded_file = state["audio_file_path"]

    
    # 2. Construct the Prompt - Key Considerations for Conversation Style:
    #    - Explicitly ask for transcription.
    #    - Request speaker diarization (identifying and labeling speakers).
    #    - Suggest common labels (like 'Doctor:', 'Patient:', or 'Speaker 1:', 'Speaker 2:').
    #    - Ask for natural punctuation and formatting.
    #    - Specify how to handle non-speech sounds (ignore, note in brackets, etc.).

    prompt = """
    Diarize and transcribe this health-related interview, maintaining chronological order with timestamps if possible. Add labels for speaker (like 'Doctor:', 'Patient:', or 'Speaker 1:', 'Speaker 2:') at the beginning of each turn.
    Accurately capture medical terms, mark unclear words as “[INAUDIBLE],” avoid adding extra commentary or guesses, and keep overlapping speech on separate lines. 
    Return only the final transcript.
    """

    response = gda_client.models.generate_content(
        model=google_model_id,
        contents = [prompt, uploaded_file]
    )

    transcription = response.text.strip()
    return {"transcription": transcription}



In [18]:
def terminate_workflow(state: AgentState):
    # TODO - 
    print("Workflow end")
    return {}

In [19]:
from langchain_google_genai import ChatGoogleGenerativeAI # Correct import path
from langgraph.graph import StateGraph, END, START
# from langgraph.pregel import PregelProcess # TODO???

model_id = "gemini-2.0-flash"
model = ChatGoogleGenerativeAI(model=model_id, google_api_key=GOOGLE_API_KEY)

# Defined the graph
wk_graph = StateGraph(AgentState)

# Nodes
wk_graph.add_node("discover_and_fetch_questionnaire", fetch_questionnaire)
wk_graph.add_node("transcribe_audio", diarize_audio)
wk_graph.add_node("generate_questionnaire_response", generate_questresp)
wk_graph.add_node("repair_and_validate_questionnaire_response", validate_qr)
wk_graph.add_node("save_response", save_questionnaire_response)
wk_graph.add_node("terminate_workflow", terminate_workflow)

# Edges
wk_graph.add_edge(START, "discover_and_fetch_questionnaire")
wk_graph.add_conditional_edges("discover_and_fetch_questionnaire", lambda state: "transcribe_audio" if state["quest_found"] else "terminate_workflow")
wk_graph.add_edge("transcribe_audio", "generate_questionnaire_response")
wk_graph.add_edge("generate_questionnaire_response", "repair_and_validate_questionnaire_response")
wk_graph.add_edge("repair_and_validate_questionnaire_response", "save_response")
wk_graph.add_edge("save_response", "terminate_workflow")
wk_graph.add_edge("discover_and_fetch_questionnaire", "terminate_workflow")
wk_graph.add_edge("terminate_workflow", END)



graph = wk_graph.compile()


In [20]:
# from IPython.display import Image, display

# display(Image(graph.get_graph().draw_mermaid_png()))


In [21]:

# local_input_file_url = "./Data/Audio Recordings/CAR0002.mp3"
# uploaded_file_uri = gda_client.files.upload(file=local_input_file_url)

inputs = {"audio_file_path": uploaded_file_uri, "instructions": "Process audio and fill out a medical history report"}
result = graph.invoke(inputs)

NameError: name 'uploaded_file_uri' is not defined