# AI-Powered Clinical Documentation Assistant

## Problem Overview

Medical documentation consumes a significant amount of healthcare professionals' time, contributing to burnout and reducing time available for direct patient care. This notebook demonstrates an AI-powered workflow designed to alleviate this burden.

**Goal:** Automatically process audio recordings of physician-patient encounters to:
1.  **Extract structured medical information.**
2.  **Pre-fill relevant clinical forms** (e.g., medical history questionnaires).
3.  **Generate standardized clinical notes** (e.g., SOAP notes).
4.  **Output data in FHIR format** for seamless integration with Electronic Health Records (EHRs) and other healthcare systems.

By automating these tasks, we aim to improve efficiency, reduce administrative overhead, and enhance data quality and reusability for clinical workflows and analytics.

## Solution Architecture

This solution employs a multi-step process orchestrated using LangGraph, a framework for building stateful, multi-actor applications with LLMs.

**Key Components:**

1.  **Questionnaire Discovery (RAG):**
    *   Uses Retrieval-Augmented Generation (RAG) to find the most relevant FHIR [Questionnaire](https://www.hl7.org/fhir/R4/questionnaire.html) based on user instructions (e.g., "Fill out a medical history report").
    *   Leverages ChromaDB (a vector database) and Gemini embeddings to store and search questionnaire descriptions.
    *   Includes a validation step using Gemini to confirm the relevance of the retrieved questionnaire.
2.  **Audio Processing & Form Filling:**
    *   Uploads the encounter audio file to the Gemini API.
    *   Uses a multimodal Gemini model to analyze the audio content and the selected FHIR Questionnaire schema.
    *   Generates a FHIR [QuestionnaireResponse](https://www.hl7.org/fhir/R4/questionnaireresponse.html), representing the completed form based on information extracted from the audio.
3.  **SOAP Note Generation:**
    *   Uses the Gemini API to generate a SOAP (Subjective, Objective, Assessment, Plan) note directly from the audio recording.
4.  **FHIR Resource Creation:**
    *   Creates FHIR [Binary](https://www.hl7.org/fhir/R4/binary.html) and [DocumentReference](https://www.hl7.org/fhir/R4/documentreference.html) resources to represent the generated SOAP note in a standard, interoperable format suitable for EHR integration.

**Note:** For demonstration purposes, a local JSON file (`quest.db.json`) acts as a placeholder repository for FHIR Questionnaires instead of a live FHIR server.


## 1. Setup

### 1.1 Install Required Packages

Install the necessary Python libraries for interacting with the Gemini API, ChromaDB, LangChain, LangGraph, and handling JSON.

In [None]:
%pip uninstall -qqy jupyterlab kfp  # Remove unused conflicting packages
%pip install -q "google-genai==1.7.0" "chromadb==0.6.3" "langchain==0.3.23" "langgraph==0.3.29" "json-repair==0.41.1" "google-api-core==2.24.2" "langchain-google-genai==2.1.2"


### 1.2 Setting Up `GOOGLE_API_KEY` for Execution

To successfully run the next cell, you must provide your `GOOGLE_API_KEY`. This is required to authenticate with Google APIs and is handled differently depending on the environment:

- **Google Colab**
  1.TBD

- **Kaggle**
  1. Click on **"Add-ons" > "Secrets"** in the notebook editor.
  2. Create a new secret with the name `GOOGLE_API_KEY`.
  3. Paste your API key as the value.
  4. The notebook will automatically retrieve the secret.

- **Local Environment**
  1. Set the `GOOGLE_API_KEY` as an environment variable:
     - Temporarily (for a session):
       ```bash
       export GOOGLE_API_KEY=<your-api-key>
       ```
  2. Restart your terminal or IDE session if needed before running the notebook.

In [None]:
import os

# Case 1: Google Colab
try:
    import google.colab
    from google.colab import userdata
    os.environ["GOOGLE_API_KEY"] = userdata.get("GOOGLE_API_KEY")
except ImportError:
    # Not in Colab
    pass

# Case 2: Kaggle (use kaggle secrets)
if os.environ.get("KAGGLE_KERNEL_RUN_TYPE"):
    try:
        from kaggle_secrets import UserSecretsClient
        secret = UserSecretsClient().get_secret("GOOGLE_API_KEY")
        os.environ["GOOGLE_API_KEY"] = secret
    except Exception as e:
        print("Kaggle secret 'GOOGLE_API_KEY' not found or could not be retrieved:", e)

# Case 3: Local dev - assume manually set in environment
GOOGLE_API_KEY=os.environ.get("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise EnvironmentError(
        "GOOGLE_API_KEY not found. Please set it in your environment variables "
        "or in Kaggle/Colab secrets."
    )




In [None]:
from google import genai
from google.genai import types
from google.api_core import retry

client = genai.Client(api_key=GOOGLE_API_KEY)
model_id = "gemini-2.0-flash"

is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

# setup a retry scheme in case some requests fail (mostly due to rate limiting)
genai.models.Models.generate_content = retry.Retry(
    predicate=is_retriable)(genai.models.Models.generate_content)

## 2. Prepare Questionnaire Data & Vector Store

This section focuses on setting up the data source for FHIR Questionnaires and creating a vector database (ChromaDB) to enable semantic search for relevant forms.

### 2.1 Define Gemini Embedding Function for ChromaDB

ChromaDB needs a way to convert text (questionnaire descriptions) into numerical vectors (embeddings) for similarity searching. We define a custom embedding function using the Gemini `text-embedding-004` model.

*   The function handles embedding generation for both `retrieval_document` (when adding questionnaires to the DB) and `retrieval_query` (when searching the DB).
*   It includes retry logic (`@retry.Retry`) to handle potential transient API errors (like rate limits).

In [None]:
from chromadb import Documents, EmbeddingFunction, Embeddings

class GeminiEmbeddingFunction(EmbeddingFunction):
    # Specify whether to generate embeddings for documents, or queries
    document_mode = True

    @retry.Retry(predicate=is_retriable)
    def __call__(self, input: Documents) -> Embeddings:
        if self.document_mode:
            embedding_task = "retrieval_document"
        else:
            embedding_task = "retrieval_query"

        response = client.models.embed_content(
            model="models/text-embedding-004",
            contents=input,
            config=types.EmbedContentConfig(
                task_type=embedding_task,
            ),
        )
        return [e.values for e in response.embeddings]

### 2.2 Load Questionnaire Data

Define utility functions to read the FHIR Questionnaire data from the local JSON file (`../quest.db.json`).

*   `read_questionnaires_from_fs()`: Reads the JSON file (cached for efficiency).
*   `get_quest_docs_meta()`: Extracts the `description` (or uses "No description" if missing) and relevant `metadata` (id, title, name) for each questionnaire. This prepares the data for insertion into ChromaDB.

In [None]:
import json
""" 
Utility functions to read FHIR Questionnaire data from a local JSON file.
"""
_quest_docs = None
def read_questionnaires_from_fs():
    global _quest_docs
    if _quest_docs is None:
        with open("/kaggle/input/quest-sample-db/quest.db.json", "r") as file:
            _quest_docs = json.loads(file.read())
    return _quest_docs

def get_quest_docs_meta():
    quest_docs = read_questionnaires_from_fs()
    doc_with_metad = []
    doc_ids = []
    for doc in quest_docs:
        doc_id = doc.get("id")
        doc_meta = {
            k: v
            for k, v in {
                "id": doc_id,
                "title": doc.get("title"),
                "name": doc.get("name"),
            }.items()
            if v is not None
        }
        doc_desc = (
            doc.get("description") if doc.get("description") else "No description"
        )
        doc_with_metad.append((doc_desc, doc_meta))
        doc_ids.append(doc_id)
    return doc_with_metad, doc_ids

### 2.3 Populate ChromaDB Vector Store

Now, we initialize the ChromaDB client and create/get a collection named `fhir-quest-semantic` using our custom Gemini embedding function.

The `populate_vector_db` function:
1.  Retrieves the questionnaire descriptions and metadata using `get_quest_docs_meta()`.
2.  Sets the embedding function mode to `retrieval_document`.
3.  Adds the descriptions as documents to the ChromaDB collection, using the questionnaire `id` as the document ID and storing the extracted `title` and `name` as metadata associated with each vector embedding.

In [None]:
import chromadb

DB_NAME = "fhir-quest-semantic"

embed_fn = GeminiEmbeddingFunction()
chroma_client = chromadb.Client()
db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)


def populate_vector_db():
    """
    Populates the ChromaDB vector store with FHIR questionnaire data.
    """
    embed_fn.document_mode = True
    (desc_with_metad, doc_ids) = get_quest_docs_meta()
    descriptions, meta = zip(*desc_with_metad)

    db.add(documents=list(descriptions), ids=doc_ids, metadatas=list(meta))


populate_vector_db()

#### 2.3.1 Verify Database Population

Confirm that the questionnaires have been successfully added to the ChromaDB collection by checking the document count.

In [None]:
db.count()

## 2.4 Retrieval: Finding relevant questionnaires

We will be using the user prompt to find a relevant questionnaire to fill. We do so by

1. Querying our vector store for the questionnaire that is most semantically related to the users needs
2. Then use the gemini model to validate that the questionnaire does actually relate to the users prompt.

In [None]:
def generate_form_validation_prompt(user_prompt, quest_desc, quest_metadata):
    """
    Generates a prompt for validating the relevance of a questionnaire.
    This function constructs a prompt for the Gemini model to evaluate the relevance of a FHIR Questionnaire
    The model is expected to determine if the questionnaire is likely to be relevant and useful for the user's instruction.
    """
    return f"""
# Instruction
You are an evaluator. Your task is to evaluate the relevance of a form description and metadata to a user instruction.
We will provide you with the user instruction, and the form description and metadata.
Read the user instruction carefully to understand the user's need, and then evaluate if the provided form description and metadata are relevant to fulfilling that need based on the Criteria provided in the Evaluation section below.
You will assign the form description a rating following the Rating Rubric

# Evaluation
## Metric Definition
You will be assessing form relevance, which measures whether the provided form description and metadata are suitable for fulfilling the user's instruction. Relevance implies that a user could likely find the form useful and pertinent to their stated need.

## Criteria
Relevance to User Instruction: The form description and metadata align with the user's instruction and suggest the form could potentially address the user's need.
Usefulness for User Instruction: The form, as described, appears practically useful for a user attempting to follow the given instruction.
Clarity of Description: The form description and metadata are clear and understandable enough to assess relevance. (If description is unclear, down-rate even if potentially relevant).

## Rating Rubric
(YES). The form is very likely to be relevant and useful for the user instruction. The description is clear and strongly suggests a good match.
(NO). The form is not relevant to the user instruction. The description clearly indicates the form is unrelated to the user's need.

# User Inputs and Model Rating
## User Instruction

### Prompt
{user_prompt}

## Form Description and Metadata

### Form Instruction Description
{quest_desc}

### Form Metadata (JSON)
{quest_metadata}
"""

In [None]:
import enum

"""
Enumeration to represent the relevance rating of a questionnaire.
"""


class RelevantRating(enum.Enum):
    YES = "Yes"
    NO = "No"


def discover_questionnaire(query):
    """
    Discovers a relevant questionnaire by querying the ChromaDB vector store and validating the result with the Gemini model.
    """
    embed_fn.document_mode = False
    result = db.query(query_texts=[query], n_results=1)
    queried_doc_ids = result.get("ids")
    try:
        interest_doc_id = queried_doc_ids[0][0]
    except IndexError:
        return None
    queried_doc_desc = result.get("documents")[0][0]
    queried_doc_meta = result.get("metadatas")[0][0]
    prompt = generate_form_validation_prompt(query, queried_doc_desc, queried_doc_meta)

    structured_output_config = types.GenerateContentConfig(
        response_mime_type="text/x.enum",
        response_schema=RelevantRating,
    )
    response = client.models.generate_content(
        model=model_id, contents=[prompt], config=structured_output_config
    )
    parsed_resp = response.parsed

    if parsed_resp is RelevantRating.YES:
        return interest_doc_id
    else:
        return

## 2.5 Define the langraph workflow

We define the typing for our graph state in preparationto defining the function nodes.

In [None]:
from langchain_core.tools import tool
"""
Defines the structure of the agent's internal state for tracking during execution.
"""
from typing_extensions import TypedDict, Any, Dict, List

# Define the state of our graph
class AgentState(TypedDict):
 audio_file_path: str
 uploaded_audio_file: Any
 instructions: str
 quest: Dict[str, Any]
 patient_res: Dict[str, Any]
 practitioner_res: Dict[str, Any]
 quest_resp: str
 quest_found: bool
 soap_note: str
 soap_fhir_resources: List[Dict[str, Any]]

### 2.5.1 Define nodes to use in our graph workflow.

In [None]:
def fetch_questionnaire(state: AgentState):
    """
    Fetches a relevant questionnaire based on a given query, using the `discover_questionnaire` function and reading from the local file system.
    """
    query = state.get("instructions")
    quest_id = discover_questionnaire(query)

    full_quest_docs = read_questionnaires_from_fs()
    of_interest_quest = None
    for quest in full_quest_docs:
        if quest["id"] == quest_id:
            of_interest_quest = quest
            break
    if of_interest_quest is None:
        return {"quest_found": False}
    else:
        return {"quest_found": True, "quest": of_interest_quest}

In [None]:
_upload_file_cache = None

def upload_to_gemini(state: AgentState):
    """
    Uploads the local audio file to Gemini if not already uploaded.
    Returns a dictionary with the uploaded file object.
    """
    global _upload_file_cache
    local_file_path = state.get("audio_file_path")

    try:
        if _upload_file_cache is None:
            _upload_file_cache = client.files.upload(file=local_file_path)
        return {"uploaded_audio_file": _upload_file_cache}

    except Exception as e:
        print(f"Error uploading to Gemini: {e}")
        # You can also return None, raise the error, or log it more formally
        return {}

In [None]:
import json_repair

def get_questionnaire_response(state: AgentState):
    """
    Use gen ai to generate a questionnaireResponse(form submission instance) from the audio file and using a
    questionnaire discovered in a previous step.
    """
    audio_file = state.get("uploaded_audio_file")
    questionnaire = state.get("quest")
    
    prompt = f"""
        You are an audio processing expert with extensive experience in converting audio files into structured data formats, specifically JSON. Your specialty lies in accurately extracting meaningful information from audio recordings and populating questionnaire-style data structures based on that information.
        
        Your task is to analyze the provided audio file and patient Electronic Medical Record (EMR) and fill out the questionnaire with the relevant responses. The output format should follow the structure of the provided questionnaireResponse example.

        
        Here is the questionnaire:
        {questionnaire}

        
        Please analyze the audio and generate the appropriate questionnaire response.

        Use this JSON schema:

        QuestionnaireResponse = <generated questionnaireResponse>
        return: QuestionnaireResponse
    """

    response = client.models.generate_content(
        model=model_id,
        contents=[audio_file, prompt],
        config=types.GenerateContentConfig(
            temperature=0,
            response_mime_type='application/json',
        )
    )
    qr = json_repair.loads(response.text)
    return {"quest_resp": qr}

In [None]:
soap_note_generation_sys_prompt = """You are an expert medical scribe tasked with generating a concise and accurate SOAP (Subjective, Objective, Assessment, Plan) note from a health care provider - patient conversation.
**Input:** You will be provided with an audio recording of the conversation.
**Task:**  Analyze the transcription and extract relevant information to populate each section of a SOAP note.
**Output:**  Generate a SOAP note in the following structured format:
S - Subjective:
    Chief Complaint (CC): [Concise statement of the patient's primary reason for visit]
    History of Present Illness (HPI): [Detailed narrative of the patient's current problem, using OLDCARTS or similar mnemonic if applicable. Include onset, location, duration, character, aggravating/alleviating factors, radiation, timing, severity.]
    Past Medical History (PMH): [Summarize relevant past medical conditions mentioned by the patient or provider.]
    Medications: [List current medications mentioned by the patient.]
    Allergies: [List known allergies mentioned by the patient.]
    Social History (SH): [Extract pertinent social history details like smoking, alcohol use, occupation, living situation if discussed and relevant to the encounter.]
    Family History (FH): [Summarize relevant family history if discussed.]
    Review of Systems (ROS): [Briefly list any systems reviewed and any symptoms reported by the patient related to those systems. Focus on relevant systems based on the chief complaint.]

O - Objective:
    Vital Signs: [List any vital signs mentioned in the transcription (BP, HR, RR, Temp, SpO2, Pain Scale) and their values if provided. If not explicitly stated in the transcription, state "Not documented in transcription."]
    Physical Exam Findings: [Summarize any physical exam findings described by the provider. Focus on findings related to the chief complaint and ROS. If no physical exam findings are explicitly mentioned in the transcription, state "Not documented in transcription, infer from provider statements if possible (e.g., 'lungs sound clear' implies auscultation)."]
    Lab Results: [List any lab results mentioned by the provider or patient, including test name and result. If no lab results are mentioned, state "Not documented in transcription."]
    Imaging Results: [List any imaging results mentioned, including type and findings. If none mentioned, state "Not documented in transcription."]
    Other Diagnostic Tests: [List any other diagnostic test results mentioned (e.g., EKG, PFTs). If none mentioned, state "Not documented in transcription."]

A - Assessment:
    Differential Diagnoses: [List any differential diagnoses discussed by the provider. Include potential diagnoses considered.]
    Working Diagnosis (or Most Likely Diagnosis): [Identify the most likely diagnosis or working diagnosis stated or strongly implied by the provider. If no clear diagnosis is stated, summarize the provider's assessment of the patient's condition.]
    Problem List: [List any active or chronic problems identified or confirmed by the provider during the encounter. Focus on problems relevant to this visit.]

P - Plan:
    Diagnostic Plan: [List any further diagnostic tests, labs, or imaging ordered or planned by the provider.]
    Therapeutic Plan: [Summarize the treatment plan, including medications prescribed, procedures planned, therapies recommended, lifestyle modifications advised, and referrals made.]
    Patient Education: [Summarize any patient education provided by the provider, including instructions, self-care advice, and information about medications or conditions.]
    Follow-up Plan: [Describe the follow-up plan, including when the patient should return, specific instructions for follow-up, and any "return precautions" mentioned (e.g., "return if symptoms worsen").]
    Consults/Referrals: [List any consultations or referrals to specialists or other providers planned by the provider.]   
"""

In [None]:
def generate_soap_note_from_audio(state: AgentState):
    """
    Uses generative ai to summarize the audio  file to a soap note.
    """
    audio_file = state.get("uploaded_audio_file")

    response = client.models.generate_content(
        model=model_id,
        contents=[audio_file],
        config=types.GenerateContentConfig(
            temperature=0.1,
            system_instruction=soap_note_generation_sys_prompt
        )
    )

    return {"soap_note": response.text}

In [None]:
def truncate_text(text: str, n: int = 200) -> str:
    """Truncates text to the first n characters, adding ellipsis if truncated."""    
    if len(text) > n:
        return text[:n] + "..."
    return text

def write_response(agent_state: AgentState):
    """
    Pretty prints specific fields from the AgentState dictionary:
    `quest`, `quest_resp`, `soap_note`, `soap_fhir_resources`.

    Prints an error message if quest, quest_resp, and soap_note are not defined.
    Uses section headers, delimiters, line spacing, and pretty printing for dictionaries and lists.
    Prints each SOAP FHIR resource explicitly with an index.
    """

    if not agent_state.get('quest') or not agent_state.get('quest_resp') or not agent_state.get('soap_note'):
        print("=" * 20)
        print("Error: Cannot pretty print AgentState fields.")
        if not agent_state.get('quest'):
            print("- 'quest' field is missing.")
        if not agent_state.get('quest_resp'):
            print("- 'quest_resp' field is missing.")
        if not agent_state.get('soap_note'):
            print("- 'soap_note' field is missing.")
        print("=" * 20)
        return

    delimiter = "=" * 20 + "\n"

    print(delimiter)
    print("======== Quest ========")
    print(delimiter)
    print(truncate_text(json.dumps(agent_state['quest'], indent=2)))
    print("\n" + delimiter)

    print("===== Quest Response =====")
    print(delimiter)
    print(truncate_text(json.dumps(agent_state['quest_resp'], indent=2)))
    print("\n" + delimiter)

    print("======== SOAP Note ========")
    print(delimiter)
    print(truncate_text(agent_state['soap_note']))
    print("\n" + delimiter)

    print("==== SOAP FHIR Resources ====")
    print(delimiter)
    if agent_state.get('soap_fhir_resources'):
        for fhir_resource in agent_state['soap_fhir_resources']:
            print(f"""--- SOAP FHIR Resource Index: {fhir_resource.get("resourceType")} ---""")
            print(truncate_text(json.dumps(fhir_resource, indent=2)))
            print("-" * 20 + "\n") # Sub-delimiter for each resource
    else:
        print("No SOAP FHIR Resources found.")
    print(delimiter)

In [None]:
import uuid
import base64

def generate_binary_for_soap(soap_note):
    """
    Generates binary resource that will store the soap note on the fhir servers
    """
    soap_note_bytes = soap_note.encode('utf-8')
    base64_encoded_content = base64.b64encode(soap_note_bytes).decode('utf-8')
    binary = {
        "id": str(uuid.uuid4()),
        "resourceType": "Binary",
        "contentType": "text/plain",
        "content": base64_encoded_content
    }

    return binary

In [None]:
import json_repair

def generate_doc_ref_for_soap(soap_note):
    prompt = f"""You are an expert in FHIR (Fast Healthcare Interoperability Resources) and are tasked with creating a DocumentReference resource. This resource will serve as an index entry for a SOAP note that will be stored separately as a FHIR Binary resource.  You are given the content of the SOAP note as text.

    Generate a well-formed JSON representation of a FHIR DocumentReference resource. This resource should contain metadata extracted from the SOAP note and information needed to locate the SOAP note once it is stored as a Binary resource.

    **Thought Process:**

    1. **Understand the Request:** I need to create a DocumentReference. This DocumentReference is *about* a SOAP note, not the SOAP note itself. The SOAP note will become a Binary resource later.

    2. **Identify Key FHIR DocumentReference Fields:**  To properly index a SOAP note, I need to consider the essential fields in a DocumentReference.  Let's think about what information is crucial for indexing and retrieval:
        * `resourceType`: Always "DocumentReference".
        * `status`:  Likely "current".
        * `subject`: Who is the SOAP note about? (Patient) - Extract from SOAP note.
        * `date`: When was the SOAP note created? - Extract from SOAP note (or use current date if not found).
        * `author`: Who wrote the SOAP note? (Practitioner/Provider) - Extract from SOAP note.
        * `type`: What kind of document is this? (SOAP Note, Progress Note, etc.) - Needs to be coded.
        * `category`:  Broader categories for the document (Clinical Note, etc.) - Needs to be coded.
        * `content`:  Crucially, this describes the *content* being referenced. Since the SOAP note will be a Binary, we need:
            * `content.attachment.contentType`:  What kind of content is the SOAP note? (e.g., text/plain, application/pdf if we assume it *could* be converted later). Let's assume text/plain for now since the input is text.
            * `content.attachment.url`:  This is where we would point to the Binary resource *if it existed*. Since we aren't creating the Binary, we'll use a placeholder URI format that indicates it's a reference to a Binary. Something like `urn:uuid:{{binary-resource-id}}` would work, where `{{binary-resource-id}}` is a placeholder for a future Binary resource ID.
            * `content.format`:  Describe the format of the SOAP note content itself (e.g., text/plain). We could also use a more specific coding if we know the format in more detail.
        * `context`:  Contextual information, like the encounter or practice setting.  Less critical for basic indexing, but good to consider if information is available.  Could be Encounter reference.

    3. **Information Extraction Strategy from SOAP Note:**  I need to look at the SOAP note text and identify the pieces of information needed for the DocumentReference fields (Patient, Author, Date, Document Type).  The format of the SOAP note might be somewhat structured even if it's just text. I'll need to make reasonable assumptions about common SOAP note structures. If information is missing or unclear, I should leave out the corresponding FHIR field.

    4. **FHIR Coding for `type` and `category`:** I need to use CodeableConcepts for `type` and `category`. For SOAP notes, we can use LOINC codes for `type` (e.g., "Progress note" `72517-5`) and potentially SNOMED CT for `category` (e.g., "Clinical note" `114884008`). I will use LOINC for type and a general category.

    Use this json schema for output:

    return {{DocumentReference}}
    """

    response = client.models.generate_content(
        model=model_id,
        contents=[prompt, soap_note],
        config=types.GenerateContentConfig(
            temperature=0,
            response_mime_type="application/json",
        ),
    )

    document_ref = json_repair.loads(response.text)
    return document_ref

In [None]:
def generate_soap_resources(state: AgentState):
    """
    create a document Reference resource to index a binary resource that holds the soap_note.
    """
    soap_note = state.get("soap_note")

    binary = generate_binary_for_soap(soap_note)
    document_ref = generate_doc_ref_for_soap(soap_note)
    patient_id = state.get("patient_res").get("id")
    practitioner_id = state.get("practitioner_res").get("id")
    patient_ref = f"Patient/{patient_id}"
    practitioner_ref = f"Practitioner/{practitioner_id}"
    binary_id = binary.get("id")

    document_ref["content"] = [
        {
            "attachment": {
                "contentType": "text/plain",
                "url": f"Binary/{binary_id}",  # Placeholder URL
            },
            "format": {
                "system": "http://terminology.hl7.org/CodeSystem/fhir-format-codes",
                "code": "text/plain",
                "display": "Plain text",
            },
        }
    ]
    document_ref["subject"] = {
        "reference": patient_ref
    }
    document_ref["author"] = {
        "reference": practitioner_ref
    }
    return {"soap_fhir_resources": [binary, document_ref]}

### 2.5.2 Defining the graph workflow

Bring every together, the nodes and create the edge connections that effect the workflow

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.graph import StateGraph, END, START

model = ChatGoogleGenerativeAI(model=model_id, google_api_key=GOOGLE_API_KEY)

# Defined the graph
wk_graph = StateGraph(AgentState)

def aggregate_state(state: AgentState):
    return state

def aggregate_state2(state: AgentState):
    return state

# node magic strings
audio_file_upload_key = "upload_file_to_gemini"
fetch_questionnaire_key = "discover_and_fetch_questionnaire"
gen_quest_resp_key = "generate_questionnaire_response"
gen_soap_note_key = "generate_soap_note"
write_resp_key = "write_resp"
quest_audio_state_aggregator_key = "inputs_aggregator"
soap_quest_state_aggregator_key = "output_aggregator"
generate_soap_resources_key="generate_soap_resources"

# Nodes
wk_graph.add_node(audio_file_upload_key, upload_to_gemini)
wk_graph.add_node(fetch_questionnaire_key, fetch_questionnaire)
wk_graph.add_node(gen_quest_resp_key, get_questionnaire_response)
wk_graph.add_node(gen_soap_note_key, generate_soap_note_from_audio)
wk_graph.add_node(write_resp_key, write_response)
wk_graph.add_node(quest_audio_state_aggregator_key, aggregate_state)
wk_graph.add_node(soap_quest_state_aggregator_key, aggregate_state2)
wk_graph.add_node(generate_soap_resources_key, generate_soap_resources)

def check_file_upload(state: AgentState):
    next_nodes = []
    if state.get("uploaded_audio_file") is None:
        return write_resp_key
    else:
        next_nodes.append(gen_soap_note_key)
    if state.get("quest_found"):
        next_nodes.append(gen_quest_resp_key)
    return next_nodes


# Edges
wk_graph.add_edge(START, audio_file_upload_key)
wk_graph.add_edge(START, fetch_questionnaire_key)
wk_graph.add_edge(
    [audio_file_upload_key, fetch_questionnaire_key],
    quest_audio_state_aggregator_key)
   
wk_graph.add_conditional_edges(quest_audio_state_aggregator_key, check_file_upload)
wk_graph.add_edge(gen_soap_note_key, generate_soap_resources_key)

# instead of two add_edge calls, do:
wk_graph.add_edge(
    [gen_quest_resp_key, generate_soap_resources_key],
    soap_quest_state_aggregator_key
)
wk_graph.add_edge(generate_soap_resources_key, soap_quest_state_aggregator_key)

wk_graph.add_edge(soap_quest_state_aggregator_key, write_resp_key)

wk_graph.add_edge(write_resp_key, END)

graph = wk_graph.compile()

In [None]:
from IPython.display import Image

Image(graph.get_graph().draw_mermaid_png())

### 2.5.3 Run the workflow

The patient resources and practitioner resource would respectively represent the patient and provider in the emr.

In [None]:
# the sample audio file that will be used
audio_file_path = "../Data/CAR0001.mp3"

practitioner_resource = {
    "resourceType": "Practitioner",
    "id": "practitioner-adam-careful",
    "identifier": [{"system": "http://hl7.org/fhir/sid/us-npi", "value": "9999999999"}],
    "active": True,
    "name": [{"family": "Careful", "given": ["Adam"]}],
}

patient_resource = {
    "resourceType": "Patient",
    "id": "patient-jane-doe",
    "identifier": [
        {"use": "usual", "system": "urn:oid:1.2.3.4.5.6.7", "value": "MRN12345"}
    ],
    "active": True,
    "name": [{"use": "official", "family": "Doe", "given": ["Jane"]}],
    "gender": "female",
    "birthDate": "1985-08-15",
}

inputs = {
    "audio_file_path": audio_file_path,
    "instructions": "Fill out a medical history report",
    "practitioner_res": practitioner_resource,
    "patient_res": patient_resource
}
state = graph.invoke(inputs)