<a href="https://colab.research.google.com/github/ekrombouts/GenCareAI/blob/main/notebooks/100_note_generation/140_GenerateClientRecords.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GenCare AI: Generating client records

## Info


**Author:** Eva Rombouts  
**Date:** 2024-06-15  
**Updated:**   
**Version:** 2.0

### Description
This script generates synthetic care notes and careplans for clients in a psychogeriatric ward using the OpenAI GPT-3.5-turbo model.  
The care notes are based on client profiles and scenarios generated in earlier scripts in this repo. The goal is to add as much story and variation to the notes as possible, without letting the model produce outputs that are overly creative.  
To achieve this, I use structured prompts (to help the model understand the expected content and its creative liberties), example libraries and memory integration. As memory, I use summary memory, structured as careplans, a method commonly known and used in nursing home environments. 
The output parser uses Pydantic models to structure and validate the care notes, ensuring proper format and content.  
Chroma is used to retrieve example care notes that are representative of the client profile and scenario. 

The script processes client profiles and scenarios, generates care notes, and updates careplans accordingly, creating comprehensive client records.  
The goal is to create a diverse and realistic dataset for NLP experiments in nursing homes.

**Please note** that generating data with OpenAI is not free. Generating records for 24 clients with a mean of 8 months, 5 iterations per month takes about 3 hours en costs appr $2,- with gpt-3.5

## Setup

In [1]:
!pip install GenCareAI
from GenCareAI.GenCareAIUtils import GenCareAISetup

setup = GenCareAISetup()

if setup.environment == 'Colab':
        !pip install -q langchain langchain-openai langchain-community langchain-chroma



In [2]:
# Imports
import random
import pandas as pd
from pprint import pprint
from typing import List

from langchain.output_parsers import PydanticOutputParser, CommaSeparatedListOutputParser
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.prompts import PromptTemplate
from pydantic import BaseModel, Field
from langchain_community.callbacks import get_openai_callback

In [3]:
# Paths to various data files and constants for the model and temperature settings.
ward_name = 'Athena' # Make sure to use the same ward-name for which clientprofiles and -scenarios have been generated
path_profiles = setup.get_file_path(f'data/gcai_client_profiles_{ward_name}.csv')
path_scenarios = setup.get_file_path(f'data/gcai_client_subscenarios_{ward_name}.csv')
path_notes = setup.get_file_path(f'data/gcai_client_notes_{ward_name}.csv')
path_careplans = setup.get_file_path(f'data/gcai_client_careplans_{ward_name}.csv')

path_db_gcai = setup.get_file_path('data/chroma_db_gcai_notes')
collection_name = 'anonymous_notes'

model_notes = 'gpt-3.5-turbo-0125'
model_careplans = 'gpt-3.5-turbo-0125'
model_keywords = 'gpt-3.5-turbo-0125'
model_embeddings = 'text-embedding-ada-002'

temp_notes = 1.1
temp_careplans = 1.0
temp_keywords = 1.0

verbose = True # Set to True for debugging / printing
sample_client_id = 1
sample_scenario_id = 3
sep_line = 100*'-'

### Load data

In the notebooks [110_GenerateClientProfiles.ipynb](), [120_GenerateClientScenarios.ipynb]() and [130_GenerateClientSubScenarios.ipynb](), datasets are generated with client profiles and scenarios, respectively.

- ***df_profiles***: Contains one row per client. The row describes the type of dementia the client is diagnosed with, the ADL care needs, medical symptoms and diseases, mobility, and behavior.
  
- ***df_scenarios***: Each client has zero or more scenario lines. 

In [4]:
# Load scenarios and profiles from CSV files
df_scenarios = pd.read_csv(path_scenarios)
df_profiles = pd.read_csv(path_profiles)

df_scenarios['scenario_id'] = df_scenarios.groupby('client_id').cumcount() + 1

if verbose:
    print(df_profiles.info())
    print(df_scenarios.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24 entries, 0 to 23
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   client_id      24 non-null     int64 
 1   naam           24 non-null     object
 2   type_dementie  24 non-null     object
 3   somatiek       24 non-null     object
 4   adl            24 non-null     object
 5   mobiliteit     24 non-null     object
 6   gedrag         24 non-null     object
dtypes: int64(1), object(6)
memory usage: 1.4+ KB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 573 entries, 0 to 572
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   client_id           573 non-null    int64 
 1   period              573 non-null    int64 
 2   sub_period          573 non-null    int64 
 3   events_description  573 non-null    object
 4   scenario_id         573 non-null    int64 
dtypes: int64(4)

In [5]:
if verbose:
    # Select the profile row for the sample client
    sample_profile_row = df_profiles.loc[df_profiles['client_id'] == sample_client_id].iloc[0]

    # Select the scenario row for the sample client and scenario
    sample_scenario_row = df_scenarios.loc[
        (df_scenarios['client_id'] == sample_client_id) & 
        (df_scenarios['scenario_id'] == sample_scenario_id)
    ].iloc[0]

### Functions to display the data

In [6]:
def format_client_profile(profile_row, display_name=True, display_diagnosis=True):
    """
    Format profile information from a given profile_row.
    
    Args:
    - profile_row: A Pandas Series object containing the client's profile information.
    - display_name: Boolean that determines whether the client's name should be displayed.
    
    Returns:
    - A formatted profile as a string.
    """
    profile = ""
    if display_name:
        profile += f"Naam: {profile_row['naam']}\n"
    if display_diagnosis:
        profile += f"Type Dementie: {profile_row['type_dementie']}\n"
    profile += (f"Lichamelijke klachten: {profile_row['somatiek']}\n"
                f"ADL: {profile_row['adl']}\n"
                f"Mobiliteit: {profile_row['mobiliteit']}\n"
                f"Cognitie/gedrag: {profile_row['gedrag']}")
    
    return profile

if verbose:
    sample_profile = format_client_profile(sample_profile_row, display_name=True)
    sample_profile_no_name = format_client_profile(sample_profile_row, display_name=False, display_diagnosis=False)
    print(sample_profile)
    print(sep_line)
    print(sample_profile_no_name)

Naam: Meneer Jan de Vries
Type Dementie: Alzheimer
Lichamelijke klachten: Diabetes, gehoorproblemen
ADL: Afhankelijk van hulp bij aankleden, wassen en toiletgang
Mobiliteit: Gebruik van een rollator, valgevaar
Cognitie/gedrag: Rustig, soms verward, moeite met oriëntatie in tijd en plaats
----------------------------------------------------------------------------------------------------
Lichamelijke klachten: Diabetes, gehoorproblemen
ADL: Afhankelijk van hulp bij aankleden, wassen en toiletgang
Mobiliteit: Gebruik van een rollator, valgevaar
Cognitie/gedrag: Rustig, soms verward, moeite met oriëntatie in tijd en plaats


In [7]:
# Function to display the scenario information for a given client and periodnumber
def format_scenario(scenario_row):
    """
    Formats and returns the events description for a given scenario.

    Args:
    - scenario_row: A Pandas Series object representing a row from the scenarios DataFrame. 
      This row contains information about the scenario of a specific client.

    Returns:
    - A string with the 'events_description' of the provided scenario_row, which describes 
      the key events and details of the scenario.
    """
    
    # Return the events description from the scenario_row
    return scenario_row['events_description']

if verbose:
    sample_scenario = format_scenario(sample_scenario_row)
    print(sample_scenario)    

De professionele begeleiding en goede zorg dragen bij aan de verbetering van meneer Jan's gemoedstoestand en fysieke gezondheid. Hij voelt zich steeds meer op zijn gemak in het verpleeghuis.


## Setting up the example library

To improve the contextual relevance and diversity of the generated care notes, we will dynamically inject example notes into the prompts. 

Chroma is used in this context to select example care notes that are representative of the client profile and scenario. By using a vector database, the system can efficiently find example notes that match the specific characteristics of each client. The Chroma vector database, created [here](), contains a variety of (synthetic) notes for anonymous clients. The retriever queries this database, allowing the processing functions to access and integrate the stored example notes.  
By adding these dynamically selected examples to the prompt, this approach aims to make the generated care notes more contextually appropriate and diverse.

Our approach is:
- Generate Keywords: Using the LLM, keywords are generated from a client’s profile and scenario. These keywords are used to query the Chroma vector database for relevant example notes.
- Initialize the Chroma Vector Database.
- Retrieve Example Notes: Utilizing a retriever to search the vector database with the generated keywords, supplemented by additional neutral terms to ensure a broad coverage of relevant care notes.
- Filter by Gender: Filter the retrieved example notes to exclude those with gender-specific pronouns that do not match the client’s gender, ensuring the relevance of the examples.
- Sample Examples: Randomly sample a subset of the filtered example notes to inject into the prompts, enhancing the diversity of the generated care notes.

### Retrieving notes for the example library

The next step is to invoke the retriever with the generated keywords to obtain example notes. To add more ‘neutral’ notes, we also add the keywords ‘ADL’, ‘mobility’, and ‘food and drinks’ to the list.

Please note: The retriever utilizes an OpenAI embedding model (which is not free), which transforms the keywords into embeddings. These embeddings are then used to find similar notes in the initialized vector database, ensuring the retrieval of contextually relevant examples.

In [15]:
# Initialize Chroma vector database
vectordb = Chroma(
    persist_directory=path_db_gcai,
    embedding_function=OpenAIEmbeddings(api_key=setup.get_openai_key(), model=model_embeddings),
    collection_name = collection_name
    )

retriever = vectordb.as_retriever(search_kwargs={"k": 40})

def retrieve_examples (profile, scenario, retriever):
#     text = f"""{profile}

# {scenario}"""
    text = profile
    example_library = []
    documents = retriever.invoke(text)
    for document in documents:
        example_library.append(document.page_content.strip('"'))

    return example_library

example_library = retrieve_examples(
    profile=format_client_profile(
        profile_row=sample_profile_row, 
        display_name=False,
        display_diagnosis=False
        ),
        scenario=sample_scenario, 
        retriever=retriever)
print(example_library)

['Begint langzaam te verzwakken en heeft veel moeite met lopen en opstaan. Mogelijk fysiotherapie intensiveren en mantelzorgers informeren over veranderingen.', 'Signaleer toegenomen vermoeidheid en kortademigheid tijdens het uitvoeren van ADL-taken. Mogelijk verband met afgenomen conditie.', 'Mw was vandaag minder mobiel en had voortdurend hulp nodig bij transfers. Veranderingen in haar gezondheidstoestand worden gemonitord en ge\\u00ebvalueerd.', 'Dhr had vanmorgen veel moeite met opstaan en voelde zich zwak. Geduldig geholpen met ADL-taken en extra rustmomenten ingelast.', 'Mw heeft moeite met opstaan uit de stoel en vraagt regelmatig om assistentie. Zorgmomenten zorgvuldig plannen.', 'Mw lijkt steeds meer achteruit te gaan in mobiliteit. Extra aandacht en ondersteuning zijn nodig om de kwaliteit van leven en zelfstandigheid te behouden.', 'Mw vertoont steeds meer symptomen van beginnende dementie. Familie ingelicht en zorgplan ge\\u00ebvalueerd.', 'Mw klaagde over duizeligheid bij 

The example notes often contain gender-specific pronouns or titles. When these are included in the prompt the model tends to generate responses with incorrect pronouns for our client. Therefore, we need to filter the example library to exclude notes that use ‘mr’ or ‘mrs’.

In [16]:
def determine_client_gender(profile_row):
    """
    Determines the gender of the client based on their name.
    
    Args:
    - profile_row (pd.Series): A Pandas Series object containing the client's profile information.
    
    Returns:
    - str: 'female' if the name contains 'Mevrouw', 'male' if it contains 'Meneer', 
           and 'unknown' if neither is found.
    """
    name = profile_row['naam']
    if "Mevrouw" in name:
        return 'female'
    elif "Meneer" in name:
        return 'male'
    else:
        return 'unknown'

if verbose:
    print(f"The gender of {sample_profile_row['naam']} is: {determine_client_gender(sample_profile_row)}")
    

The gender of Meneer Jan de Vries is: male


In [17]:
def filter_notes_by_gender(notes, gender):
    """
    Filters example notes based on the specified gender to ensure relevance.
    
    Args:
    notes (list): List of example notes to filter.
    gender (str): Gender to filter for ('male', 'female', or 'unknown').
    
    Returns:
    list: Filtered list of example notes.
    """
    if gender == 'male':
        gender_words = ['mw', 'mevr', 'mvr', 'mevrouw']
    elif gender == 'female':
        gender_words = ['dhr', 'meneer']
    else:
        return notes  # No filtering for unknown gender

    def contains_gender_words(note):
        return any(gender_word in note.lower() for gender_word in gender_words)

    return [note for note in notes if not contains_gender_words(note)]

if verbose:
    ex_lib_gender_filtered = filter_notes_by_gender(
        notes=example_library, 
        gender=determine_client_gender(
            profile_row=sample_profile_row))
    
    [print('- '+item) for item in random.sample(ex_lib_gender_filtered, 5)]
    
    print(f"\nOorspronkelijk aantal voorbeelden: {len(example_library)}")
    print(f"Gefilterd aantal voorbeelden: {len(ex_lib_gender_filtered)}")

- Vanochtend vertoonde bewoner tekenen van verwardheid. Kon niet goed meer aangeven wie ze was of waar ze was. Veel herhaling in gesprek en moeite met concentreren.
- Dhr had vanmorgen veel moeite met opstaan en voelde zich zwak. Geduldig geholpen met ADL-taken en extra rustmomenten ingelast.
- Dhr had vandaag moeite met lopen en was erg wankel. Extra aandacht voor valrisico en mobiliteit.
- Dhr had moeite met opstaan en lopen, mobilisatie-oefeningen gedaan en hulpmiddelen ingezet voor ondersteuning.
- Client had vandaag moeite met concentreren en reageerde trager, alert zijn op mogelijk delier

Oorspronkelijk aantal voorbeelden: 40
Gefilterd aantal voorbeelden: 18


We'll be sampling a subset of this library

In [18]:
def sample_and_format_example_library(example_library, num_items=3):
    """
    Selects a random set of examples from the example library and returns them as a bulleted string.
    
    Args:
    example_library (list): List of example notes.
    num_items (int): Number of random items to select.
    
    Returns:
    str: A formatted string of the randomly selected notes.
    """
    random_items = random.sample(example_library, num_items)
    return '\n'.join(['- ' + item for item in random_items])

if verbose:
    ex_lib_sample = sample_and_format_example_library(ex_lib_gender_filtered, 5)
    print(ex_lib_sample)

- Client had vandaag moeite met concentreren en reageerde trager, alert zijn op mogelijk delier
- Dhr had moeite met opstaan en lopen, mobilisatie-oefeningen gedaan en hulpmiddelen ingezet voor ondersteuning.
- Tijdens de ochtendzorg viel op dat cli\u00ebnt moeite had met opstaan en wat wankel ter been was.
- Dhr. voelde zich vanochtend erg zwak en had moeite met opstaan uit bed. Stapsgewijs geholpen bij het uit bed komen en mobiliteit geoefend tijdens de ochtendzorg.
- Dhr had vanmorgen veel moeite met opstaan en voelde zich zwak. Geduldig geholpen met ADL-taken en extra rustmomenten ingelast.


## Setting up the summary memory

### Memory prompt & chain

Adding memory is important because it allows the generated care notes to reflect ongoing developments in a client’s condition and care. This ensures that each new set of notes builds on previous information, maintaining continuity in the narrative of the client’s health and daily experiences.

I chose to implement memory by using a summary-based approach. This turned out to be more effective than passing the last notes directly, because it provides context through the profile and the summary, maintaining the storyline. Meanwhile, example notes are refreshed each time, preventing the model from becoming repetitive and producing the same structure repeatedly.

Initially, a summary is created that includes the client’s profile and the scenario of the first month. As new care notes are generated, this summary is updated to reflect the latest information. This updated summary is then used as the context for generating subsequent notes, ensuring that the model retains and incorporates past details. 

In the prompt, the model is asked to update the summary based on the previous summary and the newly generated client notes. 

[todo: a note about the temp, for now, I chose to keep it relatively high]

In [19]:
#Structure for a single careplan item
class CarePlanItem(BaseModel):
    probleem: str = Field(description="beschrijving van het zorgprobleem of van de situatie")
    doel: str = Field(description="doel")
    acties: List[str] = Field(description="twee acties")

# Structure for multiple notes
class CarePlan(BaseModel):
    zorgplan: List[CarePlanItem]

In [20]:
careplan_model = ChatOpenAI(api_key=setup.get_openai_key(), temperature=temp_careplans, model=model_careplans)
careplan_parser = PydanticOutputParser(pydantic_object=CarePlan)
careplan_format_instructions = careplan_parser.get_format_instructions()
if verbose:
    print(careplan_format_instructions)

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"$defs": {"CarePlanItem": {"properties": {"probleem": {"description": "beschrijving van het zorgprobleem of van de situatie", "title": "Probleem", "type": "string"}, "doel": {"description": "doel", "title": "Doel", "type": "string"}, "acties": {"description": "twee acties", "items": {"type": "string"}, "title": "Acties", "type": "array"}}, "required": ["probleem", "doel", "acties"], "title": "CarePlanItem", "type": "object"}}, "properties": {"zorgplan": {"items": {"$ref": "#/$defs/CarePlanItem"}, "title": "Zorgplan", "type": "array"}}, "requi

In [21]:
# Template for updating the client summary based on new care notes
careplan_template = """Hieronder staat:
1. Het profiel van een client(e) die verblijft op een psychogeriatrische afdeling van het verpleeghuis. 
2. Het zorgplan op de indexdatum
3. Nieuwe zorgrapportages vanaf de indexdatum

PROFIEL:
{profile}

ZORGPLAN:
{careplan}

NIEUWE RAPPORTAGES:
{new_notes}

Schrijf een nieuw zorgplan op basis van drie zorgproblemen die de meeste invloed hebben op het welzijn van client. Neem belangrijke aspecten van het eerdere zorgplan over en pas aan op basis van gebeurtenissen en zorgvraag uit de rapportages. 

Zorg dat de ambities realistisch zijn. Beschrijf alleen zorgaspecten, doe geen uitspraken over behandeling, betrekken van artsen of therapeuten.
Beschrijf bijvoorbeeld welke hulp bij de ADL wordt geboden, mobiliteit en transfers, omgang met probleemgedrag. 
Maak uitsluitend gebruik van problemen en acties die worden genoemd in de rapportages. 

{format_instructions}
"""

careplan_prompttemplate = PromptTemplate(
    template = careplan_template,
    input_variables=["profile", "careplan", "new_notes"],
    partial_variables={"format_instructions": careplan_format_instructions},
)

if verbose:
    # Since we don't have any notes yet we'll be using example notes as new_notes
    examples = sample_and_format_example_library(ex_lib_gender_filtered, num_items=20)
    profile = format_client_profile(
        profile_row=sample_profile_row,
        display_name=False
    )
    prompt_careplan = careplan_prompttemplate.format(
        profile=profile,
        careplan="Er is nog geen zorgplan, client is pas opgenomen",
        new_notes=sample_and_format_example_library(ex_lib_gender_filtered, 10)
    )
    print(prompt_careplan)

ValueError: Sample larger than population or is negative

In [22]:
careplan_chain = careplan_prompttemplate | careplan_model | careplan_parser

def generate_careplan(profile, careplan_old, notes_new, careplan_chain):
    """
    Generates a care plan by invoking the careplan_chain with the provided profile, care plan, and new notes.
    
    Args:
    - profile: Profile of the client (string).
    - careplan: The current care plan (string). If none exists, provide a note such as "No care plan, client has just been admitted".
    - new_notes: The new notes that should be considered when generating the care plan.
    - careplan_chain: The chain object that processes the profile, care plan, and new notes to generate the updated care plan.
    - df_profiles: The DataFrame containing client profile data, used to format the profile.

    Returns:
    - str: The newly generated care plan.
    """
    
    careplan_new = careplan_chain.invoke({
        "profile": profile,
        "careplan": careplan_old,
        "new_notes": notes_new
    })
    
    return careplan_new.zorgplan

if verbose:
    careplan_new = generate_careplan(
        profile=format_client_profile(sample_profile_row, display_name=False),
        careplan_old="Er is nog geen zorgplan, client is pas opgenomen",
        notes_new=sample_and_format_example_library(ex_lib_gender_filtered, 10),
        careplan_chain=careplan_chain
    )
    for i in careplan_new:
        print(i)
        print(sep_line)

probleem='Toegenomen vermoeidheid en kortademigheid tijdens ADL-taken' doel='Verbeteren van conditie en toename van energie' acties=['Dagelijkse rustmomenten inplannen tussen de activiteiten door', 'Bespreekbaar maken van vermoeidheid en kortademigheid met het multidisciplinaire team']
----------------------------------------------------------------------------------------------------
probleem='Moeite met lopen en opstaan vanwege rugklachten' doel='Verbetering van mobiliteit en verminderen van pijnklachten' acties=['Fysiotherapie intensiveren en bekijken of er andere methoden zijn om pijn te verlichten', 'Extra ondersteuning bieden bij mobiliteit en transfers om rugklachten te ontlasten']
----------------------------------------------------------------------------------------------------
probleem='Toenemende verwardheid en concentratieproblemen' doel='Stabiliseren van cognitieve functies en verminderen van angst bij verwardheid' acties=['Dagelijkse geheugenoefeningen en stimuleren van 

In [23]:
def format_careplan_items(careplan_items):
    """
    Formats a list of CarePlanItem objects into a readable care plan text.
    
    Args:
    - careplan_items: A list of CarePlanItem objects, each containing a problem, goal, and actions.
    
    Returns:
    - str: A formatted string representing the care plan.
    """
    formatted_plan = ""
    
    for i, item in enumerate(careplan_items, 1):
        formatted_plan += f"Probleem: {item.probleem}\n"
        formatted_plan += f"Doel: {item.doel}\n"
        formatted_plan += "Acties:\n"
        for action in item.acties:
            formatted_plan += f"- {action}\n"
        formatted_plan += "\n"
    
    return formatted_plan.strip()

if verbose:
    print(format_careplan_items(careplan_new))

Probleem: Toegenomen vermoeidheid en kortademigheid tijdens ADL-taken
Doel: Verbeteren van conditie en toename van energie
Acties:
- Dagelijkse rustmomenten inplannen tussen de activiteiten door
- Bespreekbaar maken van vermoeidheid en kortademigheid met het multidisciplinaire team

Probleem: Moeite met lopen en opstaan vanwege rugklachten
Doel: Verbetering van mobiliteit en verminderen van pijnklachten
Acties:
- Fysiotherapie intensiveren en bekijken of er andere methoden zijn om pijn te verlichten
- Extra ondersteuning bieden bij mobiliteit en transfers om rugklachten te ontlasten

Probleem: Toenemende verwardheid en concentratieproblemen
Doel: Stabiliseren van cognitieve functies en verminderen van angst bij verwardheid
Acties:
- Dagelijkse geheugenoefeningen en stimuleren van oriëntatie in tijd en plaats
- Creëren van een rustige en gestructureerde omgeving om verwardheid te verminderen


## Generating client notes

Our goal is populate the client record by generating care notes reflecting the client profile and describing the scenario. 
In practice, the number and length of notes per day vary based on clinical circumstances. Stable clients typically have fewer and shorter notes than ill or agitated clients. (I might add this functionality in the future.) Currently, I chose to generate three notes per day. Through trial and error, I found that the model can reliably generate nine notes per prompt. 
Having a scenario-twist every 3 days is not very realistic. I have scenario descriptions per 'month'. To populate a client record for an entire month, I could decide to break down the monthly scenario into smaller segments that fit the three-day prompt structure. I chose, however, to give the model the scenario in the first iteration and allowing it some creative freedom to build upon it, relying on the summaries to maintain continuity and build upon previous notes.

For parsing, we use pydantic models. These are remarkably well understood by the LLM, and offer several benefits:
- It enforces consistent output structure
- It automates parsing of the output
- The field description further guides the LLM model in generating appropriate responses. 
- It catches invalid data early, preventing downstream issues.

In our main prompt defined below, we ask the LLM to generate multiple care notes. Our structure consists of two classes: one to structure a single note (CareNote) and a second class defined as a list of care notes (CareNotes).

CareNote represents a single care note with the fields dag, tijd, and rapportage:
- **'dag'**: Even though the sequence number of the day ("dag") isn't meaningful, it forces the model to respond with the number of days requested.  
- **'tijd'**: The field 'time' ("tijd") was chosen over 'daypart' ("dagdeel") because the model tended to link 'daypart' to breakfast, lunch, and dinner, which led to many notes describing lunch details.  
- **'rapportage'**: This is the main challenge, what it's all about... 

CareNotes is a container for multiple care notes, consisting of a single field **'notes'**, which is a list of CareNote instances.

In [24]:
#Structure for a single care note
class CareNote(BaseModel):
    dag: int = Field(description="volgnummer dag")
    tijd: str = Field(description="tijd van de rapportage (hh:mm)")
    rapportage: str = Field(description="Inhoud van de rapportage. Een rapportage beschrijft over het algemeen één zorgaspect, soms meer")

# Structure for multiple notes
class CareNotes(BaseModel):
    notes: List[CareNote]

In [25]:
notes_model = ChatOpenAI(
    api_key=setup.get_openai_key(), 
    temperature=temp_notes, 
    model=model_notes, 
    max_tokens=2048)
notes_parser = PydanticOutputParser(pydantic_object=CareNotes)
notes_format_instructions = notes_parser.get_format_instructions()
if verbose:
    print(notes_format_instructions)

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"$defs": {"CareNote": {"properties": {"dag": {"description": "volgnummer dag", "title": "Dag", "type": "integer"}, "tijd": {"description": "tijd van de rapportage (hh:mm)", "title": "Tijd", "type": "string"}, "rapportage": {"description": "Inhoud van de rapportage. Een rapportage beschrijft over het algemeen één zorgaspect, soms meer", "title": "Rapportage", "type": "string"}}, "required": ["dag", "tijd", "rapportage"], "title": "CareNote", "type": "object"}}, "properties": {"notes": {"items": {"$ref": "#/$defs/CareNote"}, "title": "Notes", "

### Note generation prompts & chain

In [36]:
notes_template = """Lees dit scenario:
SCENARIO:
{scenario}

Jouw taak is om rapportages te schrijven op basis van dit scenario van een fictieve client die verblijft op een psychogeriatrische afdeling van het verpleeghuis. Zorg voor een duidelijke opbouw in de rapportages en volg het scenario. 

Schrijf rapportages voor een week (7 dagen). Per dag worden drie rapportages geschreven, dus er zijn 21 rapportages totaal.

Hou rekening met de beperkingen van de client en het eerder opgestelde zorgplan.  

PROFIEL:
{profile}

ZORGPLAN:
{careplan}

INSTRUCTIES:
- Vermijd het noemen van de naam.
- Je spreekt de taal van een niveau 3 zorgmedewerker (Verzorgende IG).  Varieer in zinsopbouw en stijl, soms zijn de rapportages langer en meer gedetailleerd.
- Zorg dat de beschreven zorg realistisch is. Beschrijf alleen zorgaspecten, doe geen uitspraken over behandeling, betrekken van artsen of therapeuten. Beschrijf bijvoorbeeld welke hulp bij de ADL wordt geboden, mobiliteit en transfers, omgang met probleemgedrag. 

VOORBEELD RAPPORTAGES:
{examples}

{format_instructions}
"""

notes_prompt_template = PromptTemplate(
    template=notes_template,
    input_variables=["profile", "careplan", "scenario", "examples"],
    partial_variables={"format_instructions": notes_format_instructions},
)

if verbose:
    notes_prompt = notes_prompt_template.format(
        profile = sample_profile,
        careplan = format_careplan_items(careplan_new),
        # careplan = "Er is nog geen zorgplan, client is net opgenomen", 
        scenario = sample_scenario,
        examples = sample_and_format_example_library(ex_lib_gender_filtered,5), 
        )
    print(notes_prompt)

Lees dit scenario:
SCENARIO:
De professionele begeleiding en goede zorg dragen bij aan de verbetering van meneer Jan's gemoedstoestand en fysieke gezondheid. Hij voelt zich steeds meer op zijn gemak in het verpleeghuis.

Jouw taak is om rapportages te schrijven op basis van dit scenario van een fictieve client die verblijft op een psychogeriatrische afdeling van het verpleeghuis. Zorg voor een duidelijke opbouw in de rapportages en volg het scenario. 

Schrijf rapportages voor een week (7 dagen). Per dag worden drie rapportages geschreven, dus er zijn 21 rapportages totaal.

Hou rekening met de beperkingen van de client en het eerder opgestelde zorgplan.  

PROFIEL:
Naam: Meneer Jan de Vries
Type Dementie: Alzheimer
Lichamelijke klachten: Diabetes, gehoorproblemen
ADL: Afhankelijk van hulp bij aankleden, wassen en toiletgang
Mobiliteit: Gebruik van een rollator, valgevaar
Cognitie/gedrag: Rustig, soms verward, moeite met oriëntatie in tijd en plaats

ZORGPLAN:
Probleem: Toegenomen verm

Now we create a chain. The output of the chain is a structured list of care notes for a client, encapsulated in an object called CareNotes. This object contains an attribute named notes, which is a list of individual CareNote entries. Each CareNote entry includes three pieces of information(dag, tijd, rapportage)

In [32]:
# Create a chain of operations: prompt template -> model -> output parser
notes_chain = notes_prompt_template | notes_model | notes_parser

def generate_notes(profile, careplan, scenario, examples):
    """
    """
    
    try:
        notes = notes_chain.invoke({
            "profile": profile,
            "careplan": careplan,
            "scenario": scenario,
            "examples": examples, 
        })
    except Exception as e:
        # Try once more in case of failure
        print(f"Error in generating notes, retrying: {e}")
        notes = notes_chain.invoke({
            "profile": profile,
            "careplan": careplan,
            "scenario": scenario,
            "examples": examples, 
        })
        print("Retry successful")

    return notes.notes


if verbose:
    print(f"PROFILE:\n{sample_profile}")
    print(sep_line)
    print(f"CAREPLAN:\n{careplan_new}")
    print(sep_line)
    print(f"SCENARIO:\n{sample_scenario}")
    print(sep_line)
    test_notes = generate_notes(
        profile=sample_profile,
        careplan=careplan_new,
        scenario=sample_scenario, 
        examples=sample_and_format_example_library(ex_lib_gender_filtered, 1)
    )

    print('***The result of model:')
    print(test_notes)

    print('\n***And these are the individual notes')
    for note in test_notes:
        print(note)

PROFILE:
Naam: Meneer Jan de Vries
Type Dementie: Alzheimer
Lichamelijke klachten: Diabetes, gehoorproblemen
ADL: Afhankelijk van hulp bij aankleden, wassen en toiletgang
Mobiliteit: Gebruik van een rollator, valgevaar
Cognitie/gedrag: Rustig, soms verward, moeite met oriëntatie in tijd en plaats
----------------------------------------------------------------------------------------------------
CAREPLAN:
[CarePlanItem(probleem='Toegenomen vermoeidheid en kortademigheid tijdens ADL-taken', doel='Verbeteren van conditie en toename van energie', acties=['Dagelijkse rustmomenten inplannen tussen de activiteiten door', 'Bespreekbaar maken van vermoeidheid en kortademigheid met het multidisciplinaire team']), CarePlanItem(probleem='Moeite met lopen en opstaan vanwege rugklachten', doel='Verbetering van mobiliteit en verminderen van pijnklachten', acties=['Fysiotherapie intensiveren en bekijken of er andere methoden zijn om pijn te verlichten', 'Extra ondersteuning bieden bij mobiliteit en t

In [33]:
# Function formats the notes and returns them as a string for display
def format_notes(notes, simple_bulleted=True):
    note_strings = []
    for note in notes:
        if simple_bulleted:
            note_strings.append(f"- {note.rapportage}")
        else:
            note_strings.append(f"Dag {note.dag} ({note.tijd}): {note.rapportage}")
    return "\n".join(note_strings)

if verbose:
    print(format_notes(test_notes))

- Meneer Jan werd vanochtend geholpen met aankleden en wassen. Hij was wat onrustig en had moeite met het aantrekken van zijn sokken vanwege rugklachten. Extra ondersteuning geboden bij transfers.
- Tijdens de lunch had Meneer Jan moeite met het vasthouden van zijn bestek vanwege gehoorproblemen. Rustig en geduldig ondersteund bij de maaltijd en aangespoord om voldoende te drinken vanwege diabetes.
- Meneer Jan klaagde vanavond over vermoeidheid en kortademigheid. Geadviseerd om dagelijkse rustmomenten in te lassen tussen activiteiten door. Vermoeidheid besproken met het team.
- Vandaag had Meneer Jan vroeg in de ochtend moeite met opstaan vanwege rugklachten. Intensivering van fysiotherapie overwogen om mobiliteit te verbeteren. Extra voorzichtigheid bij transfers.
- Tijdens de lunch had Meneer Jan moeite met concentreren en was verward over de dag van de week. Geheugenoefeningen gedaan en stimulatie van oriëntatie in tijd en plaats aangeboden.
- Meneer Jan had vanavond last van vermo

In [34]:
def process_client(df_profile_row, df_scenarios, careplan_chain, notes_chain, careplan_initial=None, verbose=True):
    """
    Loops through all scenarios for a single client, generates notes and careplans, 
    and returns the results along with their metadata.
    
    Args:
    - df_profile_row: A Pandas Series containing the client's profile information.
    - df_scenarios: A Pandas DataFrame containing the scenarios for the client.
    - careplan_chain: The chain responsible for generating and updating careplans.
    - notes_chain: The chain responsible for generating care notes.
    - careplan_initial: Optionally, the initial careplan for the client. Defaults to 'None' if there's no initial careplan.
    - verbose: Displays debug information if set to True.
    
    Returns:
    - A list of careplans including metadata.
    - A list of care notes including metadata.
    """
    client_id = df_profile_row['client_id']
    client_name = df_profile_row['naam']
    # Format the client profile
    profile = format_client_profile(df_profile_row, display_name=False)
    client_gender = determine_client_gender(df_profile_row)

    # Filter the scenarios that belong to this client
    client_scenarios = df_scenarios[df_scenarios['client_id'] == client_id]
    
    # Initialize variables to store notes and careplans with metadata
    all_careplans = []
    all_notes = []
    
    # Start with an initial careplan, or use a default string if none exists
    current_careplan = careplan_initial if careplan_initial else "Er is nog geen zorgplan, client is net opgenomen"

    retriever = vectordb.as_retriever(search_kwargs={"k": 20})

    # Loop over the scenarios
    with get_openai_callback() as cb:
        for idx, scenario_row in client_scenarios.iterrows():
            
            # Retrieve the current scenario description
            scenario = scenario_row['events_description']
            scenario_id = scenario_row['scenario_id']
            if verbose:
                print(f"Processing scenario {idx}:\n{scenario}")
                print(sep_line)
            
            # Generate example notes based on the client profile and scenario
            example_library = retrieve_examples(
                profile=profile,
                scenario=scenario,
                retriever=retriever
            )
            ex_lib_gender_filtered = filter_notes_by_gender(
                notes=example_library, 
                gender=client_gender
            )
            ex_lib_sample = sample_and_format_example_library(
                example_library=ex_lib_gender_filtered,
                num_items=5
            )
            
            # Generate care notes for the current scenario
            if verbose:
                print("Generating NOTES")
            result_notes = generate_notes(
                profile=profile, 
                careplan=current_careplan, 
                scenario=scenario, 
                examples=ex_lib_sample
            )
            
            # Add generated notes to the notes list along with metadata
            for note in result_notes:
                all_notes.append({
                    "client_id": client_id,
                    "scenario_id": scenario_id,
                    "dag": note.dag,
                    "tijd": note.tijd,
                    "rapportage": note.rapportage,
                })

            formatted_notes = format_notes(result_notes)
            if verbose:
                print(formatted_notes)
                print(sep_line)

            # Generate a new careplan based on the new notes
            if verbose:
                print("Generating CAREPLAN")
            careplan_new = generate_careplan(
                profile=profile, 
                careplan_old=current_careplan, 
                notes_new=formatted_notes,
                careplan_chain=careplan_chain
            )
            
            # Add the generated careplan to the careplans list along with metadata
            all_careplans.append({
                "client_id": client_id,
                "scenario_id": scenario_id,
                "careplan": careplan_new,
            })
            
            # Update the current careplan for the next iteration
            current_careplan = careplan_new
            
            if verbose:
                print(format_careplan_items(current_careplan))
                print(sep_line)
    print(cb)
    
    return all_careplans, all_notes


In [35]:
cpn, nts = process_client(
    df_profile_row=sample_profile_row, 
    df_scenarios=df_scenarios,
    careplan_chain=careplan_chain,
    notes_chain=notes_chain,
    careplan_initial=None,
    verbose=True
)

Processing scenario 0:
Meneer Jan verhuist naar het verpleeghuis en begint langzaam te wennen aan de nieuwe omgeving. Hij ontvangt zorg bij aankleden, wassen en toiletgang, wat zijn stemming en lichamelijke klachten verbetert.
----------------------------------------------------------------------------------------------------
Generating NOTES
- Client had vanochtend moeite met herkennen van bekende gezichten. Symptoom van gevorderde dementie.
- Meneer Jan leek vandaag verward en wist niet goed waar hij was. Mogelijk verergering van dementie.
- Dhr had vanmiddag extra hulp nodig bij het aankleden vanwege zijn mobiliteitsproblemen.
- Meneer Jan had veel moeite met lopen en opstaan vanmorgen. Geholpen met transfers en mobiliteit geoefend.
- Client was vanmiddag rustig en genoot van een wandeling met de rollator in de tuin.
- Dhr leek vanavond verward en onrustig, extra aandacht geboden om te kalmeren.
- Meneer Jan voelde zich vanochtend zwak en moe. Geholpen met ADL-taken en rustmomenten 

In [None]:
#-----------------
# Haal de rij van de sample cliënt uit het df_profiles DataFrame
df_profile_row = df_profiles[df_profiles['client_id'] == sample_client].iloc[0]

# Roep de functie aan met de sample cliënt
all_careplans, all_notes = process_client(
    df_profile_row=df_profile_row, 
    df_scenarios=df_scenarios, 
    careplan_chain=careplan_chain, 
    notes_chain=notes_chain, 
    verbose=True
)

# Print de resultaten
print("\n*** Alle gegenereerde zorgplannen ***")
for careplan in all_careplans:
    for item in careplan.zorgplan:
        print(f"Probleem: {item.probleem}\nDoel: {item.doel}\nActies: {', '.join(item.acties)}\n")

print("\n*** Alle gegenereerde rapportages ***")
for note in all_notes:
    print(f"Dag {note.dag} ({note.tijd}): {note.rapportage}\n")

In [30]:
#WIP
def generate_notes_and_careplan(profile_row, df_scenarios, careplan_old, notes_chain, careplan_chain):

    # Create the example library based on the current scenario
    example_library = create_example_library(row=client, scenario=scenario_text)

    # Generate care notes using the example library, client profile, and current careplan
    notes = notes_chain.invoke({
        "profile": profile,
        "careplan": careplan_old,
        "scenario": scenario,
        "examples": format_sample_example_notes(example_library, 3),
    })

    # Update the careplan based on the newly generated notes
    careplan_new = careplan_chain.invoke({
        "profile": profile,
        "careplan": careplan_old,
        "new_notes": format_notes(notes.notes)
    })

    # Return the updated careplan and generated care notes
    return careplan_new, notes


In [None]:
cpn, nts = generate_notes_and_careplan(client=sample_client, scenario_id=2,careplan_old=sample_careplan, notes_chain=notes_chain, careplan_chain=careplan_chain)

In [None]:
for zp in cpn.zorgplan:
    print(zp)

In [None]:
for n in nts.notes:
    print(n)

In [52]:
cpn2, nts2 = generate_notes_and_careplan(client=sample_client, scenario_id=3, careplan_old=cpn.zorgplan, notes_chain=notes_chain, careplan_chain=careplan_chain)

In [None]:
for zp in cpn2.zorgplan:
    print(zp)

for n in nts2.notes:
    print(n)

In [None]:
# Example for three periods:
summary = format_scenario(client, scenario_id=1)
for period in range(1, 3):
    summary, notes = generate_notes_and_summary(client, scenario, summary, period, notes_chain, careplan_chain, create_example_library, format_sample_example_notes, format_client_profile)
    print(f"NOTES {period}")
    print(format_notes(notes.notes))
    print(f"SUMMARY {period}")
    print(summary)

In [None]:
## WIP
def generate_care_notes(profile_row, num_examples=3):  
    """
    Returns:
    -  list of careplans.
    -  list of care notes.
    """
    profile = format_client_profile(profile_row)
    scenario = format_scenario(profile_row, scenario_id=1)
    client_id = profile_row['client_id']
    summary = summary_list[-1]

    for i in range(num_iterations):
        iteration = i + 1
        try:
            print(f'Scenario: {scenario}')
            if iteration > 1:
                # Update the scenario to let the model build upon the scenario
                scenario = "Bouw voort op de gegevens uit het Profiel en het Beloop."

            # Sample examples from the example library
            examples = format_sample_example_notes(example_library, num_examples)

            # Generate care notes using the model and the example library
            # There are frequent 'Invalid json output' errors. In that case, try again
            try:
                result_notes = notes_chain.invoke({
                    "examples": examples,
                    "profile": profile,
                    "summary": summary,
                    "scenario": scenario
                })
            except Exception as e:
                # Try once more in case of failure
                print(f"Error in iteration {iteration}, retrying: {e}")
                result_notes = notes_chain.invoke({
                    "examples": examples,
                    "profile": profile,
                    "summary": summary,
                    "scenario": scenario
                })
                print("Retry successful")

            # Update the summary based on the new care notes
            result_memory = careplan_chain.invoke({
                 "profile": profile,
                 "summary": summary,
                 "new_notes": format_notes(result_notes.notes)
            })

            # Add generated notes to the notes list
            for note in result_notes.notes:
                notes_list.append({
                    "client_id": client_id,
                    "month": month_no,
                    "iteration": iteration,
                    "dag": note.dag,
                    "tijd": note.tijd,
                    "rapportage": note.rapportage,
                })

            # add_notes_to_list(result_notes.notes, notes_list, client_id, month_no, iteration)

            # Update the memory with new notes and generate a new summary
            summary = result_memory.content

            # Add the updated summary to the summary list 
            summary_list.append({
                "client_id": client_id,
                "month": month_no,
                "iteration": iteration,
                "summary": summary,
            })

        except Exception as e:
            print(f"Error in iteration {iteration}: {e}")
            continue

    return summary_list, notes_list

if verbose:
    notes_list = []
    summary_list = []
    summary_list.append({
        "client_id": sample_client['client_id'],
        "month": 0,
        "iteration": 0,
        "summary": format_scenario(sample_client, month_no=1),
        })

    summary_list, notes_list = generate_care_notes(
        summary_list=summary_list, 
        notes_list=notes_list, 
        profile_row=sample_client, 
        example_library=example_library,
        month_no=1,
        num_iterations=1,
        num_examples=3)
    
    for cs in summary_list:
        print(cs)
    print(100*'-')
    for n in notes_list:
        print(n)
    

Next, let's have a look at the generation of care notes and summaries for a single client. We need to process each client’s data individually. This involves iterating through their associated scenarios and generating relevant notes. 

In [None]:
# Processe a single client to generate care notes and careplans.
def process_client(profile_row, df_scenarios, num_iterations=5, num_examples=3):
    all_notes_list = []
    all_careplans_list = []

    try:
        with get_openai_callback() as cb:
            print(f"Processing client: {profile_row['naam']}")

            careplan_list = []
            notes_list = []

            client_id = profile_row['client_id']
            # # As initial summary, we take the scenario of the first month
            # careplan_list.append({
            #     "client_id": profile_row['client_id'],
            #     "month": 0,
            #     "iteration": 0,
            #     "summary": format_scenario(profile_row=profile_row, month_no=1),
            #     })

            # Select the scenario rows for the client
            df_client_scenarios = df_scenarios.loc[df_scenarios['client_id'] == client_id, ['scenario_id', 'events_description']]  

            num_scenarios = len(df_client_scenarios)

            scenario_no = 1    
            for i, month_scenario in df_client_scenarios.iterrows():  
                scenario = format_scenario(profile_row, scenario_no)
                example_library = create_example_library(profile_row, scenario)

                print(f'Generating notes for month: {scenario_no} of {num_scenarios} for client {client_id}')
                careplan_list, notes_list = generate_care_notes(
                    careplan_list=careplan_list,
                    notes_list=notes_list, 
                    profile_row=profile_row,
                    scenario_no=scenario_no,
                    example_library=example_library,
                    num_iterations=num_iterations,
                    num_examples=num_examples,
                    )
                scenario_no += 1

            # Add client_id to notes and summaries
            for note in notes_list:
                note['client_id'] = client_id
            for summary in summary_list:
                all_summaries_list.append({'client_id': client_id, 'summary': summary, 'month': month_no})  

            all_notes_list.extend(notes_list)
            print(cb)

    except Exception as e:
        print(f"Error processing client {profile_row['naam']}: {e}")

    return all_notes_list, all_summaries_list

if verbose:
    notes_list, summaries_list = process_client(profile_row=sample_client, df_scenarios=df_scenarios, num_iterations=2, num_examples=3)
    for cs in summaries_list:
        print(cs)

    for n in notes_list:
        print(n)
        
# # Processe a single client to generate care notes and summaries.
# def process_client(profile_row, df_scenarios, num_iterations=5, num_examples=3):
#     all_notes_list = []
#     all_summaries_list = []

#     try:
#         with get_openai_callback() as cb:
#             print(f"Processing client: {profile_row['naam']}")

#             summary_list = []
#             notes_list = []

#             client_id = profile_row['client_id']
#             # As initial summary, we take the scenario of the first month
#             summary_list.append({
#                 "client_id": profile_row['client_id'],
#                 "month": 0,
#                 "iteration": 0,
#                 "summary": format_scenario(profile_row=profile_row, month_no=1),
#                 })

#             # Select the scenario rows for the client
#             df_client_scenarios = df_scenarios.loc[df_scenarios['client_id'] == client_id, ['journey', 'month']]  

#             num_months = len(df_client_scenarios)

#             month_no = 1    
#             for i, month_scenario in df_client_scenarios.iterrows():  
#                 scenario = format_scenario(profile_row, month_no)
#                 example_library = create_example_library(profile_row, scenario)

#                 print(f'Generating notes for month: {month_no} of {num_months} for client {client_id}')
#                 summary_list, notes_list = generate_care_notes(
#                     summary_list=summary_list,
#                     notes_list=notes_list, 
#                     profile_row=profile_row,
#                     month_no=month_no,
#                     example_library=example_library,
#                     num_iterations=num_iterations,
#                     num_examples=num_examples,
#                     )
#                 month_no += 1

#             # Add client_id to notes and summaries
#             for note in notes_list:
#                 note['client_id'] = client_id
#             for summary in summary_list:
#                 all_summaries_list.append({'client_id': client_id, 'summary': summary, 'month': month_no})  

#             all_notes_list.extend(notes_list)
#             print(cb)

#     except Exception as e:
#         print(f"Error processing client {profile_row['naam']}: {e}")

#     return all_notes_list, all_summaries_list

# if verbose:
#     notes_list, summaries_list = process_client(profile_row=sample_client, df_scenarios=df_scenarios, num_iterations=2, num_examples=3)
#     for cs in summaries_list:
#         print(cs)

#     for n in notes_list:
#         print(n)

In [None]:
# # Iterate through all clients, processes each one, and saves the generated notes and summaries to CSV files.
# def process_clients(df_profiles, df_scenarios):
#     all_notes_list = []
#     all_summaries_list = []

#     for idx, row in df_profiles.iterrows():
#         try:
#             notes, summaries = process_client(row, df_scenarios)
#             all_notes_list.extend(notes)
#             all_summaries_list.extend(summaries)

#             df_notes = pd.DataFrame(all_notes_list)
#             df_summaries = pd.DataFrame(all_summaries_list)

#             # save after each client to prevent having to start over in case of an error
#             df_notes.to_csv(path_notes, index=False)
#             df_summaries.to_csv(path_summaries, index=False)
#         except Exception as e:
#             print(f"Error processing client: {e}")

# process_clients(df_profiles=df_profiles, df_scenarios=df_scenarios)

def process_clients(df_profiles, df_scenarios, path_notes='notes.csv', path_careplans='careplans.csv'):
    """
    Main function to generate care notes and care plans for each client based on scenarios.
    Iterates through all clients, processes each one to generate care notes and care plans,
    and saves the generated data to CSV files after each client to prevent data loss in case of an error.
    
    Parameters:
    df_profiles: DataFrame containing client profiles.
    df_scenarios: DataFrame containing scenarios for clients.
    path_notes: Path to save the care notes CSV file (default is 'notes.csv').
    path_careplans: Path to save the care plans CSV file (default is 'careplans.csv').
    
    Returns:
    df_notes, df_careplans
    """
    all_notes_list = []
    all_careplans_list = []

    for idx, row in df_profiles.iterrows():
        try:
            notes, careplans = process_client(row, df_scenarios)
            all_notes_list.extend(notes)
            all_careplans_list.extend(careplans)

            df_notes = pd.DataFrame(all_notes_list)
            df_careplans = pd.DataFrame(all_careplans_list)

            df_notes.to_csv(path_notes, index=False)
            df_careplans.to_csv(path_careplans, index=False)

        except Exception as e:
            print(f"Error processing client: {e}")

    return df_notes, df_careplans


In [None]:
def update_dag_counter(df):
    """
    This function updates the 'dag' column in df such that it maintains a running counter per client and per month.
    The counter starts at 1 and increments by 1 each time the 'dag' value changes. The counter resets to 1 for each new client and month.
    """
    # Add a column to shift 'dag' values by one row within each group of client_id and month
    df['dag_shift'] = df.groupby(['client_id', 'month'])['dag'].shift(1)
    # Create a column indicating if 'dag' has changed compared to the previous row
    df['dag_changed'] = (df['dag'] != df['dag_shift']).astype(int)
    # Create a column indicating the start of a new group (client_id and month)
    df['group_changed'] = df.groupby(['client_id', 'month']).cumcount() == 0
    
    # Update 'group_changed' to be False if 'dag_shift' is NaN
    df['group_changed'] = df['group_changed'] & df['dag_shift'].notna()
    # Create the counter ('teller') by cumulatively summing 'dag_changed' within each group and adding 'group_changed'
    df['dag'] = df.groupby(['client_id', 'month'])['dag_changed'].cumsum() + df['group_changed']
    
    # Remove the temporary columns used for calculations
    df.drop(columns=['dag_shift', 'dag_changed', 'group_changed'], inplace=True)
    
    return df


In [None]:
dfn = pd.read_csv(path_notes)
dfn = update_dag_counter(dfn)
dfn.to_csv(path_notes, index=False)