<a href="https://colab.research.google.com/github/ekrombouts/GenCareAI/blob/main/notebooks/100_note_generation/140_GenerateClientRecords.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GenCare AI: Generating client records

**Author:** Eva Rombouts  
**Date:** 2024-06-15  
**Updated:** 2024-09-01  
**Version:** 1.2

### Description
This script generates synthetic care notes and summaries for clients in a psychogeriatric ward using the OpenAI GPT-3.5-turbo model.  
The care notes are based on client profiles and scenarios generated in earlier scripts in this repo. The goal is to add as much story and variation to the notes as possible, without letting the model produce outputs that are overly creative.  
To achieve this, I use structured prompts (to help the model understand the expected content and its creative liberties), example libraries and memory integration.  
The output parser uses Pydantic models to structure and validate the care notes, ensuring proper format and content.  
Chroma is used to retrieve example care notes that are representative of the client profile and scenario. 

The script processes client profiles and scenarios, generates care notes, and updates summaries accordingly, creating comprehensive client records.  
The goal is to create a diverse and realistic dataset for NLP experiments in nursing homes.

**Please note** that generating data with OpenAI is not free. Generating records for 24 clients with a mean of 8 months, 5 iterations per month takes about 3 hours en costs appr $2,- with gpt-3.5

In [None]:
!pip install GenCareAI
from GenCareAI.GenCareAIUtils import GenCareAISetup

setup = GenCareAISetup()

if setup.environment == 'Colab':
        !pip install -q langchain langchain-openai langchain-community langchain-chroma

In [None]:
# Imports
import random
import pandas as pd
from pprint import pprint
from typing import List

from langchain.output_parsers import PydanticOutputParser, CommaSeparatedListOutputParser
# from langchain.prompts import ChatPromptTemplate 
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_community.callbacks import get_openai_callback

In [None]:
# Paths to various data files and constants for the model and temperature settings.
PATH_DB_GCAI = setup.get_file_path('data/chroma_db_gcai_notes')
WARD_NAME = 'Athena' # Make sure to use the same ward-name for which clientprofiles and -scenarios have been generated
PATH_PROFILES = setup.get_file_path(f'data/gcai_client_profiles_{WARD_NAME}.csv')
PATH_SCENARIOS = setup.get_file_path(f'data/gcai_client_scenarios_{WARD_NAME}.csv')
PATH_NOTES = setup.get_file_path(f'data/gcai_client_notes_{WARD_NAME}.csv')
PATH_SUMMARIES = setup.get_file_path(f'data/gcai_client_summaries_{WARD_NAME}.csv')

COLLECTION_NAME = 'anonymous_notes'
MODEL = 'gpt-3.5-turbo-0125'
MODEL_EMBEDDINGS = 'text-embedding-ada-002'
TEMP = 1.1

VERBOSE = True # Set to True for debugging / printing
TEST_CLIENT_ROW_NUMBER = 2
SEP_LINE = 100*'-'

### Load data

In the notebooks [110_GenerateClientProfiles.ipynb]() and [120_GenerateClientScenarios.ipynb](), datasets are generated with client profiles and scenarios, respectively.

- ***df_profiles***: Contains one row per client. The row describes the type of dementia the client is diagnosed with, the ADL care needs, medical symptoms and diseases, mobility, and behavior.
  
- ***df_scenarios***: Each client has zero or more scenario lines. These are referred to as month numbers, but they do not necessarily correspond to actual months.

In [None]:
# Load scenarios and profiles from CSV files
df_scenarios = pd.read_csv(PATH_SCENARIOS)
df_profiles = pd.read_csv(PATH_PROFILES)

if VERBOSE:
    print(df_profiles.info())
    print(df_scenarios.info())
    test_client = df_profiles.iloc[TEST_CLIENT_ROW_NUMBER] # We will be using this test client throughout the script

### Functions to display the data

In [None]:
# Function to format the client’s profile information as a single string
def profile_as_string(profile_row, display_name=True):
    profile = ""
    if display_name:
        profile += f"Naam: {profile_row['naam']}\n"
    profile += (f"Type Dementie: {profile_row['type_dementie']}\n"
                f"Lichamelijke klachten: {profile_row['somatiek']}\n"
                f"ADL: {profile_row['adl']}\n"
                f"Mobiliteit: {profile_row['mobiliteit']}\n"
                f"Cognitie / gedrag: {profile_row['gedrag']}")
    return profile

if VERBOSE:
    print(profile_as_string(profile_row=test_client, display_name=True))

In [None]:
# Function to display the scenario information for a given client and month
def scenario_as_string(profile_row, month_no=1):
    client_id = profile_row['client_id']
    return df_scenarios.loc[df_scenarios['client_id'] == client_id, 'journey'].iloc[month_no - 1]
    
if VERBOSE:
    print(scenario_as_string(profile_row=test_client))
    print(scenario_as_string(profile_row=test_client, month_no=2))

### Define Pydantic models

For parsing, we use pydantic models. These are remarkably well understood by the LLM, and offer several benefits:
- It enforces consistent output structure
- It automates parsing of the output
- The field description further guides the LLM model in generating appropriate responses. 
- It catches invalid data early, preventing downstream issues.

In our main prompt defined below, we ask the LLM to generate multiple care notes. Our structure consists of two classes: one to structure a single note (CareNote) and a second class defined as a list of care notes (CareNotes).

CareNote represents a single care note with the fields dag, tijd, and rapportage:
- **'dag'**: Even though the sequence number of the day ("dag") isn't meaningful, it forces the model to respond with the number of days requested.  
- **'tijd'**: The field 'time' ("tijd") was chosen over 'daypart' ("dagdeel") because the model tended to link 'daypart' to breakfast, lunch, and dinner, which led to many notes describing lunch details.  
- **'rapportage'**: This is the main challenge, what it's all about... 

CareNotes is a container for multiple care notes, consisting of a single field **'notes'**, which is a list of CareNote instances.

In [None]:
#Structure for a single care note
class CareNote(BaseModel):
    dag: int = Field(description="volgnummer dag")
    tijd: str = Field(description="tijd van de rapportage (hh:mm)")
    rapportage: str = Field(description="Inhoud van de rapportage. Een rapportage beschrijft over het algemeen één zorgaspect, soms meer")

# Structure for multiple notes
class CareNotes(BaseModel):
    notes: List[CareNote]

### Model initialization

The temperature and model settings can be configured in the constants section. I selected the OpenAI GPT-3.5-turbo model for cost-efficiency. The temperature setting of 1.1 was determined through trial and error.

In [None]:
# Initialize OpenAI Chat model
model = ChatOpenAI(api_key=setup.get_openai_key(), temperature=TEMP, model=MODEL)

In [None]:
# Initialize CSV and pydantic parsers
csv_parser = CommaSeparatedListOutputParser()
csv_format_instructions = csv_parser.get_format_instructions()
pyd_parser = PydanticOutputParser(pydantic_object=CareNotes)

## Setting up the example library

### Setup Chroma

To improve the contextual relevance and diversity of the generated care notes, we will dynamically inject example notes into the prompts. 

Chroma is used in this context to select example care notes that are representative of the client profile and scenario. By using a vector database, the system can efficiently find example notes that match the specific characteristics of each client. The Chroma vector database, created [here](), contains a variety of (synthetic) notes for anonymous clients. The retriever queries this database, allowing the processing functions to access and integrate the stored example notes.  
By adding these dynamically selected examples to the prompt, this approach aims to make the generated care notes more contextually appropriate and diverse.

Our approach is:
- Initialize the Chroma Vector Database.
- Generate Keywords: Using the LLM, keywords are generated from a client’s profile and scenario. These keywords are used to query the Chroma vector database for relevant example notes.
- Retrieve Example Notes: Utilizing a retriever to search the vector database with the generated keywords, supplemented by additional neutral terms to ensure a broad coverage of relevant care notes.
- Filter by Gender: Filter the retrieved example notes to exclude those with gender-specific pronouns that do not match the client’s gender, ensuring the relevance of the examples.
- Sample Examples: Randomly sample a subset of the filtered example notes to inject into the prompts, enhancing the diversity of the generated care notes.

In [None]:
# Initialize Chroma vector database
vectordb = Chroma(persist_directory=PATH_DB_GCAI,
                  embedding_function=OpenAIEmbeddings(api_key=setup.get_openai_key(), model=MODEL_EMBEDDINGS),
                  collection_name = COLLECTION_NAME
                  )

In [None]:
# Set up a retriever for document querying
retriever = vectordb.as_retriever(search_kwargs={"k": 20})

### Keywords prompt & chain

In [None]:
# Template to generate key words from a client’s profile and scenario for retrieving example notes
PT_keywords = PromptTemplate(
    template = """
Geef vijf woorden die de kern weergeven van onderstaand profiel en scenario.
Geef geen namen terug.

Profiel:
{profile}

Scenario:
{scenario}

{format_instructions}
""",
    input_variables=["profile", "scenario"],
    partial_variables={"format_instructions": csv_format_instructions},)

# Format the prompt for the example library
if VERBOSE:
    P_keywords = PT_keywords.format(
        profile=profile_as_string(test_client, display_name=False), 
        scenario=scenario_as_string(test_client, month_no=1))
    print(P_keywords)

The code below shows us the result of passing this prompt to the model. The result is a langchain_core.messages.ai.AIMessage object. The 'content' parameter holds the AI message (the response) as a string. As requested by the format instructions in the prompt, this is a comma separated 'list' of values.
Passing this string to the CommaSeparatedListOutputParser results in an actual python list.

The model often returns more than the five requested keywords. Since this has minor consequences, I accept this behavior.

In [None]:
if VERBOSE:
    response_keywords = model.invoke(P_keywords)
    parsed_response_keywords = csv_parser.parse(response_keywords.content)

    print(response_keywords)
    print(100 * '-')
    print(parsed_response_keywords)
    print(100 * '-')

Chaining it all together

In [None]:
# Chain the prompt template with the model and the parser
chain_keywords = PT_keywords | model | csv_parser

if VERBOSE:
    test_keywords = chain_keywords.invoke(
        {"profile": profile_as_string(test_client, display_name=False), 
         "scenario": scenario_as_string(test_client, month_no=1)})
    print(test_keywords)

### Retrieving notes for the example library

The next step is to invoke the retriever with the generated keywords to obtain example notes. To add more ‘neutral’ notes, we also add the keywords ‘ADL’, ‘mobility’, and ‘food and drinks’ to the list.

Please note: The retriever utilizes an OpenAI embedding model (which is not free), which transforms the keywords into embeddings. These embeddings are then used to find similar notes in the initialized vector database, ensuring the retrieval of contextually relevant examples.

In [None]:
# Retrieves example notes from a retriever based on keywords
def get_example_notes(keywords, retriever):
    example_library = []
    example_library_topics = keywords
    example_library_topics.extend(['adl', 'mobiliteit', 'eten en drinken'])
    for i in example_library_topics:
        docs = retriever.invoke(i)
        for d in docs:
            example_library.append(d.page_content.strip('"'))
    return example_library

if VERBOSE:
    example_notes = get_example_notes(test_keywords, retriever)
    print(random.sample(example_notes,5))
    print(len(example_notes))

The example notes often contain gender-specific pronouns or titles. When these are included in the prompt the model tends to generate responses with incorrect pronouns for our client. Therefore, we need to filter the example library to exclude notes that use ‘mr’ or ‘mrs’.

In [None]:
# Determines the gender of a client based on their name
def determine_gender(name):
    if "Mevrouw" in name:
        return 'female'
    elif "Meneer" in name:
        return 'male'
    else:
        return 'unknown'
    
if VERBOSE:
    print(f"The gender of {test_client['naam']} is: {determine_gender(test_client['naam'])}")

In [None]:
# Filters example notes by the specified gender to ensure relevance
def filter_examples_by_gender(texts, gender_to_keep):
    if gender_to_keep == 'male':
        keywords = ['mw', 'mevr', 'mvr', 'mevrouw']
    elif gender_to_keep == 'female':
        keywords = ['dhr', 'meneer']
    else:
        return texts

    def contains_keywords(text):
        text_lower = text.lower()
        return any(keyword in text_lower for keyword in keywords)

    return [text for text in texts if not contains_keywords(text)]

if VERBOSE:
    test_example_library = filter_examples_by_gender(texts=example_notes, gender_to_keep=determine_gender(test_client['naam']))
    [print('- '+item) for item in random.sample(test_example_library, 5)]
    print(f"\nOorspronkelijk aantal voorbeelden: {len(example_notes)}")
    print(f"Gefilterd aantal voorbeelden: {len(test_example_library)}")

We'll be sampling a subset of this library

In [None]:
# Selects a random set of examples from the example library and returns as bulleted string
def sample_examples_as_string(example_library, num_items=3):
    random_items = random.sample(example_library, num_items)
    return '\n'.join(['- ' + item for item in random_items])

if VERBOSE:
    sampled_examples_as_string = sample_examples_as_string(test_example_library, 5)
    print(sampled_examples_as_string)

Putting it all together in a function:

In [None]:
# Generates an example library for a client based on their profile and scenario
def create_example_library(row, scenario):
    profile_no_name = profile_as_string(row, display_name=False)
    client_gender = determine_gender(row['naam'])

    # Invoke the example library chain to get the keywords to search for example notes relevant to this client
    keywords = chain_keywords.invoke({"profile": profile_no_name, "scenario": scenario})
    example_notes = get_example_notes(keywords=keywords,retriever=retriever)
    example_library = filter_examples_by_gender(texts=example_notes, gender_to_keep=client_gender)

    return example_library

if VERBOSE:
    example_library = create_example_library(test_client, scenario=scenario_as_string(test_client,month_no=1))
    sampled_examples_as_string = sample_examples_as_string(example_library=example_library, num_items=3)
    print(SEP_LINE)
    print(sampled_examples_as_string)

## Setting up the summary memory

### Memory prompt & chain

Adding memory is important because it allows the generated care notes to reflect ongoing developments in a client’s condition and care. This ensures that each new set of notes builds on previous information, maintaining continuity in the narrative of the client’s health and daily experiences.

I chose to implement memory by using a summary-based approach. This turned out to be more effective than passing the last notes directly, because it provides context through the profile and the summary, maintaining the storyline. Meanwhile, example notes are refreshed each time, preventing the model from becoming repetitive and producing the same structure repeatedly.

Initially, a summary is created that includes the client’s profile and the scenario of the first month. As new care notes are generated, this summary is updated to reflect the latest information. This updated summary is then used as the context for generating subsequent notes, ensuring that the model retains and incorporates past details. 

In the prompt, the model is asked to update the summary based on the previous summary and the newly generated client notes. 

[todo: a note about the TEMP, for now, I chose to keep it relatively high]

In [None]:
# Template for updating the client summary based on new care notes

PT_memory = PromptTemplate(
    template = """
Hieronder staat:
1. Het profiel van een client(e) die verblijft op een psychogeriatrische afdeling van het verpleeghuis. 
2. Een samenvatting van het beloop tot de indexdatum
3. Nieuwe zorgrapportages vanaf de indexdatum

PROFIEL:
{profile}

SAMENVATTING BELOOP TOT INDEXDATUM:
{summary}

NIEUWE RAPPORTAGES:
{new_notes}

Schrijf in één alinea een nieuwe samenvatting van het beloop. Neem belangrijke gebeurtenissen en zorgvraag uit de eerdere samenvatting over en vul aan met belangrijke gebeurtenissen en zorgvraag uit de rapportages. Neem de gegevens uit het profiel niet over in de samenvatting.

In het antwoord dient uitsluitend de samenvatting van het beloop te staan, zonder aanvullende tekst.
""",
    input_variables=["profile", "summary", "new_notes"],
)

if VERBOSE:
    # Since we don't have any notes yet we'll be using example notes as new_notes
    test_new_notes = sample_examples_as_string(example_library,9)
    P_memory = PT_memory.format(profile=profile_as_string(test_client),
                                summary=scenario_as_string(test_client, month_no=1), 
                                new_notes=test_new_notes)
    print(P_memory)

In [None]:
chain_memory = PT_memory | model

if VERBOSE:
    updated_summary = chain_memory.invoke({"profile": profile_as_string(test_client),
                                           "summary": scenario_as_string(test_client, month_no=1), 
                                           "new_notes": test_new_notes})
    test_summary_m1 = updated_summary.content
    pprint(test_summary_m1)

## Generating client notes

### Note generation prompts & chain

Our goal is populate the client record by generating care notes reflecting the client profile and describing the scenario. 
In practice, the number and length of notes per day vary based on clinical circumstances. Stable clients typically have fewer and shorter notes than ill or agitated clients. (I might add this functionality in the future.) Currently, I chose to generate three notes per day. Through trial and error, I found that the model can reliably generate nine notes per prompt. 
Having a scenario-twist every 3 days is not very realistic. I have scenario descriptions per 'month'. To populate a client record for an entire month, I could decide to break down the monthly scenario into smaller segments that fit the three-day prompt structure. I chose, however, to give the model the scenario in the first iteration and allowing it some creative freedom to build upon it, relying on the summaries to maintain continuity and build upon previous notes.

In [None]:
if VERBOSE:
    # Let's study the scenario
    test_client_id = test_client['client_id']
    df_client_scenarios = df_scenarios.loc[df_scenarios['client_id'] == test_client_id, ['journey', 'month']]  
    counter = 1
    for i, r in df_client_scenarios.iterrows():
        print(str(counter) + ' ' + r['journey'])
        counter = counter + 1

In [None]:
# Template for generating care notes 
PT_get_notes = PromptTemplate(
    template="""Jouw taak als AI is om zorgrapportages te schrijven van een fictieve client die verblijft op een psychogeriatrische afdeling van een verpleeghuis.
Hieronder staat:
- Het profiel van de client. 
- Een samenvatting van het beloop
- Het scenario van de rapportages die je moet schrijven

Voorbeeld rapportages:
- Dhr. zijn haar gewassen en zijn baard geschoren.
- Inco van mw, was verzadigd vanmorgen en bed was nat.
{examples}

PROFIEL:
{profile}

SAMENVATTING BELOOP TOT HEDEN:
{summary}

Gebruik een informele, menselijke stijl. Gebruik relatief eenvoudige taal en vermijd termen als 'cruciaal'.
Varieer met de zinsopbouw en stijl. Omschrijf de zorg, zonder het profiel letterlijk te herhalen. Vermijd het noemen van de naam.

Schrijf rapportages voor drie dagen. Per dag worden drie rapportages geschreven, dus er zijn 9 rapportages totaal.

SCENARIO voor de rapportages die je moet schrijven:
{scenario}

{format_instructions}
""",
    input_variables=["examples", "profile", "summary", "scenario"],
    partial_variables={"format_instructions": pyd_parser.get_format_instructions()},
)

if VERBOSE:
    P_get_notes = PT_get_notes.format(
        examples = sample_examples_as_string(example_library, num_items=3), 
        profile = profile_as_string(test_client),
        # Since we don't have a summary for the first round, we use the scenario for both summary and scenario
        summary = scenario_as_string(test_client, 1), 
        scenario = scenario_as_string(test_client, 1)
        )
    print(P_get_notes)

Now we create a chain. The output of the chain is a structured list of (nine) care notes for a client, encapsulated in an object called CareNotes. This object contains an attribute named notes, which is a list of individual CareNote entries. Each CareNote entry includes three pieces of information(dag, tijd, rapportage)

In [None]:
# Create a chain of operations: prompt template -> model -> output parser
chain_get_notes = PT_get_notes | model | pyd_parser

if VERBOSE:
    test_notes = chain_get_notes.invoke({"examples": sampled_examples_as_string, 
                                         "profile": profile_as_string(test_client),
                                         "summary": scenario_as_string(test_client, month_no=1), 
                                         "scenario": scenario_as_string(test_client, month_no=1)})

    print('***The parsed result of model:')
    print(test_notes)

    print('\n***And these are the individual notes')
    for note in test_notes.notes:
        print(note)

In [None]:
# Function formats the notes and returns them as a string for display
def notes_as_string(notes, simple_bulleted=True):
    note_strings = []
    for note in notes:
        if simple_bulleted:
            note_strings.append(f"- {note.rapportage}")
        else:
            note_strings.append(f"Dag {note.dag} ({note.tijd}): {note.rapportage}")
    return "\n".join(note_strings)

if VERBOSE:
    print(notes_as_string(test_notes.notes, simple_bulleted=False))
    test_notes_m1 = notes_as_string(test_notes.notes)
    print(SEP_LINE)
    print(test_notes_m1)

In [None]:
# Seeing some iterations explicitly written out can make the flow of the process clearer, especially when tracking the sequence of actions and understanding the logic at each step. 
if VERBOSE:
    print('PROFILE')
    print(profile_as_string(test_client))
    print('\nSCENARIO: '+ scenario_as_string(test_client,month_no=1))
    print('\nNOTES')
    print(test_notes_m1)

    # Now have the model create a new summary 
    updated_summary = chain_memory.invoke({"profile": profile_as_string(test_client),
                                           # Initially, there is no summary
                                           "summary": scenario_as_string(test_client, month_no=1), 
                                           "new_notes": test_notes_m1})
    test_summary_m1 = updated_summary.content
    print('\nSUMMARY')
    pprint(test_summary_m1)

    # And again some notes
    test_notes = chain_get_notes.invoke({"examples": sample_examples_as_string(example_library, 3), 
                                         "profile": profile_as_string(test_client),
                                         "summary": test_summary_m1, 
                                         "scenario": "Bouw voort op de gegevens uit het Profiel en het Beloop"})

    test_notes_m2 = notes_as_string(test_notes.notes)
    print('\nSCENARIO: '+ "Bouw voort op de gegevens uit het Profiel en het Beloop")
    print('\nNOTES')
    print(test_notes_m2)

    # a new summary 
    updated_summary = chain_memory.invoke({"profile": profile_as_string(test_client),
                                           "summary": test_summary_m1, 
                                           "new_notes": test_notes_m2})
    test_summary_m2 = updated_summary.content
    print('\nSUMMARY')
    pprint(test_summary_m2)

    # And again some notes
    test_notes = chain_get_notes.invoke({"examples": sample_examples_as_string(example_library, 3), 
                                         "profile": profile_as_string(test_client),
                                         "summary": test_summary_m2, 
                                         "scenario": scenario_as_string(test_client, month_no=2)})

    test_notes_m3 = notes_as_string(test_notes.notes)
    print('\nSCENARIO: '+ scenario_as_string(test_client,month_no=2))
    print('\nNOTES')
    print(test_notes_m3)


### Fuctions to populate the client records

In [None]:
# Generate care notes for a client over a specified number of iterations and update the client summary
def generate_care_notes(summary_list, notes_list, profile_row, month_no, example_library, num_iterations=5, num_examples=3):  
    """
    Returns:
    - Updated list of summaries.
    - Updated list of care notes.
    """
    profile = profile_as_string(profile_row)
    scenario = scenario_as_string(profile_row, month_no=month_no)
    client_id = profile_row['client_id']
    summary = summary_list[-1]

    for i in range(num_iterations):
        iteration = i + 1
        try:
            print(f'Iteration {iteration}')
            if iteration > 1:
                # Update the scenario to let the model build upon the scenario
                scenario = "Bouw voort op de gegevens uit het Profiel en het Beloop."

            # Sample examples from the example library
            examples = sample_examples_as_string(example_library, num_examples)

            # Generate care notes using the model and the example library
            # There are frequent 'Invalid json output' errors. In that case, try again
            try:
                result_notes = chain_get_notes.invoke({
                    "examples": examples,
                    "profile": profile,
                    "summary": summary,
                    "scenario": scenario
                })
            except Exception as e:
                # Try once more in case of failure
                print(f"Error in iteration {iteration}, retrying: {e}")
                result_notes = chain_get_notes.invoke({
                    "examples": examples,
                    "profile": profile,
                    "summary": summary,
                    "scenario": scenario
                })
                print("Retry successful")

            # Update the summary based on the new care notes
            result_memory = chain_memory.invoke({
                 "profile": profile,
                 "summary": summary,
                 "new_notes": notes_as_string(result_notes.notes)
            })

            # Add generated notes to the notes list
            for note in result_notes.notes:
                notes_list.append({
                    "client_id": client_id,
                    "month": month_no,
                    "iteration": iteration,
                    "dag": note.dag,
                    "tijd": note.tijd,
                    "rapportage": note.rapportage,
                })

            # add_notes_to_list(result_notes.notes, notes_list, client_id, month_no, iteration)

            # Update the memory with new notes and generate a new summary
            summary = result_memory.content

            # Add the updated summary to the summary list 
            summary_list.append({
                "client_id": client_id,
                "month": month_no,
                "iteration": iteration,
                "summary": summary,
            })

        except Exception as e:
            print(f"Error in iteration {iteration}: {e}")
            continue

    return summary_list, notes_list

if VERBOSE:
    notes_list = []
    summary_list = []
    summary_list.append({
        "client_id": test_client['client_id'],
        "month": 0,
        "iteration": 0,
        "summary": scenario_as_string(test_client, month_no=1),
        })

    summary_list, notes_list = generate_care_notes(
        summary_list=summary_list, 
        notes_list=notes_list, 
        profile_row=test_client, 
        example_library=example_library,
        month_no=1,
        num_iterations=1,
        num_examples=3)
    
    for cs in summary_list:
        print(cs)
    print(100*'-')
    for n in notes_list:
        print(n)
    

Next, let's have a look at the generation of care notes and summaries for a single client. We need to process each client’s data individually. This involves iterating through their associated scenarios and generating relevant notes. 

In [None]:
# Processe a single client to generate care notes and summaries.
def process_client(profile_row, df_scenarios, num_iterations=5, num_examples=3):
    all_notes_list = []
    all_summaries_list = []

    try:
        with get_openai_callback() as cb:
            print(f"Processing client: {profile_row['naam']}")

            summary_list = []
            notes_list = []

            client_id = profile_row['client_id']
            # As initial summary, we take the scenario of the first month
            summary_list.append({
                "client_id": profile_row['client_id'],
                "month": 0,
                "iteration": 0,
                "summary": scenario_as_string(profile_row=profile_row, month_no=1),
                })

            # Select the scenario rows for the client
            df_client_scenarios = df_scenarios.loc[df_scenarios['client_id'] == client_id, ['journey', 'month']]  

            num_months = len(df_client_scenarios)

            month_no = 1    
            for i, month_scenario in df_client_scenarios.iterrows():  
                scenario = scenario_as_string(profile_row, month_no)
                example_library = create_example_library(profile_row, scenario)

                print(f'Generating notes for month: {month_no} of {num_months} for client {client_id}')
                summary_list, notes_list = generate_care_notes(
                    summary_list=summary_list,
                    notes_list=notes_list, 
                    profile_row=profile_row,
                    month_no=month_no,
                    example_library=example_library,
                    num_iterations=num_iterations,
                    num_examples=num_examples,
                    )
                month_no += 1

            # Add client_id to notes and summaries
            for note in notes_list:
                note['client_id'] = client_id
            for summary in summary_list:
                all_summaries_list.append({'client_id': client_id, 'summary': summary, 'month': month_no})  

            all_notes_list.extend(notes_list)
            print(cb)

    except Exception as e:
        print(f"Error processing client {profile_row['naam']}: {e}")

    return all_notes_list, all_summaries_list

if VERBOSE:
    notes_list, summaries_list = process_client(profile_row=test_client, df_scenarios=df_scenarios, num_iterations=2, num_examples=3)
    for cs in summaries_list:
        print(cs)

    for n in notes_list:
        print(n)

In [None]:
# Iterate through all clients, processes each one, and saves the generated notes and summaries to CSV files.
def process_clients(df_profiles, df_scenarios):
    all_notes_list = []
    all_summaries_list = []

    for idx, row in df_profiles.iterrows():
        try:
            notes, summaries = process_client(row, df_scenarios)
            all_notes_list.extend(notes)
            all_summaries_list.extend(summaries)

            df_notes = pd.DataFrame(all_notes_list)
            df_summaries = pd.DataFrame(all_summaries_list)

            # save after each client to prevent having to start over in case of an error
            df_notes.to_csv(PATH_NOTES, index=False)
            df_summaries.to_csv(PATH_SUMMARIES, index=False)
        except Exception as e:
            print(f"Error processing client: {e}")

process_clients(df_profiles=df_profiles, df_scenarios=df_scenarios)


In [None]:
def update_dag_counter(df):
    """
    This function updates the 'dag' column in df such that it maintains a running counter per client and per month.
    The counter starts at 1 and increments by 1 each time the 'dag' value changes. The counter resets to 1 for each new client and month.
    """
    # Add a column to shift 'dag' values by one row within each group of client_id and month
    df['dag_shift'] = df.groupby(['client_id', 'month'])['dag'].shift(1)
    # Create a column indicating if 'dag' has changed compared to the previous row
    df['dag_changed'] = (df['dag'] != df['dag_shift']).astype(int)
    # Create a column indicating the start of a new group (client_id and month)
    df['group_changed'] = df.groupby(['client_id', 'month']).cumcount() == 0
    
    # Update 'group_changed' to be False if 'dag_shift' is NaN
    df['group_changed'] = df['group_changed'] & df['dag_shift'].notna()
    # Create the counter ('teller') by cumulatively summing 'dag_changed' within each group and adding 'group_changed'
    df['dag'] = df.groupby(['client_id', 'month'])['dag_changed'].cumsum() + df['group_changed']
    
    # Remove the temporary columns used for calculations
    df.drop(columns=['dag_shift', 'dag_changed', 'group_changed'], inplace=True)
    
    return df


In [None]:
dfn = pd.read_csv(PATH_NOTES)
dfn = update_dag_counter(dfn)
dfn.to_csv(PATH_NOTES, index=False)