<a href="https://colab.research.google.com/github/ekrombouts/GenCareAI/blob/main/notebooks/120_GenerateClientScenarios.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GenCare AI: Generating client scenarios

**Author:** Eva Rombouts  
**Date:**   13-06-2024  
**Updated:** 2024-10-14  
**Version:** Olympia

### Description
In previous notebooks, we created a [dataset with client profiles for nursing home residents](https://colab.research.google.com/github/ekrombouts/GenCareAI/blob/main/notebooks/110_GenerateClientProfiles.ipynb). In this notebook, we will generate client scenarios that describe the course of events during their stay in a psychogeriatric ward. These scenarios aim to provide a timeline of care over several weeks, including complications that may arise during the client’s time in the nursing home.

Our goal is to simulate the subtle changes that occur over time in a resident’s health and care needs. Each scenario is generated based on a client profile and includes complications such as weight loss, infections, or other health-related issues. The number of weeks and complications vary to reflect the unpredictability of real-life care trajectories.

In this notebook, we use the gpt-4o-mini model to generate these scenarios. The temperature is set to 1.1 to promote variation in the generated content. The ward name is defined to allow for multiple experiments, and the number of weeks is drawn from a normal distribution to ensure variability in the duration of each client’s scenario.

This scripts generates client scenarios based on profiles generated [here](todo).

Generating scenarios based on 24 client profiles and 20 weeks, the cost is approximately $?? per run.

In [None]:
# Setup dependencies based on environment (e.g., Colab)
!pip install GenCareAIUtils
from GenCareAIUtils import GenCareAISetup

setup = GenCareAISetup()

if setup.environment == 'Colab':
        !pip install -q langchain langchain_core langchain_openai langchain_community

In [2]:
import os
import pandas as pd
import random
import numpy as np
from tqdm import tqdm
from typing import List
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser
from langchain_community.callbacks import get_openai_callback

# Import custom utilities
from GenCareAIUtils import ClientProfileFormatter

In [3]:
# Configuration parameters

# General settings
nursing_care_home_name = "Olympia"
ward_name = 'Apollo'
verbose = True

# Model settings
model_name = 'gpt-4o-mini-2024-07-18'
temp = 1.1

# Simulation settings
duration = 20  # Number of weeks to simulate
duration_sd = 6  # Standard deviation of the number of weeks
num_complications_min = 1
num_complications_max = 3

# List of complications to be randomly assigned to clients
complications_library = [
    "gewichtsverlies",
    "algehele achteruitgang",
    "decubitus",
    "urineweginfectie",
    "pneumonie",
    "delier",
    "verergering van onderliggende lichamelijke klachten",
    "verbetering van de klachten",
    "overlijden",
    "valpartij"
]

# File paths (after 'ward_name' is defined)
fn_profiles =  setup.get_file_path(f"data/{nursing_care_home_name}/gcai_client_profiles_{ward_name}.csv")
fn_scenarios = setup.get_file_path(f'data/{nursing_care_home_name}/gcai_client_scenarios_{ward_name}.csv')


In [None]:
# Load the client profiles
df = pd.read_csv(fn_profiles)

# Create an instance of ClientProfileFormatter
cpf = ClientProfileFormatter()

if verbose:
    sample_client_id = 1

    # Print the formatted client profile for the selected client
    formatted_profile = cpf.format_client_profile(
        profile_row=df[df['client_id'] == sample_client_id].iloc[0]
    )
    print("Client Profile:")
    print(formatted_profile)
    print(100 * '-')


In [5]:
# Define the Pydantic models for handling scenario outputs
class ClientScenario(BaseModel):
    week: int = Field(description="Weeknummer")
    events_description: str = Field(description="Beschrijving van de gebeurtenissen en zorg")

class ClientScenarios(BaseModel):
    scenario: List[ClientScenario]

In [6]:
# Initialize model and parser
model = ChatOpenAI(api_key=setup.get_openai_key(), temperature=temp, model=model_name)
pyd_parser = PydanticOutputParser(pydantic_object=ClientScenarios)
format_instructions = pyd_parser.get_format_instructions()

In [7]:
# Function to determine the scenario duration (weeks) based on normal distribution
def determine_duration(mean=12, std_dev=4):
    return int(np.round(np.random.normal(mean, std_dev)))

# Function to determine the number of complications to be included
def sample_complications(complications_library, min_n=1, max_n=3):
    num_complications = random.randint(min_n, max_n)
    chosen_complications = random.sample(complications_library, num_complications)
    complications = ", ".join(chosen_complications)
    return complications


In [None]:
# Define the prompt template
template="""
Dit is het profiel van een fictieve client in het verpleeghuis:
---
{client_profile}
---

Maak een week-tot-week tijdlijn voor een periode van {num_weeks} weken, en beschrijf hierin beloop van zijn/haar verblijf in het verpleeghuis.
Verwerk de volgende complicatie(s) hierin: {complications}.

Instructies:
- Formuleer elke scenarioregel zodanig dat deze duidelijk en begrijpelijk is voor een taalmodel. Dit scenario zal als basis dienen voor het genereren van fictieve zorgrapportages.
- Beperk dramatische veranderingen en focus op subtiele ontwikkelingen in de toestand van de cliënt. Zorg voor realistische en geleidelijke progressie die typisch is voor verpleeghuisclienten.
- Vermijd het noemen van de naam van de client.

{format_instructions}
"""

prompt_template = PromptTemplate(
    template=template,
    input_variables=["client_profile", "num_weeks", "complications"],
    partial_variables={"format_instructions": format_instructions},
)

if verbose:
    print(prompt_template.format(client_profile=formatted_profile,
                                 num_weeks = determine_duration(),
                                 complications = sample_complications(complications_library)))


In [9]:
# Create a chain of operations: prompt template -> model -> output parser
chain_scenario = prompt_template | model | pyd_parser

In [None]:
# Generate and save scenarios
if not os.path.exists(fn_scenarios):
    print("Data file not found. Generating new data...")

    scenario_list = []
    with get_openai_callback() as cb:
        for _, row in tqdm(df.iterrows(), total = df.shape[0], desc="Generating Scenario's"):
            # Format the client profile
            client_profile = cpf.format_client_profile(
                profile_row=row,
            )

            # Determine the number of weeks and complications for the scenario
            num_weeks = determine_duration(mean=duration, std_dev=duration_sd)
            complications = sample_complications(complications_library, min_n=num_complications_min, max_n=num_complications_max)

            # Invoke the model.
            # Errors are frequently due to incorrectly formatted responses, causing parsing errors. A simple retry often does the trick.
            try:
                result = chain_scenario.invoke({"client_profile": client_profile, "num_weeks": str(num_weeks), "complications": complications})
            except Exception as e:
                print(f"Error encountered. Retrying...")
                result = chain_scenario.invoke({"client_profile": client_profile, "num_weeks": str(num_weeks), "complications": complications})
                print("Retry successful")

            # Store the results in the scenario_list
            for scenario in result.scenario:
                scenario_list.append((row['client_id'], scenario.week, scenario.events_description, complications, num_weeks))
        print(f"Total cost: {cb.total_cost}")

        df_scenarios = pd.DataFrame(scenario_list, columns=['client_id', 'week', 'events_description', 'complications', 'num_weeks'])
        df_scenarios.to_csv(fn_scenarios, index=False)
        print(f"Data saved successfully to {fn_scenarios}.")
else:
    print("Data file found. Loading data...")
    df_scenarios = pd.read_csv(fn_scenarios)


In [None]:
if verbose:
    print("Client Profile:")
    print(formatted_profile)
    print(100 * '-')  # Divider for better readability

    # Filter and display the scenarios for the selected client
    client_scenarios = df_scenarios[df_scenarios['client_id'] == sample_client_id][['week', 'events_description']]
    print("Client Scenarios:")

    # Loop through each scenario and print it with a numbered list
    for index, scenario in enumerate(client_scenarios.itertuples(), 1):
        print(f"{index}. Week {scenario.week}: {scenario.events_description}")