<a href="https://colab.research.google.com/github/ekrombouts/GenCareAI/blob/main/notebooks/100_note_generation/120_GenerateClientScenarios.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GenCare AI: Generating client scenarios

**Author:** Eva Rombouts  
**Date:**   13-06-2024  
**Updated:** 2024-09-01  
**Version:** 1.2

### Description
This scripts generates client scenarios based on profiles generated [here](https://colab.research.google.com/github/ekrombouts/GenCareAI/blob/main/notebooks/100_note_generation/110_GenerateClientProfiles.ipynb).

Generating scenarios based on 24 client profiles and 8 months, the cost is approximately $0.03 per run.

In [1]:
!pip install GenCareAI
from GenCareAI.GenCareAIUtils import GenCareAISetup

setup = GenCareAISetup()

if setup.environment == 'Colab':
        !pip install -q langchain langchain_core langchain_openai langchain_community



In [10]:
from typing import List

from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_community.callbacks import get_openai_callback
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI

import os
import pandas as pd
import random
import numpy as np

In [12]:
# Constants and Configurations
WARD_NAME = 'Athena'
FN_PROFILES = setup.get_file_path(f'data/gcai_client_profiles_{WARD_NAME}.csv')
FN_SCENARIOS = setup.get_file_path(f'data/gcai_client_scenarios_{WARD_NAME}.csv')

MODEL_SCENARIOS = 'gpt-3.5-turbo-0125'
TEMP = 1.1

DURATION = 8 # Number of 'months' or periods
DURATION_SD = 3 # Standard deviation of the number of months
NUM_COMPLICATIONS_MIN = 1
NUM_COMPLICATIONS_MAX = 3

complications_library = ["gewichtsverlies", "algehele achteruitgang", "decubitus", "urineweginfectie", "pneumonie", "delier", "verergering van onderliggende lichamelijke klachten", "verbetering van de klachten", "overlijden", "valpartij"]

In [13]:
# Load the client profiles
df = pd.read_csv(FN_PROFILES)

In [14]:
# Pydantic models
class ClientScenario(BaseModel):
    month: str = Field(description="Volgnummer van de maand")
    journey: str = Field(description="Beschrijving van de gebeurtenissen en zorg")

class ClientScenarios(BaseModel):
    scenario: List[ClientScenario]

In [15]:
# Initialize model and parser
model = ChatOpenAI(api_key=setup.get_openai_key(), temperature=TEMP, model=MODEL_SCENARIOS)
pyd_parser = PydanticOutputParser(pydantic_object=ClientScenarios)

In [16]:
# Define the prompt template
PT_scenario = PromptTemplate(
    template="""
Dit is het profiel van een fictieve client in het verpleeghuis:
---
{client_profile}
---

Schrijf in een tijdlijn het beloop van zijn/haar verblijf in het verpleeghuis gedurende {num_months} maanden.
Verwerk de volgende complicatie(s) hierin: {complications}.
Hou wijzigingen subtiel. Vermijd al te grote dramatiek.
Vermijd het noemen van de naam.

{format_instructions}
""",
    input_variables=["client_profile", "num_months", "complications"],
    partial_variables={"format_instructions": pyd_parser.get_format_instructions()},
)

P_scenario = PT_scenario.format(client_profile="client profiel",
                                      num_months = 6,
                                      complications = "complicatie(s)")
print(P_scenario)


Dit is het profiel van een fictieve client in het verpleeghuis:
---
client profiel
---

Schrijf in een tijdlijn het beloop van zijn/haar verblijf in het verpleeghuis gedurende 6 maanden.
Verwerk de volgende complicatie(s) hierin: complicatie(s).
Hou wijzigingen subtiel. Vermijd al te grote dramatiek.
Vermijd het noemen van de naam.

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"scenario": {"title": "Scenario", "type": "array", "items": {"$ref": "#/definitions/ClientScenario"}}}, "required": ["scenario"], "definitions": {"ClientScenario": {"title": "ClientScenario", "type

In [17]:
# Create a chain of operations: prompt template -> model -> output parser
chain_scenario = PT_scenario | model | pyd_parser

In [18]:
# Generate and save scenarios
if not os.path.exists(FN_SCENARIOS):
    print("Data file not found. Generating new data...")

    def generate_scenarios(df, chain):
        def display_profile(row):
            return (
                f"Naam: {row['naam']}\n"
                f"Type Dementie: {row['type_dementie']}\n"
                f"Lichamelijke klachten: {row['somatiek']}\n"
                f"ADL: {row['adl']}\n"
                f"Mobiliteit: {row['mobiliteit']}\n"
                f"Cognitie / gedrag: {row['gedrag']}"
            )

        def determine_duration(mean=6, std_dev=2):
            return int(np.round(np.random.normal(mean, std_dev)))

        def determine_num_complications(min=1, max=3):
            return random.randint(min, max)

        scenario_list = []
        for _, row in df.iterrows():
            client_profile = display_profile(row)
            print(f"Generating scenario for client: {row['naam']}")
            num_months = determine_duration(mean=DURATION, std_dev=DURATION_SD)
            num_complications = determine_num_complications(min=NUM_COMPLICATIONS_MIN, max=NUM_COMPLICATIONS_MAX)
            chosen_complications = random.sample(complications_library, num_complications)
            complications = ", ".join(chosen_complications)
            result = chain.invoke({"client_profile": client_profile, "num_months": str(num_months), "complications": complications})
            for scenario in result.scenario:
                scenario_list.append((row['client_id'], scenario.month, scenario.journey, complications, num_months))
        return scenario_list

    with get_openai_callback() as cb:
        scenario_data = generate_scenarios(df, chain_scenario)
        print(cb)

    df_scenarios = pd.DataFrame(scenario_data, columns=['client_id', 'month', 'journey', 'complications', 'num_months'])
    df_scenarios.to_csv(FN_SCENARIOS, index=False)
    print(f"Data saved successfully to {FN_SCENARIOS}.")
else:
    print("Data file found. Loading data...")
    df_scenarios = pd.read_csv(FN_SCENARIOS)

Data file not found. Generating new data...
Generating scenario for client: Meneer Johan van der Vliet
Generating scenario for client: Mevrouw Erica Groenhof
Generating scenario for client: Meneer Arnold Bergkamp
Generating scenario for client: Mevrouw Vera Verhagen
Generating scenario for client: Meneer Kees de Punder
Generating scenario for client: Mevrouw Hannelore Klaassen
Generating scenario for client: Meneer Roel Hulshof
Generating scenario for client: Mevrouw Bianca te Boekhorst
Tokens Used: 7258
	Prompt Tokens: 3892
	Completion Tokens: 3366
Successful Requests: 8
Total Cost (USD): $0.006995000000000001
Data saved successfully to /Users/eva/Library/CloudStorage/GoogleDrive-e.k.rombouts@gmail.com/My Drive/Colab Notebooks/GenCareAI/data/gcai_client_scenarios_Athena.csv.
