<a href="https://colab.research.google.com/github/ekrombouts/GenCareAI/blob/main/notebooks/100_note_generation/110_GenerateClientProfiles.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GenCare AI: Generating client profiles

**Author:** Eva Rombouts  
**Date:** 2024-06-01  
**Updated:** 2024-10-10  
**Version:** 2.0

### Description
In [a previous notebook](https://colab.research.google.com/github/ekrombouts/GenCareAI/blob/main/notebooks/100_note_generation/100_GenerateAnonymousCareNotes.ipynb) we created a dataset with synthetic progress notes for nursing home clients. In the following notebooks, we will be populating a fictional nursing home with clients by generating client profiles and scenarios to ultimately build client records, consisting of a series of progress notes, documenting the day-to-day care of each client.

Our goal is to mimic real-world client records, where profiles and scenarios aren’t always clearly stated but must be read between the lines, inferred from the progress notes, which are often vague yet detailed.

This notebook automates the generation of fictional client profiles for a psychogeriatric ward in a nursing home using GPT-4o. The profiles describe aspects like dementia type, physical complaints, ADL (activities of daily living) support, mobility, and behavior.

- Here we use GPT-4o, as it provides better and more diverse results than GPT-3.
- The temperature is set to 1.1 to encourage variation in the output.
- The ward name and the number of wings are defined to facilitate running multiple experiments. For instance, different ward names can be assigned for each experiment to distinguish between them. The number of wings is set because generating 8 profiles per prompt is believed to be the maximum manageable amount in a single prompt.
- Pydantic Models: Two Pydantic models are defined to ensure the structure of the generated client profiles.
  - The ClientProfile model includes fields like name, dementia type, physical complaints, ADL support, mobility, and behavior.
  - The ClientProfiles model holds multiple client profiles.

With the current settings of generating eight profiles per query and running the query three times, the cost is approximately $0.05 per run.

In [None]:
!pip install GenCareAI
from GenCareAI.GenCareAIUtils import GenCareAISetup

setup = GenCareAISetup()

if setup.environment == 'Colab':
  !pip install -q -U langchain langchain_core langchain_openai langchain_community

In [None]:
import os
import pandas as pd
from typing import List
from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI
from langchain_community.callbacks import get_openai_callback

In [None]:
# Constants and Configurations
# The ward name will be used in the filename. Practical when performing multiple
# experiments
ward_name = 'Dahlia'
fn_profiles =  setup.get_file_path(f'data/gcai_client_profiles_{ward_name}.csv')
# Per query eight profiles are generated. The query is run num_wings times, so
# when num_wings is set to 3 the total number of client profiles generated is 24.
num_wings = 3
# GPT-4o yields better, more diverse results than gpt-3.5
model_name = 'gpt-4o-2024-05-13'
temp = 1.1
verbose = True

In [None]:
# Definition of Pydantic model to structure the client profile data
class ClientProfile(BaseModel):
    naam: str = Field(description="naam van de client (Meneer/Mevrouw Voornaam Achternaam, gebruik een naam die je normaal niet zou kiezen)")
    type_dementie: str = Field(description="type dementie (Alzheimer, gemengde dementie, vasculaire dementie, lewy body dementie, parkinsondementie, FTD: varieer, de kans op Alzheimer, gemengde en vasculaire dementie is het grootst)")
    somatiek: str = Field(description="lichamelijke klachten")
    # biografie: str = Field(description="een korte beschrijving van karakter en relevante biografische gegevens (vermijd stereotypen in beroep en achtergrond)")
    adl: str = Field(description="beschrijf welke ADL hulp de cliënt nodig heeft")
    mobiliteit: str = Field(description="beschrijf de mobiliteit (bv rolstoelafhankelijk, gebruik rollator, valgevaar)")
    gedrag: str = Field(description="beschrijf voor de zorg relevante aspecten van cognitie en probleemgedrag. Varieer met de ernst van het probleemgedrag van rustige cliënten, gemiddeld onrustige cliënten tot cliënten die fors apathisch, onrustig, angstig, geagiteerd of zelfs agressief kunnen zijn")

# Pydantic model to hold multiple client profiles
class ClientProfiles(BaseModel):
    clients: List[ClientProfile]

In [None]:
# Initialize OpenAI model and parser
model= ChatOpenAI(api_key=setup.get_openai_key(), temperature=temp, model=model_name)
pyd_parser = PydanticOutputParser(pydantic_object=ClientProfiles)
format_instructions = pyd_parser.get_format_instructions()

In [None]:
template = """
Schrijf acht profielen van cliënten die zijn opgenomen op een psychogeriatrische afdeling van het verpleeghuis. Hier wonen mensen met een gevorderde dementie met een hoge zorgzwaarte.
Zorg dat de profielen erg van elkaar verschillen.

{format_instructions}
"""

prompt_template = PromptTemplate(
    template = template,
    input_variables=[],
    partial_variables={"format_instructions": format_instructions},
)

if verbose: print(prompt_template.format())

In [None]:
# Combine the prompt, model, and parser into a single chain
chain = prompt_template | model | pyd_parser

In [None]:
# Check if the file with client profiles exists
if not os.path.exists(fn_profiles):
    print("Data file not found. Generating new data...")

    # Create directories if they do not exist
    os.makedirs(os.path.dirname(fn_profiles), exist_ok=True)

    # Function to generate client profile data by querying the model
    def generate_data():
        all_data = []
        for i in range(num_wings):
            print(f'Generating data for wing {i+1}')
            # Generate client profiles for each wing
            result = chain.invoke({})
            # Check if the result contains valid data
            if result is None or not hasattr(result, 'clients'):
                raise ValueError("No valid response received from the model.")
            # Convert profiles to a list of dictionaries
            data = [client.dict() for client in result.clients]
            # Append data for all wings
            all_data.extend(data)
        # Convert to pandas df
        return pd.DataFrame(all_data)

    # Function to add a unique client ID to each profile and reorder columns
    def add_client_id(df):
        df['client_id'] = range(1, len(df) + 1)
        return df[['client_id', 'naam', 'type_dementie', 'somatiek', 'adl', 'mobiliteit', 'gedrag']]

    # Use OpenAI callback to monitor API usage
    with get_openai_callback() as cb:
        df = generate_data()
        print("Data generated successfully.\n")
        print(cb)

    # Add client ID and save the data to a CSV file
    df_with_id = add_client_id(df)
    df_with_id.to_csv(fn_profiles, index=False)
    print(f"Data saved successfully to {fn_profiles}.")
else:
    # If the file exists, load the data from the CSV file
    print("Data file found. Loading data...")
    df_with_id = pd.read_csv(fn_profiles)