# Part 1 - Data Simulation & Feature Engineering

## Assignment Details
- Author: Khor Kean Teng
- Date: May 20, 2025
- Model: Gemini 2.5 Flash Preview 0417

## Deliverables
- Utilize GenAI to generate realistic patient data, including vital signs and detailed textual questionnaire responses. 
- Apply LLMs to extract meaningful features from simulated textual data (clinical notes, health records). 

## API & Model Configuration

In [6]:
# configure api
from dotenv import load_dotenv
import os

load_dotenv()
gemini_api_key = os.getenv("GEMINI_API_KEY")

## Prompt Engineering

In [7]:
from google import genai
from google.genai import types

client = genai.Client(api_key=gemini_api_key)

model = [
    "gemini-2.5-flash-preview-04-17"
]

generate_content_config = types.GenerateContentConfig(
    response_mime_type="application/json",
)

In [8]:
# extract the prompt in `Prompt Engineering.txt`
with open("prompt/data.txt", "r") as file:
    original_prompt = file.read()

print(original_prompt)

You should generate dataset of 30 random patients based on the provided instruction to analysis their health deterioration.

Below shows the data schema description and you should try to have a balance of deteriorated and non-deteriorated patients.

Data schema description:
{
    "patient_id": {
        "type": "Integer",
        "description": "Unique identifier for each patient"
    },
    "age": {
        "type": "Integer",
        "description": "Age of the patient"
    },
    "gender": {
        "type": "String",
        "description": "Gender of the patient"
    },
    "medical_history": {
        "type": "Text",
        "description": "Short medical history of the patient"
    },
    "deterioration_label": {
        "type": "Boolean",
        "description": "Indicates if the patient has deteriorated (true) or not (false)"
    },
    "timestamp": {
        "type": "DateTime",
        "description": "Timestamp of the data entry"
    },
    "hear_rate": {
        "type": "Float",
 

In [9]:
# loop each prompt with update patient id to the llm
# and get the response and save it as a json file as data/patient_id.json
# the patient_id is the patient id in the prompt
import time  # Import the time module for delay
import uuid

history = []

# Loop to repeat the process 50 times
for iteration in range(40):
    print(f"Iteration {iteration + 1} of 40")

    # Calculate the start and end indices for the current batch
    start_index = iteration * 10
    end_index = start_index + 10

    # Always start with the original prompt and add current history
    # This way we're not continuously appending to the prompt
    prompt = f"Unique session ID: {uuid.uuid4()} \n\n{original_prompt}"

    # Get the response from the LLM
    response = client.models.generate_content(model=model[0], contents=prompt, config=generate_content_config)

    # Update the history with the current response
    history.append(response)

    # Save the response as a JSON file
    with open(f"data/raw/run_{iteration}.json", "w") as file:
        file.write(response.text)

    # Wait for one minute before the next iteration
    if iteration < 49:  # Avoid waiting after the last iteration
        print("Waiting for 1 minute before the next iteration...")
        if iteration % 5 == 0:
            pass
        else:
            time.sleep(60)

%echo "All iterations completed. Check the data/raw directory for the output files."

Iteration 1 of 40
Waiting for 1 minute before the next iteration...
Iteration 2 of 40
Waiting for 1 minute before the next iteration...
Iteration 3 of 40
Waiting for 1 minute before the next iteration...
Iteration 4 of 40
Waiting for 1 minute before the next iteration...
Iteration 5 of 40
Waiting for 1 minute before the next iteration...
Iteration 6 of 40
Waiting for 1 minute before the next iteration...
Iteration 7 of 40
Waiting for 1 minute before the next iteration...
Iteration 8 of 40
Waiting for 1 minute before the next iteration...
Iteration 9 of 40
Waiting for 1 minute before the next iteration...
Iteration 10 of 40
Waiting for 1 minute before the next iteration...
Iteration 11 of 40
Waiting for 1 minute before the next iteration...
Iteration 12 of 40
Waiting for 1 minute before the next iteration...
Iteration 13 of 40
Waiting for 1 minute before the next iteration...
Iteration 14 of 40
Waiting for 1 minute before the next iteration...
Iteration 15 of 40
Waiting for 1 minute bef