# Simulate Patient Data Columns
This notebook demonstrates how to generate the `PatientNotes`, `PatientSentiment`, and `NoShowReason` columns using the `DataSimulator` class. You can specify custom column names for each output.

In [1]:
%pip install -r ../requirements.txt
%load_ext autoreload
%autoreload 2

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.2.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import sys
import os
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Fix sys.path to include the parent directory so 'src' can be imported
sys.path.append(os.path.abspath('..'))

from src.data_simulator import DataSimulator
from src.preprocessor import DataPreprocessor
from src.plots import PlotGenerator
from src import config
from src.config import INPUT_PATH, OUTPUT_PATH

# Initialize the plotting system
plotter = PlotGenerator(style='whitegrid', palette='viridis')

# Load the data using the preprocessing module
preprocessor = DataPreprocessor(config)
df = preprocessor.load_data(INPUT_PATH)

# Initialize the simulator
simulator = DataSimulator()

print("Starting data simulation...")

# Simulate and save notes with custom column names (change as needed)
simulated_df = simulator.simulate(
    input_csv=INPUT_PATH,
    output_csv=OUTPUT_PATH,
    notes_col='PatientNotes',
    sentiment_col='PatientSentiment',
    reason_col='NoShowReason'
)

print("Data simulation completed!")

# Display a sample of the generated columns
simulated_df[[
    'PatientId', 'Age', 'Gender', 'PatientNotes', 'PatientSentiment', 'NoShowReason'
]].head(10)

Starting data simulation...
Data simulation completed!


Unnamed: 0,PatientId,Age,Gender,PatientNotes,PatientSentiment,NoShowReason
0,29872500000000.0,62,F,This patient has a longstanding history of ele...,Patient is generally positive and engaged in c...,"Satisfaction with previous care, including suc..."
1,558997800000000.0,56,M,No chronic conditions or significant health co...,Patient is generally positive and engaged in c...,"Encouragement from family members, especially ..."
2,4262962000000.0,62,F,No chronic conditions or significant health co...,Patient feels hopeful and confident about mana...,"A recent health scare, such as a hospitalizati..."
3,867951200000.0,8,F,Child accompanied by parent/guardian. Reviewed...,Patient feels hopeful and confident about mana...,The patient is focused on improving overall qu...
4,8841186000000.0,56,F,Patient with hypertension is tracking salt int...,Patient feels hopeful and confident about mana...,"Encouragement from family members, especially ..."
5,95985130000000.0,76,F,Patient with hypertension is receiving sleep h...,Patient is generally positive and engaged in c...,The patient is focused on improving overall qu...
6,733688200000000.0,23,F,No chronic conditions or significant health co...,Patient feels hopeful and confident about mana...,A power outage caused by severe weather made i...
7,3449833000000.0,39,F,No chronic conditions or significant health co...,Patient is optimistic and shows no significant...,Recent bereavement or loss in the family affec...
8,56394730000000.0,21,F,No chronic conditions or significant health co...,Patient is optimistic and shows no significant...,The patient prioritizes following medical advi...
9,78124560000000.0,19,F,No chronic conditions or significant health co...,Patient is optimistic and shows no significant...,Trust and positive relationships with the heal...


In [3]:
print("No-show value counts:")
print(simulated_df['No-show'].value_counts())
print("\nNo-show reason counts by no-show:")
print(simulated_df.groupby('No-show')['NoShowReason'].nunique())

No-show value counts:
No-show
No     88208
Yes    22319
Name: count, dtype: int64

No-show reason counts by no-show:
No-show
No      506
Yes    2550
Name: NoShowReason, dtype: int64
