# Simulate Patient Data Columns
This notebook demonstrates how to generate the `PatientNotes`, `PatientSentiment`, and `NoShowReason` columns using the `DataSimulator` class. You can specify custom column names for each output.

In [1]:
%pip install -r ../requirements.txt

Collecting numpy>=1.20.0
  Using cached numpy-2.2.6-cp310-cp310-win_amd64.whl (12.9 MB)
Collecting pandas>=1.3.0
  Using cached pandas-2.3.0-cp310-cp310-win_amd64.whl (11.1 MB)
Collecting scikit-learn>=1.0.0
  Using cached scikit_learn-1.7.0-cp310-cp310-win_amd64.whl (10.7 MB)
Collecting xgboost>=1.5.0
  Using cached xgboost-3.0.2-py3-none-win_amd64.whl (150.0 MB)
Collecting matplotlib>=3.4.0
  Using cached matplotlib-3.10.3-cp310-cp310-win_amd64.whl (8.1 MB)
Collecting seaborn>=0.11.0
  Using cached seaborn-0.13.2-py3-none-any.whl (294 kB)
Collecting plotly>=5.0.0
  Using cached plotly-6.1.2-py3-none-any.whl (16.3 MB)
Collecting nltk>=3.6.0
  Using cached nltk-3.9.1-py3-none-any.whl (1.5 MB)
Collecting transformers>=4.20.0
  Using cached transformers-4.52.4-py3-none-any.whl (10.5 MB)
Collecting torch>=1.10.0
  Using cached torch-2.7.1-cp310-cp310-win_amd64.whl (216.1 MB)
Collecting streamlit>=1.10.0
  Using cached streamlit-1.45.1-py3-none-any.whl (9.9 MB)
Collecting pytest>=6.0.0
  U


[notice] A new release of pip available: 22.2.1 -> 25.1.1
[notice] To update, run: d:\Personal\AI-Admissions\Semester 3\AAI-510 - Machine learning Fundamentals and Applications\Final Team Project\aai510_3proj\.venv\Scripts\python.exe -m pip install --upgrade pip


In [3]:
import sys
import os
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Fix sys.path to include the parent directory so 'src' can be imported
sys.path.append(os.path.abspath('..'))

from src.data_simulator import DataSimulator
from src.preprocessor import DataPreprocessor
from src.plots import PlotGenerator
from src import config
from src.config import INPUT_PATH, OUTPUT_PATH

# Initialize the plotting system
plotter = PlotGenerator(style='whitegrid', palette='viridis')

# Load the data using the preprocessing module
preprocessor = DataPreprocessor(config)
df = preprocessor.load_data(INPUT_PATH)

# Initialize the simulator
simulator = DataSimulator()

print("Starting data simulation...")

# Simulate and save notes with custom column names (change as needed)
simulated_df = simulator.simulate(
    input_csv=INPUT_PATH,
    output_csv=OUTPUT_PATH,
    notes_col='PatientNotes',
    sentiment_col='PatientSentiment',
    reason_col='NoShowReason'
)

print("Data simulation completed!")

# Display a sample of the generated columns
simulated_df[[
    'PatientId', 'Age', 'Gender', 'PatientNotes', 'PatientSentiment', 'NoShowReason'
]].head(10)

Starting data simulation...
Data simulation completed!
Data simulation completed!


Unnamed: 0,PatientId,Age,Gender,PatientNotes,PatientSentiment,NoShowReason
0,29872500000000.0,62,F,Patient has a known history of hypertension. P...,Patient expresses fear and anxiety about high ...,
1,558997800000000.0,56,M,Discussed men's health and cardiovascular risk...,Patient is hopeful and shows no significant an...,
2,4262962000000.0,62,F,Discussed women's health screening and prevent...,Elderly patient expresses fear of declining he...,
3,867951200000.0,8,F,Pediatric patient. Parent/guardian present dur...,Patient (minor) is anxious and fearful about m...,
4,8841186000000.0,56,F,Patient has a known history of hypertension. P...,Patient experiences stress and anxiety managin...,
5,95985130000000.0,76,F,Patient has a known history of hypertension. P...,Patient expresses fear and anxiety about high ...,
6,733688200000000.0,23,F,Patient previously missed appointments. Discus...,Patient is hopeful and shows no significant an...,Patient was anxious about the visit.
7,3449833000000.0,39,F,Patient previously missed appointments. Discus...,Patient is hopeful and shows no significant an...,Couldn't leave home due to household duties.
8,56394730000000.0,21,F,Discussed women's health screening and prevent...,Patient is hopeful and shows no significant an...,
9,78124560000000.0,19,F,Discussed women's health screening and prevent...,Patient is hopeful and shows no significant an...,
