<a href="https://colab.research.google.com/github/nattaran/HealthTequity-LLM/blob/main/generateBloodPressureData.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

🩺 Generate Synthetic Blood Pressure Dataset
This notebook generates a realistic 30-day blood pressure dataset
for one individual (age, sex, systolic, diastolic)
and saves it with a descriptive filename including the age and sex.

Example output file:
synthetic_bp_45_female.csv

Author: Nasrin Attaran
Created: 2025-10-19
Project: HealthTequity Voice Pipeline


In [1]:
# --- Step 1: Import required libraries ---
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from pathlib import Path
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [6]:
from pathlib import Path

# Save inside your Drive
DATA_DIR = Path("/content/drive/MyDrive/HealthTequity-LLM/data/synthetic_csv")
DATA_DIR.mkdir(parents=True, exist_ok=True)

# Save CSV with dynamic name (including age/sex)
filename = f"synthetic_bp_{age}_{sex.lower()}.csv"
output_csv = DATA_DIR / filename

df.to_csv(output_csv, index=False)
print(f"✅ File saved to Google Drive: {output_csv.resolve()}")


✅ File saved to Google Drive: /content/drive/MyDrive/HealthTequity-LLM/data/synthetic_csv/synthetic_bp_70_female.csv


In [14]:
# ==========================================================
# 🩺 Synthetic Blood Pressure Dataset (Normal ↔ Hypertensive)
# ==========================================================
# Randomly oscillates between normal and hypertensive readings
# for an older adult over 30 days. Includes age and sex.
# ==========================================================

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from pathlib import Path
from google.colab import drive

# --- Mount Drive to save output ---
drive.mount('/content/drive')

# --- Output folder ---
DATA_DIR = Path("/content/drive/MyDrive/HealthTequity-LLM/data/synthetic_csv")
DATA_DIR.mkdir(parents=True, exist_ok=True)

# --- Parameters ---
age = 95
sex = "female"
num_days = 30
np.random.seed(42)

# --- Dates ---
start_date = datetime.today() - timedelta(days=num_days - 1)
dates = pd.date_range(start=start_date, periods=num_days)

# --- Randomly choose each day's BP state ---
states = np.random.choice(["normal", "hypertensive"], size=num_days, p=[0.6, 0.4])

systolic = []
diastolic = []

for state in states:
    if state == "normal":
        sys = np.random.normal(118, 6)
        dia = np.random.normal(76, 4)
    else:  # hypertensive
        sys = np.random.normal(148, 10)
        dia = np.random.normal(96, 6)
    systolic.append(round(sys))
    diastolic.append(round(dia))

# --- DataFrame ---
df = pd.DataFrame({
    "date": [d.strftime("%Y-%m-%d") for d in dates],
    "age": age,
    "sex": sex,
    "systolic_mmHg": systolic,
    "diastolic_mmHg": diastolic,
    "bp_category": states
})

# --- Save to Drive ---
filename = f"synthetic_bp_{age}_{sex.lower()}.csv"
output_csv = DATA_DIR / filename
df.to_csv(output_csv, index=False, encoding="utf-8-sig")

print(f"✅ File saved to Google Drive: {output_csv.resolve()}")
display(df.head(10))

# --- Optional quick summary ---
print("\nCategory distribution:")
print(df["bp_category"].value_counts())


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
✅ File saved to Google Drive: /content/drive/MyDrive/HealthTequity-LLM/data/synthetic_csv/synthetic_bp_95_female.csv


Unnamed: 0,date,age,sex,systolic_mmHg,diastolic_mmHg,bp_category
0,2025-09-21,95,female,111,78,normal
1,2025-09-22,95,female,142,94,hypertensive
2,2025-09-23,95,female,142,107,hypertensive
3,2025-09-24,95,female,118,72,normal
4,2025-09-25,95,female,123,71,normal
5,2025-09-26,95,female,119,68,normal
6,2025-09-27,95,female,110,77,normal
7,2025-09-28,95,female,155,97,hypertensive
8,2025-09-29,95,female,147,94,hypertensive
9,2025-09-30,95,female,133,92,hypertensive



Category distribution:
bp_category
normal          21
hypertensive     9
Name: count, dtype: int64
