#  Synthetic Wellness Dataset Generator

This notebook generates a **synthetic wellness dataset** that includes both **normal daily patterns** and **edge cases** for testing the robustness of wellness-related models.

##  Purpose:
- Simulate realistic health metrics: `Sleep`, `Mood`, and `Steps`
- Create extreme and contradictory scenarios (edge cases) to help test ML models, scoring systems, or rule-based wellness assessments
- Save the final dataset as a CSV for further use in model training or evaluation

The final dataset includes:
-  1800 normal entries (balanced lifestyle)
-  200 edge cases (low sleep, zero steps, contradictory combinations, etc.)


###  Step 1: Import Required Libraries

We'll start by importing the necessary libraries: `pandas` and `numpy`.


In [2]:
import pandas as pd
import numpy as np


###  Step 2: Set Random Seed

Setting the random seed for reproducibility of results across runs.


In [3]:
# Set seed for reproducibility
np.random.seed(42)


###  Step 3: Define Function to Generate Normal Wellness Data

This function simulates normal cases with:
- Sleep: Normally distributed around 7.5 hours
- Mood: Random integers between 4 and 8
- Steps: Normally distributed around 8000


In [4]:
def generate_normal_data(n):
    sleep = np.random.normal(loc=7.5, scale=1.0, size=n).clip(3, 10)
    mood = np.random.randint(4, 9, size=n)
    steps = np.random.normal(loc=8000, scale=2000, size=n).clip(1000, 15000)
    df = pd.DataFrame({'Sleep': sleep, 'Mood': mood, 'Steps': steps})
    return df.round(3)


###  Step 4: Define Function to Generate Edge Case Data

This function creates edge scenarios to test robustness of models:
- Extremely low or high sleep
- Zero sleep or steps
- Contradictory or unusual combinations
- Ideal and worst-case scenarios


In [5]:
def generate_edge_cases():
    edge_data = []

    for _ in range(50):  # Very low sleep
        edge_data.append([np.random.uniform(0.5, 2.5), np.random.randint(1, 6), np.random.randint(1000, 8000)])

    for _ in range(50):  # Oversleeping
        edge_data.append([np.random.uniform(10.5, 12), np.random.randint(3, 7), np.random.randint(1000, 10000)])

    for _ in range(25):  # Zero sleep
        edge_data.append([0, np.random.randint(1, 5), np.random.randint(100, 5000)])

    for _ in range(25):  # High mood, poor sleep/steps
        edge_data.append([np.random.uniform(2, 4), 9, np.random.randint(200, 2000)])

    for _ in range(50):  # Sedentary lifestyle
        edge_data.append([np.random.uniform(6, 8), np.random.randint(2, 6), 0])

    for _ in range(50):  # Very active users
        edge_data.append([np.random.uniform(6, 9), np.random.randint(6, 10), np.random.randint(16000, 20000)])

    for _ in range(25):  # Perfect balance
        edge_data.append([8, 9, 10000])

    for _ in range(25):  # Worst values
        edge_data.append([0, 1, 0])

    for _ in range(25):  # Contradictory: high sleep + mood, low steps
        edge_data.append([np.random.uniform(9, 10), 9, np.random.randint(100, 1000)])

    for _ in range(25):  # Contradictory: low sleep, high steps, low mood
        edge_data.append([np.random.uniform(2, 4), 2, np.random.randint(12000, 18000)])

    edge_df = pd.DataFrame(edge_data, columns=["Sleep", "Mood", "Steps"])
    return edge_df.round(3)


### 🧪 Step 5: Generate Normal and Edge Case Datasets

We now generate:
- 1800 rows of normal data
- 200 rows of edge case data


In [6]:
normal_df = generate_normal_data(1800)
edge_df = generate_edge_cases()


### 🔀 Step 6: Combine and Shuffle Data

We combine the normal and edge case data into one dataset and shuffle it.


In [7]:
full_df = pd.concat([normal_df, edge_df], ignore_index=True)
full_df = full_df.sample(frac=1, random_state=42).reset_index(drop=True)


### 💾 Step 7: Save the Final Dataset

Save the combined dataset to a CSV file for use in model training or analysis.


In [10]:
full_df.to_csv("synthetic_wellness_dataset.csv", index=False)
print("Dataset generated and saved as 'synthetic_wellness_dataset.csv'")

Dataset generated and saved as 'synthetic_wellness_dataset.csv'
