### Remote work impacts on health

#### Description 
The Post-Pandemic Remote Work Health Impact 2025 dataset presents a comprehensive, global snapshot of how remote, hybrid, and onsite work arrangements are influencing the mental and physical health of employees in the post-pandemic era. Collected in June 2025, this dataset aggregates responses from a diverse workforce spanning continents, industries, age groups, and job roles. It is designed to support research, data analysis, and policy-making around the evolving landscape of work and well-being.

This dataset enables in-depth exploration of:

The prevalence of mental health conditions (e.g., anxiety, burnout, PTSD, depression) across different work setups.
The relationship between work arrangements and physical health complaints (e.g., back pain, eye strain, neck pain).
Variations in work-life balance, social isolation, and burnout levels segmented by demographic and occupational factors.
Salary distributions and their correlation with health outcomes and job roles.
By providing granular, anonymized data on both subjective (self-reported) and objective (hours worked, salary range) factors, this resource empowers data scientists, health researchers, HR professionals, and business leaders to:

Identify risk factors and protective factors for employee well-being.
Benchmark health impacts across industries and regions.
Inform organizational policy and future-of-work strategies.

In [63]:
import pandas as pd
import numpy as np 
from kaggle.api.kaggle_api_extended import KaggleApi

from sklearn.preprocessing import MultiLabelBinarizer

pd.set_option("display.max_columns", None)

api = KaggleApi()
api.authenticate()

api.dataset_download_files(
    'kshitijsaini121/remote-work-of-health-impact-survey-june-2025',
    path='data/',
    unzip=True 
)

df = pd.read_csv("data/post_pandemic_remote_work_health_impact_2025.csv", parse_dates=['Survey_Date'])

df.head()

Dataset URL: https://www.kaggle.com/datasets/kshitijsaini121/remote-work-of-health-impact-survey-june-2025


Unnamed: 0,Survey_Date,Age,Gender,Region,Industry,Job_Role,Work_Arrangement,Hours_Per_Week,Mental_Health_Status,Burnout_Level,Work_Life_Balance_Score,Physical_Health_Issues,Social_Isolation_Score,Salary_Range
0,2025-06-01,27,Female,Asia,Professional Services,Data Analyst,Onsite,64,Stress Disorder,High,3,Shoulder Pain; Neck Pain,2,$40K-60K
1,2025-06-01,37,Female,Asia,Professional Services,Data Analyst,Onsite,37,Stress Disorder,High,4,Back Pain,2,$80K-100K
2,2025-06-01,32,Female,Africa,Education,Business Analyst,Onsite,36,ADHD,High,3,Shoulder Pain; Eye Strain,2,$80K-100K
3,2025-06-01,40,Female,Europe,Education,Data Analyst,Onsite,63,ADHD,Medium,1,Shoulder Pain; Eye Strain,2,$60K-80K
4,2025-06-01,30,Male,South America,Manufacturing,DevOps Engineer,Hybrid,65,,Medium,5,,4,$60K-80K


In [60]:
# one-hot encode the mutli-valued columns 
lists = (
    df['Physical_Health_Issues']
      .fillna('')
      .str.split(';')
      .apply(lambda L: [item.strip() for item in L if item.strip()])
)

mlb = MultiLabelBinarizer()

onehot = pd.DataFrame(
    mlb.fit_transform(lists),
    columns=mlb.classes_,
    index=df.index
)

df.drop("Physical_Health_Issues", axis = 1, inplace = True)

df = pd.concat([df, onehot], axis = 1)


In [61]:
print(df.describe())
print("\n -----")
print(df.info())


                         Survey_Date          Age  Hours_Per_Week  \
count                           3157  3157.000000     3157.000000   
mean   2025-06-13 13:29:37.763699712    43.732024       49.904973   
min              2025-06-01 00:00:00    22.000000       35.000000   
25%              2025-06-07 00:00:00    33.000000       42.000000   
50%              2025-06-14 00:00:00    44.000000       50.000000   
75%              2025-06-20 00:00:00    55.000000       57.000000   
max              2025-06-26 00:00:00    65.000000       65.000000   
std                              NaN    12.661095        8.897699   

       Work_Life_Balance_Score  Social_Isolation_Score    Back Pain  \
count              3157.000000             3157.000000  3157.000000   
mean                  2.996516                2.704783     0.495090   
min                   1.000000                1.000000     0.000000   
25%                   2.000000                2.000000     0.000000   
50%                   3

In [65]:
print(df['Work_Life_Balance_Score'].value_counts())

# Create a indicator for satisfied and unsatisfied (could have class imbalance)
df['Work_Satisfied_Indicator'] = np.where(df['Work_Life_Balance_Score'] >= 4, 1, 0)

print(df['Work_Satisfied_Indicator'].value_counts())

Work_Life_Balance_Score
3    1169
4     655
2     572
1     404
5     357
Name: count, dtype: int64
Work_Satisfied_Indicator
0    2145
1    1012
Name: count, dtype: int64
