#Remote Work & Health impact Analysis
##Notebook 03: Data Analysis

**Author:** Mengie Jean-Baptiste
**Date:** September 2025
**Purpose:** This notebook evaluates how job role, industry, salary, and work arrangement relate to social isolation, burnout, and work-life balance.


## Project Objectives

1. Create Job Health profiles
For each job role:
- Average burnout level
- Average work life balance score
- Average social isolation score
2. Analayze specific roles in different industries and their average burnout level
3. Compare tech vs. non-tech roles and their burnout level
4. Compare the salary ranges with burnout level
5. Salary vs. isolation score
6. Isolation vs. work arrangement

This notebook addresses the following questions:

- Does a higher salary eliminate burnout risk?
- Which job roles experiences the highest and lowest burnout level?
- Does remote work increase chances of social isolation?
- Is a higher salary associated with improved work-life balance score?



## Dataset Overview

- Regions: North America, South America, Africa, Oceania, Europe, Asia
- Region of focus: North America
- Unit of Observation: Individual employees
- Variables:
    - Job Role
    - Industry
    - Work Arrangement
    - Burnout level
    - Social isolation score
    - work-life balance score


## Import setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

pd.set_option("display.max_columns", None)
np.random.seed(42)

In [None]:
df_na=pd.read_csv("/Users/mengiejean-baptiste/remote_work_health_analysis/data/processed/processed_na.csv")
df_eu=pd.read_csv("/Users/mengiejean-baptiste/remote_work_health_analysis/data/processed/processed_eu.csv")
df_af=pd.read_csv("/Users/mengiejean-baptiste/remote_work_health_analysis/data/processed/processed_af.csv")
df_as=pd.read_csv("/Users/mengiejean-baptiste/remote_work_health_analysis/data/processed/processed_as.csv")
df_sa=pd.read_csv("/Users/mengiejean-baptiste/remote_work_health_analysis/data/processed/processed_sa.csv")
df_oc=pd.read_csv("/Users/mengiejean-baptiste/remote_work_health_analysis/data/processed/processed_oc.csv")

(497, 16)

## North America Data set

## Job Role Distribution
In this section job roles are organized by industry.

In [80]:
#Sort roles by industry
df_na_sorted=df_na.sort_values(by="industry")
df_na_sorted.head()
df_na["industry"].value_counts()

job_industry_counts=(
    df_na
    .groupby(["industry", "job_role"])
    .size()
    .reset_index(name="count")
)

job_industry_counts

Unnamed: 0,industry,job_role,count
0,Customer Service,Account Manager,2
1,Customer Service,Business Analyst,1
2,Customer Service,Consultant,1
3,Customer Service,Content Writer,2
4,Customer Service,Customer Service Manager,1
...,...,...,...
177,Technology,Research Scientist,6
178,Technology,Sales Representative,4
179,Technology,Software Engineer,3
180,Technology,Technical Writer,5


## Salary Range by Job Role
This section examines the salary ramge for all job roles.

In [20]:
salary_by_role=(
    df_na
    .groupby("job_role")["salary_range"]
    .agg(lambda x: x.mode().iloc[0] if not x.mode().empty else None)
)

salary_by_role

job_role
Account Manager                   $60K-80K
Business Analyst                  $60K-80K
Consultant                       $80K-100K
Content Writer                    $40K-60K
Customer Service Manager          $60K-80K
Data Analyst                     $80K-100K
Data Scientist                   $80K-100K
DevOps Engineer                 $100K-120K
Digital Marketing Specialist     $80K-100K
Executive Assistant              $80K-100K
Financial Analyst                 $60K-80K
HR Manager                        $60K-80K
IT Support                        $60K-80K
Marketing Specialist              $60K-80K
Operations Manager                $60K-80K
Product Manager                   $60K-80K
Project Manager                   $60K-80K
Quality Assurance                 $40K-60K
Research Scientist               $80K-100K
Sales Representative             $80K-100K
Social Media Manager              $40K-60K
Software Engineer                $80K-100K
Technical Writer                  $60K-80K
UX

## Mental Health Status by Job Role

In [None]:
#Make a copy of North ameirca dataset with reported mental health status
df_na_reported=df_na[
    df_na["mental_health_status"].notna() &
    (df_na["mental_health_status"] != "not_reported")
].copy()

#Mental staus by role
mental_status_by_role=(
    df_na_reported
    .groupby("job_role")["mental_health_status"]
    .agg(lambda x: x.mode().iloc[0])
)

mental_status_by_role

mental_status_counts=(
    df_na_reported
    .groupby(["job_role", "mental_health_status"])
    .size()
    .reset_index(name="count")
    .sort_values(["job_role", "count"], ascending=[True, False])
)
mental_status_counts

#Percentage of mental healthh status counts
mental_status_pct=(
    mental_status_counts.assign(
        pct=lambda x: x["count"] /
        x.groupby("job_role")["count"].transform("sum")*100
    )
    .round(1)
)
mental_status_pct

Unnamed: 0,job_role,mental_health_status,count,pct
4,Account Manager,ptsd,5,33.3
0,Account Manager,adhd,3,20.0
2,Account Manager,burnout,3,20.0
3,Account Manager,depression,2,13.3
1,Account Manager,anxiety,1,6.7
...,...,...,...,...
130,UX Designer,adhd,3,21.4
131,UX Designer,anxiety,3,21.4
133,UX Designer,depression,2,14.3
132,UX Designer,burnout,1,7.1


## Mental Health Status by Industry

In [None]:
#Seperate by industry
mental_status_by_industry=(
    df_na_reported
    .groupby("industry")["mental_health_status"]
    .agg(lambda x: x.mode().iloc[0])
)
print(mental_status_by_industry)

#Add counts
mental_status_counts=(
    df_na_reported
    .groupby(["industry", "mental_health_status"])
    .size()
    .reset_index(name="count")
    .sort_values(["industry", "count"], ascending=[True, False])
)

print(mental_status_counts)

#Add percentages of counts
mental_status_pct=(
    mental_status_counts.assign(
        pct=lambda x: x["count"] /
        x.groupby ("industry") ["count"].transform("sum") * 100
    )
    .round(1)
)
mental_status_pct


industry
Customer Service              depression
Education                        burnout
Finance                       depression
Healthcare               stress disorder
Manufacturing                    burnout
Marketing                           adhd
Professional Services    stress disorder
Retail                        depression
Technology                          adhd
Name: mental_health_status, dtype: object
                 industry mental_health_status  count
3        Customer Service           depression      5
1        Customer Service              anxiety      4
2        Customer Service              burnout      4
4        Customer Service                 ptsd      4
5        Customer Service      stress disorder      3
0        Customer Service                 adhd      2
8               Education              burnout      9
9               Education           depression      7
10              Education      stress disorder      7
7               Education              a

Unnamed: 0,industry,mental_health_status,count,pct
3,Customer Service,depression,5,22.7
1,Customer Service,anxiety,4,18.2
2,Customer Service,burnout,4,18.2
4,Customer Service,ptsd,4,18.2
5,Customer Service,stress disorder,3,13.6
0,Customer Service,adhd,2,9.1
8,Education,burnout,9,30.0
9,Education,depression,7,23.3
10,Education,stress disorder,7,23.3
7,Education,anxiety,4,13.3


## For the analysis of mental health status by job role and industry, data was restricted to respondents who reported their specific condition. Non reported responses were excluded to avoid bias and misclassification. Among the reported cases, Depression and adhd were the most commonly reported across most job indusries, with higher prevelance observed in Finance.

## Burnout by job role within different industries

This section explores the burnout distribution for job roles across different industries.

In [146]:
burnout_by_role=df_na.groupby("job_role")["burnout_level"].agg(
    lambda x: x.mode().iloc[0]
)

pd.crosstab(df_na["job_role"], df_na["burnout_level"], normalize="index")

#Burnout level distribution by job role across industries
burnout_dist_industry=(
    pd.crosstab(
        [df_na["industry"], df_na["job_role"]],
        df_na["burnout_level"],
        normalize="index"
    )
)
print(burnout_dist_industry)
burnout_by_role

burnout_level                                  High       Low    Medium
industry         job_role                                              
Customer Service Account Manager           0.500000  0.000000  0.500000
                 Business Analyst          0.000000  0.000000  1.000000
                 Consultant                0.000000  0.000000  1.000000
                 Content Writer            0.000000  0.500000  0.500000
                 Customer Service Manager  0.000000  1.000000  0.000000
...                                             ...       ...       ...
Technology       Research Scientist        0.500000  0.166667  0.333333
                 Sales Representative      0.500000  0.250000  0.250000
                 Software Engineer         0.666667  0.333333  0.000000
                 Technical Writer          0.400000  0.400000  0.200000
                 UX Designer               0.000000  0.333333  0.666667

[182 rows x 3 columns]


job_role
Account Manager                   High
Business Analyst                Medium
Consultant                      Medium
Content Writer                  Medium
Customer Service Manager        Medium
Data Analyst                    Medium
Data Scientist                  Medium
DevOps Engineer                 Medium
Digital Marketing Specialist       Low
Executive Assistant             Medium
Financial Analyst               Medium
HR Manager                        High
IT Support                        High
Marketing Specialist            Medium
Operations Manager                High
Product Manager                 Medium
Project Manager                 Medium
Quality Assurance                 High
Research Scientist                High
Sales Representative            Medium
Social Media Manager            Medium
Software Engineer                 High
Technical Writer                   Low
UX Designer                     Medium
Name: burnout_level, dtype: object

## Job Health profiles

In [None]:
#Median work-life-balance and isolation score by role
wlb_isolation_by_role=(
    df_na
    .groupby("job_role")
    .agg(
        med_work_life_balance=("work_life_balance_score", "median"),
        med_social_isolation=("social_isolation_score", "median"),
        count=("job_role", "size")
    )
)
print(wlb_isolation_by_role)




                              med_work_life_balance  med_social_isolation  \
job_role                                                                    
Account Manager                                 3.0                   3.0   
Business Analyst                                3.0                   3.0   
Consultant                                      4.0                   3.0   
Content Writer                                  2.0                   2.0   
Customer Service Manager                        3.0                   3.0   
Data Analyst                                    3.0                   3.0   
Data Scientist                                  3.0                   3.0   
DevOps Engineer                                 3.0                   2.0   
Digital Marketing Specialist                    3.0                   2.0   
Executive Assistant                             3.0                   3.0   
Financial Analyst                               3.0                   2.5   

* Methodological Info

- Burnout level is categorical (High, Medium, Low) and is analyzed using distributions rather than averages.
- Both social isolation scores and work-life balance scores are ordinal 1-5. They are analyzed using medians instead of means.
    ~ Social isolation scores: 1 (none) to 5 (severe)
    ~ Work-life balance scores: 1 (poor) to 5 (exxcellent)
    

In [170]:
#Job health profile
job_health_profiles=(
    burnout_by_role.to_frame()
    .join(wlb_isolation_by_role)
    .sort_values("med_work_life_balance")
)
job_health_profiles

#Identify high, medium, and low burnout roles

high_burnout=job_health_profiles[job_health_profiles["burnout_level"]=="High"].sort_values("count", ascending=False)
high_burnout

med_burnout=job_health_profiles[job_health_profiles["burnout_level"]=="Medium"].sort_values("count", ascending=False)
med_burnout

low_burnout=job_health_profiles[job_health_profiles["burnout_level"]=="Low"].sort_values("count", ascending=False)
low_burnout

#Tech and Creative Roles

job_health_profiles=job_health_profiles.reset_index()
tech_roles=job_health_profiles[
    job_health_profiles["job_role"].str.contains(
        "Engineer|Scientis|IT|DevOps|QA|Data", case=False
    )
]
creative_roles=job_health_profiles[
    job_health_profiles["job_role"].str.contains(
        "Writer|Designer| Marketing| Social| Ux", case=False
    )
]
tech_roles["burnout_level"].value_counts(), creative_roles["burnout_level"].value_counts()

# Healthiest job roles
job_health_profiles["health_score"]=(
    job_health_profiles["med_work_life_balance"]-
    job_health_profiles["med_social_isolation"]
)
job_health_profiles.sort_values("health_score", ascending=False).head(10)

Unnamed: 0,job_role,burnout_level,med_work_life_balance,med_social_isolation,count,health_score
23,UX Designer,Medium,4.0,2.0,17,2.0
14,DevOps Engineer,Medium,3.0,2.0,20,1.0
22,Consultant,Medium,4.0,3.0,23,1.0
4,Social Media Manager,Medium,3.0,2.0,18,1.0
6,Quality Assurance,High,3.0,2.0,14,1.0
7,Product Manager,Medium,3.0,2.0,27,1.0
13,Digital Marketing Specialist,Low,3.0,2.0,22,1.0
21,Sales Representative,Medium,3.5,3.0,16,0.5
20,Project Manager,Medium,3.5,3.0,18,0.5
19,Financial Analyst,Medium,3.0,2.5,16,0.5


# Key Findings
- Burnout level across most job roles are medium-high
    - Almost all High burnout level roles have a 3.0 in Work-life balance and Social isolation scores.
* Employees are both over worked and isolated in the Tech, Operations, and corporate industry
- Creative and communication roles have lower burnout levels, but still with moderate Work-life balance and social isolation scores
- Job roles with the healthiest profiles are UX Designer, consultant, sales Representative, and Project Manager. Their work life balance scores are the highest with lower isolation scores.
  

## Compare salary ranges with burnout level

This section explores the relationship between higher or lower salary ranges and different burnout prevelance.

In [172]:
#Salary vs. burnout level
burnout_by_salary=(
    df_na.groupby(["salary_range", "burnout_level"])
    .size()
    .reset_index(name="count")
)
burnout_by_salary

#Salary vs. work life balance
salary_wlb=(
    df_na.groupby("salary_range")["work_life_balance_score"]
    .median()
    .reset_index(name="median_wlb_score")
)

# Salary vs. isolation score
salary_isolation=(
    df_na.groupby("salary_range")["social_isolation_score"]
    .agg(lambda x: x.mode().iloc[0])
    .reset_index(name="avg_isolation_score")
    .sort_values("avg_isolation_score", ascending=False)
)

salary_wlb




Unnamed: 0,salary_range,median_wlb_score
0,$100K-120K,3.0
1,$120K+,3.5
2,$40K-60K,3.0
3,$60K-80K,3.0
4,$80K-100K,3.0


## Social Isolation by Work Arrangement

This section explores the association of work types such as remote, hybrid, and onsite with isolation scores

In [171]:
#Isolation score across work types
isolation_by_work=(
    df_na
    .groupby("work_arrangement")
    .agg(
        med_social_isolation=("social_isolation_score", "median"),
        count=("social_isolation_score", "count")
    )
    .reset_index()
)

isolation_by_work

#Distribution of isolation scores
isolation_dist=(
    df_na
    .groupby(["work_arrangement", "social_isolation_score"])
    .size()
    .reset_index(name="count")
)
isolation_by_work

Unnamed: 0,work_arrangement,med_social_isolation,count
0,Hybrid,3.0,142
1,Onsite,2.0,256
2,Remote,4.0,99


## Final Observations

- Burnout risks peak at middle-income job roles.
-  Tech, operations, and corporate roles shows higher burnout risk.
- UX, consulting, and creative roles show the healthiest profiles.
- Remote work shows the highest chances of social isolation.
- Work-life balance is poor for middle income earners before reaching a higher salary, which is associated with improved work-life balance.

## Limitations
- Self reported survey data may include reporting bias.
- Some roles and industries have smaller sample sizes.
- Results are specific to North America

## Conclusion

This analysis proves that employee well being is partly dependent on job roles, industries, and work arrangments. Burnout is driven by social isolation and poor work-life balance. Organizations should make effort to prioritize social connection and role clarity to improve health outcomes.

## Next Steps
- Create Dashboard to demonstrate findings