# MSc. Dissertation - DataScience By Robert Solomon
### Analysis of Remote Work Impact on Employee Well-Being (Cleaned Primary Dataset)

In [37]:
# Importing necessary libraries here below:
import numpy as np
import pandas as pd

### 1. Loading in the primary survey data

In [39]:
wfh_mentalHealth_surveyData = pd.read_csv('../Primary_Research/PR_Dataset/WFH-Mental_Health_(Survey).csv')

In [40]:
# Displaying the first few rows to inspect the data
wfh_mentalHealth_surveyData.head()

Unnamed: 0,Timestamp,Q1: What is your age group?,Q2: What is your gender?,Q3: What is your current work arrangement?,Q4: What is your job role/industry?,"Q5: On a scale of 1 (very poor) to 5 (excellent), how would you rate your work-life balance as a remote worker?",Q6: How many hours per week do you typically work? (Numeric and Decimal input accepted which will be rounded to the nearest whole number for e.g. 6.7hrs = 7hrs),"Q7: On a scale of 1 (no stress) to 5 (extremely stressed), how would you rate your stress levels since starting remote work?",Q8: What factors contribute most to your stress levels? (Select all that apply),Q9: How often do you feel socially isolated while working remotely?,Q10: Do you feel a lack of connection with your team while working remotely?,"Q11: Does your employer provide resources to support mental health (e.g., counseling, wellness programs)?",Q12: What changes would you recommend to improve mental health support for remote workers? (Open-ended)
0,2025/01/27 12:02:17 PM GMT,25–34,Male,Hybrid,IT/Technology,4,35,2,Isolation from colleagues,Always,Yes,No,Stop remote work and have people come into the...
1,2025/01/27 12:02:39 PM GMT,25–34,Male,Remote,IT/Technology,5,40,2,Lack of clarity about whether I will be forced...,Never,No,Yes,
2,2025/01/27 12:11:36 PM GMT,35–44,Male,Remote,IT/Technology,5,39,1,Other (please specify);Less commute / Less tim...,Sometimes,No,Unsure,Regional hot desk options at some of the many ...
3,2025/01/27 12:14:54 PM GMT,45–54,Male,Remote,IT/Technology,5,39,3,On call over weekends,Sometimes,No,Unsure,Occasional lunchtime meetups if enough people ...
4,2025/01/27 12:25:24 PM GMT,35–44,Female,Remote,IT/Technology,2,9,5,Increased workload,Often,No,Unsure,Add remote workers team lead to identify possi...


In [41]:
# Checking number of rows, columns in the data.

print(wfh_mentalHealth_surveyData.shape)

(45, 13)


In [42]:
# Printing the column names and their data types
print(wfh_mentalHealth_surveyData.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45 entries, 0 to 44
Data columns (total 13 columns):
 #   Column                                                                                                                                                             Non-Null Count  Dtype 
---  ------                                                                                                                                                             --------------  ----- 
 0   Timestamp                                                                                                                                                          45 non-null     object
 1   Q1: What is your age group?                                                                                                                                        45 non-null     object
 2   Q2: What is your gender?                                                                                                        

In [43]:
# Renaming column names using a list
wfh_mentalHealth_surveyData.columns = [
    "Timestamp",
    "Age_Group",
    "Gender",
    "Work_Location",
    "Industry",
    "Work_Life_Balance",
    "Weekly_Hours_Worked",
    "Stress_Level",
    "Stress_Factors",
    "Social_Isolation_Frequency",
    "Lack_Of_Team_Connection",
    "Employer_Mental_Health_Support",
    "Mental_Health_Recommendations"
]

In [44]:
# Displaying updated column names
wfh_mentalHealth_surveyData.head()

Unnamed: 0,Timestamp,Age_Group,Gender,Work_Location,Industry,Work_Life_Balance,Weekly_Hours_Worked,Stress_Level,Stress_Factors,Social_Isolation_Frequency,Lack_Of_Team_Connection,Employer_Mental_Health_Support,Mental_Health_Recommendations
0,2025/01/27 12:02:17 PM GMT,25–34,Male,Hybrid,IT/Technology,4,35,2,Isolation from colleagues,Always,Yes,No,Stop remote work and have people come into the...
1,2025/01/27 12:02:39 PM GMT,25–34,Male,Remote,IT/Technology,5,40,2,Lack of clarity about whether I will be forced...,Never,No,Yes,
2,2025/01/27 12:11:36 PM GMT,35–44,Male,Remote,IT/Technology,5,39,1,Other (please specify);Less commute / Less tim...,Sometimes,No,Unsure,Regional hot desk options at some of the many ...
3,2025/01/27 12:14:54 PM GMT,45–54,Male,Remote,IT/Technology,5,39,3,On call over weekends,Sometimes,No,Unsure,Occasional lunchtime meetups if enough people ...
4,2025/01/27 12:25:24 PM GMT,35–44,Female,Remote,IT/Technology,2,9,5,Increased workload,Often,No,Unsure,Add remote workers team lead to identify possi...


In [45]:
# Removing Timestamps columns and only showing ones I deem applicable for analysis:

wfh_mentalHealth_surveyData[
[
    "Age_Group",
    "Gender",
    "Work_Location",
    "Industry",
    "Work_Life_Balance",
    "Weekly_Hours_Worked",
    "Stress_Level",
    "Stress_Factors",
    "Social_Isolation_Frequency",
    "Lack_Of_Team_Connection",
    "Employer_Mental_Health_Support",
    "Mental_Health_Recommendations"
]
]

Unnamed: 0,Age_Group,Gender,Work_Location,Industry,Work_Life_Balance,Weekly_Hours_Worked,Stress_Level,Stress_Factors,Social_Isolation_Frequency,Lack_Of_Team_Connection,Employer_Mental_Health_Support,Mental_Health_Recommendations
0,25–34,Male,Hybrid,IT/Technology,4,35,2,Isolation from colleagues,Always,Yes,No,Stop remote work and have people come into the...
1,25–34,Male,Remote,IT/Technology,5,40,2,Lack of clarity about whether I will be forced...,Never,No,Yes,
2,35–44,Male,Remote,IT/Technology,5,39,1,Other (please specify);Less commute / Less tim...,Sometimes,No,Unsure,Regional hot desk options at some of the many ...
3,45–54,Male,Remote,IT/Technology,5,39,3,On call over weekends,Sometimes,No,Unsure,Occasional lunchtime meetups if enough people ...
4,35–44,Female,Remote,IT/Technology,2,9,5,Increased workload,Often,No,Unsure,Add remote workers team lead to identify possi...
5,35–44,Male,Remote,IT/Technology,3,40,4,Difficulty managing time,Often,No,Yes,More clarity on availability/real world exampl...
6,45–54,Female,Hybrid,IT/Technology,4,40,3,Other (please specify);I feel my stress levels...,Never,No,Yes,I use my camera every day for all meetings - i...
7,45–54,Female,Remote,IT/Technology,4,38,3,Often work through lunch because of meetings,Rarely,No,Yes,It's good to have one or two colleagues you ge...
8,25–34,Male,Remote,IT/Technology,5,40,1,,Never,No,No,
9,45–54,Female,Hybrid,IT/Technology,3,39 hours,3,Isolation from colleagues;Other (please specif...,Sometimes,Yes,Yes,Regular huddles/meetings. If possible onsite m...


In [46]:
# Printing the updated column names and their data types
print(wfh_mentalHealth_surveyData.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45 entries, 0 to 44
Data columns (total 13 columns):
 #   Column                          Non-Null Count  Dtype 
---  ------                          --------------  ----- 
 0   Timestamp                       45 non-null     object
 1   Age_Group                       45 non-null     object
 2   Gender                          45 non-null     object
 3   Work_Location                   45 non-null     object
 4   Industry                        45 non-null     object
 5   Work_Life_Balance               45 non-null     int64 
 6   Weekly_Hours_Worked             45 non-null     object
 7   Stress_Level                    45 non-null     int64 
 8   Stress_Factors                  44 non-null     object
 9   Social_Isolation_Frequency      45 non-null     object
 10  Lack_Of_Team_Connection         45 non-null     object
 11  Employer_Mental_Health_Support  45 non-null     object
 12  Mental_Health_Recommendations   32 non-null     obje

## 2. Handling Missing Data

### 2.1 Checking for missing values in each column

In [49]:
# Checking for missing values in each column
wfh_mentalHealth_surveyData.isnull().sum()

Timestamp                          0
Age_Group                          0
Gender                             0
Work_Location                      0
Industry                           0
Work_Life_Balance                  0
Weekly_Hours_Worked                0
Stress_Level                       0
Stress_Factors                     1
Social_Isolation_Frequency         0
Lack_Of_Team_Connection            0
Employer_Mental_Health_Support     0
Mental_Health_Recommendations     13
dtype: int64

In [50]:
# Displaying updated column names
wfh_mentalHealth_surveyData.head()

Unnamed: 0,Timestamp,Age_Group,Gender,Work_Location,Industry,Work_Life_Balance,Weekly_Hours_Worked,Stress_Level,Stress_Factors,Social_Isolation_Frequency,Lack_Of_Team_Connection,Employer_Mental_Health_Support,Mental_Health_Recommendations
0,2025/01/27 12:02:17 PM GMT,25–34,Male,Hybrid,IT/Technology,4,35,2,Isolation from colleagues,Always,Yes,No,Stop remote work and have people come into the...
1,2025/01/27 12:02:39 PM GMT,25–34,Male,Remote,IT/Technology,5,40,2,Lack of clarity about whether I will be forced...,Never,No,Yes,
2,2025/01/27 12:11:36 PM GMT,35–44,Male,Remote,IT/Technology,5,39,1,Other (please specify);Less commute / Less tim...,Sometimes,No,Unsure,Regional hot desk options at some of the many ...
3,2025/01/27 12:14:54 PM GMT,45–54,Male,Remote,IT/Technology,5,39,3,On call over weekends,Sometimes,No,Unsure,Occasional lunchtime meetups if enough people ...
4,2025/01/27 12:25:24 PM GMT,35–44,Female,Remote,IT/Technology,2,9,5,Increased workload,Often,No,Unsure,Add remote workers team lead to identify possi...


In [51]:
# Writing to excel file for re-checks and reassurances

# wfh_mentalHealth_surveyData .to_csv("../Primary_Research/PR_Dataset/mid-way_WFH-Mental_Health_(Survey).csv", index=False)

### 2.2 Applying A Strategy To Handle The Missing Data

In [53]:
# Filling in "Stress_Factors" column for the worker with no stress with "None really" placeholder to match the other one in dataset so we can conduct analysis in the future if required:
wfh_mentalHealth_surveyData["Stress_Factors"].fillna("None really", inplace=True)

In [54]:
# Filling in "Mental_Health_Recommendations" column for the workers who didn't provide recommendation with "No Response":
wfh_mentalHealth_surveyData["Mental_Health_Recommendations"].fillna("No Response", inplace=True)

In [55]:
# Writing to excel file for re-checks and reassurances

# wfh_mentalHealth_surveyData .to_csv("../Primary_Research/PR_Dataset/mid-way_WFH-Mental_Health_(Survey).csv", index=False)

In [56]:
# Re-checking for missing values in each column
wfh_mentalHealth_surveyData .isnull().sum()

Timestamp                         0
Age_Group                         0
Gender                            0
Work_Location                     0
Industry                          0
Work_Life_Balance                 0
Weekly_Hours_Worked               0
Stress_Level                      0
Stress_Factors                    0
Social_Isolation_Frequency        0
Lack_Of_Team_Connection           0
Employer_Mental_Health_Support    0
Mental_Health_Recommendations     0
dtype: int64

In [57]:
# Displaying updated column names
wfh_mentalHealth_surveyData 

Unnamed: 0,Timestamp,Age_Group,Gender,Work_Location,Industry,Work_Life_Balance,Weekly_Hours_Worked,Stress_Level,Stress_Factors,Social_Isolation_Frequency,Lack_Of_Team_Connection,Employer_Mental_Health_Support,Mental_Health_Recommendations
0,2025/01/27 12:02:17 PM GMT,25–34,Male,Hybrid,IT/Technology,4,35,2,Isolation from colleagues,Always,Yes,No,Stop remote work and have people come into the...
1,2025/01/27 12:02:39 PM GMT,25–34,Male,Remote,IT/Technology,5,40,2,Lack of clarity about whether I will be forced...,Never,No,Yes,No Response
2,2025/01/27 12:11:36 PM GMT,35–44,Male,Remote,IT/Technology,5,39,1,Other (please specify);Less commute / Less tim...,Sometimes,No,Unsure,Regional hot desk options at some of the many ...
3,2025/01/27 12:14:54 PM GMT,45–54,Male,Remote,IT/Technology,5,39,3,On call over weekends,Sometimes,No,Unsure,Occasional lunchtime meetups if enough people ...
4,2025/01/27 12:25:24 PM GMT,35–44,Female,Remote,IT/Technology,2,9,5,Increased workload,Often,No,Unsure,Add remote workers team lead to identify possi...
5,2025/01/27 12:45:05 PM GMT,35–44,Male,Remote,IT/Technology,3,40,4,Difficulty managing time,Often,No,Yes,More clarity on availability/real world exampl...
6,2025/01/27 12:46:45 PM GMT,45–54,Female,Hybrid,IT/Technology,4,40,3,Other (please specify);I feel my stress levels...,Never,No,Yes,I use my camera every day for all meetings - i...
7,2025/01/27 12:52:35 PM GMT,45–54,Female,Remote,IT/Technology,4,38,3,Often work through lunch because of meetings,Rarely,No,Yes,It's good to have one or two colleagues you ge...
8,2025/01/27 12:52:39 PM GMT,25–34,Male,Remote,IT/Technology,5,40,1,None really,Never,No,No,No Response
9,2025/01/27 12:53:21 PM GMT,45–54,Female,Hybrid,IT/Technology,3,39 hours,3,Isolation from colleagues;Other (please specif...,Sometimes,Yes,Yes,Regular huddles/meetings. If possible onsite m...


# 3. Standardizing Data Formats
To ensure statistical compatibility, we must encode to categorical and ordinal data. I'm also ensuring that all of the data is in a consistent format and numeric scales.

### 3.1 Gender (Categorical)
- This column contains "Male", "Female".
- Since gender is categorical but not ordinal, I will use Label Encoding for simplicity and focus on gender that makes sense, 'Male' and 'Female'.

In [60]:
wfh_mentalHealth_surveyData['Gender'] = wfh_mentalHealth_surveyData['Gender'].map({'Male': 0, 'Female': 1})

### 3.2 Standardize Likert Scale Responses (Ordinal)
Since my survey uses a Likert scale (e.g., 1-5 for work-life balance as well as for stress-level), I'm ensuring that all the responses are all numeric and follow a consistent format.

In [62]:
# Standardizing Likert scale responses for work-life balance
wfh_mentalHealth_surveyData['Work_Life_Balance'] = pd.to_numeric(wfh_mentalHealth_surveyData['Work_Life_Balance'], errors='coerce')

In [63]:
wfh_mentalHealth_surveyData['Stress_Level'] = pd.to_numeric(wfh_mentalHealth_surveyData['Work_Life_Balance'], errors='coerce')

In [64]:
# Displaying Dataframe
wfh_mentalHealth_surveyData.head()

Unnamed: 0,Timestamp,Age_Group,Gender,Work_Location,Industry,Work_Life_Balance,Weekly_Hours_Worked,Stress_Level,Stress_Factors,Social_Isolation_Frequency,Lack_Of_Team_Connection,Employer_Mental_Health_Support,Mental_Health_Recommendations
0,2025/01/27 12:02:17 PM GMT,25–34,0,Hybrid,IT/Technology,4,35,4,Isolation from colleagues,Always,Yes,No,Stop remote work and have people come into the...
1,2025/01/27 12:02:39 PM GMT,25–34,0,Remote,IT/Technology,5,40,5,Lack of clarity about whether I will be forced...,Never,No,Yes,No Response
2,2025/01/27 12:11:36 PM GMT,35–44,0,Remote,IT/Technology,5,39,5,Other (please specify);Less commute / Less tim...,Sometimes,No,Unsure,Regional hot desk options at some of the many ...
3,2025/01/27 12:14:54 PM GMT,45–54,0,Remote,IT/Technology,5,39,5,On call over weekends,Sometimes,No,Unsure,Occasional lunchtime meetups if enough people ...
4,2025/01/27 12:25:24 PM GMT,35–44,1,Remote,IT/Technology,2,9,2,Increased workload,Often,No,Unsure,Add remote workers team lead to identify possi...


### 3.3 Encoding Multi-Class Categories

For categorical columns like Industry, and Region, we use One-Hot Encoding.

In [66]:
# wfh_mentalHealth_surveyData = pd.get_dummies(wfh_mentalHealth_surveyData , columns=['Industry'], drop_first=True)

In [67]:
# Displaying Dataframe
wfh_mentalHealth_surveyData.head(10)

Unnamed: 0,Timestamp,Age_Group,Gender,Work_Location,Industry,Work_Life_Balance,Weekly_Hours_Worked,Stress_Level,Stress_Factors,Social_Isolation_Frequency,Lack_Of_Team_Connection,Employer_Mental_Health_Support,Mental_Health_Recommendations
0,2025/01/27 12:02:17 PM GMT,25–34,0,Hybrid,IT/Technology,4,35,4,Isolation from colleagues,Always,Yes,No,Stop remote work and have people come into the...
1,2025/01/27 12:02:39 PM GMT,25–34,0,Remote,IT/Technology,5,40,5,Lack of clarity about whether I will be forced...,Never,No,Yes,No Response
2,2025/01/27 12:11:36 PM GMT,35–44,0,Remote,IT/Technology,5,39,5,Other (please specify);Less commute / Less tim...,Sometimes,No,Unsure,Regional hot desk options at some of the many ...
3,2025/01/27 12:14:54 PM GMT,45–54,0,Remote,IT/Technology,5,39,5,On call over weekends,Sometimes,No,Unsure,Occasional lunchtime meetups if enough people ...
4,2025/01/27 12:25:24 PM GMT,35–44,1,Remote,IT/Technology,2,9,2,Increased workload,Often,No,Unsure,Add remote workers team lead to identify possi...
5,2025/01/27 12:45:05 PM GMT,35–44,0,Remote,IT/Technology,3,40,3,Difficulty managing time,Often,No,Yes,More clarity on availability/real world exampl...
6,2025/01/27 12:46:45 PM GMT,45–54,1,Hybrid,IT/Technology,4,40,4,Other (please specify);I feel my stress levels...,Never,No,Yes,I use my camera every day for all meetings - i...
7,2025/01/27 12:52:35 PM GMT,45–54,1,Remote,IT/Technology,4,38,4,Often work through lunch because of meetings,Rarely,No,Yes,It's good to have one or two colleagues you ge...
8,2025/01/27 12:52:39 PM GMT,25–34,0,Remote,IT/Technology,5,40,5,None really,Never,No,No,No Response
9,2025/01/27 12:53:21 PM GMT,45–54,1,Hybrid,IT/Technology,3,39 hours,3,Isolation from colleagues;Other (please specif...,Sometimes,Yes,Yes,Regular huddles/meetings. If possible onsite m...


- For ordinal categories like Work_Location, Social_Isolation_Frequency and Employer_Mental_Health_Support columns, we map them to numerical values:

In [69]:
wfh_mentalHealth_surveyData['Work_Location'] = wfh_mentalHealth_surveyData['Work_Location'].map({'Remote': 1, 'Hybrid': 2, 'On-site': 3})

In [70]:
wfh_mentalHealth_surveyData['Social_Isolation_Frequency'] = wfh_mentalHealth_surveyData['Social_Isolation_Frequency'].map({'Always': 1, 'Never': 2, 'Sometimes': 3, 'Often': 4})

In [71]:
wfh_mentalHealth_surveyData['Employer_Mental_Health_Support'] = wfh_mentalHealth_surveyData['Employer_Mental_Health_Support'].map({'Yes': 1, 'No': 2, 'Unsure': 3})

In [72]:
# Displaying Dataframe
wfh_mentalHealth_surveyData.head(10)

Unnamed: 0,Timestamp,Age_Group,Gender,Work_Location,Industry,Work_Life_Balance,Weekly_Hours_Worked,Stress_Level,Stress_Factors,Social_Isolation_Frequency,Lack_Of_Team_Connection,Employer_Mental_Health_Support,Mental_Health_Recommendations
0,2025/01/27 12:02:17 PM GMT,25–34,0,,IT/Technology,4,35,4,Isolation from colleagues,1.0,Yes,2,Stop remote work and have people come into the...
1,2025/01/27 12:02:39 PM GMT,25–34,0,1.0,IT/Technology,5,40,5,Lack of clarity about whether I will be forced...,2.0,No,1,No Response
2,2025/01/27 12:11:36 PM GMT,35–44,0,1.0,IT/Technology,5,39,5,Other (please specify);Less commute / Less tim...,3.0,No,3,Regional hot desk options at some of the many ...
3,2025/01/27 12:14:54 PM GMT,45–54,0,1.0,IT/Technology,5,39,5,On call over weekends,3.0,No,3,Occasional lunchtime meetups if enough people ...
4,2025/01/27 12:25:24 PM GMT,35–44,1,1.0,IT/Technology,2,9,2,Increased workload,4.0,No,3,Add remote workers team lead to identify possi...
5,2025/01/27 12:45:05 PM GMT,35–44,0,1.0,IT/Technology,3,40,3,Difficulty managing time,4.0,No,1,More clarity on availability/real world exampl...
6,2025/01/27 12:46:45 PM GMT,45–54,1,2.0,IT/Technology,4,40,4,Other (please specify);I feel my stress levels...,2.0,No,1,I use my camera every day for all meetings - i...
7,2025/01/27 12:52:35 PM GMT,45–54,1,1.0,IT/Technology,4,38,4,Often work through lunch because of meetings,,No,1,It's good to have one or two colleagues you ge...
8,2025/01/27 12:52:39 PM GMT,25–34,0,1.0,IT/Technology,5,40,5,None really,2.0,No,2,No Response
9,2025/01/27 12:53:21 PM GMT,45–54,1,2.0,IT/Technology,3,39 hours,3,Isolation from colleagues;Other (please specif...,3.0,Yes,1,Regular huddles/meetings. If possible onsite m...


In [73]:
# Validating number of columns ,post encoding above:
print(wfh_mentalHealth_surveyData.shape)

(45, 13)


### 3.3 Converting Categorical Responses to Numeric
For categorical responses (e.g., "Yes"/"No"), you should convert them into numeric values for easier analysis.

In [75]:
# Converts "Yes"/"No" answers to 1/0 for "Lack_Of_Team_Connection"column:
wfh_mentalHealth_surveyData['Lack_Of_Team_Connection'] = wfh_mentalHealth_surveyData['Lack_Of_Team_Connection'].map({'Yes': 1, 'No': 0})

In [76]:
wfh_mentalHealth_surveyData

Unnamed: 0,Timestamp,Age_Group,Gender,Work_Location,Industry,Work_Life_Balance,Weekly_Hours_Worked,Stress_Level,Stress_Factors,Social_Isolation_Frequency,Lack_Of_Team_Connection,Employer_Mental_Health_Support,Mental_Health_Recommendations
0,2025/01/27 12:02:17 PM GMT,25–34,0,,IT/Technology,4,35,4,Isolation from colleagues,1.0,1.0,2,Stop remote work and have people come into the...
1,2025/01/27 12:02:39 PM GMT,25–34,0,1.0,IT/Technology,5,40,5,Lack of clarity about whether I will be forced...,2.0,0.0,1,No Response
2,2025/01/27 12:11:36 PM GMT,35–44,0,1.0,IT/Technology,5,39,5,Other (please specify);Less commute / Less tim...,3.0,0.0,3,Regional hot desk options at some of the many ...
3,2025/01/27 12:14:54 PM GMT,45–54,0,1.0,IT/Technology,5,39,5,On call over weekends,3.0,0.0,3,Occasional lunchtime meetups if enough people ...
4,2025/01/27 12:25:24 PM GMT,35–44,1,1.0,IT/Technology,2,9,2,Increased workload,4.0,0.0,3,Add remote workers team lead to identify possi...
5,2025/01/27 12:45:05 PM GMT,35–44,0,1.0,IT/Technology,3,40,3,Difficulty managing time,4.0,0.0,1,More clarity on availability/real world exampl...
6,2025/01/27 12:46:45 PM GMT,45–54,1,2.0,IT/Technology,4,40,4,Other (please specify);I feel my stress levels...,2.0,0.0,1,I use my camera every day for all meetings - i...
7,2025/01/27 12:52:35 PM GMT,45–54,1,1.0,IT/Technology,4,38,4,Often work through lunch because of meetings,,0.0,1,It's good to have one or two colleagues you ge...
8,2025/01/27 12:52:39 PM GMT,25–34,0,1.0,IT/Technology,5,40,5,None really,2.0,0.0,2,No Response
9,2025/01/27 12:53:21 PM GMT,45–54,1,2.0,IT/Technology,3,39 hours,3,Isolation from colleagues;Other (please specif...,3.0,1.0,1,Regular huddles/meetings. If possible onsite m...


In [77]:
# Checking again for missing values after dropping rows
wfh_mentalHealth_surveyData.isnull().sum()

Timestamp                          0
Age_Group                          0
Gender                             0
Work_Location                      2
Industry                           0
Work_Life_Balance                  0
Weekly_Hours_Worked                0
Stress_Level                       0
Stress_Factors                     0
Social_Isolation_Frequency        11
Lack_Of_Team_Connection            1
Employer_Mental_Health_Support     0
Mental_Health_Recommendations      0
dtype: int64

### 4.2 Mental_Health_Condition (Categorical)
- This column contains "Anxiety", "Depression", "None", etc.
- Since this is also nominal (not ordered), we can use One-Hot Encoding.

One-Hot Encoding Mental Health Condition:

In [79]:
# wfh_mentalHealth_surveyData = pd.get_dummies(wfh_mentalHealth_surveyData, columns=['Mental_Health_Condition'], prefix='MHR', drop_first=True)

In [80]:
# Creating a new "stress score" column based on existing data
wfh_mentalHealth_surveyData['stress_score'] = wfh_mentalHealth_surveyData['Work_Life_Balance'] - wfh_mentalHealth_surveyData['Social_Isolation_Frequency']

In [81]:
wfh_cleanedMentalHealth_surveyData = wfh_mentalHealth_surveyData

In [82]:
wfh_cleanedMentalHealth_surveyData

Unnamed: 0,Timestamp,Age_Group,Gender,Work_Location,Industry,Work_Life_Balance,Weekly_Hours_Worked,Stress_Level,Stress_Factors,Social_Isolation_Frequency,Lack_Of_Team_Connection,Employer_Mental_Health_Support,Mental_Health_Recommendations,stress_score
0,2025/01/27 12:02:17 PM GMT,25–34,0,,IT/Technology,4,35,4,Isolation from colleagues,1.0,1.0,2,Stop remote work and have people come into the...,3.0
1,2025/01/27 12:02:39 PM GMT,25–34,0,1.0,IT/Technology,5,40,5,Lack of clarity about whether I will be forced...,2.0,0.0,1,No Response,3.0
2,2025/01/27 12:11:36 PM GMT,35–44,0,1.0,IT/Technology,5,39,5,Other (please specify);Less commute / Less tim...,3.0,0.0,3,Regional hot desk options at some of the many ...,2.0
3,2025/01/27 12:14:54 PM GMT,45–54,0,1.0,IT/Technology,5,39,5,On call over weekends,3.0,0.0,3,Occasional lunchtime meetups if enough people ...,2.0
4,2025/01/27 12:25:24 PM GMT,35–44,1,1.0,IT/Technology,2,9,2,Increased workload,4.0,0.0,3,Add remote workers team lead to identify possi...,-2.0
5,2025/01/27 12:45:05 PM GMT,35–44,0,1.0,IT/Technology,3,40,3,Difficulty managing time,4.0,0.0,1,More clarity on availability/real world exampl...,-1.0
6,2025/01/27 12:46:45 PM GMT,45–54,1,2.0,IT/Technology,4,40,4,Other (please specify);I feel my stress levels...,2.0,0.0,1,I use my camera every day for all meetings - i...,2.0
7,2025/01/27 12:52:35 PM GMT,45–54,1,1.0,IT/Technology,4,38,4,Often work through lunch because of meetings,,0.0,1,It's good to have one or two colleagues you ge...,
8,2025/01/27 12:52:39 PM GMT,25–34,0,1.0,IT/Technology,5,40,5,None really,2.0,0.0,2,No Response,3.0
9,2025/01/27 12:53:21 PM GMT,45–54,1,2.0,IT/Technology,3,39 hours,3,Isolation from colleagues;Other (please specif...,3.0,1.0,1,Regular huddles/meetings. If possible onsite m...,0.0


# 7.  Saving Cleaned Dataset
Saving pre-processed dataset, which is ready to be used for analysis:

In [84]:
wfh_cleanedMentalHealth_surveyData.to_csv("../Primary_Research/PR_Dataset/cleaned_WFH-Mental_Health_(Survey).csv", index=False)