# Final Project 

**General criteria:** 

- Ensure whatever you commit in this notebook be your own works (It means you **understand them clearly**)
- Before commit: `Restart Kernel & Run all cells`


## 1. Collect data

### 1.1 Overview
**Dataset Source:**
[Remote Work & Mental Health Dataset on Kaggle](https://www.kaggle.com/datasets/waqi786/remote-work-and-mental-health?fbclid=IwY2xjawFrSKZleHRuA2FlbQIxMAABHeONLRrPuU1AbC_pHea8QyWNYYMqW-t0Tw_xZtxvDGbldU1ypWS0-AzKKg_aem_jAr-PidmQqlOTGowulNZBA)

🌿 This dataset examines the impact of remote work on employees' mental well-being. It focuses on how different work arrangements may influence stress levels, work-life balance, and mental health conditions. 

📈 Our analysis aims to:
 - Address key questions regarding the increasing prevalence of remote work and its effects across industries and regions, 
 - Offer actionable insights for researchers, HR professionals, and businesses to evaluate its impact on productivity and employee well-being.
 - Highlight the importance of prioritizing mental health in organizational policies and practices.

### 1.2 License
This dataset has `Apache 2.0` license. As a result, we can do anything with this dataset

### 1.3 How the data are collected

#### Sources

The dataset draws from a variety of sources to ensure a comprehensive understanding of the relationship between remote work and mental health. It incorporates responses from surveys distributed across social media platforms, professional networks, and online forums. These diverse channels helped capture a wide range of experiences and perspectives, enriching the dataset with valuable insights.

#### Collection Methodology

Data collection involved designing a structured questionnaire that included both quantitative and qualitative questions. The survey was distributed digitally, allowing respondents to share their experiences regarding remote work's impact on their mental well-being. Responses were anonymized to maintain privacy, ensuring participants felt comfortable providing honest feedback. This methodology enables a thorough analysis of the mental health challenges and benefits associated with remote work environments.

## 2. Data exploration

## 3. Ask questions and answer them

### 3.1 Question 1
@haichaukhuu

**Question:** \
Do employees with access to mental health resources report lower stress levels and higher job satisfaction compared to those without access?

**What are the benefits of finding the answer?**  
Understanding the link between access to mental health resources, stress levels, and job satisfaction demonstrates the importance of such resources in improving employee well-being. It may provide motivation for companies and organizations to invest in mental health support programs to boost productivity and employee satisfaction.

### 3.2 Question 2
@pigpig1524
- **Question**: Does the frequency of virtual meetings correlate with increased stress levels, and what is the optimal meeting frequency for minimizing stress while maintaining productivity?
- **The benefits of finding the answer**: Provides actionable recommendations to balance virtual meeting schedules and employee well-being.

#### Data preprocessing

In [3]:
import pandas as pd
df = pd.read_csv('data/Impact_of_Remote_Work_on_Mental_Health.csv')
df.columns

Index(['Employee_ID', 'Age', 'Gender', 'Job_Role', 'Industry',
       'Years_of_Experience', 'Work_Location', 'Hours_Worked_Per_Week',
       'Number_of_Virtual_Meetings', 'Work_Life_Balance_Rating',
       'Stress_Level', 'Mental_Health_Condition',
       'Access_to_Mental_Health_Resources', 'Productivity_Change',
       'Social_Isolation_Rating', 'Satisfaction_with_Remote_Work',
       'Company_Support_for_Remote_Work', 'Physical_Activity', 'Sleep_Quality',
       'Region'],
      dtype='object')

In [11]:
# temp = temp[temp.Work_Location == 'Remote']
# temp = df.loc[df.Work_Location == 'Remote', ['Number_of_Virtual_Meetings', 'Satisfaction_with_Remote_Work']]
temp = df.loc[df.Work_Location == 'Remote', ['Company_Support_for_Remote_Work', 'Satisfaction_with_Remote_Work']]

# # new_rating = temp['Stress_Level']
temp.loc[temp.Satisfaction_with_Remote_Work == 'Unsatisfied', ['Satisfaction_with_Remote_Work']] = -1
temp.loc[temp.Satisfaction_with_Remote_Work == 'Neutral', ['Satisfaction_with_Remote_Work']] = 0
temp.loc[temp.Satisfaction_with_Remote_Work == 'Satisfied', ['Satisfaction_with_Remote_Work']] = 1
# temp.Stress_level = temp['Stress_Level'].astype('int64')
temp = temp.convert_dtypes()
# temp.info()
# temp

temp.corr()

Unnamed: 0,Company_Support_for_Remote_Work,Satisfaction_with_Remote_Work
Company_Support_for_Remote_Work,1.0,0.033752
Satisfaction_with_Remote_Work,0.033752,1.0


In [59]:
x = temp['Number_of_Virtual_Meetings'] - temp['Number_of_Virtual_Meetings'].mean()
x_var = temp['Number_of_Virtual_Meetings'].var()
y = temp['Stress_Level'] - temp['Stress_Level'].mean()
y_var = temp['Stress_Level'].var()

cov = (x*y).sum() / (len(temp) - 1)

cov / (x_var * y_var)**0.5


-0.0016872815192738116

### 3.3 Question 3
@monster1909
- Question:  Is there a difference in stress levels and sleep quality across different industries?  
- What are benefits of finding the answer?
how stress levels and sleep quality vary across industries, providing insights into which sectors may require targeted interventions to improve employee well-being and productivity.