<a href="https://colab.research.google.com/github/olumideadekunle/Data-Sharing-among-Business/blob/main/Capstone_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Proactive Insights for Employee Well-being: Predicting Burnout with Machine Learning

In this rapidly paced world, we all seem to have a lot to keep up with and even more to do with more multitasking, task-switching and decline in mental health. In this light, World Mental Health Day was established and is celebrated annually on October 10th to raise awareness about mental health issues and mobilize support worldwide.

Having moved around a bit in the data space in the past months, you work as a data scientist at NeuroWell Analytics. Last week Friday in the usual weekly stand-ups the Human Resources Manager raised the issue of **employee burnout** as one of the top-most challenges highlighted often by companies NeuroWell Analytics clientele. Upon this discovery, your boss turns towards you and assigns to you the task of addressing this important issue. He decided it would be best to build a solution predicting employee burnout rates using historical data to proactively address mental health concerns.


## Overview - a bit about the company

NeuroWell Analytics is a global leader in workplace well-being and productivity solutions. Since 2015, the organization has specialized in leveraging data science and machine learning to enhance employee engagement and mental health. With a team of psychologists, data scientists, and HR specialists, NeuroWell Analytics partners with companies across various industries to create a more resilient and thriving workforce.

![NeuroWell](https://drive.google.com/uc?export=view&id=1LbtLJQjK-UKdbqHaUF4lNbUmqFsjdSWH)


The organization’s mission is to empower companies to take proactive steps in addressing mental health challenges by providing actionable insights derived from data. NeuroWell Analytics combines cutting-edge technology with evidence-based research to deliver comprehensive solutions, including predictive analytics, well-being assessments, and customized intervention strategies.

Employee burnout is a growing concern globally, impacting productivity, morale, and overall organizational health. Using data provided by your company, you will analyze employee profiles and develop a machine learning model to predict burnout rates based on various factors such as work environment, resource allocation, and mental fatigue scores. Also, you will develop actionable insights to help your company mitigate burnout and foster a healthier work environment.

## Objective

The primary objectives of this project are:

- Data Understanding and Exploration: Analyze the dataset to identify patterns, trends, and correlations related to employee burnout.
- Data Preprocessing: Handle missing values, encode categorical data, and normalize numerical features.
- Feature Engineering: Derive meaningful features that improve the performance of your machine learning model.
- Model Development: Build and evaluate predictive models to estimate employee burnout rates.
- Insights and Recommendations: Provide actionable recommendations to reduce burnout based on model outcomes.

## About the data

| Column Name           | Description                                             |
|-----------------------|---------------------------------------------------------|
| Employee ID           | Unique ID of the employee                               |
| Date of Joining       | Date on which the employee joined the company           |
| Gender                | Gender of the employee                                  |
| Company Type          | Type of company e.g., Service-based, Product-based      |
| WFH Setup Available   | Whether proper work-from-home setup is available or not |
| Designation           | Seniority level of the employee in codes               |
| Resource Allocation   | Hours allocated per day                                 |
| Mental Fatigue Score  | Stress rating provided by employees                     |
| Burn Rate             | Rate of saturation or burnout rate [Target]            |


**You would find the dataset in the project folder named as: "[NeuroWell](https://drive.google.com/drive/folders/1OVKUNsOYFbczju7z836PrDnS5kL0WeAd?usp=drive_link)." The folder is further made up of the train.csv and test.csv datasets for this problem. Use them accordingly.**

## Tasks

**Phase 1: Understanding the Problem**

- Research and write a brief summary about employee burnout and its organizational impacts.
- Discuss the importance of using data to predict and prevent burnout.

**Phase 2: Exploratory Data Analysis (EDA)**

- Load the dataset and inspect its structure.
- Generate descriptive statistics for all variables.
- Visualize relationships between variables (e.g., scatter plots, histograms, box plots).
- Identify missing values and propose strategies to handle them.

**Phase 3: Data Preprocessing**

- Encode categorical variables like Gender and Company Type.
- Handle missing or inconsistent data in Mental Fatigue Score and Resource Allocation.
- Normalize numerical variables like Resource Allocation and Designation.
- Create new features (e.g., tenure derived from Date of Joining).

**Phase 4: Model Development**

- Use the training and testing sets as required.
- Experiment with multiple algorithms, such as Linear Regression, Random Forest, and Gradient Boosting.
- Evaluate model performance using metrics like RMSE, MAE, and R-squared.
- Optimize the best-performing model using hyperparameter tuning.

**Phase 5: Insights, Recommendations and solution deployment**

- Analyze the importance of features using tools like feature importance scores or SHAP values.
- Write a report summarizing key findings and predictions.
- Suggest actionable recommendations for the organization based on your insights.

**Reflection Questions:**

- What challenges did you face while handling missing data? How did you resolve them?
- Which machine learning algorithm performed the best, and why do you think it outperformed others?
- If you had access to additional data, what would you like to include, and how might it improve your model?

## Deliverables

- Exploratory Data Analysis (EDA) notebook with visualizations and data cleaning steps. (2 weeks)
- An organized Jupyter Notebook detailing necessary project phases (2 weeks)
- Interactive Streamlit App hosted on Streamlit Cloud contains insights, visualizations, and an interactive prediction tool (2 weeks)
- A final report summarizing findings, model performance, and recommendations. (2 week)

**Timeline - 8 weeks**