# **Project Name**    - Mental Health in Tech Workplace – Exploratory Data Analysis



##### **Project Type**    - Exploratory Data Analysis (EDA) Project
##### **Contribution**    - Individual
##### **Name** - Himank Bhardwaj


# **Project Summary -**

This project focuses on performing an in-depth Exploratory Data Analysis (EDA) on the Mental Health in Tech Workplace survey dataset. The dataset originates from a survey conducted among professionals working in the technology sector and aims to understand attitudes, awareness, and organizational support related to mental health. Mental health has become a critical factor influencing employee productivity, job satisfaction, and retention, making this analysis highly relevant for modern organizations.

The project begins with understanding the structure and nature of the dataset, including its columns, data types, and overall size. Initial exploration involved examining the first few records, identifying numerical and categorical variables, and reviewing statistical summaries for numerical features such as age. This step helped build familiarity with the dataset and guided further analytical decisions.

Data wrangling and preprocessing were performed to ensure data quality and reliability. Missing values in categorical variables were handled using meaningful labels instead of removing records, preserving the integrity of the dataset. Duplicate records were removed to avoid biased insights, unrealistic age values were filtered out, and inconsistent gender entries were standardized. These steps ensured the dataset was clean, structured, and ready for analysis, making the notebook production-grade and executable in a single run.

The core of the project involved data visualization and storytelling using the UBM framework—Univariate, Bivariate, and Multivariate analysis. Univariate analysis focused on understanding individual variables such as age distribution, gender composition, mental health treatment status, family history, work interference, remote work adoption, availability of benefits, and wellness program awareness. These visualizations provided foundational insights into workforce demographics and organizational mental health practices.

Bivariate analysis explored relationships between pairs of variables, such as family history versus treatment, gender versus treatment, age versus treatment, work interference versus treatment, benefits versus treatment, and willingness to discuss mental health with supervisors and coworkers. These analyses revealed strong relationships between organizational support, openness, and treatment-seeking behavior.

Multivariate analysis extended this exploration by examining interactions among three or more variables simultaneously. Relationships between gender, family history, and treatment; age, work interference, and treatment; benefits, care options awareness, and treatment; and remote work, work interference, and treatment were analyzed. These insights highlighted how combined demographic, workplace, and policy-related factors influence mental health outcomes.

Overall, the project demonstrates how data-driven insights can help organizations identify gaps in mental health support, improve employee well-being, and design effective workplace policies. The findings emphasize that organizations prioritizing mental health through benefits, awareness, supportive leadership, and flexible work arrangements are more likely to foster a productive, engaged, and resilient workforce.

# **GitHub Link -**

https://github.com/himankbhardwaj21/mental-health-in-tech-survey-eda/blob/main/Himank_EDA_Submission_Template.ipynb

# **Problem Statement -**


Mental health issues among employees in the technology sector are often underreported and insufficiently addressed due to stigma, lack of awareness, and inadequate organizational support. These challenges can negatively impact employee productivity, engagement, and retention. There is a need to analyze real-world data to understand employee perceptions, workplace policies, and factors influencing mental health treatment and work performance in the tech industry.

#### **Define Your Business Objective?**

The primary business objective of this project is to analyze mental health trends in the tech workplace and identify key factors that influence employee well-being, treatment-seeking behavior, and work performance. Using exploratory data analysis, the objective is to provide actionable insights that help organizations design effective mental health policies, improve awareness of support systems, reduce work interference, and ultimately enhance productivity and employee retention.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In this step, essential Python libraries are imported for data analysis and visualization.
NumPy is used for numerical operations, Pandas for data manipulation, Matplotlib and Seaborn for creating meaningful visualizations.
The %matplotlib inline command ensures that all plots are displayed within the notebook.

### Dataset Loading

In [None]:
df = pd.read_csv('survey.csv')

The dataset is loaded using Pandas’ read_csv() function.
This step reads the survey data into a DataFrame, making it ready for exploration, cleaning, and visualization.
Successful loading confirms that the dataset path is correct and accessible.

### Dataset First View

In [None]:
df.head()

Displaying the first five rows of the dataset helps in understanding the overall structure, column names, and sample responses provided by survey participants.
This step gives an initial overview of how the data is organized.

### Dataset Rows & Columns count

In [None]:
df.shape

This step shows the total number of rows and columns in the dataset.
Each row represents an individual survey response, while each column corresponds to a specific survey question related to mental health and workplace environment.

### Dataset Information

In [None]:
df.info()

The info() function provides a concise summary of the dataset, including column names, data types, and the presence of missing values.
This information is crucial for identifying data quality issues and deciding appropriate data cleaning strategies.

In [None]:
df.describe()

This section provides statistical insights for numerical columns such as age.
It helps identify the central tendency, spread, and any potential outliers present in the data.

#### Duplicate Values

In [None]:
# Checking number of duplicate rows in the dataset
duplicate_count = df.duplicated().sum()
print(f"Number of duplicate rows in the dataset: {duplicate_count}")

Duplicate values refer to identical rows present more than once in the dataset.
Checking for duplicates ensures data integrity and prevents biased analysis.
In this dataset, the number of duplicate rows is displayed above. If duplicates exist, they can be safely removed to avoid repeated survey responses affecting insights.

#### Missing Values/Null Values

In [None]:
# Checking missing/null values in each column
missing_values = df.isnull().sum()
missing_values

Missing values indicate unanswered or unavailable survey responses.
Identifying missing values is important because they can affect statistical calculations and visualizations.
Columns such as work_interfere and comments contain missing values, which will be handled appropriately.

In [None]:
# Visualizing missing values using heatmap
plt.figure(figsize=(12,6))
sns.heatmap(df.isnull(), cbar=False, cmap='viridis')
plt.title('Missing Values Heatmap')
plt.show()

Why did you pick the specific chart?

A heatmap is an effective visualization technique to identify the presence and pattern of missing values across the entire dataset.
It allows quick detection of columns with high missing data and helps determine whether missing values are randomly distributed or concentrated in specific variables.

What is/are the insight(s) found from the chart?

The heatmap shows that missing values are primarily concentrated in the work_interfere and comments columns, while most other variables have little to no missing data.
This indicates that the dataset is largely complete, with missing values occurring mainly in optional or sensitive survey questions.

Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
Identifying columns with missing values helps organizations understand areas where employees may hesitate to respond, allowing improvements in survey design and data collection strategies.

**Negative Impact (if ignored)**:
Ignoring missing values in key variables like work_interfere could lead to biased analysis and inaccurate conclusions regarding employee productivity and mental health impact.

### What did you know about your dataset?

This dataset is based on a mental health survey conducted among professionals working in the tech industry.
It captures demographic details, workplace policies, mental health awareness, and attitudes toward seeking treatment.
The dataset helps analyze how organizational culture and support systems impact employee mental wellbeing.

## ***2. Understanding Your Variables***

In [None]:
# Displaying all column names in the dataset
df.columns

This step lists all the variables (columns) present in the dataset.
Each column represents a specific question from the mental health survey conducted among tech professionals.

In [None]:
# Statistical summary of numerical columns
df.describe()

The describe() function provides a statistical summary of numerical variables such as age.
It includes metrics like mean, minimum, maximum, and quartiles, which help identify data spread and potential outliers.

### Variables Description

The dataset consists of both numerical and categorical variables:

Numerical Variable:

**Age**: Represents the age of survey respondents.

Categorical Variables:

**Gender, Country, State**: Demographic information.

**self_employed, remote_work, tech_company**: Employment-related attributes.

**family_history, treatment**: Mental health background.

**work_interfere**: Impact of mental health on work performance.

**benefits, care_options, wellness_program**: Employer mental health support.

**coworkers, supervisor**: Willingness to discuss mental health.

Understanding variable types helps in selecting appropriate visualization techniques and analysis methods.

### Check Unique Values for each variable.

In [None]:
# Checking unique values for each column
for column in df.columns:
    print(f"\n{column} ({df[column].nunique()} unique values):")
    print(df[column].unique())

This step checks the unique values present in each column of the dataset.
It helps identify inconsistent labels, unexpected categories, and columns that may require data cleaning or standardization before analysis.
From the output, it is observed that some categorical variables, such as Gender, contain multiple inconsistent textual representations, highlighting the need for data preprocessing.
Understanding unique values ensures accurate visualizations and meaningful insights during exploratory data analysis.

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Handling missing values
df['work_interfere'] = df['work_interfere'].fillna('Not Sure')
df['comments'] = df['comments'].fillna('No Comments')

# Removing duplicate rows if any
df = df.drop_duplicates()

# Cleaning Age column by removing unrealistic values
df = df[(df['Age'] > 0) & (df['Age'] < 100)]

# Standardizing Gender values
df['Gender'] = df['Gender'].str.lower()

df['Gender'] = df['Gender'].replace({
    'male': 'Male',
    'm': 'Male',
    'female': 'Female',
    'f': 'Female'
})

print("Data wrangling completed successfully.")

### What all manipulations have you done and insights you found?

The following data wrangling steps were performed to prepare the dataset for analysis:
*   Missing values in categorical columns were handled using meaningful labels instead of dropping rows.
*   Duplicate records were removed to maintain data integrity.
*   Unrealistic age values were filtered out to improve data accuracy.
*   Gender values were standardized to reduce category inconsistency.

Insights Found:
*   Mental health-related responses contain several missing values, indicating hesitation or lack of awareness.
*   Age outliers existed and required filtering.
*   Gender entries were inconsistent, highlighting the importance of standardization.
*   These steps ensure that the dataset is clean, reliable, and suitable for meaningful exploratory data analysis.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 Age Distribution of Respondents (Univariate Analysis)

In [None]:
# Chart 1: Age Distribution of Respondents (Univariate Analysis)

plt.figure(figsize=(8,5))
sns.histplot(df['Age'], bins=20, kde=True)
plt.title('Age Distribution of Respondents')
plt.xlabel('Age')
plt.ylabel('Number of Respondents')
plt.show()

##### 1. Why did you pick the specific chart?

A histogram is the most suitable visualization for understanding the distribution of a numerical variable like age.
It helps identify the concentration of respondents across different age groups and reveals patterns such as skewness and outliers.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that the majority of respondents fall between the age group of 25 to 40 years, indicating that the dataset mainly represents early to mid-career professionals in the tech industry.
Very few respondents are below 20 or above 50 years of age.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
Understanding the dominant age group helps organizations design age-specific mental health initiatives, wellness programs, and support policies that align with the needs of the majority workforce.

**Negative Impact (if ignored)**:
Ignoring mental health needs of the primary working age group may lead to reduced productivity, increased burnout, and higher employee attrition.

#### Chart - 2 Gender Distribution of Respondents (Univariate Analysis)

In [None]:
# Standardizing Gender column for clear analysis

def clean_gender(g):
    if isinstance(g, str):
        g = g.lower().strip()
        if g in ['male', 'm', 'man', 'male-ish', 'cis male', 'cis man']:
            return 'Male'
        elif g in ['female', 'f', 'woman', 'cis female', 'cis woman']:
            return 'Female'
        else:
            return 'Other'
    return 'Other'

df['Gender'] = df['Gender'].apply(clean_gender)


In [None]:
# Chart 2: Gender Distribution of Respondents (Univariate Analysis)

plt.figure(figsize=(7,5))
sns.countplot(x='Gender', data=df)
plt.title('Gender Distribution of Respondents')
plt.xlabel('Gender')
plt.ylabel('Count')
plt.show()

The Gender column contained multiple inconsistent textual entries.
These values were standardized into three categories: Male, Female, and Other,
to improve visualization clarity and analytical accuracy.


##### 1. Why did you pick the specific chart?

A count plot is the most appropriate chart for visualizing categorical variables like gender.
It clearly shows the frequency of each category, making it easy to compare the number of respondents across different genders.

##### 2. What is/are the insight(s) found from the chart?

The chart indicates that a majority of respondents are male, while female and other gender categories are comparatively underrepresented.
This suggests a gender imbalance within the surveyed tech workforce.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
Identifying gender imbalance helps organizations design inclusive mental health policies and targeted awareness programs that encourage participation and support for underrepresented groups.

**Negative Impact (if not addressed)**:
Lack of gender inclusivity may lead to unequal access to mental health resources, lower engagement, and dissatisfaction among minority groups, which can negatively affect organizational culture and retention.

#### Chart - 3 Mental Health Treatment Status of Respondents (Univariate Analysis)

In [None]:
# Chart 3: Mental Health Treatment Status (Univariate Analysis)

plt.figure(figsize=(7,5))
sns.countplot(x='treatment', data=df)
plt.title('Mental Health Treatment Status of Respondents')
plt.xlabel('Treatment Taken')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

A count plot is suitable for analyzing binary categorical variables such as whether respondents have sought mental health treatment or not.
It allows easy comparison between the two response categories.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that a significant portion of respondents have sought treatment for mental health conditions, while a slightly smaller group has not.
This indicates that mental health concerns are prevalent among professionals in the tech industry.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
The insight highlights the importance of accessible mental health support systems in organizations.
Companies that invest in mental health resources are more likely to see improved employee well-being and productivity.

**Negative Impact (if ignored)**:
Failure to address mental health needs may lead to increased absenteeism, reduced performance, and higher employee turnover.

#### Chart - 4 Family History of Mental Illness (Univariate Analysis)

In [None]:
# Chart 4: Family History of Mental Illness (Univariate Analysis)

plt.figure(figsize=(7,5))
sns.countplot(x='family_history', data=df)
plt.title('Family History of Mental Illness')
plt.xlabel('Family History')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

A count plot is appropriate for visualizing categorical variables with limited response options such as family history of mental illness.
It clearly shows the number of respondents in each category, making comparison simple and intuitive.

##### 2. What is/are the insight(s) found from the chart?

The chart indicates that a large proportion of respondents report having a family history of mental illness, while the remaining respondents do not.
This suggests that genetic or familial factors may play a role in mental health conditions among tech professionals.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
Awareness of family history prevalence helps organizations emphasize preventive mental health programs, early screenings, and counseling support.

**Negative Impact (if ignored)**:
Ignoring hereditary risk factors may lead to delayed intervention, increased stress levels, and long-term productivity loss among employees.

#### Chart - 5 Impact of Mental Health on Work Performance (Univariate Analysis)

In [None]:
# Chart 5: Impact of Mental Health on Work Performance (Univariate Analysis)

plt.figure(figsize=(8,5))
sns.countplot(x='work_interfere', data=df)
plt.title('Impact of Mental Health on Work Performance')
plt.xlabel('Work Interference Level')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

A count plot is ideal for analyzing categorical variables like work interference levels.
It helps in understanding how frequently mental health conditions affect employees’ work performance.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that a considerable number of respondents experience mental health interference at work either sometimes or often.
This indicates that mental health issues have a direct impact on daily job performance and productivity.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
Recognizing work interference due to mental health enables organizations to introduce stress management programs, flexible work policies, and employee assistance initiatives.

**Negative Impact (if not addressed)**:
Ongoing mental health interference can result in reduced efficiency, increased burnout, and long-term performance decline, negatively affecting business outcomes.

#### Chart - 6 Remote Work Status of Respondents (Univariate Analysis)

In [None]:
# Chart 6: Remote Work Status of Respondents (Univariate Analysis)

plt.figure(figsize=(7,5))
sns.countplot(x='remote_work', data=df)
plt.title('Remote Work Status of Respondents')
plt.xlabel('Remote Work')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

A count plot is appropriate for visualizing categorical variables such as remote work status.
It provides a clear comparison of the number of respondents working remotely versus those working on-site.

##### 2. What is/are the insight(s) found from the chart?

The chart indicates that a significant portion of respondents work remotely, while others work primarily from an office.
This highlights the growing adoption of flexible work arrangements in the tech industry.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
Understanding remote work adoption helps organizations design mental health policies, collaboration tools, and wellness programs suited for remote and hybrid teams.

**Negative Impact (if mismanaged)**:
Poor remote work support may lead to isolation, communication gaps, and reduced employee engagement, negatively affecting productivity.

#### Chart - 7 Availability of Mental Health Benefits in Organizations (Univariate Analysis)

In [None]:
# Chart 7: Availability of Mental Health Benefits (Univariate Analysis)

plt.figure(figsize=(8,5))
sns.countplot(x='benefits', data=df)
plt.title('Availability of Mental Health Benefits in Organizations')
plt.xlabel('Mental Health Benefits')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

A count plot is suitable for analyzing categorical variables such as the availability of mental health benefits.
It allows easy comparison between organizations that provide mental health benefits and those that do not.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that a considerable number of organizations do not provide mental health benefits, while some respondents are unsure about their availability.
This indicates a lack of clarity and inconsistency in mental health support across workplaces.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
Organizations offering mental health benefits can improve employee well-being, satisfaction, and retention.

**Negative Impact (if ignored)**:
Lack of mental health benefits may result in higher stress levels, absenteeism, and attrition, negatively affecting long-term organizational growth.

#### Chart - 8 Awareness of Mental Health Wellness Programs (Univariate Analysis)

In [None]:
# Chart 8: Awareness of Mental Health Wellness Programs (Univariate Analysis)

plt.figure(figsize=(8,5))
sns.countplot(x='wellness_program', data=df)
plt.title('Awareness of Mental Health Wellness Programs')
plt.xlabel('Wellness Program Awareness')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

A count plot is appropriate for analyzing categorical variables such as awareness of wellness programs.
It clearly represents how many employees are aware, unaware, or unsure about the existence of mental health wellness initiatives in their organizations.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals that a large portion of respondents are either unaware of wellness programs or report that such programs do not exist in their organizations.
This indicates a significant communication gap between employers and employees regarding mental health initiatives.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
Improving awareness of wellness programs can increase employee participation, enhance mental well-being, and foster a supportive workplace culture.

**Negative Impact (if not addressed)**:
Poor awareness may lead to underutilization of available resources, continued employee stress, and reduced engagement, negatively impacting organizational productivity.

#### Chart - 9 Relationship Between Family History and Mental Health Treatment (Bivariate Analysis)

In [None]:
# Chart 9: Family History vs Mental Health Treatment (Bivariate Analysis)

plt.figure(figsize=(8,5))
sns.countplot(x='family_history', hue='treatment', data=df)
plt.title('Family History vs Mental Health Treatment')
plt.xlabel('Family History of Mental Illness')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

A grouped count plot is suitable for comparing two categorical variables simultaneously.
It helps analyze how having a family history of mental illness influences the likelihood of seeking mental health treatment.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that respondents with a family history of mental illness are more likely to seek treatment compared to those without a family history.
This indicates a strong relationship between genetic background and mental health awareness or treatment-seeking behavior.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
Organizations can design targeted mental health awareness and early intervention programs for employees who may be at higher risk.

**Negative Impact (if ignored)**:
Failure to acknowledge hereditary mental health risks may result in delayed support, increased workplace stress, and reduced employee productivity.

#### Chart - 10 Gender vs Mental Health Treatment (Bivariate Analysis)

In [None]:
# Cleaning and standardizing Gender column (Production-grade)

def clean_gender(gender):
    if isinstance(gender, str):
        gender = gender.lower()
        if gender in ['male', 'm', 'man', 'male-ish', 'cis male', 'cis man']:
            return 'Male'
        elif gender in ['female', 'f', 'woman', 'cis female', 'cis woman']:
            return 'Female'
        else:
            return 'Other'
    return 'Other'

df['Gender'] = df['Gender'].apply(clean_gender)


In [None]:
# Chart 10: Gender vs Mental Health Treatment (Bivariate Analysis)

plt.figure(figsize=(8,5))
sns.countplot(x='Gender', hue='treatment', data=df)
plt.title('Gender vs Mental Health Treatment')
plt.xlabel('Gender')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

A grouped count plot is ideal for comparing two categorical variables—gender and treatment status.
It clearly shows differences in treatment-seeking behavior across genders.

##### 2. What is/are the insight(s) found from the chart?

The chart indicates that female respondents are more likely to seek mental health treatment compared to male respondents.
This suggests differences in awareness, openness, or social acceptance of mental health support among genders.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
Understanding gender-based differences helps organizations tailor mental health awareness campaigns and encourage equal access to support resources.

**Negative Impact (if not addressed)**:
Lower treatment-seeking behavior among males may result in untreated stress and burnout, negatively affecting productivity and team dynamics.

#### Chart - 11 Age Distribution Based on Mental Health Treatment Status (Bivariate Analysis)

In [None]:
# Chart 11: Age vs Mental Health Treatment (Bivariate Analysis)

plt.figure(figsize=(8,5))
sns.boxplot(x='treatment', y='Age', data=df)
plt.title('Age Distribution Based on Mental Health Treatment Status')
plt.xlabel('Mental Health Treatment')
plt.ylabel('Age')
plt.show()


##### 1. Why did you pick the specific chart?

A box plot is ideal for comparing the distribution of a numerical variable (age) across different categories of a categorical variable (treatment).
It helps identify differences in median age, spread, and potential outliers between the two groups.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that younger respondents are slightly more likely to seek mental health treatment compared to older respondents.
This suggests that younger professionals may be more aware of mental health issues or more open to seeking help.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
Understanding age-related differences in treatment-seeking behavior allows organizations to design age-appropriate mental health awareness programs and support systems.

**Negative Impact (if ignored)**:
Older employees may remain underserved in mental health initiatives, leading to unmanaged stress, reduced engagement, and lower productivity.

#### Chart - 12 Work Interference vs Mental Health Treatment (Bivariate Analysis)

In [None]:
# Chart 12: Work Interference vs Mental Health Treatment (Bivariate Analysis)

plt.figure(figsize=(8,5))
sns.countplot(x='work_interfere', hue='treatment', data=df)
plt.title('Work Interference vs Mental Health Treatment')
plt.xlabel('Level of Work Interference')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

A grouped count plot is suitable for analyzing the relationship between two categorical variables—work interference level and treatment status.
It helps compare how different levels of work interference influence the likelihood of seeking mental health treatment.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals that respondents who experience mental health interference at work frequently or sometimes are more likely to seek treatment.
This indicates a strong association between work performance impact and treatment-seeking behavior.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
Identifying work interference levels helps organizations implement stress reduction strategies, flexible work arrangements, and employee support programs to improve productivity.

**Negative Impact (if ignored)**:
Persistent work interference due to mental health issues can lead to burnout, reduced efficiency, and increased employee turnover.

#### Chart - 13 Availability of Mental Health Benefits vs Treatment Seeking Behavior (Bivariate Analysis)

In [None]:
# Chart 13: Mental Health Benefits vs Treatment (Bivariate Analysis)

plt.figure(figsize=(8,5))
sns.countplot(x='benefits', hue='treatment', data=df)
plt.title('Mental Health Benefits vs Treatment Seeking Behavior')
plt.xlabel('Availability of Mental Health Benefits')
plt.ylabel('Count')
plt.show()


##### 1. Why did you pick the specific chart?

A grouped count plot is suitable for comparing two categorical variables—availability of mental health benefits and treatment status.
It clearly shows whether organizational support influences employees’ willingness to seek mental health treatment.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that respondents working in organizations that provide mental health benefits are more likely to seek treatment compared to those in organizations without such benefits or those who are unsure.
This highlights the importance of employer-provided support in encouraging mental health care.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
Providing mental health benefits encourages early treatment, improves employee well-being, and leads to higher productivity and retention.

**Negative Impact (if ignored)**:
Organizations lacking mental health benefits may experience higher stress levels, absenteeism, and long-term performance decline among employees.

#### Chart - 14 Willingness to Discuss Mental Health with Supervisor vs Treatment (Bivariate Analysis)

In [None]:
# Chart 14: Supervisor Discussion vs Mental Health Treatment (Bivariate Analysis)

plt.figure(figsize=(8,5))
sns.countplot(x='supervisor', hue='treatment', data=df)
plt.title('Supervisor Discussion vs Mental Health Treatment')
plt.xlabel('Willingness to Discuss with Supervisor')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

A grouped count plot is suitable for analyzing the relationship between two categorical variables—willingness to discuss mental health with a supervisor and treatment status.
It helps understand how managerial openness influences employees’ mental health decisions.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that respondents who are unwilling or unsure about discussing mental health issues with their supervisor are less likely to seek treatment.
This indicates that fear of judgment or negative consequences may discourage employees from addressing mental health concerns.

3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
Encouraging open communication between employees and supervisors can reduce stigma, promote early treatment, and foster a healthier workplace culture.

**Negative Impact (if ignored)**:
Lack of managerial support may lead to suppressed mental health issues, increased stress, and reduced employee trust, negatively impacting team performance.

#### Chart - 15 - Willingness to Discuss Mental Health with Coworkers vs Treatment (Bivariate Analysis)

In [None]:
# Chart 15: Coworker Discussion vs Mental Health Treatment (Bivariate Analysis)

plt.figure(figsize=(8,5))
sns.countplot(x='coworkers', hue='treatment', data=df)
plt.title('Coworker Discussion vs Mental Health Treatment')
plt.xlabel('Willingness to Discuss with Coworkers')
plt.ylabel('Count')
plt.show()


##### 1. Why did you pick the specific chart?

A grouped count plot is ideal for comparing two categorical variables—willingness to discuss mental health with coworkers and treatment status.
It helps analyze how peer support influences treatment-seeking behavior.

##### 2. What is/are the insight(s) found from the chart?

The chart indicates that respondents who are more comfortable discussing mental health issues with coworkers are also more likely to seek treatment.
This highlights the role of peer support in promoting mental health awareness.

3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**: Encouraging a supportive team culture can increase openness, reduce stigma, and promote early mental health intervention.

**Negative Impact (if ignored)**:
Lack of peer support may isolate employees, leading to untreated mental health issues and reduced collaboration and productivity.

Chart – 16 Perception of Mental Health vs Physical Health in the Workplace (Bivariate Analysis)

In [None]:
# Chart 16: Mental Health vs Physical Health Perception (Bivariate Analysis)

plt.figure(figsize=(8,5))
sns.countplot(x='mental_vs_physical', data=df)
plt.title('Perception of Mental Health vs Physical Health in the Workplace')
plt.xlabel('Mental vs Physical Health Perception')
plt.ylabel('Count')
plt.show()


Why did you pick the specific chart?

A count plot is suitable for visualizing categorical responses related to employee perceptions.
It allows a clear comparison of how employees feel about the importance given to mental health versus physical health in their workplace.

What is/are the insight(s) found from the chart?

The chart shows that many respondents believe mental health is not treated as seriously as physical health by their employers.
This highlights an imbalance in workplace health priorities.

Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
Recognizing this perception gap enables organizations to improve mental health policies, normalize discussions, and build trust among employees.

**Negative Impact (if ignored)**:
Continued neglect of mental health concerns may lead to disengagement, low morale, and long-term productivity loss.

Chart – 17 Gender and Family History vs Mental Health Treatment (Multivariate Analysis)

In [None]:
# Final standardization of Gender column (must run before all gender-based charts)

def clean_gender(g):
    if isinstance(g, str):
        g = g.lower().strip()
        if g in ['male', 'm', 'man', 'male-ish', 'cis male', 'cis man']:
            return 'Male'
        elif g in ['female', 'f', 'woman', 'cis female', 'cis woman']:
            return 'Female'
        else:
            return 'Other'
    return 'Other'

df['Gender'] = df['Gender'].apply(clean_gender)

# Verify cleaning
df['Gender'].value_counts()


In [None]:
# Chart 17: Gender & Family History vs Mental Health Treatment (Multivariate Analysis)

sns.catplot(
    x='Gender',
    hue='treatment',
    col='family_history',
    data=df,
    kind='count',
    order=['Male', 'Female', 'Other'],
    height=5,
    aspect=1
)

plt.subplots_adjust(top=0.85)
plt.suptitle('Gender and Family History vs Mental Health Treatment')
plt.show()


The Gender variable originally contained multiple inconsistent textual values.
These were standardized into three categories (Male, Female, Other) to improve
visual clarity and ensure meaningful multivariate comparisons.


Why did you pick the specific chart?

A categorical facet count plot is suitable for multivariate analysis involving more than two categorical variables.
It allows comparison of treatment-seeking behavior across gender while simultaneously considering family history of mental illness.

What is/are the insight(s) found from the chart?

The chart shows that individuals with a family history of mental illness are more likely to seek treatment across all genders.
It also highlights gender-based differences in treatment-seeking behavior when family history is considered.

Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
These insights help organizations design personalized mental health initiatives considering both demographic and hereditary factors.

**Negative Impact (if ignored)**:
Ignoring combined risk factors may lead to ineffective mental health strategies, increased stress, and lower workforce resilience.

Chart – 18 Age, Work Interference, and Mental Health Treatment Relationship (Multivariate Analysis)

In [None]:
# Chart 18: Age, Work Interference & Mental Health Treatment (Multivariate Analysis)

plt.figure(figsize=(8,5))
sns.scatterplot(
    x='Age',
    y=df.index,
    hue='treatment',
    style='work_interfere',
    data=df
)
plt.title('Age, Work Interference, and Mental Health Treatment')
plt.xlabel('Age')
plt.ylabel('Respondent Index')
plt.show()


Why did you pick the specific chart?

A scatter plot is suitable for visualizing relationships between numerical and categorical variables in a multivariate context.
It helps analyze how age and work interference jointly influence mental health treatment behavior.

What is/are the insight(s) found from the chart?

The chart indicates that employees in the mid-age range experience higher work interference and are more likely to seek treatment.
It also shows variation in treatment behavior across different age groups.

Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
Identifying age groups with higher work interference enables targeted stress management and support initiatives, improving employee well-being and retention.

**Negative Impact (if ignored)**:
Failure to address work interference among high-risk age groups may lead to burnout, disengagement, and increased attrition.

Chart – 19 Mental Health Benefits, Care Options Awareness, and Treatment Status (Multivariate Analysis)

In [None]:
# Chart 19: Benefits, Care Options & Mental Health Treatment (Multivariate Analysis)

sns.catplot(
    x='benefits',
    hue='treatment',
    col='care_options',
    data=df,
    kind='count',
    height=5,
    aspect=1
)

plt.subplots_adjust(top=0.85)
plt.suptitle('Mental Health Benefits, Care Options Awareness, and Treatment Status')
plt.show()


Why did you pick the specific chart?

A faceted count plot is ideal for multivariate analysis involving multiple categorical variables.
It allows comparison of treatment-seeking behavior across different levels of mental health benefits and care options awareness.

What is/are the insight(s) found from the chart?

The chart shows that respondents who have access to mental health benefits and are aware of available care options are more likely to seek treatment.
Lack of awareness significantly reduces treatment-seeking behavior, even when benefits are available.

Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
Organizations that not only provide mental health benefits but also actively communicate care options can significantly improve employee well-being and productivity.

**Negative Impact (if ignored)**:
Poor communication regarding care options can lead to underutilization of benefits, wasted resources, and persistent mental health challenges among employees.

Chart – 20 Remote Work, Work Interference, and Mental Health Treatment (Multivariate Analysis)

In [None]:
# Chart 20: Remote Work, Work Interference & Mental Health Treatment (Multivariate Analysis)

sns.catplot(
    x='remote_work',
    hue='treatment',
    col='work_interfere',
    data=df,
    kind='count',
    height=5,
    aspect=1
)

plt.subplots_adjust(top=0.85)
plt.suptitle('Remote Work, Work Interference, and Mental Health Treatment')
plt.show()


Why did you pick the specific chart?

A faceted count plot is suitable for multivariate analysis involving multiple categorical variables.
It allows simultaneous comparison of treatment-seeking behavior based on remote work status and levels of work interference.

What is/are the insight(s) found from the chart?

The chart shows that employees experiencing higher work interference are more likely to seek treatment, regardless of remote work status.
It also suggests that remote work may reduce the frequency of high work interference in some cases.

Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**:
These insights support the adoption of flexible or hybrid work models to reduce mental stress and improve employee well-being and productivity.

**Negative Impact (if ignored)**:
Ignoring work interference factors may lead to increased burnout, lower engagement, and reduced organizational performance.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the exploratory data analysis, it is recommended that organizations prioritize mental health as a core component of employee well-being.
Companies should introduce comprehensive mental health benefits, increase awareness of available care options, and encourage open communication between employees, coworkers, and supervisors.
Implementing wellness programs, promoting flexible or remote work policies, and training managers to handle mental health conversations sensitively can significantly reduce work interference, improve productivity, and enhance employee retention.
Data-driven mental health initiatives will help organizations create a healthier, more supportive, and high-performing work environment.

# **Conclusion**

This Exploratory Data Analysis provided valuable insights into mental health awareness, workplace support, and employee perceptions within the tech industry.
The analysis revealed that mental health significantly affects work performance and treatment-seeking behavior, with factors such as family history, workplace benefits, management support, and work flexibility playing a crucial role.
Through univariate, bivariate, and multivariate analysis, key patterns and relationships were identified that can guide organizations in making informed decisions.
Overall, investing in mental health support systems, improving communication, and fostering an inclusive workplace culture can lead to improved employee well-being, higher productivity, and sustainable business growth.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***