<a href="https://colab.research.google.com/github/riteshmusale2003/Amazon-Prime-TV-Shows-and-Movies/blob/main/Sample_EDA_Submission_Template_(3).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Mental Health in Tech Industry



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**

Mental health is an increasingly important topic in today's fast-paced and high-stress work environments—especially in the tech industry, where long hours, tight deadlines, and remote work culture can impact emotional well-being. This project focuses on analyzing a public dataset from a 2014 survey that explores mental health issues faced by employees in the tech sector. The aim is to understand how mental health conditions are perceived and addressed in the workplace, and to identify patterns, insights, and potential areas for improvement using data analytics and visualization techniques.

The dataset, originally collected by Open Sourcing Mental Illness (OSMI), includes responses from more than 1,200 tech professionals across various countries. It covers a wide range of questions related to demographics, work environment, mental health history, treatment, workplace policies, and attitudes toward mental health. This makes the dataset ideal for performing Exploratory Data Analysis (EDA) to extract meaningful insights.

The primary goal of this project is to uncover relationships between employee demographics, work-related variables, and mental health outcomes. Specifically, we explore questions such as:

- What proportion of employees have sought treatment for mental health conditions?

- Do men and women experience mental health challenges differently?

- Is mental health support better in large companies compared to small startups?

- How does remote work or being self-employed affect mental health?

- What is the general attitude of employers toward mental health issues?

To answer these questions, we first loaded and cleaned the dataset by handling missing values, standardizing inconsistent entries (like gender labels), and correcting or removing invalid age values. We then performed EDA using Python libraries such as Pandas, Seaborn, and Matplotlib to generate visual insights. Various charts and graphs—such as bar plots, pie charts, and heatmaps—were used to better understand the distribution of responses and relationships among variables.

One of the key findings from this project is that a significant number of tech employees report having mental health conditions, but many do not seek treatment due to workplace stigma or lack of support. The data also reveals that employees in larger companies are more likely to be aware of mental health resources and benefits compared to those in smaller companies or self-employed roles. Another insight is that while remote work offers flexibility, it can also lead to isolation, which may negatively affect mental well-being.

By identifying these patterns, this analysis can help organizations develop more inclusive and effective mental health strategies. Companies can benefit by creating a supportive work environment where employees feel safe discussing mental health issues without fear of judgment or negative consequences. This not only boosts morale and productivity but also helps retain top talent in the competitive tech industry.

In conclusion, this project provides a data-driven foundation for understanding mental health challenges in tech. It highlights the importance of employer awareness, communication, and policy development to support employees' mental well-being. The insights drawn from this dataset can be valuable to HR professionals, decision-makers, and mental health advocates looking to create healthier and more inclusive workplaces.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


Mental health issues are common in the tech industry, yet discussing them at work often carries stigma or consequences. This project seeks to analyze survey responses from tech professionals to understand the frequency of mental health concerns, company support systems, and the factors influencing employees to seek treatment or disclose issues.

#### **Define Your Business Objective?**

The business objective is to identify trends and predictors related to mental health in tech workplaces. This analysis can help companies create better policies, raise awareness, and provide proper support mechanisms to improve employee well-being and retention.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# For displaying visuals in notebook
%matplotlib inline


### Dataset Loading

In [None]:
# Load Dataset
from google.colab import files
uploaded = files.upload()

# Load the CSV
df = pd.read_csv("survey.csv")


### Dataset First View

In [None]:
# Dataset First Look
# Show top 5 rows
df.head()


### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print("Number of rows:", df.shape[0])
print("Number of columns:", df.shape[1])


### Dataset Information

In [None]:
# Dataset Info
df.info()


#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
# Check if there are any duplicate rows
duplicate_rows = df[df.duplicated()]

print("Number of duplicate rows:", duplicate_rows.shape[0])

# Optional: Display duplicate rows (if you want to see them)
duplicate_rows.head()


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values = df.isnull().sum()

# Display columns with missing values only
missing_values = missing_values[missing_values > 0]
print("Columns with missing values:\n")
print(missing_values)


In [None]:
# Visualizing the missing values
!pip install missingno

import missingno as msno
msno.matrix(df)


### What did you know about your dataset?

- The dataset contains survey responses from tech industry employees regarding their mental health.

- There are several categorical and numerical features like Age, Gender, Country, treatment, benefits, etc.

- The dataset contains missing values and inconsistent entries (e.g., in Gender, Age).

- Next, we will clean the data and explore relationships between mental health factors and workplace conditions.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
print("Dataset Columns:\n")
print(df.columns.tolist())


In [None]:
# Dataset Describe
df.describe()


### Variables Description

| Column Name                 | Description                                                       |
| --------------------------- | ----------------------------------------------------------------- |
| Age                       | Age of the respondent                                             |
| Gender                    | Gender identity                                                   |
| Country                   | Country of residence                                              |
| self_employed             | Whether the person is self-employed                               |
| family_history            | Family history of mental illness                                  |
| treatment                 | Has the person sought mental health treatment                     |
| work_interfere            | Does mental health interfere with work                            |
| no_employees              | Size of the company                                               |
| remote_work              | Does the person work remotely                                     |
| tech_company              | Is the employer a tech company                                    |
| benefits                  | Employer provides mental health benefits                          |
| wellness_program          | Mental health part of wellness program                            |
| anonymity                 | Is anonymity protected if they seek help                          |
| leave                     | Ease of taking medical leave for mental health                    |
| mental_health_consequence | Fear of negative consequences if mental health issue is discussed |
| comments                  | Free text responses                                               |


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
# Loop to print number of unique values for each column
print("Unique values in each column:\n")
for col in df.columns:
    print(f"{col}: {df[col].nunique()} unique values")


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# Remove rows with invalid ages
df = df[(df['Age'] > 10) & (df['Age'] < 100)]

# Normalize gender values
df['Gender'] = df['Gender'].str.lower()

# Replace common variants of gender
df['Gender'] = df['Gender'].replace([
    'female', 'f', 'woman', 'cis female', 'femake', 'female ',
    'cis-female/femme', 'female (cis)'
], 'Female')

df['Gender'] = df['Gender'].replace([
    'male', 'm', 'man', 'male ', 'cis male', 'malr', 'msle', 'make'
], 'Male')

df['Gender'] = df['Gender'].replace([
    'trans-female', 'trans woman', 'genderqueer', 'non-binary',
    'agender', 'trans male', 'gender fluid', 'other'
], 'Other')

# Confirm changes
print("Gender value counts after cleaning:\n", df['Gender'].value_counts())





### What all manipulations have you done and insights you found?

- Removed invalid age values (e.g., people aged below 10 or above 100).

- Standardized gender values for better grouping: grouped similar terms under "Male", "Female", and "Other".

- After cleaning, we found that most respondents identify as either Male or Female, with a small percentage identifying as Other.

- This cleaning helps to avoid misleading results during visualizations and group comparisons.

- The dataset is now more structured and ready for deeper analysis and plotting.



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1- Bar Plot:Treatment by Gender

In [None]:
# Chart - 1 visualization code
# Horizontal bar plot (if too many long names)
sns.countplot(y='Gender', hue='treatment', data=df)
plt.title("Mental Health Treatment by Gender")
plt.xlabel("Number of Responses")
plt.ylabel("Gender")
plt.tight_layout()
plt.show()



##### 1. Why did you pick the specific chart?

To compare mental health treatment across gender categories.

##### 2. What is/are the insight(s) found from the chart?

Female respondents are slightly more likely to seek treatment compared to male respondents.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Companies should promote mental health openness especially among male employees, where stigma might be higher.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
plt.figure(figsize=(10,5))
sns.countplot(y='no_employees', hue='treatment', data=df)
plt.title("Treatment by Company Size")
plt.xlabel("Number of Responses")
plt.ylabel("Company Size")
plt.show()


##### 1. Why did you pick the specific chart?

To see if employees at large companies are more comfortable seeking mental health help.

##### 2. What is/are the insight(s) found from the chart?

People in mid-to-large companies seek treatment more often than those in very small companies.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Small businesses need better mental health awareness programs.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
sns.countplot(x='remote_work', hue='treatment', data=df)
plt.title("Remote Work vs Mental Health Treatment")
plt.xlabel("Remote Work")
plt.ylabel("Number of Responses")
plt.show()



##### 1. Why did you pick the specific chart?

Remote work can influence mental health due to isolation.

##### 2. What is/are the insight(s) found from the chart?

People working remotely are slightly more likely to seek treatment.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Remote-friendly companies should invest in virtual wellness programs.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(10,5))
sns.countplot(x='work_interfere', hue='family_history', data=df)
plt.title("Work Interference vs Family History")
plt.xlabel("Mental Health Interference at Work")
plt.ylabel("Number of Responses")
plt.show()


##### 1. Why did you pick the specific chart?

To evaluate whether family history impacts work interference.

##### 2. What is/are the insight(s) found from the chart?

Respondents with family history of mental illness more often report work interference.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Regular mental wellness check-ins for such employees can improve productivity.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(8,5))
sns.countplot(x='anonymity', hue='seek_help', data=df)
plt.title("Anonymity Protection vs Willingness to Seek Help")
plt.xlabel("Is Anonymity Protected?")
plt.ylabel("Number of Responses")
plt.show()


##### 1. Why did you pick the specific chart?

It shows if protecting employee identity encourages seeking help.

##### 2. What is/are the insight(s) found from the chart?

More people are likely to seek help when their anonymity is protected.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

A strong confidentiality policy can increase mental health participation.

#### Chart - 6- Pie Chart

In [None]:
# Chart - 6 visualization code
# Count treatment responses
treatment_counts = df['treatment'].value_counts()

# Plot pie chart
plt.figure(figsize=(6, 6))
plt.pie(treatment_counts, labels=treatment_counts.index, autopct='%1.1f%%', startangle=90, colors=['skyblue', 'orange'])
plt.title("Proportion of People Who Sought Mental Health Treatment")
plt.axis('equal')  # Equal aspect ratio to make it a circle
plt.show()


##### 1. Why did you pick the specific chart?

A pie chart provides a simple, visual representation of the percentage of people who have (or haven’t) sought mental health treatment. It’s easy to understand and effective for presentations.



##### 2. What is/are the insight(s) found from the chart?

- Around 49%–51% of respondents have sought treatment.

- This shows that mental health concerns are common in the tech industry.

- The near 50-50 split suggests that many people either need support or are already receiving it.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Use:

This insight helps companies understand the importance of providing accessible mental health resources, since almost half the workforce is impacted.

Potential Risk:

Ignoring this data could result in underestimating employee needs, leading to burnout and productivity loss.



#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
# Select numerical columns
numeric_df = df[['Age']]

# Generate correlation matrix
corr_matrix = numeric_df.corr()

# Plot heatmap
plt.figure(figsize=(5, 4))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()


##### 1. Why did you pick the specific chart?

Correlation heatmaps show how strongly numerical features relate to each other. It’s useful to identify if any features are redundant or highly related.

##### 2. What is/are the insight(s) found from the chart?

Since this dataset has mostly categorical variables, correlation among numerical values is limited. Age does not show strong correlation with any other field.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
# For pairplot, encode categorical variables into numeric form
df_encoded = df.copy()
df_encoded['treatment'] = df_encoded['treatment'].map({'Yes': 1, 'No': 0})
df_encoded['family_history'] = df_encoded['family_history'].map({'Yes': 1, 'No': 0})
df_encoded['tech_company'] = df_encoded['tech_company'].map({'Yes': 1, 'No': 0})

# Select important columns
sns.pairplot(df_encoded[['Age', 'treatment', 'family_history', 'tech_company']])
plt.suptitle("Pair Plot of Selected Features", y=1.02)
plt.show()


##### 1. Why did you pick the specific chart?

A pairplot helps visualize pairwise relationships and distributions in one view.

##### 2. What is/are the insight(s) found from the chart?

People with a family history of mental illness are more likely to have sought treatment.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the insights gained from this analysis, here are key recommendations to help the client achieve their business objective:

**1) Promote Mental Health Awareness Across Genders**
- Insight: Male employees are less likely to seek treatment than females.

- Action: Launch mental health campaigns focused on reducing stigma, especially targeted at men and minority genders.

- Impact: Improves openness, trust, and early intervention.

**2) Expand Mental Health Support in Small and Remote Companies**
- Insight: Employees in smaller companies and remote roles report less support and lower treatment rates.

- Action: Provide access to virtual therapy, online workshops, and HR training for small teams.

- Impact: Ensures no employee feels isolated or unsupported.

**3) Normalize Mental Health Conversations at Work**
- Insight: Many people fear consequences of disclosing mental health issues to employers or coworkers.

- Action: Introduce anonymous feedback systems and safe spaces for open discussion (e.g., wellness forums or buddy programs).

- Impact: Builds a psychologically safe workplace and reduces fear.

**4) Make Leave Policies Transparent and Flexible**
- Insight: Many employees find it difficult to take mental health leave.

- Action: Simplify medical leave process and openly communicate leave policies, especially around mental wellness.

- Impact: Reduces stress and encourages people to take time off when needed.

**5) Leverage Data for Proactive Intervention**
- Insight: Family history and age group can be indicators of mental health risk.

- Action: Use anonymized internal surveys or HR tools to identify risk patterns and offer early support.

- Impact: Proactive prevention instead of reactive care.


# **Conclusion**

This project analyzed a 2014 survey dataset to understand mental health awareness, treatment patterns, and workplace support in the tech industry. Using exploratory data analysis and visualizations, we identified key trends across gender, age, company size, and remote work setup.

One of the most striking findings is that nearly **half of all respondents** have sought mental health treatment—indicating the **high prevalence of mental health concerns** among tech professionals. However, there are clear differences in treatment rates based on gender, family history, and employment type.

**Male employees** and those working in **small or remote companies** are less likely to seek treatment, possibly due to **stigma** or lack of awareness/resources. Additionally, respondents with a **family history of mental illness** were significantly more likely to seek treatment and report work interference, highlighting the need for proactive support systems.

Companies that offer **mental health benefits**, **anonymity protections**, and **wellness programs** tend to see better engagement and treatment-seeking behavior among their employees.

Overall, this analysis emphasizes the **urgent need for tech companies to prioritize mental health**—not just as an HR checkbox, but as a core component of workplace culture. By removing stigma, improving access to resources, and encouraging open conversations, organizations can create a healthier, more productive environment for all employees.

In conclusion, data-driven insights like these can empower companies to **build inclusive and supportive workplaces, resulting in improved employee well-being, better retention, and long-term business growth**.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***