<a href="https://colab.research.google.com/github/shobika113/LABMENTIX/blob/main/shobika_mental_health_labmentix_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual
Shobika K


# **Project Summary -**


In this project, I perform an exploratory data analysis (EDA) on the **Mental Health in Tech Survey** dataset. The aim is to uncover patterns, relationships, and insights related to mental health issues and workplace attitudes in the tech industry.

####  **Key Steps Undertaken**

1. **Understanding the Data**

   * Reviewed the structure, features, and types of responses in the dataset.
   * Identified important columns such as age, gender, country, family history, treatment, work interference, and employer support.

2. **Data Cleaning & Wrangling**

   * Handled missing values and inconsistent entries (e.g., country names, gender categories).
   * Removed outliers in the age column to maintain data quality.
   * Standardized categorical variables for easier analysis.

3. **Data Visualization**

   * Visualized distributions of age, gender, and country-wise participation.
   * Explored relationships between mental health treatment and factors such as family history, work interference, and age.
   * Created heatmaps and bar charts to compare geographical trends and workplace support across countries.

4. **Business Insights**

   * Countries with more supportive workplace policies tend to have employees more open about seeking treatment.
   * Family history and work stress are major predictors of mental health issues.
   * Companies with better mental health awareness policies may reduce stigma and improve employee well-being.



# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


Mental health is a growing concern in the tech industry, where high stress levels, long working hours, and limited support can negatively impact employee well-being. This project aims to explore the factors influencing mental health issues among tech workers using survey data . Through data wrangling, visualization, and pattern analysis, we seek to understand how mental health conditions and attitudes vary by demographics, geography, and workplace environment. The ultimate goal is to uncover actionable insights that can help organizations create more supportive mental health policies and reduce stigma in the tech workplace.

#### **Define Your Business Objective?**

The business objective of this project is to analyze mental health trends in the tech industry to help organizations make data-driven decisions that promote better mental health support in the workplace. By identifying key factors such as geographical patterns, workplace attitudes, and personal predictors related to mental health, companies can:

Recognize early warning signs among employees.

Develop targeted mental health programs.

Foster an open and supportive work culture.

Reduce absenteeism and improve employee productivity.

The ultimate goal is to use insights from the data to guide effective mental health policies and enhance employee well-being in the tech sector.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
# For data manipulation and handling
import pandas as pd
import numpy as np

# For visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from sklearn.preprocessing import LabelEncoder
print("All libraries imported successfully.")

### Dataset Loading

In [None]:
# Load Dataset
file_path = '/content/survey.csv'
# Load the dataset
data = pd.read_csv(file_path)

# Display basic info
print("✅ Dataset loaded successfully.")

### Dataset First View

In [None]:
# Dataset First Look
data.head()

In [None]:
data.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
rows, columns = data.shape
print(f"The dataset contains {rows} rows and {columns} columns.")

### Dataset Information

In [None]:
# Dataset Info
data.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_count = data.duplicated().sum()
print(f"Number of duplicate rows in the dataset: {duplicate_count}")

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values = data.isnull().sum()
print("Missing values in each column:\n")
print(missing_values)

In [None]:
# Visualizing the missing values
plt.figure(figsize=(10, 6))
sns.heatmap(data.isnull(), cmap='viridis', cbar=False)
plt.title('Missing Values Heatmap')
plt.show()

### What did you know about your dataset?



The dataset contains **1,259 rows** from a survey focused on mental health in the tech industry. It includes **27 columns** covering demographics (age, gender, country), work environment (remote work, company size, tech company), and mental health indicators (treatment, family history, work interference). Most fields are complete, though some have missing values, such as state (515 missing), work_interfere (264 missing), and self_employed (18 missing). The comments field has significant missing data and may be excluded from core analysis. The dataset has **no duplicate entries** and is ready for cleaning and exploration.


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
column=data.columns.tolist()
print("Column Names:",column)

In [None]:
# Dataset Describe
data.describe(include='all').transpose()

### Variables Description

| **Variable Name**           | **Description**                                                                                |
| --------------------------- | ---------------------------------------------------------------------------------------------- |
| `Timestamp`                 | Date and time when the survey was submitted.                                                   |
| `Age`                       | Age of the respondent (in years).                                                              |
| `Gender`                    | Gender identity of the respondent.                                                             |
| `Country`                   | Country of residence.                                                                          |
| `state`                     | State of residence (mainly applicable to the U.S.).                                            |
| `self_employed`             | Whether the respondent is self-employed.                                                       |
| `family_history`            | Whether the respondent has a family history of mental illness.                                 |
| `treatment`                 | Whether the respondent has sought treatment for mental health.                                 |
| `work_interfere`            | Frequency with which mental health interferes with work.                                       |
| `no_employees`              | Number of employees at the respondent’s workplace.                                             |
| `remote_work`               | Whether the respondent works remotely.                                                         |
| `tech_company`              | Whether the respondent works in a tech company.                                                |
| `benefits`                  | Whether the employer provides mental health benefits.                                          |
| `care_options`              | Availability of mental health care options through the employer.                               |
| `wellness_program`          | Whether the employer has a wellness program.                                                   |
| `seek_help`                 | Whether the employer offers resources to seek mental health help.                              |
| `anonymity`                 | Whether anonymity is protected when seeking help for mental health.                            |
| `leave`                     | Ease of taking medical leave for mental health issues.                                         |
| `mental_health_consequence` | Perceived consequences of discussing mental health at work.                                    |
| `phys_health_consequence`   | Perceived consequences of discussing physical health at work.                                  |
| `coworkers`                 | Comfort level in discussing mental health with coworkers.                                      |
| `supervisor`                | Comfort level in discussing mental health with supervisors.                                    |
| `mental_health_interview`   | Whether mental health is a factor in job interviews.                                           |
| `phys_health_interview`     | Whether physical health is a factor in job interviews.                                         |
| `mental_vs_physical`        | Importance of mental vs. physical health in the workplace.                                     |
| `obs_consequence`           | Whether the respondent has observed negative consequences of mental health disclosure at work. |
| `comments`                  | Open-ended comments (free text, mostly missing).                                               |


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_values = data.nunique().sort_values(ascending=False)

# Display the result
print("Unique values in each column:\n")
print(unique_values)

In [None]:
unique_counts = data.nunique().sort_values()

# Set plot style
plt.figure(figsize=(10, 6))
sns.barplot(x=unique_counts.values, y=unique_counts.index, palette='viridis')

# Add labels and title
plt.title("Unique Values per Column in survey  Dataset", fontsize=14)
plt.xlabel("Number of Unique Values")
plt.ylabel("Columns")
plt.tight_layout()
plt.show()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.


# ----------------------
# Step 2: Clean 'Age' column
# ----------------------
# Remove unrealistic age entries (e.g., <16 or >100)
data = data[(data['Age'] >= 16) & (data['Age'] <= 100)]

# ----------------------
# Step 3: Clean 'Gender' column
# ----------------------
# Convert all gender values to lowercase and strip whitespace
data['Gender'] = data['Gender'].str.lower().str.strip()

# Replace similar gender entries
male_terms = ['male', 'm', 'man', 'male-ish', 'maile', 'mal', 'male (cis)', 'cis male']
female_terms = ['female', 'f', 'woman', 'femail', 'cis female', 'femake']
trans_terms = ['trans', 'trans-female', 'trans male', 'trans man', 'trans woman', 'genderqueer']

data['Gender'] = data['Gender'].replace(male_terms, 'male')
data['Gender'] = data['Gender'].replace(female_terms, 'female')
data['Gender'] = data['Gender'].replace(trans_terms, 'trans')

# Keep only common values: male, female, trans
data = data[data['Gender'].isin(['male', 'female', 'trans'])]

# ----------------------
# Step 4: Handle missing values
# ----------------------

# Fill or impute selected columns
data['self_employed'].fillna('No', inplace=True)
data['work_interfere'].fillna('Don’t know', inplace=True)

# ----------------------
# Step 5: Convert to categorical types (optional but helpful)
# ----------------------
categorical_cols = [
    'Gender', 'Country', 'self_employed', 'family_history', 'treatment',
    'work_interfere', 'no_employees', 'remote_work', 'tech_company',
    'benefits', 'care_options', 'wellness_program', 'seek_help',
    'anonymity', 'leave', 'mental_health_consequence',
    'phys_health_consequence', 'coworkers', 'supervisor',
    'mental_health_interview', 'phys_health_interview',
    'mental_vs_physical', 'obs_consequence'
]

data[categorical_cols] = data[categorical_cols].astype('category')

# ----------------------
# Final check
# ----------------------
print(data.info())
print(data.isnull().sum())


### What all manipulations have you done and insights you found?

We performed several data wrangling steps, including removing unrealistic age entries, standardizing gender categories (male, female, trans), and handling missing values by dropping or imputing where appropriate. Unnecessary columns like Timestamp, state, and comments were removed to focus on relevant features. After cleaning, we discovered that a significant portion of respondents reported having a family history of mental illness and receiving treatment. Most participants worked in tech companies, yet many reported limited access to mental health resources. Comfort discussing mental health with employers varied widely, indicating a need for more open and supportive workplace environments in the tech industry.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
fig = px.histogram(data, x='Age', nbins=10, color='Gender',
                   title='Age Distribution by Gender',
                   labels={'Age': 'Age', 'count': 'Number of Respondents'})
fig.update_layout(barmode='overlay')  # you can use 'group' for side-by-side
fig.update_traces(opacity=0.6)
fig.show()

##### 1. Why did you pick the specific chart?

Age is a continuous variable, and histograms are ideal for showing how frequently different age ranges occurOverlaying the distributions (with transparency) helps highlight where age distributions for different genders overlap or differ.
This is crucial in the mental health in tech context, where age and gender may influence both awareness and access to mental health resources..

##### 2. What is/are the insight(s) found from the chart?

The majority of respondents fall in the 25–35 age range.

Male participants dominate the survey across most age groups, especially between 25–40.

Female participation is comparatively lower, and older age groups (above 45) are underrepresented for both genders.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Positive Business Impact:


Targeted Mental Health Programs: Knowing that most respondents are in the 25–35 age group, companies can design programs (e.g., workshops, flexible therapy sessions) tailored to the stressors of early/mid-career professionals.

 Negative Impact:

Lack of Representation: Low participation from older employees and females could indicate:

~Cultural stigma around mental health.

~Lack of trust or comfort with company support systems.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
px.box(data, x='treatment', y='Age', title=' Age by Treatment').show()

##### 1. Why did you pick the specific chart?

A box plot is ideal for visualizing age distribution across different treatment groups.

It shows key statistical properties such as median, quartiles (Q1 & Q3), and outliers.



##### 2. What is/are the insight(s) found from the chart?

The median age of respondents who received treatment is slightly higher than those who did not.

There’s a wider age spread (more variability) among the treated group, possibly including older individuals.

Younger respondents (early 20s) appear in both groups, but some outliers in the treatment group are older adults.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive  Impact:


This chart helps companies understand that:

Age may influence treatment-seeking behavior.

Mental health services might need to be tailored by age group (e.g., younger employees may benefit from peer support or anonymous counseling).

Potential Impact:


If younger employees are less likely to seek treatment, this could indicate:

Cultural stigma or fear of career impact.

A lack of awareness or trust in company mental health support.

#### Chart - 3

In [None]:
# Chart 3: Violin plot
px.violin(data, x='family_history', y='Age', color='family_history', box=True, title='4. Age vs Family History').show()

##### 1. Why did you pick the specific chart?

A violin plot is ideal for visualizing both the distribution and density of data across categories.

It combines the features of a box plot (median, IQR) and a KDE (Kernel Density Estimate) curve.

##### 2. What is/are the insight(s) found from the chart?

Respondents with a family history of mental illness are evenly spread across a broad age range.

The median age appears similar for both groups, but:

The distribution shape for those without family history might be narrower, indicating less age diversity.

Those with family history show a wider spread, possibly indicating higher representation across all age groups.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive  Impact:


Understanding the age distribution of individuals with family history allows organizations to:

Design awareness campaigns that focus on hereditary factors in mental health.

Promote preventive care and early intervention, especially in younger employees with known family history.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
# Chart 4: Bar plot
bar_data = data.groupby(['work_interfere', 'treatment']).size().reset_index(name='count')
px.bar(bar_data, x='work_interfere', y='count', color='treatment', title='5. Work Interference vs Treatment').show()

##### 1. Why did you pick the specific chart?

A bar plot is perfect for categorical comparisons, especially for visualizing frequencies.

This chart compares how many people with different levels of work interference due to mental health have either received treatment or not.



##### 2. What is/are the insight(s) found from the chart?

Respondents who report frequent interference with work (Often, Sometimes) are more likely to have received treatment.

Those reporting Never or Rarely experiencing interference are less likely to seek treatment.

The highest count of treatment is often observed among those who say work is “Sometimes” affected—showing a potential threshold where people decide to seek help.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:


This insight validates the importance of workplace impact in driving mental health treatment.

Companies can:

Monitor employees’ self-reported productivity or interference levels as early indicators.

Develop well-being check-ins or assessments to identify those who may be struggling but not seeking help.


Negative Insight:


If many individuals whose work is “Often” or “Sometimes” affected do not seek treatment, it could mean:

Lack of awareness, trust, or access to support services.

Fear of being judged or penalized for seeking help.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
# Chart 6: Donut chart
tech_counts = data['tech_company'].value_counts()
px.pie(values=tech_counts.values, names=tech_counts.index, hole=0.5, title='6. Tech Company Employment').show()

##### 1. Why did you pick the specific chart?

It gives a quick snapshot of how many respondents are working in the tech industry versus those who are not.

This chart is especially relevant to the dataset “Mental Health in Tech”, as it helps understand the target audience of the survey.Answer Here.

##### 2. What is/are the insight(s) found from the chart?

A smaller proportion of participants do not work in tech—either from other sectors or are unemployed.

This confirms that the dataset is heavily tech-industry focused, validating its relevance to the mental health in tech theme.

The dominance of tech employees suggests that mental health challenges and experiences captured are primarily from a fast-paced, high-pressure environment typical of tech workplaces.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive  Impact:


The fact that most respondents are from tech companies confirms that:

Mental health strategies can be tailored specifically for tech work environments (e.g., burnout prevention, remote work support).

Companies in tech can benchmark their employee wellness practices against this dataset.

Negative Insight:


The underrepresentation of non-tech companies may indicate:

A lack of awareness or concern about mental health outside the tech industry.

Potential industry bias, meaning some insights may not generalize to broader populations.

#### Chart - 6

In [None]:
# Chart - 6 visualization code# Chart 9: Swarmplot
sns.swarmplot(x='Gender', y='Age', data=data)
plt.title("9. Gender vs Age (Swarm)")
plt.show()

##### 1. Why did you pick the specific chart?

A swarm plot is ideal for showing individual data points while avoiding overlap.

In this chart:

We're exploring how Age varies across Gender.

It helps uncover patterns, outliers, or skewness in participation by gender identity.

##### 2. What is/are the insight(s) found from the chart?

Male respondents dominate the dataset, with ages concentrated between 25 and 40.

Female respondents are present in similar age ranges but are fewer in number, suggesting underrepresentation.

Non-binary/Other gender identities appear with very few data points, often scattered and not clustered.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:


Recognize which gender groups are actively participating in mental health surveys.

Identify gaps in outreach or inclusion, especially among non-binary and female employees.

Negative Insight:


The low participation of female and non-binary individuals may signal:

Lack of psychological safety.

Distrust in survey confidentiality or company support systems.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
# Chart 11: Strip plot
sns.stripplot(x='leave', y='Age', data=data, jitter=True)
plt.title("11. Leave vs Age")
plt.show()

##### 1. Why did you pick the specific chart?

A strip plot is useful for displaying individual data points across categorical variables like leave.

Adding jitter prevents overlapping of points and gives a clear view of density and spread.

This plot helps explore how age relates to attitudes toward taking mental health leave, a key concern in workplace policy.

##### 2. What is/are the insight(s) found from the chart?

Respondents who say it's “Somewhat easy” or “Very easy” to take leave are spread across various ages, mostly from 25 to 40.

The group that finds it “Very difficult” or “Don’t know” tends to be more age-clustered, often skewing slightly older.

Younger employees (under 30) appear in all categories, showing diverse experiences with mental health leave.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:


Identify age groups hesitant to take mental health leave.

Improve leave policies by making them more transparent and accessible.

Create age-inclusive mental health communication strategies—especially for older employees who may be reluctant or feel stigma.

Negative Insight:


If many employees, especially older ones, find it “Very difficult” or “Don’t know” how to take leave, this reveals:

Poor communication of mental health policies.

Fear of judgment or career impact, which can prevent people from getting the help they need.

#### Chart - 8

In [None]:
# Chart - 8 visualization code# Chart 12: Countplot
sns.countplot(x='remote_work', hue='treatment', data=data)
plt.title("12. Remote Work vs Treatment")
plt.show()

##### 1. Why did you pick the specific chart?

A count plot is ideal for comparing the frequency of categorical variables.

Among those who work remotely, there are likely more respondents who received treatment than those who didn’t.

Among those who do not work remotely, there is also treatment uptake, but the difference between treated and untreated may vary.

##### 2. What is/are the insight(s) found from the chart?

Among those who work remotely, there are likely more respondents who received treatment than those who didn’t.

Among those who do not work remotely, there is also treatment uptake, but the difference between treated and untreated may vary.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:


Understand how work settings (remote vs in-office) affect mental health treatment behaviors.

Build support structures for both remote and non-remote employees (e.g., teletherapy access, virtual wellness programs).

Negative Insight:


If non-remote workers are less likely to seek treatment, it may point to:

Rigid schedules, lack of privacy, or fear of judgment in physical offices.

A culture where mental health is not openly discussed or supported on-site.

#### Chart - 9

In [None]:
# Chart - 9 visualization code# Chart 13: Horizontal bar chart
care = pd.crosstab(data['care_options'], data['treatment'])
care.plot(kind='barh', stacked=True, colormap='viridis', title='13. Care Options vs Treatment')
plt.show()

##### 1. Why did you pick the specific chart?

A horizontal stacked bar chart is ideal for visualizing categorical relationships with multiple sub-categories.

It clearly shows the distribution of treatment status across different levels of care options availability:

##### 2. What is/are the insight(s) found from the chart?

Respondents who have care options available at work (“Yes”) are more likely to have sought treatment.

When no care options are available, a large portion of respondents did not receive treatment.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive  Impact:


Access to care options increases the likelihood of employees seeking mental health treatment.

Clear communication about available care options is crucial—many employees who are "Not sure" might miss out on help.

Negative Insight:


A high number of “Not sure” responses suggests:

Poor internal visibility of care programs.

Missed opportunities to support employees who might be struggling silently.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
# Chart 14: Stacked bar
anon = pd.crosstab(data['anonymity'], data['treatment'])
anon.plot(kind='bar', stacked=True, colormap='plasma', title='14. Anonymity vs Treatment')
plt.show()

##### 1. Why did you pick the specific chart?

A stacked bar chart is ideal for comparing multiple subcategories within a categorical variable

It shows both individual category counts and overall trends.

This is important in the Mental Health in Tech dataset, where perceived confidentiality can strongly influence whether someone chooses to seek help.

##### 2. What is/are the insight(s) found from the chart?

People who believe mental health support is anonymous (“Yes”) are more likely to seek treatment.

Those who say anonymity is not provided (“No”) tend to be less likely to pursue treatment.

A significant number of respondents selected “Don’t know”, and many in this group did not receive treatment, likely due to uncertainty or mistrust.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:


Organizations that guarantee anonymity can increase employee trust and boost treatment rates.

Mental health resources should come with strong privacy assurances.

Negative Insight:


A high number of “Don’t know” responses or people avoiding treatment where anonymity is unclear suggests:

Poor communication about how mental health issues are handled.

Fear of exposure or judgment, which may suppress help-seeking.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
# Chart 15: Pie chart (Supervisor)
sup_counts = data['supervisor'].value_counts()
px.pie(names=sup_counts.index, values=sup_counts.values, title='15. Supervisor Support').show()

##### 1. Why did you pick the specific chart?

A pie chart is perfect for visualizing the proportional distribution of categorical responses
It provides a quick snapshot of how supportive employees perceive their supervisors to be regarding mental health issues.

In the Mental Health in Tech dataset, this is critical because managerial support plays a key role in employee well-being, openness, and help-seeking behavior.

##### 2. What is/are the insight(s) found from the chart?

A significant proportion of employees report that they can talk to some supervisors about mental health, but not all.

A smaller portion may say Yes—they feel comfortable discussing it with their supervisor.

A notable portion likely reports No—they do not feel comfortable at all talking to their supervisor about mental health issues.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive  Impact:


The chart helps organizations understand the current state of supervisory trust in mental health discussions.

With this insight, companies can:

Provide mental health sensitivity training to managers.

Build a culture of openness and support from the top down

Negative Insight:


If a large number of employees respond “No” or “Some of them”, it implies:

Inconsistent or poor supervisor training on handling mental health topics.

Employees may avoid seeking help, worsening their condition over time.

#### Chart - 12

In [None]:
# Chart - 12 visualization code# Chart 2: Pie chart
px.pie(data, names='Gender', title='2. Gender Distribution').show()

##### 1. Why did you pick the specific chart?

A pie chart is ideal for visualizing the relative proportions of a categorical variable

In the Mental Health in Tech context, gender plays a crucial role in how mental health is perceived, experienced, and addressed in the workplace.

##### 2. What is/are the insight(s) found from the chart?

The majority of respondents are Male, likely over 70% depending on the dataset version.

Female respondents form a smaller proportion, while non-binary/other gender identities represent a minority.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive  Impact:


Acknowledge gender disparities and design inclusive wellness programs.

Create gender-sensitive support systems, addressing the unique challenges faced by women and non-binary individuals.

Negative Insight:


If the data shows low female or non-binary participation, it may reflect:

Stigma, fear of judgment, or lack of psychological safety in speaking up.

Unbalanced outreach efforts, where the mental health resources are not perceived as inclusive or safe.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
for label in data['family_history'].dropna().unique():
    sns.kdeplot(data[data['family_history'] == label]['Age'], label=label)
plt.title("10. KDE: Age by Family History")
plt.legend()
plt.show()

##### 1. Why did you pick the specific chart?

A KDE (Kernel Density Estimate) plot is perfect for showing the distribution of continuous variables like Age while preserving group comparisons.

In this case, it reveals how age distributions differ for those with or without a family history of mental illness.


##### 2. What is/are the insight(s) found from the chart?

Both groups (with and without family history) have age distributions that peak in the 25–35 age range.

The “Yes” (family history) group may have a broader spread, suggesting people of various ages report a family history.

The “No” group might show a sharper peak, indicating more age concentration, perhaps among younger professionals.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:

Recognize that family history correlates with more diverse age engagement in mental health.

Target awareness programs not just by age but by risk awareness (like genetic or family background).

Create inclusive initiatives that educate all employees—not just those with known risks.

Negative Insight:


If younger employees without a family history are less aware or less likely to engage, it could indicate:

False sense of immunity, lack of education, or minimal exposure to mental health risks.

A culture where early signs of mental distress are overlooked, leading to late-stage intervention.

#### Chart - 14 - Correlation Heatmap

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder

# Load your dataset (replace with actual file path if using a file)
data = pd.read_csv("/content/survey.csv")  # Replace with your CSV file if needed

# Encode categorical columns
df_encoded = data.copy()
for col in df_encoded.columns:
    if df_encoded[col].dtype == 'object':
        df_encoded[col] = LabelEncoder().fit_transform(df_encoded[col].astype(str))

# Compute the correlation matrix
correlation_matrix = df_encoded.corr()

# Plot the heatmap
plt.figure(figsize=(16, 12))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", square=True)
plt.title("Correlation Heatmap")
plt.show()


##### 1. Why did you pick the specific chart?

A correlation heatmap is ideal for visualizing relationships between numeric or encoded features in a dataset.


In the Mental Health in Tech dataset, many variables are categorical. By encoding them and visualizing their correlations, we can understand how different features interact, such as:

Family history with treatment.

Work interference with leave difficulty.

Anonymity with treatment-seeking.

##### 2. What is/are the insight(s) found from the chart?

A moderate positive correlation between:

family_history and treatment: Suggesting that people with family history are more likely to seek treatment.

work_interfere and treatment: Work interference might drive individuals to get professional help.

leave and mental_health_consequence: Difficulty in taking leave may relate to worsened mental health consequences.

A negative or weak correlation between:

anonymity and treatment: Perception of anonymity might slightly influence treatment but not strongly.

phys_health_consequence and treatment: May indicate a disconnect between physical and mental health support or perceptions.

#### Chart - 15 - Pair Plot

In [None]:
# Copy your original DataFrame
df_pair = data[['Age', 'treatment', 'family_history']].copy()

# Encode treatment and family_history (keep Age as is)
le = LabelEncoder()
df_pair['treatment'] = le.fit_transform(df_pair['treatment'].astype(str))
df_pair['family_history'] = le.fit_transform(df_pair['family_history'].astype(str))

# Create pairplot
sns.pairplot(df_pair, hue='treatment', palette='husl')
plt.suptitle("8. Pairplot - Age, Treatment, Family History", y=1.02)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A pairplot is ideal when you want to explore relationships between multiple numerical or encoded variables

It’s perfect for a quick multivariate analysis of how age, family history, and treatment interact.

Color-coding by treatment helps us identify which factors contribute to treatment-seeking behavior.

##### 2. What is/are the insight(s) found from the chart?

Individuals with a family history of mental illness (coded 1) are more likely to have received treatment (visible from clustering of certain colors).

Age does not show a strong correlation with treatment—treatment is scattered across ages.

There may be a visible diagonal trend between treatment and family history, implying some correlation (already seen in the correlation heatmap).

The scatterplots allow for a visual comparison of density and trends across the three features.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the insights from the *Mental Health in Tech Survey*, the client should aim to develop a **data-driven mental health support strategy** that fosters a safe, inclusive, and supportive workplace culture. The analysis reveals gaps in awareness, supervisor support, and accessibility of mental health resources. By addressing these issues, the client can improve employee well-being, boost productivity, and reduce turnover, leading to long-term positive business outcomes.

### Suggested Actions:

*  Improve awareness of mental health policies, leave options, and care resources.
*  Provide supervisor training to encourage open and supportive communication.
*  Focus on high-risk groups (e.g., those with family history, work interference).
*  Promote inclusivity and psychological safety for all gender identities.
*  Regularly track mental health trends and adapt programs based on feedback.


# **Conclusion**

In conclusion, the analysis of the Mental Health in Tech Survey highlights the urgent need for organizations to prioritize mental health by implementing targeted, inclusive, and well-communicated support strategies. Key factors such as family history, work interference, lack of anonymity, and inadequate supervisor support significantly influence treatment-seeking behavior. By addressing these areas through training, awareness, and accessible resources, companies can foster a mentally healthier workplace, leading to improved employee engagement, reduced burnout, and a stronger organizational culture.


### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***