[![image.png](attachment:76bf6d78-de27-4006-88ed-f14d0bd2714b.png)](http://)

# Introduction to data

The following is an analysis of the survey data collected to measures attitudes towards mental health and frequency of mental health disorders in the tech workplace.

* Timestamp
* Age
* Gender
* Country
* state: If you live in the United States, which state or territory do you live in?
* self_employed: Are you self-employed?
* family_history: Do you have a family history of mental illness?
* treatment: Have you sought treatment for a mental health condition?
* work_interfere: If you have a mental health condition, do you feel that it interferes with your work?
* no_employees: How many employees does your company or organization have?
* remote_work: Do you work remotely (outside of an office) at least 50% of the time?
* tech_company: Is your employer primarily a tech company/organization?
* benefits: Does your employer provide mental health benefits?
* care_options: Do you know the options for mental health care your employer provides?
* wellness_program: Has your employer ever discussed mental health as part of an employee wellness program?
* seek_help: Does your employer provide resources to learn more about mental health issues and how to seek help?
* anonymity: Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment  
  resources?
* leave: How easy is it for you to take medical leave for a mental health condition?
* mentalhealthconsequence: Do you think that discussing a mental health issue with your employer would have negative
  consequences?
* physhealthconsequence: Do you think that discussing a physical health issue with your employer would have negative
  consequences?
* coworkers: Would you be willing to discuss a mental health issue with your coworkers?
* physhealthinterview: Would you bring up a physical health issue with a potential employer in an interview?
* mentalvsphysical: Do you feel that your employer takes mental health as seriously as physical health?
* obs_consequence: Have you heard of or observed negative consequences for coworkers with mental health conditions in your 
  workplace?
* comments: Any additional notes or comments

# Initial approach

When selecting this data set, I gave my approach a great deal of thought.  After reading a recent article regarding mental health of technical professionals, I decided this needed further exploration.  Here are the questions that I felt would be appropriate based upon the data collected.  
1. Are there differences in mental health concerns based upon gender?
2. Are there differences in mental health concernns based upon nature of employement position (i.e.remote, tech company, etc.)?
4. Are there difference in mental health concerns in other countries based on age and gender?

# Importing packages and dataset

Before starting any project it is important to import needed packages for python.  The next few lines of code include the import of the needed packages for us to clean this data.

In [None]:
import numpy as np

In [None]:
import seaborn as sns

In [None]:
import pandas as pd

In [None]:
import matplotlib.pyplot as plt

In [None]:
from tabulate import tabulate

In [None]:
df=pd.read_csv('../input/mental-health-in-tech-survey/survey.csv')

# Data Cleaning

1. We begin the data cleaning process by looking at what columns contained null values.

In [None]:
df.isnull().sum()

2. Now that I see that the columns state, self employed, work interfere and comments contain null values, I can input codes to clean those columns up.

3. Since we do not need the comments column to be considered in the analysis, we can drop that column by using this code.

In [None]:
df=df.drop("comments",axis=1)

4. Next I used some code that will show me if the comments column has been removed.

In [None]:
df.info()

5. Next since we know that the work_interfere and self_employee columns have null values, we need to eliminate those.

In [None]:
df["work_interfere"].unique()

In [None]:
df["work_interfere"]=df["work_interfere"].fillna("Sometimes")

In [None]:
df["self_employed"].unique()

In [None]:
df["self_employed"]=df["self_employed"].fillna("Sometimes")

In [None]:
df["state"].unique()

In [None]:
df=df.dropna(subset=['state'])

6. I can now drop the Timestamp column as it also not needed in my analysis.

In [None]:
df = df.drop(['Timestamp'], axis = 1)

7. Next we want to clean up the gender column, as many of the respondents to the survey did not have consistent answers.

In [None]:
df['Gender'].value_counts()

8.  Now that I know exactly what responses were specifically received, I put in the following code to make all the responses consistent.

In [None]:
df['Gender'].replace(['Male ', 'male', 'M', 'm', 'Male', 'Cis Male',
                     'Man', 'cis male', 'Mail', 'Male-ish', 'Male (CIS)',
                      'Cis Man', 'msle', 'Malr', 'Mal', 'maile', 'Make',], 'Male', inplace = True)

df['Gender'].replace(['Female ', 'female', 'F', 'f', 'Woman', 'Female',
                     'femail', 'Cis Female', 'cis-female/femme', 'Femake', 'Female (cis)',
                     'woman',], 'Female', inplace = True)

df["Gender"].replace(['Female (trans)', 'queer/she/they', 'non-binary',
                     'fluid', 'queer', 'Androgyne', 'Trans-female', 'male leaning androgynous',
                      'Agender', 'A little about you', 'Nah', 'All',
                      'ostensibly male, unsure what that really means',
                      'Genderqueer', 'Enby', 'p', 'Neuter', 'something kinda male?',
                      'Guy (-ish) ^_^', 'Trans woman',], 'Other', inplace = True)

9. After I ran this code, I can now view what these column look like now but putting in the following code.

In [None]:
df['Gender'].value_counts()

In [None]:
df.info()

10.  The next step I took was to ensure that all of the values in the age column are appropriate and if not then input code to ensure they fall within the normal range of ages for those that were being surveyed.

In [None]:
df['Age'].value_counts()

In [None]:
df["Age"].unique()

In [None]:
df = df[df['Age'] > 18]
df= df[df['Age'] < 80]

11. Now that the formatting has been input, I want to check and make sure everything is ready for visualization.

In [None]:
df[df['Age'] > 80].head()

In [None]:
df['Age'].value_counts()

12. The remote work, work interfere and the mental health consequences have also been cleaned and deserve checking to confirm they are clean.

In [None]:
df['remote_work'].value_counts()

In [None]:
df['work_interfere'].value_counts()

In [None]:
df['mental_health_consequence'].value_counts()

In [None]:
df.isna().sum()

13.  Now that all of the columns have been confirmed to be clean, I downloaded a clean copy of the csv file and uploaded it to Tableau.

In [None]:
df.to_csv('cleaned.csv')

14.  The following are the images produced as a result of further filtering the data to answer our initial inquiries.

A. The first analysis was completed by using Tableau to find out the average age and which percentage of each gender participated in this survey.  ![image.png](attachment:7492c293-6cf0-410f-8c7a-416da52f8ff7.png)  As we can see there was a higher percentage of males that participated in the survey.  The average age of the respondents was 32 for females, 34 for males and 30 for other.

B.  From this image we can see that the number of respondents survey that were remote workers that had been treated for a mental health condition.   272 of the respondends that were remote workers had not sought treatment for mental health concerns and 130 did.  ![image.png](attachment:29751322-0bd9-4ebb-aca1-0b366efdf30b.png).

C.  This image shows that of those surveyed that were working for tech companies, that had sought treatment for mental health concerns was much higher.  
![image.png](attachment:58b11aa6-c381-43c9-a1a1-db0cc5ad678a.png)