# Feature Pre-Processing

## Setup

In [465]:
import warnings
warnings.filterwarnings('ignore')

In [466]:
import pandas as pd
import numpy as np
import sklearn
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import CountVectorizer

## Dataframe Read

In [467]:
# We read the file mental.csv into a dataframe
df = pd.read_csv('mental.csv')

In [468]:
df.head(10)

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health concerns and options for seeking help?,Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources provided by your employer?,"If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:",...,"If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?","If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?",What is your age?,What is your gender?,What country do you live in?,What US state or territory do you live in?,What country do you work in?,What US state or territory do you work in?,Which of the following best describes your work position?,Do you work remotely?
0,0,26-100,1.0,,Not eligible for coverage / N/A,,No,No,I don't know,Very easy,...,Not applicable to me,Not applicable to me,39,Male,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes
1,0,6-25,1.0,,No,Yes,Yes,Yes,Yes,Somewhat easy,...,Rarely,Sometimes,29,male,United States of America,Illinois,United States of America,Illinois,Back-end Developer|Front-end Developer,Never
2,0,6-25,1.0,,No,,No,No,I don't know,Neither easy nor difficult,...,Not applicable to me,Not applicable to me,38,Male,United Kingdom,,United Kingdom,,Back-end Developer,Always
3,1,,,,,,,,,,...,Sometimes,Sometimes,43,male,United Kingdom,,United Kingdom,,Supervisor/Team Lead,Sometimes
4,0,6-25,0.0,1.0,Yes,Yes,No,No,No,Neither easy nor difficult,...,Sometimes,Sometimes,43,Female,United States of America,Illinois,United States of America,Illinois,Executive Leadership|Supervisor/Team Lead|Dev ...,Sometimes
5,0,More than 1000,1.0,,Yes,I am not sure,No,Yes,Yes,Somewhat easy,...,Not applicable to me,Often,42,Male,United Kingdom,,United Kingdom,,DevOps/SysAdmin|Support|Back-end Developer|Fro...,Sometimes
6,0,26-100,1.0,,I don't know,No,No,No,I don't know,Somewhat easy,...,Not applicable to me,Not applicable to me,30,M,United States of America,Tennessee,United States of America,Tennessee,Back-end Developer,Sometimes
7,0,More than 1000,1.0,,Yes,Yes,No,Yes,Yes,Very easy,...,Sometimes,Often,37,female,United States of America,Virginia,United States of America,Virginia,Dev Evangelist/Advocate|Back-end Developer,Always
8,0,26-100,0.0,1.0,I don't know,No,No,No,I don't know,Very difficult,...,Rarely,Often,44,Female,United States of America,California,United States of America,California,Support|Back-end Developer|One-person shop,Sometimes
9,1,,,,,,,,,,...,Rarely,Often,30,Male,United States of America,Kentucky,United States of America,Kentucky,One-person shop|Front-end Developer|Back-end D...,Always


In [469]:
with pd.option_context("display.max_columns", None):
    display(df.head(41))

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health concerns and options for seeking help?,Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources provided by your employer?,"If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:",Do you think that discussing a mental health disorder with your employer would have negative consequences?,Do you think that discussing a physical health issue with your employer would have negative consequences?,Would you feel comfortable discussing a mental health disorder with your coworkers?,Would you feel comfortable discussing a mental health disorder with your direct supervisor(s)?,Do you feel that your employer takes mental health as seriously as physical health?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you have medical coverage (private insurance or state-provided) which includes treatment of mental health issues?,Do you know local or online resources to seek help for a mental health disorder?,"If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?","If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?","If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?","If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?",Do you believe your productivity is ever affected by a mental health issue?,"If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?",Do you have previous employers?,Have your previous employers provided mental health benefits?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Was your anonymity protected if you chose to take advantage of mental health or substance abuse treatment resources with previous employers?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s)?,Did you feel that your previous employers took mental health as seriously as physical health?,Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you be willing to bring up a physical health issue with a potential employer in an interview?,Why or why not?,Would you bring up a mental health issue with a potential employer in an interview?,Why or why not?.1,Do you feel that being identified as a person with a mental health issue would hurt your career?,Do you think that team members/co-workers would view you more negatively if they knew you suffered from a mental health issue?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,"If yes, what condition(s) have you been diagnosed with?","If maybe, what condition(s) do you believe you have?",Have you been diagnosed with a mental health condition by a medical professional?,"If so, what condition(s) were you diagnosed with?",Have you ever sought treatment for a mental health issue from a mental health professional?,"If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?","If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?",What is your age?,What is your gender?,What country do you live in?,What US state or territory do you live in?,What country do you work in?,What US state or territory do you work in?,Which of the following best describes your work position?,Do you work remotely?
0,0,26-100,1.0,,Not eligible for coverage / N/A,,No,No,I don't know,Very easy,No,No,Maybe,Yes,I don't know,No,,,,,,,,,1,"No, none did",N/A (not currently aware),I don't know,None did,I don't know,Some of them,None of them,Some of my previous employers,Some of my previous employers,I don't know,None of them,Maybe,,Maybe,,Maybe,"No, I don't think they would",Somewhat open,No,,No,Yes,No,,,Yes,"Anxiety Disorder (Generalized, Social, Phobia,...",0,Not applicable to me,Not applicable to me,39,Male,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes
1,0,6-25,1.0,,No,Yes,Yes,Yes,Yes,Somewhat easy,No,No,Maybe,Yes,Yes,No,,,,,,,,,1,"Yes, they all did",I was aware of some,None did,Some did,"Yes, always",None of them,None of them,"No, at none of my previous employers",Some of my previous employers,Some did,None of them,Maybe,It would depend on the health issue. If there ...,No,While mental health has become a more prominen...,"No, I don't think it would","No, I don't think they would",Somewhat open,No,,Yes,Yes,Yes,"Anxiety Disorder (Generalized, Social, Phobia,...",,Yes,"Anxiety Disorder (Generalized, Social, Phobia,...",1,Rarely,Sometimes,29,male,United States of America,Illinois,United States of America,Illinois,Back-end Developer|Front-end Developer,Never
2,0,6-25,1.0,,No,,No,No,I don't know,Neither easy nor difficult,Maybe,No,Maybe,Maybe,I don't know,No,,,,,,,,,1,"No, none did",N/A (not currently aware),None did,Some did,I don't know,I don't know,Some of them,Some of my previous employers,I don't know,I don't know,Some of them,Yes,"They would provable need to know, to Judge if ...",Yes,"Stigma, mainly.",Maybe,Maybe,Somewhat open,Maybe/Not sure,Yes,No,Maybe,No,,,No,,1,Not applicable to me,Not applicable to me,38,Male,United Kingdom,,United Kingdom,,Back-end Developer,Always
3,1,,,,,,,,,,,,,,,,1.0,"Yes, I know several","Sometimes, if it comes up",I'm not sure,"Sometimes, if it comes up",I'm not sure,Yes,1-25%,1,Some did,N/A (not currently aware),None did,None did,I don't know,Some of them,Some of them,Some of my previous employers,Some of my previous employers,I don't know,Some of them,Yes,"old back injury, doesn't cause me many issues ...",Maybe,would not if I was not 100% sure that the disc...,"Yes, I think it would",Maybe,Neutral,No,,No,Yes,Yes,"Anxiety Disorder (Generalized, Social, Phobia,...",,Yes,"Anxiety Disorder (Generalized, Social, Phobia,...",1,Sometimes,Sometimes,43,male,United Kingdom,,United Kingdom,,Supervisor/Team Lead,Sometimes
4,0,6-25,0.0,1.0,Yes,Yes,No,No,No,Neither easy nor difficult,Yes,Maybe,Maybe,No,No,No,,,,,,,,,1,I don't know,N/A (not currently aware),Some did,None did,I don't know,Some of them,Some of them,"No, at none of my previous employers",Some of my previous employers,Some did,Some of them,Maybe,Depending on the interview stage and whether I...,No,I don't know,"Yes, I think it would",Maybe,Somewhat open,"Yes, I experienced",Yes,Yes,Yes,Yes,"Anxiety Disorder (Generalized, Social, Phobia,...",,Yes,"Anxiety Disorder (Generalized, Social, Phobia,...",1,Sometimes,Sometimes,43,Female,United States of America,Illinois,United States of America,Illinois,Executive Leadership|Supervisor/Team Lead|Dev ...,Sometimes
5,0,More than 1000,1.0,,Yes,I am not sure,No,Yes,Yes,Somewhat easy,Yes,Yes,Maybe,Yes,No,Yes,,,,,,,,,1,"No, none did","Yes, I was aware of all of them",None did,None did,I don't know,"Yes, all of them",Some of them,"No, at none of my previous employers","No, at none of my previous employers",None did,Some of them,Yes,If it would potentially affect my ability to d...,Maybe,It would depend on the field & what I knew of ...,"Yes, I think it would",Maybe,Somewhat open,"Yes, I experienced",No,No,No,Yes,"Anxiety Disorder (Generalized, Social, Phobia,...",,No,,1,Not applicable to me,Often,42,Male,United Kingdom,,United Kingdom,,DevOps/SysAdmin|Support|Back-end Developer|Fro...,Sometimes
6,0,26-100,1.0,,I don't know,No,No,No,I don't know,Somewhat easy,No,No,Maybe,Yes,Yes,No,,,,,,,,,1,Some did,I was aware of some,None did,Some did,I don't know,None of them,None of them,Some of my previous employers,"Yes, at all of my previous employers",Some did,None of them,Yes,I want to gauge their ability to support this ...,Yes,"I want to gauge their ability to support, unde...","Yes, I think it would","No, I don't think they would",Not applicable to me (I do not have a mental i...,No,,No,No,No,,,No,,0,Not applicable to me,Not applicable to me,30,M,United States of America,Tennessee,United States of America,Tennessee,Back-end Developer,Sometimes
7,0,More than 1000,1.0,,Yes,Yes,No,Yes,Yes,Very easy,No,No,Maybe,Yes,I don't know,No,,,,,,,,,1,Some did,I was aware of some,Some did,Some did,Sometimes,Some of them,Some of them,Some of my previous employers,Some of my previous employers,Some did,Some of them,No,I feel it's irrelevant.,No,Same reason.,Maybe,Maybe,Somewhat open,"Yes, I observed",Maybe,Yes,Yes,Yes,"Anxiety Disorder (Generalized, Social, Phobia,...",,Yes,"Anxiety Disorder (Generalized, Social, Phobia,...",1,Sometimes,Often,37,female,United States of America,Virginia,United States of America,Virginia,Dev Evangelist/Advocate|Back-end Developer,Always
8,0,26-100,0.0,1.0,I don't know,No,No,No,I don't know,Very difficult,Yes,Yes,Yes,Maybe,No,No,,,,,,,,,1,I don't know,N/A (not currently aware),Some did,None did,I don't know,"Yes, all of them","Yes, all of them","No, at none of my previous employers","No, at none of my previous employers",None did,None of them,Maybe,Makes me a less attractive candidate.,Maybe,Only if I felt I required accommodation. Even ...,Maybe,"Yes, they do",Somewhat open,"Yes, I observed",No,Yes,Yes,Yes,"Mood Disorder (Depression, Bipolar Disorder, etc)",,Yes,"Mood Disorder (Depression, Bipolar Disorder, etc)",1,Rarely,Often,44,Female,United States of America,California,United States of America,California,Support|Back-end Developer|One-person shop,Sometimes
9,1,,,,,,,,,,,,,,,,1.0,I know some,"No, because it doesn't matter",,"Sometimes, if it comes up",No,Yes,1-25%,1,Some did,I was aware of some,None did,None did,I don't know,Some of them,Some of them,Some of my previous employers,Some of my previous employers,I don't know,None of them,Yes,"Generally speaking, and this isn't always the ...",Maybe,"It really depends on the person, the employer,...",Maybe,"No, I don't think they would",Very open,No,,Yes,Yes,Yes,"Anxiety Disorder (Generalized, Social, Phobia,...",,Yes,"Anxiety Disorder (Generalized, Social, Phobia,...",1,Rarely,Often,30,Male,United States of America,Kentucky,United States of America,Kentucky,One-person shop|Front-end Developer|Back-end D...,Always


## Dataframe Check

In [470]:
df.shape

(1433, 63)

In [471]:
# We get all column names
df.columns

Index(['Are you self-employed?',
       'How many employees does your company or organization have?',
       'Is your employer primarily a tech company/organization?',
       'Is your primary role within your company related to tech/IT?',
       'Does your employer provide mental health benefits as part of healthcare coverage?',
       'Do you know the options for mental health care available under your employer-provided coverage?',
       'Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?',
       'Does your employer offer resources to learn more about mental health concerns and options for seeking help?',
       'Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources provided by your employer?',
       'If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:',
       'Do you think that dis

# General Cleanup

## Column Cleanup - Too many Unique Values

In [472]:
# We store all columns with many unique values inside an array
unclear_data_columns = []
for column in df.columns:
    if len(df[column].value_counts(dropna=False)) > 10: # If the column has more than 10 unique entries, the column might be unqualified for being used in our analysis
        unclear_data_columns.append(column)

        
for column in unclear_data_columns:
    print(column)

Why or why not?
Why or why not?.1
If yes, what condition(s) have you been diagnosed with?
If maybe, what condition(s) do you believe you have?
If so, what condition(s) were you diagnosed with?
What is your age?
What is your gender?
What country do you live in?
What US state or territory do you live in?
What country do you work in?
What US state or territory do you work in?
Which of the following best describes your work position?


In [473]:
# The columns with "Why or Why not?" or other
df = df.drop(["Why or why not?", "Why or why not?.1", "What country do you live in?", "What country do you work in?",
             "If yes, what condition(s) have you been diagnosed with?","If so, what condition(s) were you diagnosed with?" ], axis = 1)

In [474]:
df.shape

(1433, 57)

## Column Cleanup - Too many Missing Values

In [475]:
# We replace all values where there is no definitve answer
df = df.replace("I don't know", np.nan)
df = df.replace("Maybe", np.nan)
df = df.replace("Not applicable to me", np.nan) 

In [476]:
df.isnull().sum()

Are you self-employed?                                                                                                                                                                 0
How many employees does your company or organization have?                                                                                                                           287
Is your employer primarily a tech company/organization?                                                                                                                              287
Is your primary role within your company related to tech/IT?                                                                                                                        1170
Does your employer provide mental health benefits as part of healthcare coverage?                                                                                                    606
Do you know the options for mental health care available under your employe

In [477]:
# We split the Dataframe in 2 different groups those who are self-emplyoed and those who are not
df_self_employed = df[df['Are you self-employed?'] == 1]
df_employed = df[df['Are you self-employed?'] == 0]
# This ensures that checking empty columns does not include those, which are empty for a reason

In [478]:
# We get all columns that exceed a threshold of 30% missing values
threshold=0.3
print(df_employed[df_employed.columns[df_employed.isnull().mean() > threshold]].columns)

Index(['Is your primary role within your company related to tech/IT?',
       'Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources provided by your employer?',
       'Do you think that discussing a mental health disorder with your employer would have negative consequences?',
       'Would you feel comfortable discussing a mental health disorder with your coworkers?',
       'Would you feel comfortable discussing a mental health disorder with your direct supervisor(s)?',
       'Do you feel that your employer takes mental health as seriously as physical health?',
       'Do you have medical coverage (private insurance or state-provided) which includes treatment of  mental health issues?',
       'Do you know local or online resources to seek help for a mental health disorder?',
       'If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?',
       'If y

In [479]:
df['Do you believe your productivity is ever affected by a mental health issue?'].isnull().sum()

1177

In [480]:
df.shape

(1433, 57)

In [481]:
# We rename the column due to some error
df.rename(columns = {df.columns[16]:'Do you have medical coverage (private insurance or state-provided) which includes treatment of  mental health issues?'}, inplace = True)

In [482]:
# We remove columns that have missing values, which are not linked to other columns and where values cannot be otherwise imputed
df = df.drop(["Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources provided by your employer?"], axis = 1)
df = df.drop(["Do you have medical coverage (private insurance or state-provided) which includes treatment of  mental health issues?"], axis = 1)
df = df.drop(["Would you feel comfortable discussing a mental health disorder with your coworkers?"], axis = 1)
df = df.drop(["Would you feel comfortable discussing a mental health disorder with your direct supervisor(s)?"], axis = 1)
df = df.drop(["Have your previous employers provided mental health benefits?"], axis = 1)
df = df.drop(["Was your anonymity protected if you chose to take advantage of mental health or substance abuse treatment resources with previous employers?"], axis = 1)
df = df.drop(["Did you feel that your previous employers took mental health as seriously as physical health?"], axis = 1)
df = df.drop(["Would you be willing to bring up a physical health issue with a potential employer in an interview?"], axis = 1)
df = df.drop(["Do you feel that being identified as a person with a mental health issue would hurt your career?"], axis = 1)
df = df.drop(["Do you think that team members/co-workers would view you more negatively if they knew you suffered from a mental health issue?"], axis = 1)

In [483]:
df.shape

(1433, 47)

## Dataframe Cleanup - Remove Rows unimportant to the Research Question

## Column - "Which of the following best describes your work position?"

### Check for row and column information

In [484]:
df['Which of the following best describes your work position?'].unique()

array(['Back-end Developer', 'Back-end Developer|Front-end Developer',
       'Supervisor/Team Lead',
       'Executive Leadership|Supervisor/Team Lead|Dev Evangelist/Advocate|DevOps/SysAdmin|Support|Back-end Developer|Front-end Developer',
       'DevOps/SysAdmin|Support|Back-end Developer|Front-end Developer|Designer',
       'Dev Evangelist/Advocate|Back-end Developer',
       'Support|Back-end Developer|One-person shop',
       'One-person shop|Front-end Developer|Back-end Developer',
       'Front-end Developer', 'Executive Leadership',
       'Supervisor/Team Lead|Dev Evangelist/Advocate|Back-end Developer|Front-end Developer',
       'DevOps/SysAdmin|Back-end Developer|Front-end Developer',
       'Designer', 'Other|Executive Leadership', 'One-person shop',
       'Other', 'Supervisor/Team Lead|Support|Back-end Developer',
       'Supervisor/Team Lead|DevOps/SysAdmin|Back-end Developer',
       'Other|Supervisor/Team Lead|Support|Back-end Developer|Designer',
       'Supervisor/

### Change row values - onehot encoding

In [485]:
# We split the categories of the 62nd column into a 'Back-end Developer' column where all split strings contain 'Back-end Developer'
df['Back-end Developer'] = df['Which of the following best describes your work position?'].str.contains('Back-end Developer')

# We then convert the 'True' categorie to 1 and 'False' to 0
df['Back-end Developer'] = df['Back-end Developer'].map({True: 1, False: 0})

In [486]:
# We split the categories of the 62nd column into a 'Front-end Developer' column where all split strings contain 'Front-end Developer'
df['Front-end Developer'] = df['Which of the following best describes your work position?'].str.contains('Front-end Developer')

# We then convert the 'True' categorie to 1 and 'False' to 0
df['Front-end Developer'] = df['Front-end Developer'].map({True: 1, False: 0})

In [487]:
# We split the categories of the 62nd column into a 'DevOps/SysAdmin' column where all split strings contain 'DevOps/SysAdmin'
df['DevOps/SysAdmin'] = df['Which of the following best describes your work position?'].str.contains('DevOps/SysAdmin')

# We then convert the 'True' categorie to 1 and 'False' to 0
df['DevOps/SysAdmin'] = df['DevOps/SysAdmin'].map({True: 1, False: 0})

In [488]:
# We split the categories of the 62nd column into a 'Dev Evangelist/Advocate' column where all split strings contain 'Dev Evangelist/Advocate'
df['Dev Evangelist/Advocate'] = df['Which of the following best describes your work position?'].str.contains('Dev Evangelist/Advocate')

# We then convert the 'True' categorie to 1 and 'False' to 0
df['Dev Evangelist/Advocate'] = df['Dev Evangelist/Advocate'].map({True: 1, False: 0})

In [489]:
# We creat a new column 'non-technical work postions' where the value of the 62nd column is not 'Back-end Developer', 'Front-end Developer', 'DevOps/SysAdmin' or 'Dev Evangelist/Advocate'
df['non-technical work postions'] = ~df['Which of the following best describes your work position?'].str.contains('Back-end Developer|Front-end Developer|DevOps/SysAdmin|Dev Evangelist/Advocate')

# We then convert the 'True' categorie to 1 and 'False' to 0
df['non-technical work postions'] = df['non-technical work postions'].map({True: 1, False: 0})

In [490]:
# We fill the empty rows in the fourth column with 0 if there is a 1 in the 68th column
df.loc[(df['Is your primary role within your company related to tech/IT?'].isnull()) & (df['non-technical work postions'] == 1), 'Is your primary role within your company related to tech/IT?'] = 0

In [491]:
# We fill the empty rows in the fourth column with 1 if there is a 0 in the 68th column
df.loc[(df['Is your primary role within your company related to tech/IT?'].isnull()) & (df['non-technical work postions'] == 0), 'Is your primary role within your company related to tech/IT?'] = 1

In [492]:
# We turn the float64 data type from the fourth column into an int data type
df['Is your primary role within your company related to tech/IT?'] = df['Is your primary role within your company related to tech/IT?'].astype(int)

### Removal of non Technology-related Jobs and Unimportant Columns

In [493]:
# We remove all rows that do not have technology related jobs
df.drop(df[df['Is your primary role within your company related to tech/IT?'] == 0].index, inplace = True)

In [494]:
# We remove the column 'non-technical work postions', and 'Which of the following best describes your work position?' as they are not relevant anymore
df = df.drop('non-technical work postions', axis=1)
df = df.drop('Which of the following best describes your work position?', axis=1)

### Check results

In [495]:
df.head(1)

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health concerns and options for seeking help?,"If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:",Do you think that discussing a mental health disorder with your employer would have negative consequences?,...,"If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?",What is your age?,What is your gender?,What US state or territory do you live in?,What US state or territory do you work in?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate
0,0,26-100,1.0,1,Not eligible for coverage / N/A,,No,No,Very easy,No,...,,39,Male,,,Sometimes,1,0,0,0


In [496]:
# We compute the number of empty rows in the fourth column
df['Is your primary role within your company related to tech/IT?'].isnull().sum()

0

## Column - "What US state or territory do you work in?"

### Check for row and column information

In [497]:
# We get all categories of the 61st column
df['What US state or territory do you work in?'].unique()

array([nan, 'Illinois', 'Tennessee', 'Virginia', 'California', 'Kentucky',
       'Oregon', 'Pennsylvania', 'New Jersey', 'New York', 'Indiana',
       'Minnesota', 'Washington', 'Georgia', 'Florida', 'North Dakota',
       'Texas', 'District of Columbia', 'Michigan', 'Vermont',
       'North Carolina', 'Kansas', 'Nevada', 'Utah', 'Connecticut',
       'Maryland', 'Colorado', 'Ohio', 'Iowa', 'Nebraska', 'Arizona',
       'Oklahoma', 'Wisconsin', 'Alabama', 'West Virginia',
       'Massachusetts', 'Louisiana', 'South Carolina', 'South Dakota',
       'Missouri', 'Maine', 'New Hampshire', 'New Mexico', 'Montana',
       'Idaho', 'Alaska'], dtype=object)

### Change row values - onehot encoding

In [498]:
# We create a new column for the individuals working in America
df['Working in America'] = " "

In [499]:
# We fill the values in the 'Working in America' column with 0 and 1 depending on the answer in "What US state or territory do you work in?"
df.loc[(df['What US state or territory do you work in?'].isnull()), 'Working in America'] = 0
df.loc[(df['Working in America'] != 0), 'Working in America'] = 1

### Removal of Unimportant Columns

In [500]:
# We remove the column "What US state or territory do you work in?"
df = df.drop('What US state or territory do you work in?', axis=1)

### Check results

In [501]:
with pd.option_context("display.max_columns", None):
    display(df.head(1))

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health concerns and options for seeking help?,"If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:",Do you think that discussing a mental health disorder with your employer would have negative consequences?,Do you think that discussing a physical health issue with your employer would have negative consequences?,Do you feel that your employer takes mental health as seriously as physical health?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you know local or online resources to seek help for a mental health disorder?,"If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?","If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?","If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?","If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?",Do you believe your productivity is ever affected by a mental health issue?,"If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?",Do you have previous employers?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s)?,Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you bring up a mental health issue with a potential employer in an interview?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,"If maybe, what condition(s) do you believe you have?",Have you been diagnosed with a mental health condition by a medical professional?,Have you ever sought treatment for a mental health issue from a mental health professional?,"If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?","If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?",What is your age?,What is your gender?,What US state or territory do you live in?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate,Working in America
0,0,26-100,1.0,1,Not eligible for coverage / N/A,,No,No,Very easy,No,No,,No,,,,,,,,1,N/A (not currently aware),,None did,Some of them,None of them,Some of my previous employers,Some of my previous employers,None of them,,Somewhat open,No,,No,Yes,No,,Yes,0,,,39,Male,,Sometimes,1,0,0,0,0


## Column - "What US state or territory do you live in?"

### Check for row and column information

In [502]:
# We get all categories of the 61st column
df['What US state or territory do you live in?'].unique()

array([nan, 'Illinois', 'Tennessee', 'Virginia', 'California', 'Kentucky',
       'Oregon', 'Pennsylvania', 'New Jersey', 'New York', 'Indiana',
       'Minnesota', 'Washington', 'Georgia', 'Florida', 'North Dakota',
       'Texas', 'Maryland', 'Wisconsin', 'Michigan', 'Vermont',
       'North Carolina', 'Kansas', 'District of Columbia', 'Nevada',
       'Utah', 'Connecticut', 'Colorado', 'Ohio', 'Iowa', 'Nebraska',
       'Arizona', 'Oklahoma', 'Idaho', 'Missouri', 'Alabama',
       'West Virginia', 'Massachusetts', 'Louisiana', 'South Carolina',
       'South Dakota', 'Maine', 'New Hampshire', 'New Mexico', 'Montana',
       'Rhode Island', 'Alaska'], dtype=object)

### Change row values - onehot encoding

In [503]:
# We create a new column for the individuals living in America
df['Living in America'] = " "

In [504]:
# We fill the values in the 'Working in America' column with 0 and 1 depending on the answer in "What US state or territory do you work in?"
df.loc[(df['What US state or territory do you live in?'].isnull()), 'Living in America'] = 0
df.loc[(df['Living in America'] != 0), 'Living in America'] = 1

### Removal of Unimportant Columns

In [505]:
# We remove the column "What US state or territory do you work in?"
df = df.drop('What US state or territory do you live in?', axis=1)

### Check results

In [506]:
with pd.option_context("display.max_columns", None):
    display(df.head(1))

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health concerns and options for seeking help?,"If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:",Do you think that discussing a mental health disorder with your employer would have negative consequences?,Do you think that discussing a physical health issue with your employer would have negative consequences?,Do you feel that your employer takes mental health as seriously as physical health?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you know local or online resources to seek help for a mental health disorder?,"If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?","If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?","If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?","If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?",Do you believe your productivity is ever affected by a mental health issue?,"If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?",Do you have previous employers?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s)?,Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you bring up a mental health issue with a potential employer in an interview?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,"If maybe, what condition(s) do you believe you have?",Have you been diagnosed with a mental health condition by a medical professional?,Have you ever sought treatment for a mental health issue from a mental health professional?,"If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?","If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?",What is your age?,What is your gender?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate,Working in America,Living in America
0,0,26-100,1.0,1,Not eligible for coverage / N/A,,No,No,Very easy,No,No,,No,,,,,,,,1,N/A (not currently aware),,None did,Some of them,None of them,Some of my previous employers,Some of my previous employers,None of them,,Somewhat open,No,,No,Yes,No,,Yes,0,,,39,Male,Sometimes,1,0,0,0,0,0


## Column - "If maybe, what condition(s) do you believe you have?"

### Check for row and column information

In [507]:
# We get all categories of column
df['If maybe, what condition(s) do you believe you have?'].unique()

array([nan, 'Substance Use Disorder|Addictive Disorder',
       'Anxiety Disorder (Generalized, Social, Phobia, etc)|Mood Disorder (Depression, Bipolar Disorder, etc)',
       'Anxiety Disorder (Generalized, Social, Phobia, etc)',
       'Mood Disorder (Depression, Bipolar Disorder, etc)|Attention Deficit Hyperactivity Disorder',
       'Mood Disorder (Depression, Bipolar Disorder, etc)|Anxiety Disorder (Generalized, Social, Phobia, etc)',
       'Mood Disorder (Depression, Bipolar Disorder, etc)',
       'Anxiety Disorder (Generalized, Social, Phobia, etc)|Mood Disorder (Depression, Bipolar Disorder, etc)|Psychotic Disorder (Schizophrenia, Schizoaffective, etc)',
       'Anxiety Disorder (Generalized, Social, Phobia, etc)|Mood Disorder (Depression, Bipolar Disorder, etc)|Personality Disorder (Borderline, Antisocial, Paranoid, etc)',
       'Anxiety Disorder (Generalized, Social, Phobia, etc)|Mood Disorder (Depression, Bipolar Disorder, etc)|Substance Use Disorder|Addictive Disorder',


### Change row values - onehot encoding

In [508]:
# We create a new 'Believed Mental Conditions' column
df['Believed Mental Conditions'] = " "

In [509]:
# We fill the values in the 'Believed Mental Conditions' column with 0 and 1 depending on the answer in "If maybe, what condition(s) do you believe you have?"
df.loc[(df['If maybe, what condition(s) do you believe you have?'].isnull()), 'Believed Mental Conditions'] = 0
df.loc[(df['Believed Mental Conditions'] != 0), 'Believed Mental Conditions'] = 1

### Removal of Unimportant Columns

In [510]:
# We remove the column "What US state or territory do you work in?"
df = df.drop('If maybe, what condition(s) do you believe you have?', axis=1)

### Check results

In [511]:
with pd.option_context("display.max_columns", None):
    display(df.head(1))

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health concerns and options for seeking help?,"If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:",Do you think that discussing a mental health disorder with your employer would have negative consequences?,Do you think that discussing a physical health issue with your employer would have negative consequences?,Do you feel that your employer takes mental health as seriously as physical health?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you know local or online resources to seek help for a mental health disorder?,"If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?","If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?","If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?","If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?",Do you believe your productivity is ever affected by a mental health issue?,"If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?",Do you have previous employers?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s)?,Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you bring up a mental health issue with a potential employer in an interview?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,Have you been diagnosed with a mental health condition by a medical professional?,Have you ever sought treatment for a mental health issue from a mental health professional?,"If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?","If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?",What is your age?,What is your gender?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate,Working in America,Living in America,Believed Mental Conditions
0,0,26-100,1.0,1,Not eligible for coverage / N/A,,No,No,Very easy,No,No,,No,,,,,,,,1,N/A (not currently aware),,None did,Some of them,None of them,Some of my previous employers,Some of my previous employers,None of them,,Somewhat open,No,,No,Yes,No,Yes,0,,,39,Male,Sometimes,1,0,0,0,0,0,0


In [512]:
df.shape

(1106, 50)

## Column - "What is your age?"

### Check for row and column information

In [513]:
# We get all categories of the column
df['What is your age?'].unique()

array([ 39,  29,  38,  43,  42,  30,  37,  44,  28,  34,  32,  31,  26,
        35,  25,  33,  27,  36,  41,  45,  40,  46,  19,  21,  24,  17,
        23,  22,  51,  48,  55,  49,  20,  54,  56,  57,  63,  50,  47,
        61, 323,  62,  58,   3,  66,  59,  52,  65,  53,  70], dtype=int64)

In [514]:
# We get mean age from a selected age range
mean_age = df[(df['What is your age?'] >= 18) | (df['What is your age?'] <= 75)]['What is your age?'].mean()

### Change row values - Impute irregular values

In [515]:
df['What is your age?'].replace(to_replace = df[(df['What is your age?'] < 18) | (df['What is your age?'] > 75)]['What is your age?'].tolist(), value = mean_age, inplace = True)

### Check results

In [516]:
# We get all categories of the column
df['What is your age?'].unique()

array([39.       , 29.       , 38.       , 43.       , 42.       ,
       30.       , 37.       , 44.       , 28.       , 34.       ,
       32.       , 31.       , 26.       , 35.       , 25.       ,
       33.       , 27.       , 36.       , 41.       , 45.       ,
       40.       , 46.       , 19.       , 21.       , 24.       ,
       33.6880651, 23.       , 22.       , 51.       , 48.       ,
       55.       , 49.       , 20.       , 54.       , 56.       ,
       57.       , 63.       , 50.       , 47.       , 61.       ,
       62.       , 58.       , 66.       , 59.       , 52.       ,
       65.       , 53.       , 70.       ])

## Column - "What is your gender?"

### Check for row and column information

In [517]:
# We get all categories from the column
df['What is your gender?'].unique()

array(['Male', 'male', 'Male ', 'Female', 'M', 'female',
       'I identify as female.', 'Bigender', 'non-binary', 'man', 'F', 'm',
       'f', 'Cis female ', 'Transitioned, M2F',
       'Genderfluid (born female)', 'Other/Transfeminine', 'Female ',
       'woman', 'female/woman', 'Cis male', 'Male.', 'Androgynous',
       'male 9:1 female, roughly', nan, 'Male (cis)', 'nb masculine',
       'Cisgender Female', 'Man', 'Sex is male', 'none of your business',
       'genderqueer', 'Human', 'Enby', 'Malr', 'genderqueer woman',
       'female ', 'Woman', 'Queer', 'Agender', 'Dude', 'Fluid',
       "I'm a man why didn't you make this a drop down question. You should of asked sex? And I would of answered yes please. Seriously how much text can this take? ",
       'mail', 'Male/genderqueer', 'fem', 'Nonbinary', 'male ', 'human',
       'Unicorn', 'Cis Male', 'Male (trans, FtM)', 'Cis-woman',
       'Genderqueer', 'cisdude', 'Genderflux demi-girl',
       'female-bodied; no feelings about gen

### Change row values - onehot encoding

In [518]:
# We create a new 'Male', 'Female', 'Other' column
df['Male'] = " "
df['Female'] = " "
df['Other'] = " "

In [519]:
# We set all values in the 'Male' column to 1 where the categories in the gender column are 'Male', 'male' or 'm' or any other variation
df.loc[(df['What is your gender?'] == 'Male') | (df['What is your gender?'] == 'male') | (df['What is your gender?'] == 'm') | (df['What is your gender?'] == 'Male ') | (df['What is your gender?'] == 'M') | (df['What is your gender?'] == 'man') | (df['What is your gender?'] == 'Male.') | (df['What is your gender?'] == 'Man') | (df['What is your gender?'] == 'Malr') | (df['What is your gender?'] == 'Dude') | (df['What is your gender?'] == 'mail') | (df['What is your gender?'] == 'M|') | (df['What is your gender?'] == 'male ') | (df['What is your gender?'] == 'MALE') | (df['What is your gender?'] == "I'm a man why didn't you make this a drop down question. You should of asked sex? And I would of answered yes please. Seriously how much text can this take? "), 'Male'] = 1

# We set all values in the 'Biological Male' column to 0 where the value is not 1
df.loc[df['Male'] != 1, 'Male'] = 0

In [520]:
# We set all values in the 'Female' column to 1 where the categories in the gender column are 'Female', 'female'
df.loc[(df['What is your gender?'] == 'Female') | (df['What is your gender?'] == 'female') | (df['What is your gender?'] == 'female ') | (df['What is your gender?'] == 'F') | (df['What is your gender?'] == 'Woman') | (df['What is your gender?'] == 'f') | (df['What is your gender?'] == 'Female ') | (df['What is your gender?'] == 'woman') | (df['What is your gender?'] == 'fem') | (df['What is your gender?'] == ' Female'), 'Female'] = 1 

# We set all values in the 'Biological Female' column to 0 where the value is not 1
df.loc[df['Female'] != 1, 'Female'] = 0

In [521]:
# We check what the most occuring gender is
df['Male'].value_counts(ascending = True)

0    258
1    848
Name: Male, dtype: int64

In [522]:
# We check what the most occuring gender is
df['Female'].value_counts(ascending = True)

1    215
0    891
Name: Female, dtype: int64

In [523]:
# We set all values in the 'Male' column to 1 where the categories in the gender column are 'Unicorn' etc
df.loc[(df['What is your gender?'] == 'none of your business') | (df['What is your gender?'] == 'human') | (df['What is your gender?'] == 'Human') | (df['What is your gender?'] == 'Unicorn'), 'Male'] = 1

In [524]:
# We set all values in the 'Other' column to 1 where we do not have a 1 in either of the categories 'Male', Female'
df.loc[(df['Male'] == 0) & (df['Female'] == 0), 'Other'] = 1

# We set all values in the 'Other' column to 0 where the value is not 1
df.loc[df['Other'] != 1, 'Other'] = 0

### Removal of Unimportant Columns

In [525]:
# We remove the column "What is your gender?"
df = df.drop('What is your gender?', axis=1)

### Check results

In [526]:
with pd.option_context("display.max_columns", None):
    display(df.head(1))

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health concerns and options for seeking help?,"If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:",Do you think that discussing a mental health disorder with your employer would have negative consequences?,Do you think that discussing a physical health issue with your employer would have negative consequences?,Do you feel that your employer takes mental health as seriously as physical health?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you know local or online resources to seek help for a mental health disorder?,"If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?","If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?","If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?","If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?",Do you believe your productivity is ever affected by a mental health issue?,"If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?",Do you have previous employers?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s)?,Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you bring up a mental health issue with a potential employer in an interview?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,Have you been diagnosed with a mental health condition by a medical professional?,Have you ever sought treatment for a mental health issue from a mental health professional?,"If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?","If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?",What is your age?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate,Working in America,Living in America,Believed Mental Conditions,Male,Female,Other
0,0,26-100,1.0,1,Not eligible for coverage / N/A,,No,No,Very easy,No,No,,No,,,,,,,,1,N/A (not currently aware),,None did,Some of them,None of them,Some of my previous employers,Some of my previous employers,None of them,,Somewhat open,No,,No,Yes,No,Yes,0,,,39.0,Sometimes,1,0,0,0,0,0,0,1,0,0


## Column - "How many employees does your company or organization have?"

### Check for row and column information

In [527]:
# We get all categories of the column
df['How many employees does your company or organization have?'].unique()

array(['26-100', '6-25', 'More than 1000', nan, '100-500', '500-1000',
       '1-5'], dtype=object)

### Change row values - map values

In [528]:
# We fill the missing values in the column with '1-5' for all rows where the individual is self employed
df.loc[(df['Are you self-employed?'] == 1) & (df['How many employees does your company or organization have?'].isnull()), 'How many employees does your company or organization have?'] = '1-5'

In [529]:
# We change the categories from the column to numbers from 1 to 6
df['How many employees does your company or organization have?'] = df['How many employees does your company or organization have?'].map({'1-5': 1, '6-25': 2, '26-100': 3, '100-500': 4, '500-1000': 5, 'More than 1000': 6})

### Check results

In [530]:
# We get all categories of the column
df['How many employees does your company or organization have?'].unique()

array([3, 2, 6, 1, 4, 5], dtype=int64)

## Column - "Do you work remotely?"

### Check for row and column information

In [531]:
# We get all categories of the column
df['Do you work remotely?'].unique()

array(['Sometimes', 'Never', 'Always'], dtype=object)

### Change row values - map values

In [532]:
# We change the categories from the column to numbers from 0 to 2
df['Do you work remotely?'] = df['Do you work remotely?'].map({'Never': 0, 'Sometimes': 1, 'Always': 2})

### Check results

In [533]:
# We get all categories of the column
df['Do you work remotely?'].unique()

array([1, 0, 2], dtype=int64)

## Column - "If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?"

### Check for row and column information

In [534]:
# We get all categories of the column
df['If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?'].unique()

array([nan, 'Sometimes', 'Often', 'Rarely', 'Never'], dtype=object)

### Change row values - map values

In [535]:
# We change the categories of the column to numerical values
df['If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?'] = df['If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?'].map({'Never': 0, 'Rarely': 1, 'Sometimes': 2, 'Often': 3})

### Check results

In [536]:
# We get all categories of the column
df['If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?'].unique()

array([nan,  2.,  3.,  1.,  0.])

## Column - "If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?"

### Check for row and column information

In [537]:
# We get all categories of the column
df['If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?'].unique()

array([nan, 'Rarely', 'Sometimes', 'Never', 'Often'], dtype=object)

### Change row values - map values

In [538]:
# We change the categories of the column to numerical values
df['If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?'] = df['If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?'].map({'Never': 0, 'Rarely': 1, 'Sometimes': 2, 'Often': 3})

### Check results

In [539]:
# We get all categories of the column
df['If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?'].unique()

array([nan,  1.,  2.,  0.,  3.])

## Column - "Have you been diagnosed with a mental health condition by a medical professional?"

### Check for row and column information

In [540]:
# We get all categories of the column
df['Have you been diagnosed with a mental health condition by a medical professional?'].unique()

array(['Yes', 'No'], dtype=object)

### Change row values - map values

In [541]:
# We change the values of the column from yes to 1 and no to 0
df['Have you been diagnosed with a mental health condition by a medical professional?'] = df['Have you been diagnosed with a mental health condition by a medical professional?'].map({'No': 0, 'Yes': 1})

### Check results

In [542]:
# We get all categories of the column
df['Have you been diagnosed with a mental health condition by a medical professional?'].unique()

array([1, 0], dtype=int64)

## Column - "Do you currently have a mental health disorder?"

### Check for row and column information

In [543]:
# We get all categories of the column
df['Do you currently have a mental health disorder?'].unique()

array(['No', 'Yes', nan], dtype=object)

In [544]:
print("Current mental health disorder |","No :",df[df['Do you currently have a mental health disorder?'] == 'No'].shape[0],"Yes :", df[df['Do you currently have a mental health disorder?'] == 'Yes'].shape[0], "Is empty :", df[df['Do you currently have a mental health disorder?'].isnull()].shape[0])

Current mental health disorder | No : 399 Yes : 430 Is empty : 277


### Change row values - map values

In [545]:
# We check if there was a mental health issue in the past or if there is a familiy history of it, if yes there is a higher probability that the subject currently has a mental health condition
df.loc[(df['Do you currently have a mental health disorder?'].isnull()) & (df['Have you had a mental health disorder in the past?'] == 'Yes') | (df['Do you have a family history of mental illness?'] == 'Yes'), 'Do you currently have a mental health disorder?'] = "Yes"
df.loc[(df['Do you currently have a mental health disorder?'].isnull()) & (df['Have you had a mental health disorder in the past?'] == 'No') | (df['Do you have a family history of mental illness?'] == 'No'), 'Do you currently have a mental health disorder?'] = "No"

In [546]:
print("Current mental health disorder |","No :",df[df['Do you currently have a mental health disorder?'] == 'No'].shape[0],"Yes :", df[df['Do you currently have a mental health disorder?'] == 'Yes'].shape[0], "Is empty :", df[df['Do you currently have a mental health disorder?'].isnull()].shape[0])

Current mental health disorder | No : 456 Yes : 601 Is empty : 49


In [547]:
# We impute the remaining missing values with the most occuring value in the column
df.loc[(df['Do you currently have a mental health disorder?'].isnull()), 'Do you currently have a mental health disorder?'] = "Yes"

In [548]:
# We change the values of the column from yes to 1, maybe to 0 and no to 0
df['Do you currently have a mental health disorder?'] = df['Do you currently have a mental health disorder?'].map({'No': 0,'Yes': 1})

### Check results

In [549]:
# We get all categories of the column
df['Do you currently have a mental health disorder?'].unique()

array([0, 1], dtype=int64)

## Column - "Have you had a mental health disorder in the past?"

### Check for row and column information

In [550]:
# We get all categories of the column
df['Have you had a mental health disorder in the past?'].unique()

array(['Yes', nan, 'No'], dtype=object)

In [551]:
print("Past Mental Health Disorder |","No :",df[df['Have you had a mental health disorder in the past?'] == 'No'].shape[0],"Yes :", df[df['Have you had a mental health disorder in the past?'] == 'Yes'].shape[0], "Is empty :", df[df['Have you had a mental health disorder in the past?'].isnull()].shape[0])

Past Mental Health Disorder | No : 341 Yes : 565 Is empty : 200


### Change row values - map values

In [552]:
# We check if there is a mental health issue now or if there is a familiy history of it, if yes there is a higher probability that the subject currently has a mental health condition
df.loc[(df['Have you had a mental health disorder in the past?'].isnull()) & (df['Do you currently have a mental health disorder?'] == 'Yes') | (df['Do you have a family history of mental illness?'] == 'Yes'), 'Have you had a mental health disorder in the past?'] = "Yes"
df.loc[(df['Have you had a mental health disorder in the past?'].isnull()) & (df['Do you currently have a mental health disorder?'] == 'No') | (df['Do you have a family history of mental illness?'] == 'No'), 'Have you had a mental health disorder in the past?'] = "No"

In [553]:
print("Past Mental Health Disorder |","No :",df[df['Have you had a mental health disorder in the past?'] == 'No'].shape[0],"Yes :", df[df['Have you had a mental health disorder in the past?'] == 'Yes'].shape[0], "Is empty :", df[df['Have you had a mental health disorder in the past?'].isnull()].shape[0])

Past Mental Health Disorder | No : 438 Yes : 603 Is empty : 65


In [554]:
# We impute the remaining missing values with the most occuring value in the column
df.loc[(df['Have you had a mental health disorder in the past?'].isnull()), 'Have you had a mental health disorder in the past?'] = "Yes"

In [555]:
# We change the values of the column from yes to 1, maybe to 0 and no to 0
df['Have you had a mental health disorder in the past?'] = df['Have you had a mental health disorder in the past?'].map({'No': 0, 'Yes': 1})

### Check results

In [556]:
# We get all categories of the column
df['Have you had a mental health disorder in the past?'].unique()

array([0, 1], dtype=int64)

## Column - "Do you have a family history of mental illness?"

### Check for row and column information

In [557]:
# We get all categories of the column
df['Do you have a family history of mental illness?'].unique()

array(['No', 'Yes', nan], dtype=object)

In [558]:
print("Familiy History |","No :",df[df['Do you have a family history of mental illness?'] == 'No'].shape[0],"Yes :", df[df['Do you have a family history of mental illness?'] == 'Yes'].shape[0], "Is empty :", df[df['Do you have a family history of mental illness?'].isnull()].shape[0])

Familiy History | No : 381 Yes : 510 Is empty : 215


### Change row values - map values

In [559]:
# We check if there is a mental health issue now or in the past, if yes there is a higher probability that the subject has a familiy history of mental conditions
df.loc[(df['Do you have a family history of mental illness?'].isnull()) & (df['Do you currently have a mental health disorder?'] == 'Yes') | (df['Have you had a mental health disorder in the past?'] == 'Yes'), 'Do you have a family history of mental illness?'] = "Yes"
df.loc[(df['Do you have a family history of mental illness?'].isnull()) & (df['Do you currently have a mental health disorder?'] == 'No') | (df['Have you had a mental health disorder in the past?'] == 'No'), 'Do you have a family history of mental illness?'] = "No"

In [560]:
# We impute the remaining missing values with the most occuring value in the column
df.loc[(df['Do you have a family history of mental illness?'].isnull()), 'Do you have a family history of mental illness?'] = "Yes"

In [561]:
print("Familiy History |","No :",df[df['Do you have a family history of mental illness?'] == 'No'].shape[0],"Yes :", df[df['Do you have a family history of mental illness?'] == 'Yes'].shape[0], "Is empty :", df[df['Do you have a family history of mental illness?'].isnull()].shape[0])

Familiy History | No : 381 Yes : 725 Is empty : 0


In [562]:
# We change the values of the column from yes to 1, I don't know to 0 and no to 0
df['Do you have a family history of mental illness?'] = df['Do you have a family history of mental illness?'].map({'No': 0, 'Yes': 1})

### Check results

In [563]:
# We get all categories of the column
df['Do you have a family history of mental illness?'].unique()

array([0, 1], dtype=int64)

## Column - "Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?" 
## & 
## "Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?"

### Check for row and column information

In [564]:
# We rename the column to perform further methods
df.rename(columns = {'Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?':'Observation',}, inplace = True)
df.rename(columns = {'Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?':'Influence',}, inplace = True)

In [565]:
# We get all categories of the column
df['Observation'].unique()

array(['No', 'Maybe/Not sure', 'Yes, I experienced', 'Yes, I observed',
       nan], dtype=object)

In [566]:
# We get all categories of the column
df['Influence'].unique()

array([nan, 'Yes', 'No'], dtype=object)

### Change row values - map values - Observation

In [567]:
# We change the empty columns to "No"
df.loc[df['Observation'].isnull(), 'Observation'] = 'No'

In [568]:
# We change the maybe columns to "Yes" if the experience influenced the outcome in 'Influence' column negatively
df.loc[(df['Observation'] == 'Maybe/Not sure') & (df['Influence'] == "Yes"), 'Observation'] = 'Yes'

In [569]:
# We change the row values of Yes, I experienced and Yes, I observed to yes
df.loc[(df['Observation'] == "Yes, I experienced") | (df['Observation'] == "Yes, I observed"), 'Observation'] = 'Yes'

In [570]:
# We check for empty rows
df['Influence'].isnull().sum()

764

In [571]:
# We check for empty rows
df['Observation'].isnull().sum()

0

### Change row values - Imputation of Missing Values

In [572]:
# We check the number of people that answered no observation and empty influence
df[(df['Observation'] == "No") & (df['Influence'].isnull())].shape[0] 

530

In [573]:
# We check the number of people that answered no observation and empty influence
df[(df['Observation'] == "Maybe/Not sure") & (df['Influence'].isnull())].shape[0] 

124

In [574]:
# We check the number of people that answered yes observation and empty influence
df[(df['Observation'] == "Yes") & (df['Influence'].isnull())].shape[0] 

110

In [575]:
# We change the category empty to "No" if there has been no observed experience in the Observation column
df.loc[(df['Observation'] == 'No') & (df['Influence'].isnull()), 'Influence'] = 'No' 

# If someone had not observed anything, there can not be a negative experience

In [576]:
# We change the category empty to "No" if there has maybe been an observed experience in the Observation column
df.loc[(df['Observation'] == 'Maybe/Not sure') & (df['Influence'].isnull()), 'Influence'] = 'No' 

# If someone is unsure to have observed anything, and there are not negative results one can assume there has not been a negative experience

In [577]:
# We change the row values of Yes, I experienced and Yes, I observed to yes
df.loc[(df['Observation'] == "Yes, I experienced") | (df['Observation'] == "Yes, I observed"), 'Observation'] = 'Yes'

In [578]:
# We change the category empty to "Yes" if there has been an observed experience in the Observation column
df.loc[(df['Observation'] == 'Yes') & (df['Influence'].isnull()), 'Influence'] = 'No' 

# If someone has observed something negative, one can assume it had a negaitve impact  

### Change row values - map values - Influence

In [579]:
# We change the maybe columns to "No" if someone is unsure of an observation and it had no negative impact
df.loc[(df['Observation'] == "Maybe/Not sure") & (df['Influence'] == "No"), 'Observation'] = 'No'

# We change the maybe columns to "No" if the experience did not influence the outcome in column negatively
df.loc[(df['Observation'] == "No") & (df['Influence'] == "Maybe"), 'Influence'] = 'No'

In [580]:
# We change the category "Maybe" to "Yes" if there has been an observed experience in the Observation column
df.loc[(df['Observation'] == 'Yes') & (df['Influence'] == "Maybe"), 'Influence'] = 'Yes' 

# We change the category "Maybe" to "Yes" if it had a negative influence
df.loc[(df['Observation'] == 'Maybe') & (df['Influence'] == "Yes"), 'Observation'] = 'Yes' 

# If someone had a negative expereince and says it maybe had an impact, one can assume it did have an impact

In [581]:
# We change the category "Maybe" to "Yes" if there has been an observed experience in the Observation column
df.loc[(df['Observation'] == 'Maybe/Not sure') & (df['Influence'] == 'Maybe'), ['Influence', 'Observation']] = ['Yes', 'Yes'] 

# One can assume something negative occured and it had a negative effect on the individual

In [582]:
# We map the values
df['Observation'] = df['Observation'].map({'No': 0, 'Yes': 1})

In [583]:
# We map the values
df['Influence'] = df['Influence'].map({'No': 0, 'Yes': 1})

### Check results

In [584]:
# We get all categories of the column
df['Observation'].unique()

array([0, 1], dtype=int64)

In [585]:
# We get all categories of the column
df['Influence'].unique()

array([0, 1], dtype=int64)

In [586]:
# We rename the column to perform further methods
df.rename(columns = {'Observation':'Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?',}, inplace = True)
df.rename(columns = {'Influence' :'Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?',}, inplace = True)

## Column - "How willing would you be to share with friends and family that you have a mental illness?"

### Check for row and column information

In [587]:
# We get all categories of the column
df['How willing would you be to share with friends and family that you have a mental illness?'].unique()

array(['Somewhat open',
       'Not applicable to me (I do not have a mental illness)',
       'Very open', 'Not open at all', 'Somewhat not open', 'Neutral'],
      dtype=object)

### Change row values - map values

In [588]:
# We change the values of the column
df['How willing would you be to share with friends and family that you have a mental illness?'] = df['How willing would you be to share with friends and family that you have a mental illness?'].map({ "Not open at all" : 0, 'Somewhat not open': 1, 'Neutral': 2, 'Somewhat open': 3, 'Very open': 4})

In [589]:
df['How willing would you be to share with friends and family that you have a mental illness?'].isnull().sum()

83

In [590]:
# We change empty categories to 2
df.loc[(df['How willing would you be to share with friends and family that you have a mental illness?'].isnull() ),'How willing would you be to share with friends and family that you have a mental illness?'] = 2

In [591]:
# We turn the float64 data type from the third column into an int data type
df['How willing would you be to share with friends and family that you have a mental illness?'] = df['How willing would you be to share with friends and family that you have a mental illness?'].astype(int)

### Check results

In [592]:
# We confirm all the unique categories inside the column
df['How willing would you be to share with friends and family that you have a mental illness?'].unique()

array([3, 2, 4, 0, 1])

## Column - "Would you bring up a mental health issue with a potential employer in an interview?"

### Check for row and column information

In [593]:
df.rename(columns = {'Would you bring up a mental health issue with a potential employer in an interview?':'Temp',}, inplace = True)

In [594]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([nan, 'No', 'Yes'], dtype=object)

In [595]:
print("Interview |","No :",df[df['Temp'] == 'No'].shape[0],"Yes :", df[df['Temp'] == 'Yes'].shape[0], "Is empty :", df[df['Temp'].isnull()].shape[0])

Interview | No : 678 Yes : 81 Is empty : 347


### Change row values - map values & Imputation

In [596]:
# We impute the remaining missing values with the most occuring value in the column
df.loc[(df['Temp'].isnull()), 'Temp'] = "No"

In [597]:
# We change the values of the column from yes to 1 and no to 0
df['Temp'] = df['Temp'].map({'No': 0, 'Yes': 1})

### Check results

In [598]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([0, 1], dtype=int64)

In [599]:
df.rename(columns = {'Temp':'Would you bring up a mental health issue with a potential employer in an interview?',}, inplace = True)

## Column - "Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?"

### Check for row and column information

In [600]:
df.rename(columns = {'Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?':'Temp',}, inplace = True)

In [601]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array(['None of them', 'Some of them', nan, 'Yes, all of them'],
      dtype=object)

In [602]:
df[(df['Are you self-employed?'] == 1) & (df['Temp'].isnull())].shape[0]

25

In [603]:
df[(df['Are you self-employed?'] == 0) & (df['Temp'].isnull())].shape

(110, 52)

### Change row values - map values & Imputation

In [604]:
# We change empty categories to 'No oberservation'
df.loc[(df['Temp'].isnull() ),'Temp'] = "No observation"

In [605]:
# We change the values of the column
df['Temp'] = df['Temp'].map({'No observation': 0, 'None of them' : 0, 'Some of them': 1, 'Yes, all of them': 1})

### Check results

In [606]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([0, 1], dtype=int64)

In [607]:
df.rename(columns = {'Temp':'Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?',}, inplace = True)

## Column - "Would you have been willing to discuss a mental health issue with your direct supervisor(s)?"

### Check for row and column information

In [608]:
df.rename(columns = {'Would you have been willing to discuss a mental health issue with your direct supervisor(s)?':'Temp',}, inplace = True)

In [609]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array(['Some of my previous employers', nan,
       'No, at none of my previous employers',
       'Yes, at all of my previous employers'], dtype=object)

In [610]:
# We change the values of the column
df['Temp'] = df['Temp'].map({'No, at none of my previous employers' : 'No', 'Some of my previous employers': "Yes", 'Yes, at all of my previous employers': "Yes"})

In [611]:
print("No :",df[df['Temp'] == 'No'].shape[0],"Yes :", df[df['Temp'] == 'Yes'].shape[0], "Is empty :", df[df['Temp'].isnull()].shape[0])

No : 321 Yes : 580 Is empty : 205


### Change row values - map values & Imputation

In [612]:
# We impute the remaining missing values with the most occuring value in the column
df.loc[(df['Temp'].isnull()), 'Temp'] = "Yes"

In [613]:
# We change the values of the column from yes to 1 and no to 0
df['Temp'] = df['Temp'].map({'No': 0, 'Yes': 1})

### Check results

In [614]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([1, 0], dtype=int64)

In [615]:
df.rename(columns = {'Temp':'Would you have been willing to discuss a mental health issue with your direct supervisor(s)',}, inplace = True)

## Column - "Would you have been willing to discuss a mental health issue with your previous co-workers?"

### Check for row and column information

In [616]:
df.rename(columns = {'Would you have been willing to discuss a mental health issue with your previous co-workers?':'Temp',}, inplace = True)

In [617]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array(['Some of my previous employers',
       'No, at none of my previous employers',
       'Yes, at all of my previous employers', nan], dtype=object)

In [618]:
# We change the values of the column
df['Temp'] = df['Temp'].map({'No, at none of my previous employers' : 'No', 'Some of my previous employers': "Yes", 'Yes, at all of my previous employers': "Yes"})

In [619]:
print("No :",df[df['Temp'] == 'No'].shape[0],"Yes :", df[df['Temp'] == 'Yes'].shape[0], "Is empty :", df[df['Temp'].isnull()].shape[0])

No : 318 Yes : 653 Is empty : 135


### Change row values - map values & Imputation

In [620]:
# We impute the remaining missing values with the most occuring value in the column
df.loc[(df['Temp'].isnull()), 'Temp'] = "Yes"

In [621]:
# We change the values of the column from yes to 1 and no to 0
df['Temp'] = df['Temp'].map({'No': 0, 'Yes': 1})

### Check results

In [622]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([1, 0], dtype=int64)

In [623]:
df.rename(columns = {'Temp':'Would you have been willing to discuss a mental health issue with your previous co-workers?',}, inplace = True)

## Column - "Do you think that discussing a physical health issue with previous employers would have negative consequences?"

### Check for row and column information

In [624]:
df.rename(columns = {'Do you think that discussing a physical health issue with previous employers would have negative consequences?':'Temp',}, inplace = True)

In [625]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array(['None of them', 'Some of them', 'Yes, all of them', nan],
      dtype=object)

In [626]:
# We change the values of the column
df['Temp'] = df['Temp'].map({'None of them' : 'No', 'Some of them': "Yes", 'Yes, all of them': "Yes"})

In [627]:
print("No :",df[df['Temp'] == 'No'].shape[0],"Yes :", df[df['Temp'] == 'Yes'].shape[0], "Is empty :", df[df['Temp'].isnull()].shape[0])

No : 439 Yes : 532 Is empty : 135


### Change row values - map values & Imputation

In [628]:
# We impute the remaining missing values with the most occuring value in the column
df.loc[(df['Temp'].isnull()), 'Temp'] = "Yes"

In [629]:
# We change the values of the column from yes to 1 and no to 0
df['Temp'] = df['Temp'].map({'No': 0, 'Yes': 1})

### Check results

In [630]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([0, 1], dtype=int64)

In [631]:
df.rename(columns = {'Temp':'Do you think that discussing a physical health issue with previous employers would have negative consequences?',}, inplace = True)

## Column - "Do you think that discussing a mental health disorder with previous employers would have negative consequences?"

### Check for row and column information

In [632]:
df.rename(columns = {'Do you think that discussing a mental health disorder with previous employers would have negative consequences?':'Temp',}, inplace = True)

In [633]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array(['Some of them', 'None of them', nan, 'Yes, all of them'],
      dtype=object)

In [634]:
# We change the values of the column
df['Temp'] = df['Temp'].map({'None of them' : 'No', 'Some of them': "Yes", 'Yes, all of them': "Yes"})

In [635]:
print("No :",df[df['Temp'] == 'No'].shape[0],"Yes :", df[df['Temp'] == 'Yes'].shape[0], "Is empty :", df[df['Temp'].isnull()].shape[0])

No : 90 Yes : 629 Is empty : 387


### Change row values - map values & Imputation

In [636]:
# We impute the remaining missing values with the most occuring value in the column
df.loc[(df['Temp'].isnull()), 'Temp'] = "Yes"

In [637]:
# We change the values of the column from yes to 1 and no to 0
df['Temp'] = df['Temp'].map({'No': 0, 'Yes': 1})

### Check results

In [638]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([1, 0], dtype=int64)

In [639]:
df.rename(columns = {'Temp':'Do you think that discussing a mental health disorder with previous employers would have negative consequences?',}, inplace = True)

## Column - "Did your previous employers provide resources to learn more about mental health issues and how to seek help?"

### Check for row and column information

In [640]:
df.rename(columns = {'Did your previous employers provide resources to learn more about mental health issues and how to seek help?':'Temp',}, inplace = True)

In [641]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array(['None did', 'Some did', nan, 'Yes, they all did'], dtype=object)

In [642]:
# We change the values of the column
df['Temp'] = df['Temp'].map({'None did' : 'No', 'Some did': "Yes", 'Yes, they all did': "Yes"})

In [643]:
print("No :",df[df['Temp'] == 'No'].shape[0],"Yes :", df[df['Temp'] == 'Yes'].shape[0], "Is empty :", df[df['Temp'].isnull()].shape[0])

No : 660 Yes : 311 Is empty : 135


### Change row values - map values & Imputation

In [644]:
# We impute the remaining missing values with the most occuring value in the column
df.loc[(df['Temp'].isnull()), 'Temp'] = "No"

In [645]:
# We change the values of the column from yes to 1 and no to 0
df['Temp'] = df['Temp'].map({'No': 0, 'Yes': 1})

### Check results

In [646]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([0, 1], dtype=int64)

In [647]:
df.rename(columns = {'Temp':'Did your previous employers provide resources to learn more about mental health issues and how to seek help?',}, inplace = True)

## Column - "Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?"

### Check for row and column information

In [648]:
df.rename(columns = {'Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?':'Temp',}, inplace = True)

In [649]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([nan, 'None did', 'Some did', 'Yes, they all did'], dtype=object)

In [650]:
# We change the values of the column
df['Temp'] = df['Temp'].map({'None did' : 'No', 'Some did': "Yes", 'Yes, they all did': "Yes"})

In [651]:
print("No :",df[df['Temp'] == 'No'].shape[0],"Yes :", df[df['Temp'] == 'Yes'].shape[0], "Is empty :", df[df['Temp'].isnull()].shape[0])

No : 676 Yes : 222 Is empty : 208


### Change row values - map values & Imputation

In [652]:
# We impute the remaining missing values with the most occuring value in the column
df.loc[(df['Temp'].isnull()), 'Temp'] = "No"

In [653]:
# We change the values of the column from yes to 1 and no to 0
df['Temp'] = df['Temp'].map({'No': 0, 'Yes': 1})

### Check results

In [654]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([0, 1], dtype=int64)

In [655]:
df.rename(columns = {'Temp':'Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?',}, inplace = True)

## Column - "Were you aware of the options for mental health care provided by your previous employers?"

### Check for row and column information

In [656]:
df.rename(columns = {'Were you aware of the options for mental health care provided by your previous employers?':'Temp',}, inplace = True)

In [657]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array(['N/A (not currently aware)', 'I was aware of some',
       'Yes, I was aware of all of them', 'No, I only became aware later',
       nan], dtype=object)

In [658]:
# We change the values of the column
df['Temp'] = df['Temp'].map({'N/A (not currently aware)' : 'No','No, I only became aware later' : 'No', 'I was aware of some': "Yes", 'Yes, I was aware of all of them': "Yes"})

In [659]:
print("No :",df[df['Temp'] == 'No'].shape[0],"Yes :", df[df['Temp'] == 'Yes'].shape[0], "Is empty :", df[df['Temp'].isnull()].shape[0])

No : 546 Yes : 425 Is empty : 135


### Change row values - map values & Imputation

In [660]:
# We impute the remaining missing values with the most occuring value in the column
df.loc[(df['Temp'].isnull()), 'Temp'] = "No"

In [661]:
# We change the values of the column from yes to 1 and no to 0
df['Temp'] = df['Temp'].map({'No': 0, 'Yes': 1})

### Check results

In [662]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([0, 1], dtype=int64)

In [663]:
df.rename(columns = {'Temp':'Were you aware of the options for mental health care provided by your previous employers?',}, inplace = True)

## Column - "Do you know local or online resources to seek help for a mental health disorder?"

### Check for row and column information

In [664]:
df.rename(columns = {'Do you know local or online resources to seek help for a mental health disorder?':'Temp',}, inplace = True)

In [665]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([nan, 'I know some', "No, I don't know any", 'Yes, I know several'],
      dtype=object)

In [666]:
# We change the values of the column
df['Temp'] = df['Temp'].map({"No, I don't know any" : 'No', 'I know some': "Yes", 'Yes, I know several': "Yes"})

In [667]:
print("No :",df[df['Temp'] == 'No'].shape[0],"Yes :", df[df['Temp'] == 'Yes'].shape[0], "Is empty :", df[df['Temp'].isnull()].shape[0])

No : 40 Yes : 140 Is empty : 926


### Change row values - map values & Imputation

In [668]:
# We impute the remaining missing values with no
df.loc[(df['Temp'].isnull()), 'Temp'] = "No"

In [669]:
# We change the values of the column from yes to 1 and no to 0
df['Temp'] = df['Temp'].map({'No': 0, 'Yes': 1})

### Check results

In [670]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([0, 1], dtype=int64)

In [671]:
df.rename(columns = {'Temp':'Do you know local or online resources to seek help for a mental health disorder?',}, inplace = True)

## Column - "Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?"

### Check for row and column information

In [672]:
df.rename(columns = {'Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?':'Temp',}, inplace = True)

In [673]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array(['No', 'Yes', nan], dtype=object)

In [674]:
print("No :",df[df['Temp'] == 'No'].shape[0],"Yes :", df[df['Temp'] == 'Yes'].shape[0], "Is empty :", df[df['Temp'].isnull()].shape[0])

No : 858 Yes : 68 Is empty : 180


### Change row values - map values & Imputation

In [675]:
# We impute the remaining missing values with no
df.loc[(df['Temp'].isnull()), 'Temp'] = "No"

In [676]:
# We change the values of the column from yes to 1 and no to 0
df['Temp'] = df['Temp'].map({'No': 0, 'Yes': 1})

### Check results

In [677]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([0, 1], dtype=int64)

In [678]:
df.rename(columns = {'Temp':'Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?',}, inplace = True)

## Column - "If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?"

### Check for row and column information

In [679]:
df.rename(columns = {'If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?':'Temp',}, inplace = True)

In [680]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([nan, '1-25%', '26-50%', '51-75%', '76-100%'], dtype=object)

### Change row values - map values & Imputation

In [681]:
# We change the values of the column
df['Temp'] = df['Temp'].map({'1-25%' : 1, '26-50%': 2, '51-75%': 3, '76-100%': 4})

### Check results

In [682]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([nan,  1.,  2.,  3.,  4.])

In [683]:
df.rename(columns = {'Temp':'If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?',}, inplace = True)

## Column - "Do you believe your productivity is ever affected by a mental health issue?"

### Check for row and column information

In [684]:
df.rename(columns = {'Do you believe your productivity is ever affected by a mental health issue?':'Temp',}, inplace = True)

In [685]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([nan, 'Yes', 'No', 'Unsure'], dtype=object)

In [686]:
print("No :",df[df['Temp'] == 'No'].shape[0],"Yes :", df[df['Temp'] == 'Yes'].shape[0],"Unsure :",  df[df['Temp'] == 'Unsure'].shape[0], "Is empty :", df[df['Temp'].isnull()].shape[0])

No : 8 Yes : 130 Unsure : 23 Is empty : 945


### Change row values - map values & Imputation

In [687]:
# We change the values of the column
df['Temp'] = df['Temp'].map({'Yes' : 1, 'No': 0, 'Unsure' : 1})

### Check results

In [688]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([nan,  1.,  0.])

In [689]:
df.rename(columns = {'Temp':'Do you believe your productivity is ever affected by a mental health issue?',}, inplace = True)

## Column - "If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?"

### Check for row and column information

In [690]:
df.rename(columns = {'If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?':'Temp',}, inplace = True)

In [691]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([nan, 'No', "I'm not sure", 'Yes'], dtype=object)

In [692]:
print("No :",df[df['Temp'] == 'No'].shape[0],"Yes :", df[df['Temp'] == 'Yes'].shape[0],"Unsure :",  df[df['Temp'] == "I'm not sure"].shape[0], "Is empty :", df[df['Temp'].isnull()].shape[0])

No : 41 Yes : 20 Unsure : 40 Is empty : 1005


### Change row values - map values & Imputation

In [693]:
# We change the values of the column
df['Temp'] = df['Temp'].map({'Yes' : 1, 'No': 0, "I'm not sure": 0}) 
# If someone does not know if it affected negatively one can assume it didn't

### Check results

In [694]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([nan,  0.,  1.])

In [695]:
df.rename(columns = {'Temp':'If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?',}, inplace = True)

## Column - "If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?"

### Check for row and column information

In [696]:
df.rename(columns = {'If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?':'Temp',}, inplace = True)

In [697]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([nan, 'Sometimes, if it comes up',
       'No, because it would impact me negatively', 'Yes, always',
       "No, because it doesn't matter"], dtype=object)

In [698]:
# We change the values of the column
df['Temp'] = df['Temp'].map({"No, because it doesn't matter" : 'No', "No, because it would impact me negatively" : 'No', 'Sometimes, if it comes up': "Yes", 'Yes, always': "Yes"})

In [699]:
print("No :",df[df['Temp'] == 'No'].shape[0],"Yes :", df[df['Temp'] == 'Yes'].shape[0], "Is empty :", df[df['Temp'].isnull()].shape[0])

No : 39 Yes : 73 Is empty : 994


### Change row values - map values & Imputation

In [700]:
# We change the values of the column
df['Temp'] = df['Temp'].map({'Yes' : 1, 'No': 0})

### Check results

In [701]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([nan,  1.,  0.])

In [702]:
df.rename(columns = {'Temp':'If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?',}, inplace = True)

## Column - "If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?"

### Check for row and column information

In [703]:
df.rename(columns = {'If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?':'Temp',}, inplace = True)

In [704]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([nan, 'No', 'Yes', "I'm not sure"], dtype=object)

In [705]:
print("No :",df[df['Temp'] == 'No'].shape[0],"Yes :", df[df['Temp'] == 'Yes'].shape[0],"Unsure :",  df[df['Temp'] == "I'm not sure"].shape[0], "Is empty :", df[df['Temp'].isnull()].shape[0])

No : 27 Yes : 20 Unsure : 42 Is empty : 1017


### Change row values - map values & Imputation

In [706]:
# We change the values of the column
df['Temp'] = df['Temp'].map({'Yes' : 1, 'No': 0, "I'm not sure": 0}) 
# If someone does not know if it affected negatively one can assume it didn't

### Check results

In [707]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([nan,  0.,  1.])

In [708]:
df.rename(columns = {'Temp':'If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?',}, inplace = True)

## Column - "If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?"

In [709]:
df.rename(columns = {'If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?':'Temp',}, inplace = True)

In [710]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([nan, "No, because it doesn't matter",
       'No, because it would impact me negatively',
       'Sometimes, if it comes up'], dtype=object)

In [711]:
# We change the values of the column
df['Temp'] = df['Temp'].map({"No, because it doesn't matter" : 'No', "No, because it would impact me negatively" : 'No', 'Sometimes, if it comes up': "Yes"})

In [712]:
print("No :",df[df['Temp'] == 'No'].shape[0],"Yes :", df[df['Temp'] == 'Yes'].shape[0], "Is empty :", df[df['Temp'].isnull()].shape[0])

No : 79 Yes : 37 Is empty : 990


### Change row values - map values & Imputation

In [713]:
# We change the values of the column
df['Temp'] = df['Temp'].map({'Yes' : 1, 'No': 0}) 

### Check results

In [714]:
# We confirm all the unique categories inside the column
df['Temp'].unique()

array([nan,  0.,  1.])

In [715]:
df.rename(columns = {'Temp':'If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?',}, inplace = True)

In [716]:
# We create a new column for individuals which either have a diagnosed or believed mental condition
df['Mental Condition'] = " "

In [717]:
# We fill the column mental condition, with either 1 or 0 depending on the answers in other columns
df.loc[(df['Do you currently have a mental health disorder?'] == 1) | (df['Have you been diagnosed with a mental health condition by a medical professional?'] == 1) | (df['Believed Mental Conditions'] == 1), 'Mental Condition'] = 1
df.loc[(df['Do you currently have a mental health disorder?'] == 0) & (df['Have you been diagnosed with a mental health condition by a medical professional?'] == 0) & (df['Believed Mental Conditions'] == 0), 'Mental Condition'] = 0

In [718]:
with pd.option_context("display.max_columns", None):
    display(df.head(5))

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health concerns and options for seeking help?,"If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:",Do you think that discussing a mental health disorder with your employer would have negative consequences?,Do you think that discussing a physical health issue with your employer would have negative consequences?,Do you feel that your employer takes mental health as seriously as physical health?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you know local or online resources to seek help for a mental health disorder?,"If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?","If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?","If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?","If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?",Do you believe your productivity is ever affected by a mental health issue?,"If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?",Do you have previous employers?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s),Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you bring up a mental health issue with a potential employer in an interview?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,Have you been diagnosed with a mental health condition by a medical professional?,Have you ever sought treatment for a mental health issue from a mental health professional?,"If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?","If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?",What is your age?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate,Working in America,Living in America,Believed Mental Conditions,Male,Female,Other,Mental Condition
0,0,3,1.0,1,Not eligible for coverage / N/A,,No,No,Very easy,No,No,,0,0,,,,,,,1,0,0,0,1,0,1,1,0,0,3,0,0,0,0,0,1,0,,,39.0,1,1,0,0,0,0,0,0,1,0,0,1
1,0,2,1.0,1,No,Yes,Yes,Yes,Somewhat easy,No,No,Yes,0,0,,,,,,,1,1,0,1,0,0,0,1,0,0,3,0,0,1,1,1,1,1,1.0,2.0,29.0,0,1,1,0,0,1,1,0,1,0,0,1
2,0,2,1.0,1,No,,No,No,Neither easy nor difficult,,No,,0,0,,,,,,,1,0,0,1,1,1,1,1,1,1,3,1,1,0,0,0,0,1,,,38.0,2,1,0,0,0,0,0,0,1,0,0,0
4,0,2,0.0,1,Yes,Yes,No,No,Neither easy nor difficult,Yes,,No,0,0,,,,,,,1,0,1,0,1,1,0,1,1,0,3,1,1,1,1,1,1,1,2.0,2.0,43.0,1,1,1,1,1,1,1,0,0,1,0,1
5,0,6,1.0,1,Yes,I am not sure,No,Yes,Somewhat easy,Yes,Yes,No,1,0,,,,,,,1,1,0,0,1,1,0,0,1,0,3,1,0,0,0,0,0,1,,3.0,42.0,1,1,1,1,0,0,0,0,1,0,0,0


# Dataframe Cleanup - Splitting the Database

In [719]:
# We split the Dataframe in 2 different groups those who are self-emplyoed and those who are not
df_self_employed = df[df['Are you self-employed?'] == 1]
df_employed = df[df['Are you self-employed?'] == 0]

## Dataframe - Self-employed

In [720]:
# We show all the column with missing values
with pd.option_context("display.max_columns", None):
    display(df_self_employed.head(1))

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health concerns and options for seeking help?,"If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:",Do you think that discussing a mental health disorder with your employer would have negative consequences?,Do you think that discussing a physical health issue with your employer would have negative consequences?,Do you feel that your employer takes mental health as seriously as physical health?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you know local or online resources to seek help for a mental health disorder?,"If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?","If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?","If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?","If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?",Do you believe your productivity is ever affected by a mental health issue?,"If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?",Do you have previous employers?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s),Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you bring up a mental health issue with a potential employer in an interview?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,Have you been diagnosed with a mental health condition by a medical professional?,Have you ever sought treatment for a mental health issue from a mental health professional?,"If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?","If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?",What is your age?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate,Working in America,Living in America,Believed Mental Conditions,Male,Female,Other,Mental Condition
9,1,1,,1,,,,,,,,,0,1,0.0,,1.0,0.0,1.0,1.0,1,1,0,0,1,1,1,1,0,0,4,0,0,1,1,1,1,1,1.0,3.0,30.0,2,1,1,0,0,1,1,0,1,0,0,1


In [721]:
# We drop all columns that are employer and coworker related questions
df_self_employed = df_self_employed.drop(['Is your employer primarily a tech company/organization?', ], axis=1)
df_self_employed = df_self_employed.drop(['Does your employer provide mental health benefits as part of healthcare coverage?', ], axis=1)
df_self_employed = df_self_employed.drop(['Do you know the options for mental health care available under your employer-provided coverage?', ], axis=1)
df_self_employed = df_self_employed.drop(['Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?', ], axis=1)
df_self_employed = df_self_employed.drop(['Does your employer offer resources to learn more about mental health concerns and options for seeking help?', ], axis=1)
df_self_employed = df_self_employed.drop(['If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:', ], axis=1)
df_self_employed = df_self_employed.drop(['Do you think that discussing a mental health disorder with your employer would have negative consequences?', ], axis=1)
df_self_employed = df_self_employed.drop(['Do you think that discussing a physical health issue with your employer would have negative consequences?', ], axis=1)
df_self_employed = df_self_employed.drop(['Do you feel that your employer takes mental health as seriously as physical health?', ], axis=1)

In [722]:
# We show all the column
with pd.option_context("display.max_columns", None):
    display(df_self_employed.head(1))

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your primary role within your company related to tech/IT?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you know local or online resources to seek help for a mental health disorder?,"If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?","If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?","If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?","If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?",Do you believe your productivity is ever affected by a mental health issue?,"If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?",Do you have previous employers?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s),Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you bring up a mental health issue with a potential employer in an interview?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,Have you been diagnosed with a mental health condition by a medical professional?,Have you ever sought treatment for a mental health issue from a mental health professional?,"If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?","If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?",What is your age?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate,Working in America,Living in America,Believed Mental Conditions,Male,Female,Other,Mental Condition
9,1,1,1,0,1,0.0,,1.0,0.0,1.0,1.0,1,1,0,0,1,1,1,1,0,0,4,0,0,1,1,1,1,1,1.0,3.0,30.0,2,1,1,0,0,1,1,0,1,0,0,1


In [723]:
# We split the Dataframe then further in 2 different groups those who are have diagnosed mental conditions and those who do not
df_self_employed_condition = df_self_employed[(df_self_employed['Mental Condition'] == 1)]
df_self_employed_healthy = df_self_employed[(df_self_employed['Mental Condition'] == 0)]

In [724]:
# We drop all columns that are mental health related questions
df_self_employed = df_self_employed.drop(['If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?', ], axis=1)
df_self_employed = df_self_employed.drop(['If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?', ], axis=1)
df_self_employed = df_self_employed.drop(['If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?', ], axis=1)
df_self_employed = df_self_employed.drop(['If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?', ], axis=1)
df_self_employed = df_self_employed.drop(['Do you believe your productivity is ever affected by a mental health issue?', ], axis=1)
df_self_employed = df_self_employed.drop(['If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?', ], axis=1)
df_self_employed = df_self_employed.drop(['If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?', ], axis=1)
df_self_employed = df_self_employed.drop(['If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?', ], axis=1)

In [725]:
df_self_employed.isnull().sum()

Are you self-employed?                                                                                                                                                              0
How many employees does your company or organization have?                                                                                                                          0
Is your primary role within your company related to tech/IT?                                                                                                                        0
Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?                                                 0
Do you know local or online resources to seek help for a mental health disorder?                                                                                                    0
Do you have previous employers?                                                           

## Dataframe - Self-employed - Healthy

In [726]:
# We show all the columns
with pd.option_context("display.max_columns", None):
    display(df_self_employed_healthy.head(1))

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your primary role within your company related to tech/IT?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you know local or online resources to seek help for a mental health disorder?,"If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?","If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?","If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?","If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?",Do you believe your productivity is ever affected by a mental health issue?,"If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?",Do you have previous employers?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s),Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you bring up a mental health issue with a potential employer in an interview?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,Have you been diagnosed with a mental health condition by a medical professional?,Have you ever sought treatment for a mental health issue from a mental health professional?,"If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?","If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?",What is your age?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate,Working in America,Living in America,Believed Mental Conditions,Male,Female,Other,Mental Condition
33,1,1,1,0,0,0.0,0.0,1.0,,1.0,1.0,1,0,0,0,1,1,1,1,0,0,3,0,0,0,0,0,0,0,,,37.0,1,1,1,0,0,0,0,0,1,0,0,0


In [727]:
# We drop all columns that are mental health related questions
df_self_employed_healthy = df_self_employed_healthy.drop(['If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?', ], axis=1)
df_self_employed_healthy = df_self_employed_healthy.drop(['If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?', ], axis=1)
df_self_employed_healthy = df_self_employed_healthy.drop(['If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?', ], axis=1)
df_self_employed_healthy = df_self_employed_healthy.drop(['If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?', ], axis=1)
df_self_employed_healthy = df_self_employed_healthy.drop(['Do you believe your productivity is ever affected by a mental health issue?', ], axis=1)
df_self_employed_healthy = df_self_employed_healthy.drop(['If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?', ], axis=1)
df_self_employed_healthy = df_self_employed_healthy.drop(['If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?', ], axis=1)
df_self_employed_healthy = df_self_employed_healthy.drop(['If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?', ], axis=1)

In [728]:
# We show all the columns
with pd.option_context("display.max_columns", None):
    display(df_self_employed_healthy.head(1))

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your primary role within your company related to tech/IT?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you know local or online resources to seek help for a mental health disorder?,Do you have previous employers?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s),Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you bring up a mental health issue with a potential employer in an interview?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,Have you been diagnosed with a mental health condition by a medical professional?,Have you ever sought treatment for a mental health issue from a mental health professional?,What is your age?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate,Working in America,Living in America,Believed Mental Conditions,Male,Female,Other,Mental Condition
33,1,1,1,0,0,1,0,0,0,1,1,1,1,0,0,3,0,0,0,0,0,0,0,37.0,1,1,1,0,0,0,0,0,1,0,0,0


In [729]:
df_self_employed_healthy.isnull().sum()

Are you self-employed?                                                                                                                                                              0
How many employees does your company or organization have?                                                                                                                          0
Is your primary role within your company related to tech/IT?                                                                                                                        0
Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?                                                 0
Do you know local or online resources to seek help for a mental health disorder?                                                                                                    0
Do you have previous employers?                                                           

## Dataframe - Self-employed - Condition

In [730]:
# We show all the columns
with pd.option_context("display.max_columns", None):
    display(df_self_employed_condition.head())

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your primary role within your company related to tech/IT?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you know local or online resources to seek help for a mental health disorder?,"If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?","If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?","If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?","If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?",Do you believe your productivity is ever affected by a mental health issue?,"If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?",Do you have previous employers?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s),Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you bring up a mental health issue with a potential employer in an interview?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,Have you been diagnosed with a mental health condition by a medical professional?,Have you ever sought treatment for a mental health issue from a mental health professional?,"If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?","If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?",What is your age?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate,Working in America,Living in America,Believed Mental Conditions,Male,Female,Other,Mental Condition
9,1,1,1,0,1,0.0,,1.0,0.0,1.0,1.0,1,1,0,0,1,1,1,1,0,0,4,0,0,1,1,1,1,1,1.0,3.0,30.0,2,1,1,0,0,1,1,0,1,0,0,1
40,1,1,1,0,1,0.0,1.0,1.0,0.0,1.0,2.0,1,0,0,1,1,1,1,1,1,0,3,1,0,1,1,1,0,0,1.0,2.0,34.0,1,1,0,1,0,1,1,0,1,0,0,1
43,1,1,1,0,1,0.0,0.0,1.0,0.0,1.0,3.0,1,0,0,0,1,0,1,1,1,0,3,1,0,1,1,1,0,1,2.0,3.0,28.0,1,1,0,1,0,0,0,0,1,0,0,1
63,1,1,1,0,1,,,,,,,1,0,0,0,1,1,1,0,0,0,2,1,1,1,1,1,0,0,,,29.0,2,1,1,0,0,0,0,1,1,0,0,1
76,1,1,1,0,1,0.0,1.0,0.0,0.0,1.0,,0,0,0,0,1,1,1,1,0,0,3,1,1,1,1,1,0,0,,,19.0,0,1,0,0,0,0,0,0,1,0,0,1


## Column - "If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?"

### Check for row and column information

In [731]:
df_self_employed_condition.rename(columns = {'If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?':'Temp',}, inplace = True)

In [732]:
# We get all categories of the column
df_self_employed_condition['Temp'].unique()

array([ 3.,  2., nan,  1.,  0.])

In [733]:
print("0 :",df_self_employed_condition[df_self_employed_condition['Temp'] == 0].shape[0],"| 1 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 1].shape[0],"| 2 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 2].shape[0],"| 3 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 3].shape[0], "| NaN :", df_self_employed_condition[df_self_employed_condition['Temp'].isnull()].shape[0])

0 : 1 | 1 : 5 | 2 : 55 | 3 : 65 | NaN : 18


### Change row values - Imputation

In [734]:
# We impute the remaining missing values with the most occuring value in the column
df_self_employed_condition.loc[(df_self_employed_condition['Temp'].isnull()), 'Temp'] = 3

In [735]:
# We turn the float64 data type from the fourth column into an int data type
df_self_employed_condition['Temp'] = df_self_employed_condition['Temp'].astype(int)

### Check results

In [736]:
# We get all categories of the column
df_self_employed_condition['Temp'].unique()

array([3, 2, 1, 0])

In [737]:
df_self_employed_condition.rename(columns = {'Temp':'If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?',}, inplace = True)

## Column - "If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?"

### Check for row and column information

In [738]:
df_self_employed_condition.rename(columns = {'If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?':'Temp',}, inplace = True)

In [739]:
# We get all categories of the column
df_self_employed_condition['Temp'].unique()

array([ 1.,  2., nan,  0.,  3.])

In [740]:
print("0 :",df_self_employed_condition[df_self_employed_condition['Temp'] == 0].shape[0],"| 1 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 1].shape[0],"| 2 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 2].shape[0],"| 3 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 3].shape[0], "| NaN :", df_self_employed_condition[df_self_employed_condition['Temp'].isnull()].shape[0])

0 : 10 | 1 : 33 | 2 : 55 | 3 : 15 | NaN : 31


### Change row values - Imputation

In [741]:
# We impute the remaining missing values with the most occuring value in the column
df_self_employed_condition.loc[(df_self_employed_condition['Temp'].isnull()), 'Temp'] = 2

In [742]:
# We turn the float64 data type from the fourth column into an int data type
df_self_employed_condition['Temp'] = df_self_employed_condition['Temp'].astype(int)

### Check results

In [743]:
# We get all categories of the column
df_self_employed_condition['Temp'].unique()

array([1, 2, 0, 3])

In [744]:
df_self_employed_condition.rename(columns = {'Temp':'If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?',}, inplace = True)

## Column - "Do you believe your productivity is ever affected by a mental health issue?"

### Check for row and column information

In [745]:
df_self_employed_condition.rename(columns = {'Do you believe your productivity is ever affected by a mental health issue?':'Temp',}, inplace = True)

In [746]:
# We get all categories of the column
df_self_employed_condition['Temp'].unique()

array([ 1., nan,  0.])

In [747]:
print("0 :",df_self_employed_condition[df_self_employed_condition['Temp'] == 0].shape[0],"| 1 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 1].shape[0], "| NaN :", df_self_employed_condition[df_self_employed_condition['Temp'].isnull()].shape[0])

0 : 6 | 1 : 133 | NaN : 5


### Change row values - map values & Imputation

In [748]:
# We impute the remaining missing values with the most occuring value in the column
df_self_employed_condition.loc[(df_self_employed_condition['Temp'].isnull()), 'Temp'] = 0

In [749]:
# We turn the float64 data type from the fourth column into an int data type
df_self_employed_condition['Temp'] = df_self_employed_condition['Temp'].astype(int)

### Check for row and column information

In [750]:
# We get all categories of the column
df_self_employed_condition['Temp'].unique()

array([1, 0])

In [751]:
df_self_employed_condition.rename(columns = {'Temp':'Do you believe your productivity is ever affected by a mental health issue?',}, inplace = True)

## Column - "If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?"

### Check for row and column information

In [752]:
df_self_employed_condition.rename(columns = {'If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?':'Temp',}, inplace = True)

In [753]:
# We get all categories of the column
df_self_employed_condition['Temp'].unique()

array([ 1.,  2.,  3., nan,  4.])

In [754]:
print("4 :",df_self_employed_condition[df_self_employed_condition['Temp'] == 4].shape[0],"| 1 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 1].shape[0],"| 2 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 2].shape[0],"| 3 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 3].shape[0], "| NaN :", df_self_employed_condition[df_self_employed_condition['Temp'].isnull()].shape[0])

4 : 10 | 1 : 46 | 2 : 50 | 3 : 14 | NaN : 24


### Change row values - map values & Imputation

In [755]:
df_self_employed_condition.loc[(df_self_employed_condition['Do you believe your productivity is ever affected by a mental health issue?'] == 0), 'Temp'] = 0

In [756]:
print("0 :",df_self_employed_condition[df_self_employed_condition['Temp'] == 0].shape[0],"| 1 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 1].shape[0],"| 2 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 2].shape[0],"| 3 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 3].shape[0],"| 4 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 4].shape[0], "| NaN :", df_self_employed_condition[df_self_employed_condition['Temp'].isnull()].shape[0])

0 : 11 | 1 : 46 | 2 : 50 | 3 : 14 | 4 : 10 | NaN : 13


In [757]:
# We impute the remaining missing values with the most occuring value in the column
df_self_employed_condition.loc[(df_self_employed_condition['Temp'].isnull()), 'Temp'] = 2

In [758]:
# We turn the float64 data type from the fourth column into an int data type
df_self_employed_condition['Temp'] = df_self_employed_condition['Temp'].astype(int)

### Check results

In [759]:
# We get all categories of the column
df_self_employed_condition['Temp'].unique()

array([1, 2, 3, 0, 4])

In [760]:
df_self_employed_condition.rename(columns = {'Temp':'If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?',}, inplace = True)

## Column - "If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?"

### Check for row and column information

In [761]:
df_self_employed_condition.rename(columns = {'If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?':'Temp',}, inplace = True)

In [762]:
# We get all categories of the column
df_self_employed_condition['Temp'].unique()

array([ 0., nan,  1.])

In [763]:
print("0 :",df_self_employed_condition[df_self_employed_condition['Temp'] == 0].shape[0],"| 1 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 1].shape[0], "| NaN :", df_self_employed_condition[df_self_employed_condition['Temp'].isnull()].shape[0])

0 : 76 | 1 : 18 | NaN : 50


### Change row values - Imputation

In [764]:
# We impute the remaining missing values with the most occuring value in the column
df_self_employed_condition.loc[(df_self_employed_condition['Temp'].isnull()), 'Temp'] = 0

In [765]:
# We turn the float64 data type from the fourth column into an int data type
df_self_employed_condition['Temp'] = df_self_employed_condition['Temp'].astype(int)

### Check results

In [766]:
# We get all categories of the column
df_self_employed_condition['Temp'].unique()

array([0, 1])

In [767]:
df_self_employed_condition.rename(columns = {'Temp':'If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?',}, inplace = True)

## Column - "If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?"

### Check for row and column information

In [768]:
df_self_employed_condition.rename(columns = {'If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?':'Temp',}, inplace = True)

In [769]:
# We get all categories of the column
df_self_employed_condition['Temp'].unique()

array([ 1., nan,  0.])

In [770]:
print("0 :",df_self_employed_condition[df_self_employed_condition['Temp'] == 0].shape[0],"| 1 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 1].shape[0], "| NaN :", df_self_employed_condition[df_self_employed_condition['Temp'].isnull()].shape[0])

0 : 38 | 1 : 68 | NaN : 38


### Change row values - Imputation

In [771]:
# We impute the remaining missing values with 0
df_self_employed_condition.loc[(df_self_employed_condition['Temp'].isnull()), 'Temp'] = 0
# If someone left this field open we can assume it was not revealed to a coworker

In [772]:
# We turn the float64 data type from the fourth column into an int data type
df_self_employed_condition['Temp'] = df_self_employed_condition['Temp'].astype(int)

### Check results

In [773]:
# We get all categories of the column
df_self_employed_condition['Temp'].unique()

array([1, 0])

In [774]:
df_self_employed_condition.rename(columns = {'Temp':'If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?',}, inplace = True)

## Column - "If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?"

### Check for row and column information

In [775]:
df_self_employed_condition.rename(columns = {'If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?':'Temp',}, inplace = True)

In [776]:
# We get all categories of the column
df_self_employed_condition['Temp'].unique()

array([ 0., nan,  1.])

In [777]:
print("0 :",df_self_employed_condition[df_self_employed_condition['Temp'] == 0].shape[0],"| 1 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 1].shape[0], "| NaN :", df_self_employed_condition[df_self_employed_condition['Temp'].isnull()].shape[0])

0 : 74 | 1 : 36 | NaN : 34


### Change row values - Imputation

In [778]:
# We impute the remaining missing values with 0
df_self_employed_condition.loc[(df_self_employed_condition['Temp'].isnull()), 'Temp'] = 0
# If someone left this field open we can assume it was not revealed to a coworker

In [779]:
# We turn the float64 data type from the fourth column into an int data type
df_self_employed_condition['Temp'] = df_self_employed_condition['Temp'].astype(int)

### Check results

In [780]:
# We get all categories of the column
df_self_employed_condition['Temp'].unique()

array([0, 1])

In [781]:
df_self_employed_condition.rename(columns = {'Temp':'If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?',}, inplace = True)

## Column - "If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?"

### Check for row and column information

In [782]:
df_self_employed_condition.rename(columns = {'If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?':'Temp',}, inplace = True)

In [783]:
# We get all categories of the column
df_self_employed_condition['Temp'].unique()

array([nan,  1.,  0.])

In [784]:
print("0 :",df_self_employed_condition[df_self_employed_condition['Temp'] == 0].shape[0],"| 1 :", df_self_employed_condition[df_self_employed_condition['Temp'] == 1].shape[0], "| NaN :", df_self_employed_condition[df_self_employed_condition['Temp'].isnull()].shape[0])

0 : 66 | 1 : 20 | NaN : 58


### Change row values - Imputation

In [785]:
# We impute the remaining missing values with 0
df_self_employed_condition.loc[(df_self_employed_condition['Temp'].isnull()), 'Temp'] = 0
# If someone left this field open we can assume it did not have a negative impact

In [786]:
# We turn the float64 data type from the fourth column into an int data type
df_self_employed_condition['Temp'] = df_self_employed_condition['Temp'].astype(int)

### Check results

In [787]:
# We get all categories of the column
df_self_employed_condition['Temp'].unique()

array([0, 1])

In [788]:
df_self_employed_condition.rename(columns = {'Temp':'If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?',}, inplace = True)

In [789]:
# We show all the columns
with pd.option_context("display.max_columns", None):
    display(df_self_employed_condition.head(1))

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your primary role within your company related to tech/IT?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you know local or online resources to seek help for a mental health disorder?,"If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?","If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?","If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?","If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?",Do you believe your productivity is ever affected by a mental health issue?,"If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?",Do you have previous employers?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s),Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you bring up a mental health issue with a potential employer in an interview?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,Have you been diagnosed with a mental health condition by a medical professional?,Have you ever sought treatment for a mental health issue from a mental health professional?,"If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?","If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?",What is your age?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate,Working in America,Living in America,Believed Mental Conditions,Male,Female,Other,Mental Condition
9,1,1,1,0,1,0,0,1,0,1,1,1,1,0,0,1,1,1,1,0,0,4,0,0,1,1,1,1,1,1,3,30.0,2,1,1,0,0,1,1,0,1,0,0,1


In [790]:
df_self_employed_healthy.isnull().sum()

Are you self-employed?                                                                                                                                                              0
How many employees does your company or organization have?                                                                                                                          0
Is your primary role within your company related to tech/IT?                                                                                                                        0
Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?                                                 0
Do you know local or online resources to seek help for a mental health disorder?                                                                                                    0
Do you have previous employers?                                                           

## Dataframe - Employed

## Column - "Is your employer primarily a tech company/organization?"

### Check for row and column information

In [791]:
# We compute the number of empty rows in the third column
df_employed['Is your employer primarily a tech company/organization?'].unique()

array([1., 0.])

### Change row values - Convert

In [792]:
df_employed['Is your employer primarily a tech company/organization?'] = df_employed['Is your employer primarily a tech company/organization?'].astype(int)

### Check results

In [793]:
# We compute the number of empty rows in the third column
df_employed['Is your employer primarily a tech company/organization?'].unique()

array([1, 0])

## Column - "Does your employer provide mental health benefits as part of healthcare coverage?"

### Check for row and column information

In [794]:
df_employed.rename(columns = {'Does your employer provide mental health benefits as part of healthcare coverage?':'Temp',}, inplace = True)

In [795]:
# We confirm all the unique categories inside the column
df_employed['Temp'].unique()

array(['Not eligible for coverage / N/A', 'No', 'Yes', nan], dtype=object)

In [796]:
print("No :",df_employed[df_employed['Temp'] == 'No'].shape[0],"| Yes :", df_employed[df_employed['Temp'] == 'Yes'].shape[0],"| Not eligible for coverage :", df_employed[df_employed['Temp'] == 'Not eligible for coverage / N/A'].shape[0], "| NaN :", df_employed[df_employed['Temp'].isnull()].shape[0])

No : 174 | Yes : 410 | Not eligible for coverage : 68 | NaN : 274


### Change row values - Map and Impute

In [797]:
# We change the category to Yes
df_employed.loc[(df_employed['Temp'] == 'Not eligible for coverage / N/A'),'Temp'] = "Yes"
# Since the individual is not eligible for coverage we can assume the employer would provide mental health benefits

In [798]:
# We change empty categories to the most common value
df_employed.loc[(df_employed['Temp'].isnull()),'Temp'] = "Yes"

In [799]:
# We change the values of the column
df_employed['Temp'] = df_employed['Temp'].map({'No': 0, 'Yes': 1})

### Check results

In [800]:
# We confirm all the unique categories inside the column
df_employed['Temp'].unique()

array([1, 0], dtype=int64)

In [801]:
df_employed.rename(columns = {'Temp':'Does your employer provide mental health benefits as part of healthcare coverage?',}, inplace = True)

## Column - "Do you know the options for mental health care available under your employer-provided coverage?"

### Check for row and column information

In [802]:
df_employed.rename(columns = {'Do you know the options for mental health care available under your employer-provided coverage?':'Temp',}, inplace = True)

In [803]:
# We confirm all the unique categories inside the column
df_employed['Temp'].unique()

array([nan, 'Yes', 'I am not sure', 'No'], dtype=object)

In [804]:
print("No :",df_employed[df_employed['Temp'] == 'No'].shape[0],"| Yes :", df_employed[df_employed['Temp'] == 'Yes'].shape[0],"| I am not sure :", df_employed[df_employed['Temp'] == 'I am not sure'].shape[0], "| NaN :", df_employed[df_employed['Temp'].isnull()].shape[0])

No : 291 | Yes : 234 | I am not sure : 292 | NaN : 109


### Change row values - Map and Impute

In [805]:
# We change the category to No
df_employed.loc[(df_employed['Temp'] == 'I am not sure'),'Temp'] = "No"
# Since the individual is does not really know the options we can assume a no as answer

In [806]:
# We change the category to No
df_employed.loc[(df_employed['Temp'].isnull()),'Temp'] = "No"
# Since the individual does not give an answer we can assume a no as answer

In [807]:
# We change the values of the column
df_employed['Temp'] = df_employed['Temp'].map({'No': 0, 'Yes': 1})

### Check results

In [808]:
# We confirm all the unique categories inside the column
df_employed['Temp'].unique()

array([0, 1], dtype=int64)

In [809]:
df_employed.rename(columns = {'Temp':'Do you know the options for mental health care available under your employer-provided coverage?',}, inplace = True)

## Column - "Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?"

### Check for row and column information

In [810]:
df_employed.rename(columns = {'Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?':'Temp',}, inplace = True)

In [811]:
# We confirm all the unique categories inside the column
df_employed['Temp'].unique()

array(['No', 'Yes', nan], dtype=object)

In [812]:
print("No :",df_employed[df_employed['Temp'] == 'No'].shape[0],"| Yes :", df_employed[df_employed['Temp'] == 'Yes'].shape[0], "| NaN :", df_employed[df_employed['Temp'].isnull()].shape[0])

No : 656 | Yes : 188 | NaN : 82


### Change row values - Map and Impute

In [813]:
# We change the category to No
df_employed.loc[(df_employed['Temp'].isnull()),'Temp'] = "No"
# Since the individual does not give an answer we can assume a no as answer

In [814]:
# We change the values of the column
df_employed['Temp'] = df_employed['Temp'].map({'No': 0, 'Yes': 1})

### Check results

In [815]:
# We confirm all the unique categories inside the column
df_employed['Temp'].unique()

array([0, 1], dtype=int64)

In [816]:
df_employed.rename(columns = {'Temp':'Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?',}, inplace = True)

## Column - "Does your employer offer resources to learn more about mental health concerns and options for seeking help?"

### Check for row and column information

In [817]:
df_employed.rename(columns = {'Does your employer offer resources to learn more about mental health concerns and options for seeking help?':'Temp',}, inplace = True)

In [818]:
# We confirm all the unique categories inside the column
df_employed['Temp'].unique()

array(['No', 'Yes', nan], dtype=object)

In [819]:
print("No :",df_employed[df_employed['Temp'] == 'No'].shape[0],"| Yes :", df_employed[df_employed['Temp'] == 'Yes'].shape[0], "| NaN :", df_employed[df_employed['Temp'].isnull()].shape[0])

No : 433 | Yes : 241 | NaN : 252


### Change row values - Map and Impute

In [820]:
# We change the category to No
df_employed.loc[(df_employed['Temp'].isnull()),'Temp'] = "No"
# Since the individual does not give an answer we can assume a no as answer

In [821]:
# We change the values of the column
df_employed['Temp'] = df_employed['Temp'].map({'No': 0, 'Yes': 1})

### Check results

In [822]:
# We confirm all the unique categories inside the column
df_employed['Temp'].unique()

array([0, 1], dtype=int64)

In [823]:
df_employed.rename(columns = {'Temp':'Does your employer offer resources to learn more about mental health concerns and options for seeking help?',}, inplace = True)

## Column - "If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:"

### Check for row and column information

In [824]:
df_employed.rename(columns = {'If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:':'Temp',}, inplace = True)

In [825]:
# We confirm all the unique categories inside the column
df_employed['Temp'].unique()

array(['Very easy', 'Somewhat easy', 'Neither easy nor difficult',
       'Very difficult', 'Somewhat difficult', nan], dtype=object)

In [826]:
print("Very difficult :",df_employed[df_employed['Temp'] == 'Very difficult'].shape[0],"| Somewhat difficult :", df_employed[df_employed['Temp'] == 'Somewhat difficult'].shape[0], "| Neither easy nor difficult :", df_employed[df_employed['Temp'] == 'Neither easy nor difficult'].shape[0], "| Somewhat easy :", df_employed[df_employed['Temp'] == 'Somewhat easy'].shape[0], "| Very easy :", df_employed[df_employed['Temp'] == 'Very easy'].shape[0], "| NaN :", df_employed[df_employed['Temp'].isnull()].shape[0])

Very difficult : 89 | Somewhat difficult : 162 | Neither easy nor difficult : 149 | Somewhat easy : 219 | Very easy : 175 | NaN : 132


### Change row values - Map and Impute

In [827]:
# We impute the missing values with the median of the column
df_employed.loc[(df_employed['Temp'].isnull()),'Temp'] = "Neither easy nor difficult"

In [828]:
# We change the values of the column
df_employed['Temp'] = df_employed['Temp'].map({'Very difficult': 0, 'Somewhat difficult': 1, 'Neither easy nor difficult': 2, 'Somewhat easy': 3, 'Very easy': 4})

### Check results

In [829]:
# We confirm all the unique categories inside the column
df_employed['Temp'].unique()

array([4, 3, 2, 0, 1], dtype=int64)

In [830]:
df_employed.rename(columns = {'Temp':'If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:',}, inplace = True)

## Column - "Do you think that discussing a physical health issue with your employer would have negative consequences?"

### Check for row and column information

In [831]:
df_employed.rename(columns = {'Do you think that discussing a physical health issue with your employer would have negative consequences?':'Temp',}, inplace = True)

In [832]:
# We confirm all the unique categories inside the column
df_employed['Temp'].unique()

array(['No', nan, 'Yes'], dtype=object)

In [833]:
print("No :",df_employed[df_employed['Temp'] == 'No'].shape[0],"Yes :", df_employed[df_employed['Temp'] == 'Yes'].shape[0], "Is empty :", df_employed[df_employed['Temp'].isnull()].shape[0])

No : 682 Yes : 31 Is empty : 213


### Change row values - map values & Imputation

In [834]:
# We impute the missing values with the median of the column
df_employed.loc[(df_employed['Temp'].isnull()),'Temp'] = "No"

In [835]:
# We change the values of the column from yes to 1 and no to 0
df_employed['Temp'] = df_employed['Temp'].map({'No': 0, 'Yes': 1})

### Check results

In [836]:
# We confirm all the unique categories inside the column
df_employed['Temp'].unique()

array([0, 1], dtype=int64)

In [837]:
df_employed.rename(columns = {'Temp':'Do you think that discussing a physical health issue with your employer would have negative consequences?',}, inplace = True)

## Column - "Do you feel that your employer takes mental health as seriously as physical health?"

### Check for row and column information

In [838]:
df_employed.rename(columns = {'Do you feel that your employer takes mental health as seriously as physical health?':'Temp',}, inplace = True)

In [839]:
# We confirm all the unique categories inside the column
df_employed['Temp'].unique()

array([nan, 'Yes', 'No'], dtype=object)

In [840]:
print("No :",df_employed[df_employed['Temp'] == 'No'].shape[0],"| Yes :", df_employed[df_employed['Temp'] == 'Yes'].shape[0], "| NaN :", df_employed[df_employed['Temp'].isnull()].shape[0])

No : 242 | Yes : 268 | NaN : 416


### Change row values - Map and Impute

In [841]:
# We change the values to No if there are no negative consequences with physical health problems and the employer takes it as seriously as mental health
df_employed.loc[(df_employed['Temp'].isnull()) & (df_employed['Do you think that discussing a physical health issue with your employer would have negative consequences?'] == 0) & (df_employed['Do you think that discussing a mental health disorder with your employer would have negative consequences?'] == "Yes"),'Temp'] = "No"

In [842]:
# We change the values to No if there are no negative consequences with physical health problems and the employer takes it as seriously as mental health
df_employed.loc[(df_employed['Temp'].isnull()) & (df_employed['Do you think that discussing a physical health issue with your employer would have negative consequences?'] == 0) & (df_employed['Do you think that discussing a mental health disorder with your employer would have negative consequences?'] == "No"),'Temp'] = "Yes"

In [843]:
# If an employer does not even take physical conditions seriously there is a high chance that they wont see mental health as equal
df_employed.loc[(df_employed['Temp'].isnull()) & (df_employed['Do you think that discussing a physical health issue with your employer would have negative consequences?'] == 1),'Temp'] = "No"

In [844]:
# We change the values to Yes if the employer reacts positevly to physical health problems but unsure about status and mental health problems
df_employed.loc[(df_employed['Temp'].isnull()) & (df_employed['Do you think that discussing a physical health issue with your employer would have negative consequences?'] == 0) & (df_employed['Do you think that discussing a mental health disorder with your employer would have negative consequences?'].isnull()) ,'Temp'] = "Yes"

In [845]:
# We change the values of the column from yes to 1 and no to 0
df_employed['Temp'] = df_employed['Temp'].map({'No': 0, 'Yes': 1})

### Check results

In [846]:
# We confirm all the unique categories inside the column
df_employed['Temp'].unique()

array([1, 0], dtype=int64)

In [847]:
df_employed.rename(columns = {'Temp':'Do you feel that your employer takes mental health as seriously as physical health?',}, inplace = True)

## Column - "Do you think that discussing a mental health disorder with your employer would have negative consequences?"

### Check for row and column information

In [848]:
df_employed.rename(columns = {'Do you think that discussing a mental health disorder with your employer would have negative consequences?':'Temp',}, inplace = True)

In [849]:
# We confirm all the unique categories inside the column
df_employed['Temp'].unique()

array(['No', nan, 'Yes'], dtype=object)

In [850]:
print("No :",df_employed[df_employed['Temp'] == 'No'].shape[0],"Yes :", df_employed[df_employed['Temp'] == 'Yes'].shape[0], "Is empty :", df_employed[df_employed['Temp'].isnull()].shape[0])

No : 352 Yes : 174 Is empty : 400


### Change row values - map values & Imputation

In [851]:
# We change the values to No if there are no negative consequences with physical health problems and the employer takes it as seriously as mental health
df_employed.loc[(df_employed['Temp'].isnull()) & (df_employed['Do you think that discussing a physical health issue with your employer would have negative consequences?'] == 0) & (df_employed['Do you feel that your employer takes mental health as seriously as physical health?'] == 0),'Temp'] = "No"

In [852]:
# We change the values to Yes if there are negative conseuqences with physical health problems
df_employed.loc[(df_employed['Temp'].isnull()) & (df_employed['Do you think that discussing a physical health issue with your employer would have negative consequences?'] == 1 ),'Temp'] = "Yes"

In [853]:
# We change the values to No if there are no negative consequences with physical health problems and the employer takes it as seriously as mental health
df_employed.loc[(df_employed['Temp'].isnull()) & (df_employed['Do you think that discussing a physical health issue with your employer would have negative consequences?'] == 0) & (df_employed['Do you feel that your employer takes mental health as seriously as physical health?'] == 1),'Temp'] = "No"

In [854]:
# We change the values of the column from yes to 1 and no to 0
df_employed['Temp'] = df_employed['Temp'].map({'No': 0, 'Yes': 1})

### Check results

In [855]:
# We confirm all the unique categories inside the column
df_employed['Temp'].unique()

array([0, 1], dtype=int64)

In [856]:
df_employed.rename(columns = {'Temp':'Do you think that discussing a mental health disorder with your employer would have negative consequences?',}, inplace = True)

In [857]:
# We show all the column with missing values
with pd.option_context("display.max_columns", None):
    display(df_employed.head(1))

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health concerns and options for seeking help?,"If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:",Do you think that discussing a mental health disorder with your employer would have negative consequences?,Do you think that discussing a physical health issue with your employer would have negative consequences?,Do you feel that your employer takes mental health as seriously as physical health?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you know local or online resources to seek help for a mental health disorder?,"If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?","If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?","If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?","If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?",Do you believe your productivity is ever affected by a mental health issue?,"If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?",Do you have previous employers?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s),Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you bring up a mental health issue with a potential employer in an interview?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,Have you been diagnosed with a mental health condition by a medical professional?,Have you ever sought treatment for a mental health issue from a mental health professional?,"If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?","If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?",What is your age?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate,Working in America,Living in America,Believed Mental Conditions,Male,Female,Other,Mental Condition
0,0,3,1,1,1,0,0,0,4,0,0,1,0,0,,,,,,,1,0,0,0,1,0,1,1,0,0,3,0,0,0,0,0,1,0,,,39.0,1,1,0,0,0,0,0,0,1,0,0,1


In [858]:
# We split the Dataframe then further in 2 different groups those who are have diagnosed mental conditions and those who do not
df_employed_condition = df_employed[(df_employed['Mental Condition'] == 1)]
df_employed_healthy = df_employed[(df_employed['Mental Condition'] == 0)]

In [859]:
# We drop all columns that are mental health related questions
df_employed = df_employed.drop(['If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?'], axis=1)
df_employed = df_employed.drop(['If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?'], axis=1)
df_employed = df_employed.drop(['If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?'], axis=1)
df_employed = df_employed.drop(['If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?'], axis=1)
df_employed = df_employed.drop(['Do you believe your productivity is ever affected by a mental health issue?'], axis=1)
df_employed = df_employed.drop(['If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?'], axis=1)
df_employed = df_employed.drop(['If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?'], axis=1)
df_employed = df_employed.drop(['If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?'], axis=1)

## Dataframe - Employed - Healthy

In [860]:
# We show all the columns
with pd.option_context("display.max_columns", None):
    display(df_employed_healthy.head(1))

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health concerns and options for seeking help?,"If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:",Do you think that discussing a mental health disorder with your employer would have negative consequences?,Do you think that discussing a physical health issue with your employer would have negative consequences?,Do you feel that your employer takes mental health as seriously as physical health?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you know local or online resources to seek help for a mental health disorder?,"If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?","If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?","If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?","If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?",Do you believe your productivity is ever affected by a mental health issue?,"If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?",Do you have previous employers?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s),Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you bring up a mental health issue with a potential employer in an interview?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,Have you been diagnosed with a mental health condition by a medical professional?,Have you ever sought treatment for a mental health issue from a mental health professional?,"If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?","If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?",What is your age?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate,Working in America,Living in America,Believed Mental Conditions,Male,Female,Other,Mental Condition
2,0,2,1,1,0,0,0,0,2,0,0,1,0,0,,,,,,,1,0,0,1,1,1,1,1,1,1,3,1,1,0,0,0,0,1,,,38.0,2,1,0,0,0,0,0,0,1,0,0,0


In [861]:
# We drop all columns that are mental health related questions
df_employed_healthy = df_employed_healthy.drop(['If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?', ], axis=1)
df_employed_healthy = df_employed_healthy.drop(['If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?', ], axis=1)
df_employed_healthy = df_employed_healthy.drop(['If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?', ], axis=1)
df_employed_healthy = df_employed_healthy.drop(['If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?', ], axis=1)
df_employed_healthy = df_employed_healthy.drop(['Do you believe your productivity is ever affected by a mental health issue?', ], axis=1)
df_employed_healthy = df_employed_healthy.drop(['If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?', ], axis=1)
df_employed_healthy = df_employed_healthy.drop(['If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?', ], axis=1)
df_employed_healthy = df_employed_healthy.drop(['If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?', ], axis=1)

In [862]:
# We show all the columns
with pd.option_context("display.max_columns", None):
    display(df_employed_healthy.head(1))

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health concerns and options for seeking help?,"If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:",Do you think that discussing a mental health disorder with your employer would have negative consequences?,Do you think that discussing a physical health issue with your employer would have negative consequences?,Do you feel that your employer takes mental health as seriously as physical health?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you know local or online resources to seek help for a mental health disorder?,Do you have previous employers?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s),Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you bring up a mental health issue with a potential employer in an interview?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,Have you been diagnosed with a mental health condition by a medical professional?,Have you ever sought treatment for a mental health issue from a mental health professional?,What is your age?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate,Working in America,Living in America,Believed Mental Conditions,Male,Female,Other,Mental Condition
2,0,2,1,1,0,0,0,0,2,0,0,1,0,0,1,0,0,1,1,1,1,1,1,1,3,1,1,0,0,0,0,1,38.0,2,1,0,0,0,0,0,0,1,0,0,0


In [863]:
df_self_employed_condition.isnull().sum()

Are you self-employed?                                                                                                                                                              0
How many employees does your company or organization have?                                                                                                                          0
Is your primary role within your company related to tech/IT?                                                                                                                        0
Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?                                                 0
Do you know local or online resources to seek help for a mental health disorder?                                                                                                    0
If you have been diagnosed or treated for a mental health disorder, do you ever reveal thi

## Dataframe - Employed - Condition

In [864]:
# We show all the columns
with pd.option_context("display.max_columns", None):
    display(df_employed_condition.head(1))

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health concerns and options for seeking help?,"If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:",Do you think that discussing a mental health disorder with your employer would have negative consequences?,Do you think that discussing a physical health issue with your employer would have negative consequences?,Do you feel that your employer takes mental health as seriously as physical health?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you know local or online resources to seek help for a mental health disorder?,"If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?","If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?","If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?","If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?",Do you believe your productivity is ever affected by a mental health issue?,"If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?",Do you have previous employers?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s),Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you bring up a mental health issue with a potential employer in an interview?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,Have you been diagnosed with a mental health condition by a medical professional?,Have you ever sought treatment for a mental health issue from a mental health professional?,"If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?","If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?",What is your age?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate,Working in America,Living in America,Believed Mental Conditions,Male,Female,Other,Mental Condition
0,0,3,1,1,1,0,0,0,4,0,0,1,0,0,,,,,,,1,0,0,0,1,0,1,1,0,0,3,0,0,0,0,0,1,0,,,39.0,1,1,0,0,0,0,0,0,1,0,0,1


## Column - "If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?"

### Check for row and column information

In [865]:
df_employed_condition.rename(columns = {'If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?':'Temp',}, inplace = True)

In [866]:
# We get all categories of the column
df_employed_condition['Temp'].unique()

array([nan,  2.,  3.,  1.,  0.])

In [867]:
print("0 :",df_employed_condition[df_employed_condition['Temp'] == 0].shape[0],"| 1 :", df_employed_condition[df_employed_condition['Temp'] == 1].shape[0],"| 2 :", df_employed_condition[df_employed_condition['Temp'] == 2].shape[0],"| 3 :", df_employed_condition[df_employed_condition['Temp'] == 3].shape[0], "| NaN :", df_employed_condition[df_employed_condition['Temp'].isnull()].shape[0])

0 : 6 | 1 : 37 | 2 : 223 | 3 : 323 | NaN : 97


### Change row values - Imputation

In [868]:
# We impute the remaining missing values with the most occuring value in the column
df_employed_condition.loc[(df_employed_condition['Temp'].isnull()), 'Temp'] = 3

In [869]:
# We turn the float64 data type from the fourth column into an int data type
df_employed_condition['Temp'] = df_employed_condition['Temp'].astype(int)

### Check results

In [870]:
# We get all categories of the column
df_employed_condition['Temp'].unique()

array([3, 2, 1, 0])

In [871]:
df_employed_condition.rename(columns = {'Temp':'If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?',}, inplace = True)

## Column - "If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?"

### Check for row and column information

In [872]:
df_employed_condition.rename(columns = {'If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?':'Temp',}, inplace = True)

In [873]:
# We get all categories of the column
df_employed_condition['Temp'].unique()

array([nan,  1.,  2.,  0.,  3.])

In [874]:
print("0 :",df_employed_condition[df_employed_condition['Temp'] == 0].shape[0],"| 1 :", df_employed_condition[df_employed_condition['Temp'] == 1].shape[0],"| 2 :", df_employed_condition[df_employed_condition['Temp'] == 2].shape[0],"| 3 :", df_employed_condition[df_employed_condition['Temp'] == 3].shape[0], "| NaN :", df_employed_condition[df_employed_condition['Temp'].isnull()].shape[0])

0 : 80 | 1 : 207 | 2 : 206 | 3 : 32 | NaN : 161


### Change row values - Imputation

In [875]:
# We impute the remaining missing values with the most occuring value in the column
df_employed_condition.loc[(df_employed_condition['Temp'].isnull()), 'Temp'] = 2

In [876]:
# We turn the float64 data type from the fourth column into an int data type
df_employed_condition['Temp'] = df_employed_condition['Temp'].astype(int)

### Check results

In [877]:
# We get all categories of the column
df_employed_condition['Temp'].unique()

array([2, 1, 0, 3])

In [878]:
df_employed_condition.rename(columns = {'Temp':'If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?',}, inplace = True)

## Column - "Do you believe your productivity is ever affected by a mental health issue?"

### Check for row and column information

In [879]:
df_employed_condition.rename(columns = {'Do you believe your productivity is ever affected by a mental health issue?':'Temp',}, inplace = True)

In [880]:
# We get all categories of the column
df_employed_condition['Temp'].unique()

array([nan])

In [881]:
print("0 :",df_employed_condition[df_employed_condition['Temp'] == 0].shape[0],"| 1 :", df_employed_condition[df_employed_condition['Temp'] == 1].shape[0], "| NaN :", df_employed_condition[df_employed_condition['Temp'].isnull()].shape[0])

0 : 0 | 1 : 0 | NaN : 686


### Change row values - map values & Imputation

In [882]:
# We impute the remaining missing values with the most occuring value in the column
df_employed_condition.loc[(df_employed_condition['Temp'].isnull()), 'Temp'] = 0

In [883]:
# We turn the float64 data type from the fourth column into an int data type
df_employed_condition['Temp'] = df_employed_condition['Temp'].astype(int)

### Check for row and column information

In [884]:
# We get all categories of the column
df_employed_condition['Temp'].unique()

array([0])

In [885]:
df_employed_condition.rename(columns = {'Temp':'Do you believe your productivity is ever affected by a mental health issue?',}, inplace = True)

## Column - "If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?"

### Check for row and column information

In [886]:
df_employed_condition.rename(columns = {'If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?':'Temp',}, inplace = True)

In [887]:
# We get all categories of the column
df_employed_condition['Temp'].unique()

array([nan])

In [888]:
print("4 :",df_employed_condition[df_employed_condition['Temp'] == 4].shape[0],"| 1 :", df_employed_condition[df_employed_condition['Temp'] == 1].shape[0],"| 2 :", df_employed_condition[df_employed_condition['Temp'] == 2].shape[0],"| 3 :", df_employed_condition[df_employed_condition['Temp'] == 3].shape[0], "| NaN :", df_employed_condition[df_employed_condition['Temp'].isnull()].shape[0])

4 : 0 | 1 : 0 | 2 : 0 | 3 : 0 | NaN : 686


### Change row values - map values & Imputation

In [889]:
df_employed_condition.loc[(df_employed_condition['Do you believe your productivity is ever affected by a mental health issue?'] == 0), 'Temp'] = 0

In [890]:
print("0 :",df_employed_condition[df_employed_condition['Temp'] == 0].shape[0],"| 1 :", df_employed_condition[df_employed_condition['Temp'] == 1].shape[0],"| 2 :", df_employed_condition[df_employed_condition['Temp'] == 2].shape[0],"| 3 :", df_employed_condition[df_employed_condition['Temp'] == 3].shape[0],"| 4 :", df_employed_condition[df_employed_condition['Temp'] == 4].shape[0], "| NaN :", df_employed_condition[df_employed_condition['Temp'].isnull()].shape[0])

0 : 686 | 1 : 0 | 2 : 0 | 3 : 0 | 4 : 0 | NaN : 0


In [891]:
# We impute the remaining missing values with the most occuring value in the column
df_employed_condition.loc[(df_employed_condition['Temp'].isnull()), 'Temp'] = 2

In [892]:
# We turn the float64 data type from the fourth column into an int data type
df_employed_condition['Temp'] = df_employed_condition['Temp'].astype(int)

### Check results

In [893]:
# We get all categories of the column
df_employed_condition['Temp'].unique()

array([0])

In [894]:
df_employed_condition.rename(columns = {'Temp':'If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?',}, inplace = True)

## Column - "If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?"

### Check for row and column information

In [895]:
df_employed_condition.rename(columns = {'If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?':'Temp',}, inplace = True)

In [896]:
# We get all categories of the column
df_employed_condition['Temp'].unique()

array([nan])

In [897]:
print("0 :",df_employed_condition[df_employed_condition['Temp'] == 0].shape[0],"| 1 :", df_employed_condition[df_employed_condition['Temp'] == 1].shape[0], "| NaN :", df_employed_condition[df_employed_condition['Temp'].isnull()].shape[0])

0 : 0 | 1 : 0 | NaN : 686


### Change row values - Imputation

In [898]:
# We impute the remaining missing values with the most occuring value in the column
df_employed_condition.loc[(df_employed_condition['Temp'].isnull()), 'Temp'] = 0

In [899]:
# We turn the float64 data type from the fourth column into an int data type
df_employed_condition['Temp'] = df_employed_condition['Temp'].astype(int)

### Check results

In [900]:
# We get all categories of the column
df_employed_condition['Temp'].unique()

array([0])

In [901]:
df_employed_condition.rename(columns = {'Temp':'If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?',}, inplace = True)

## Column - "If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?"

### Check for row and column information

In [902]:
df_employed_condition.rename(columns = {'If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?':'Temp',}, inplace = True)

In [903]:
# We get all categories of the column
df_employed_condition['Temp'].unique()

array([nan])

In [904]:
print("0 :",df_employed_condition[df_employed_condition['Temp'] == 0].shape[0],"| 1 :", df_employed_condition[df_employed_condition['Temp'] == 1].shape[0], "| NaN :", df_employed_condition[df_employed_condition['Temp'].isnull()].shape[0])

0 : 0 | 1 : 0 | NaN : 686


### Change row values - Imputation

In [905]:
# We impute the remaining missing values with 0
df_employed_condition.loc[(df_employed_condition['Temp'].isnull()), 'Temp'] = 0
# If someone left this field open we can assume it was not revealed to a coworker

In [906]:
# We turn the float64 data type from the fourth column into an int data type
df_employed_condition['Temp'] = df_employed_condition['Temp'].astype(int)

### Check results

In [907]:
# We get all categories of the column
df_employed_condition['Temp'].unique()

array([0])

In [908]:
df_employed_condition.rename(columns = {'Temp':'If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?',}, inplace = True)

## Column - "If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?"

### Check for row and column information

In [909]:
df_employed_condition.rename(columns = {'If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?':'Temp',}, inplace = True)

In [910]:
# We get all categories of the column
df_employed_condition['Temp'].unique()

array([nan])

In [911]:
print("0 :",df_employed_condition[df_employed_condition['Temp'] == 0].shape[0],"| 1 :", df_employed_condition[df_employed_condition['Temp'] == 1].shape[0], "| NaN :", df_employed_condition[df_employed_condition['Temp'].isnull()].shape[0])

0 : 0 | 1 : 0 | NaN : 686


### Change row values - Imputation

In [912]:
# We impute the remaining missing values with 0
df_employed_condition.loc[(df_employed_condition['Temp'].isnull()), 'Temp'] = 0
# If someone left this field open we can assume it was not revealed to a coworker

In [913]:
# We turn the float64 data type from the fourth column into an int data type
df_employed_condition['Temp'] = df_employed_condition['Temp'].astype(int)

### Check results

In [914]:
# We get all categories of the column
df_employed_condition['Temp'].unique()

array([0])

In [915]:
df_employed_condition.rename(columns = {'Temp':'If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?',}, inplace = True)

## Column - "If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?"

### Check for row and column information

In [916]:
df_employed_condition.rename(columns = {'If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?':'Temp',}, inplace = True)

In [917]:
# We get all categories of the column
df_employed_condition['Temp'].unique()

array([nan])

In [918]:
print("0 :",df_employed_condition[df_employed_condition['Temp'] == 0].shape[0],"| 1 :", df_employed_condition[df_employed_condition['Temp'] == 1].shape[0], "| NaN :", df_employed_condition[df_employed_condition['Temp'].isnull()].shape[0])

0 : 0 | 1 : 0 | NaN : 686


### Change row values - Imputation

In [919]:
# We impute the remaining missing values with 0
df_employed_condition.loc[(df_employed_condition['Temp'].isnull()), 'Temp'] = 0
# If someone left this field open we can assume it did not have a negative impact

In [920]:
# We turn the float64 data type from the fourth column into an int data type
df_employed_condition['Temp'] = df_employed_condition['Temp'].astype(int)

### Check results

In [921]:
# We get all categories of the column
df_employed_condition['Temp'].unique()

array([0])

In [922]:
df_employed_condition.rename(columns = {'Temp':'If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?',}, inplace = True)

In [923]:
# We show all the columns
with pd.option_context("display.max_columns", None):
    display(df_employed_condition.head(1))

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health concerns and options for seeking help?,"If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:",Do you think that discussing a mental health disorder with your employer would have negative consequences?,Do you think that discussing a physical health issue with your employer would have negative consequences?,Do you feel that your employer takes mental health as seriously as physical health?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you know local or online resources to seek help for a mental health disorder?,"If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?","If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?","If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?","If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?",Do you believe your productivity is ever affected by a mental health issue?,"If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?",Do you have previous employers?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s),Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you bring up a mental health issue with a potential employer in an interview?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,Have you been diagnosed with a mental health condition by a medical professional?,Have you ever sought treatment for a mental health issue from a mental health professional?,"If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?","If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?",What is your age?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate,Working in America,Living in America,Believed Mental Conditions,Male,Female,Other,Mental Condition
0,0,3,1,1,1,0,0,0,4,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,1,0,0,3,0,0,0,0,0,1,0,2,3,39.0,1,1,0,0,0,0,0,0,1,0,0,1


In [924]:
df_employed_condition.isnull().sum()

Are you self-employed?                                                                                                                                                              0
How many employees does your company or organization have?                                                                                                                          0
Is your employer primarily a tech company/organization?                                                                                                                             0
Is your primary role within your company related to tech/IT?                                                                                                                        0
Does your employer provide mental health benefits as part of healthcare coverage?                                                                                                   0
Do you know the options for mental health care available under your employer-provided cove

## Dataframe - Column Cleanup - Delete Columns unusable for Analysis

In [925]:
# We drop all columns that are mental health related questions
df = df.drop(['If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to clients or business contacts?', ], axis=1)
df = df.drop(['If you have revealed a mental health issue to a client or business contact, do you believe this has impacted you negatively?', ], axis=1)
df = df.drop(['If you have been diagnosed or treated for a mental health disorder, do you ever reveal this to coworkers or employees?', ], axis=1)
df = df.drop(['If you have revealed a mental health issue to a coworker or employee, do you believe this has impacted you negatively?', ], axis=1)
df = df.drop(['Do you believe your productivity is ever affected by a mental health issue?', ], axis=1)
df = df.drop(['If yes, what percentage of your work time (time performing primary or secondary job functions) is affected by a mental health issue?', ], axis=1)
df = df.drop(['If you have a mental health issue, do you feel that it interferes with your work when being treated effectively?', ], axis=1)
df = df.drop(['If you have a mental health issue, do you feel that it interferes with your work when NOT being treated effectively?', ], axis=1)

In [926]:
# We drop all columns that are employer and coworker related questions
df = df.drop(['Is your employer primarily a tech company/organization?', ], axis=1)
df = df.drop(['Does your employer provide mental health benefits as part of healthcare coverage?', ], axis=1)
df = df.drop(['Do you know the options for mental health care available under your employer-provided coverage?', ], axis=1)
df = df.drop(['Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?', ], axis=1)
df = df.drop(['Does your employer offer resources to learn more about mental health concerns and options for seeking help?', ], axis=1)
df = df.drop(['If a mental health issue prompted you to request a medical leave from work, asking for that leave would be:', ], axis=1)
df = df.drop(['Do you think that discussing a mental health disorder with your employer would have negative consequences?', ], axis=1)
df = df.drop(['Do you think that discussing a physical health issue with your employer would have negative consequences?', ], axis=1)
df = df.drop(['Do you feel that your employer takes mental health as seriously as physical health?', ], axis=1)

In [927]:
# We show all the column with missing values
with pd.option_context("display.max_columns", None):
    display(df.head(1))

Unnamed: 0,Are you self-employed?,How many employees does your company or organization have?,Is your primary role within your company related to tech/IT?,Have you heard of or observed negative consequences for co-workers who have been open about mental health issues in your workplace?,Do you know local or online resources to seek help for a mental health disorder?,Do you have previous employers?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health issues and how to seek help?,Do you think that discussing a mental health disorder with previous employers would have negative consequences?,Do you think that discussing a physical health issue with previous employers would have negative consequences?,Would you have been willing to discuss a mental health issue with your previous co-workers?,Would you have been willing to discuss a mental health issue with your direct supervisor(s),Did you hear of or observe negative consequences for co-workers with mental health issues in your previous workplaces?,Would you bring up a mental health issue with a potential employer in an interview?,How willing would you be to share with friends and family that you have a mental illness?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have your observations of how another individual who discussed a mental health disorder made you less likely to reveal a mental health issue yourself in your current workplace?,Do you have a family history of mental illness?,Have you had a mental health disorder in the past?,Do you currently have a mental health disorder?,Have you been diagnosed with a mental health condition by a medical professional?,Have you ever sought treatment for a mental health issue from a mental health professional?,What is your age?,Do you work remotely?,Back-end Developer,Front-end Developer,DevOps/SysAdmin,Dev Evangelist/Advocate,Working in America,Living in America,Believed Mental Conditions,Male,Female,Other,Mental Condition
0,0,3,1,0,0,1,0,0,0,1,0,1,1,0,0,3,0,0,0,0,0,1,0,39.0,1,1,0,0,0,0,0,0,1,0,0,1


## Dataframe - Writing to CSV

In [928]:
# We export all created dataframes as CSV files
df.to_csv('df.csv')

df_self_employed.to_csv('df_self_employed.csv')
df_employed.to_csv('df_employed.csv')

df_employed_healthy.to_csv('df_employed_healthy.csv')
df_employed_condition.to_csv('df_employed_condition.csv')

df_self_employed_healthy.to_csv('df_self_employed_healthy.csv')
df_self_employed_condition.to_csv('df_self_employed_condition.csv')