#### Job Satisfaction

In this notebook, you will be exploring job satisfaction according to the survey results.  Use the cells at the top of the notebook to explore as necessary, and use your findings to solve the questions at the bottom of the notebook.

In [1]:
import pandas as pd
import numpy as np
import JobSatisfaction as t
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_csv('./survey_results_public.csv')
schema = pd.read_csv('./survey_results_schema.csv')
df.head()

Unnamed: 0,Respondent,Professional,ProgramHobby,Country,University,EmploymentStatus,FormalEducation,MajorUndergrad,HomeRemote,CompanySize,...,StackOverflowMakeMoney,Gender,HighestEducationParents,Race,SurveyLong,QuestionsInteresting,QuestionsConfusing,InterestedAnswers,Salary,ExpectedSalary
0,1,Student,"Yes, both",United States,No,"Not employed, and not looking for work",Secondary school,,,,...,Strongly disagree,Male,High school,White or of European descent,Strongly disagree,Strongly agree,Disagree,Strongly agree,,
1,2,Student,"Yes, both",United Kingdom,"Yes, full-time",Employed part-time,Some college/university study without earning ...,Computer science or software engineering,"More than half, but not all, the time",20 to 99 employees,...,Strongly disagree,Male,A master's degree,White or of European descent,Somewhat agree,Somewhat agree,Disagree,Strongly agree,,37500.0
2,3,Professional developer,"Yes, both",United Kingdom,No,Employed full-time,Bachelor's degree,Computer science or software engineering,"Less than half the time, but at least one day ...","10,000 or more employees",...,Disagree,Male,A professional degree,White or of European descent,Somewhat agree,Agree,Disagree,Agree,113750.0,
3,4,Professional non-developer who sometimes write...,"Yes, both",United States,No,Employed full-time,Doctoral degree,A non-computer-focused engineering discipline,"Less than half the time, but at least one day ...","10,000 or more employees",...,Disagree,Male,A doctoral degree,White or of European descent,Agree,Agree,Somewhat agree,Strongly agree,,
4,5,Professional developer,"Yes, I program as a hobby",Switzerland,No,Employed full-time,Master's degree,Computer science or software engineering,Never,10 to 19 employees,...,,,,,,,,,,


In [10]:
#Space for your code
print("The proportion of missing values in the"
      " Job Satisfaction column: {}".format(df['JobSatisfaction'].isnull().sum() / len(df)))

The proportion of missing values in the Job Satisfaction column: 0.20149722542142184


In [29]:
#More space for code
job_sat_dict = {}
for x in set(df['EmploymentStatus']):
    subset_df = df[df['EmploymentStatus']==x]
    score = subset_df['JobSatisfaction'].mean()
    if score is not np.nan:
        job_sat_dict[x] = score
best_group = max(job_sat_dict, key=job_sat_dict.get)
print("Group [{}] has the highest averaged job satisfaction at {:.4f}".format(
        best_group, job_sat_dict[best_group]))

Group [Independent contractor, freelancer, or self-employed] has the highest averaged job satisfaction at 7.2320


In [31]:
set(df['CompanySize'])

{'1,000 to 4,999 employees',
 '10 to 19 employees',
 '10,000 or more employees',
 '100 to 499 employees',
 '20 to 99 employees',
 '5,000 to 9,999 employees',
 '500 to 999 employees',
 'Fewer than 10 employees',
 "I don't know",
 'I prefer not to answer',
 nan}

In [38]:
#Additional space for your additional code
small_company_sizes = ['Fewer than 10 employees', '10 to 19 employees', '20 to 99 employees', '100 to 499 employees']
big_company_sizes = ['500 to 999 employees', '1,000 to 4,999 employees', '5,000 to 9,999 employees', '10,000 or more employees']

subset_df1 = df[df['CompanySize'].isin(small_company_sizes)]
score1 = subset_df1['JobSatisfaction'].mean()
subset_df2 = df[df['CompanySize'].isin(big_company_sizes)]
score2 = subset_df2['JobSatisfaction'].mean()

print("Average job satisfaction for small companies with size < 500: {}".format(score1))
print("Average job satisfaction for big companies with size >= 500: {}".format(score2))

Average job satisfaction for small companies with size < 500: 7.018271837793233
Average job satisfaction for big companies with size >= 500: 6.876949009692373


In [39]:
#Feel free to create new cells as you need them

#### Question 1

**1.** Use the space above to assist in matching each variable (**a**, **b**, **c**, **d**, **e**, **f**, **g**, or **h** ) as the appropriate key that describes the value in the **job_sol_1** dictionary.

In [40]:
a = 0.734
b = 0.2014
c = 'full-time'
d = 'contractors'
e = 'retired'
f = 'yes'
g = 'no'
h = 'hard to tell'

job_sol_1 = {'The proportion of missing values in the Job Satisfaction column': b,
             'According to EmploymentStatus, which group has the highest average job satisfaction?': d, 
             'In general, do smaller companies appear to have employees with higher job satisfaction?': f}
             
t.jobsat_check1(job_sol_1)

Nice job! That's what we found as well!


In [41]:
program_outside_of_work = ['Yes, I contribute to open source projects', 'Yes, I program as a hobby', 'Yes, both']
program_only_at_work = ['No']

subset_df1 = df[df['ProgramHobby'].isin(program_outside_of_work)]
score1 = subset_df1['JobSatisfaction'].mean()
subset_df2 = df[df['ProgramHobby'].isin(program_only_at_work)]
score2 = subset_df2['JobSatisfaction'].mean()

print("Average job satisfaction of people who program outside of work: {}".format(score1))
print("Average job satisfaction of people who program only for work: {}".format(score2))

Average job satisfaction of people who program outside of work: 7.034508564776318
Average job satisfaction of people who program only for work: 6.874806321660985


In [44]:
remote_work = ['A few days each month', 'About half the time', "All or almost all the time (I'm full-time remote)", \
              'Less than half the time, but at least one day each week', 'More than half, but not all, the time']
work_at_office = ['Never']

subset_df1 = df[df['HomeRemote'].isin(remote_work)]
score1 = subset_df1['JobSatisfaction'].mean()
subset_df2 = df[df['HomeRemote'].isin(work_at_office)]
score2 = subset_df2['JobSatisfaction'].mean()

print("Average job satisfaction of people with flexibility of remote working: {}".format(score1))
print("Average job satisfaction of people with NO flexibility of remote working: {}".format(score2))

Average job satisfaction of people with flexibility of remote working: 7.152193830805253
Average job satisfaction of people with NO flexibility of remote working: 6.697127393838468


In [47]:
set(df['HighestEducationParents'])

subset_df1 = df[df['HighestEducationParents']=='A doctoral degree']
score1 = subset_df1['JobSatisfaction'].mean()
subset_df2 = df[df['HighestEducationParents']!='A doctoral degree']
score2 = subset_df2['JobSatisfaction'].mean()

print("Average job satisfaction of people with a doctoral degree: {}".format(score1))
print("Average job satisfaction of people with NO doctoral degree: {}".format(score2))

Average job satisfaction of people with a doctoral degree: 7.1561969439728355
Average job satisfaction of people with NO doctoral degree: 6.994476268412439


#### Question 2

**2.** Use the space above to assist in matching each variable (**a**, **b**, **c** ) as the appropriate key that describes the value in the **job_sol_2** dictionary. Notice you can have the same letter appear more than once.

In [48]:
a = 'yes'
b = 'no'
c = 'hard to tell'

job_sol_2 = {'Do individuals who program outside of work appear to have higher JobSatisfaction?': a,
             'Does flexibility to work outside of the office appear to have an influence on JobSatisfaction?': a, 
             'A friend says a Doctoral degree increases the chance of having job you like, does this seem true?': a}
             
t.jobsat_check2(job_sol_2)

Nice job! That's what we found as well!
