# Clean and Analyze Employee Exit Surveys

## Introduction

dataset
- exit surveys of employees from Queensland, Australia
    - Department of Education, Training and Employment (DETE)
    - Technical and Further Education (TAFE)
    - encoded to UTF-8

project goal
- Are employes who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction?
- What about the employees who have been there longer?
- Are younger employees resigning due to some kind of dissatisfaction?
- What about older employees?

- combine results for both surveys to answer the quetions
- use same survey template, but one customized some of the answers
- no data dictionary available

skills:
- apply(), map()
- fillna(), dropna(), drop()
- melt()
- concat(), merge()

In [3]:
import numpy as np
import pandas as pd

## 1. The DETE and TAFE Survey Datasets

`dete_survey.csv`
* `ID` participant ID
* `SeparationType` reason why employment ended
* `Cease Date` year or month employment ended
* `DETE Start Date` year employemnt started

`tafe_survey.csv`
* `Record ID` participant ID
* `Reason for ceasing employment`
* `LengthofServiceOverall. Overall Length of Service at Institute (in years)` employment in years

In [53]:
# read in and preview datasets
dete_raw = pd.read_csv('/Users/slp22/code/dataquest projects/dete_survey.csv')
tafe_raw = pd.read_csv('/Users/slp22/code/dataquest projects/tafe_survey.csv')

### DETE

In [121]:
# dete_raw.head()

In [None]:
# dete_raw.info()

In [None]:
# dete_raw.columns

In [None]:
# dete_raw.isnull()

In [None]:
# dete_raw['SeparationType'].value_counts()

Unnamed: 0_level_0,count
SeparationType,Unnamed: 1_level_1
Age Retirement,285
Resignation-Other reasons,150
Resignation-Other employer,91
Resignation-Move overseas/interstate,70
Voluntary Early Retirement (VER),67
Ill Health Retirement,61
Other,49
Contract Expired,34
Termination,15


In [None]:
# dete_raw['Position'].value_counts()

Unnamed: 0_level_0,count
Position,Unnamed: 1_level_1
Teacher,324
Teacher Aide,137
Public Servant,126
Cleaner,97
Head of Curriculum/Head of Special Education,38
Schools Officer,24
School Administrative Staff,16
Guidance Officer,12
Technical Officer,11
Other,7


In [None]:
# dete_raw['Classification'].value_counts()

**`dete_raw`**
- RangeIndex: 822 entries, 0 to 821
- Data columns (total 56 columns)
- Dytpe: ID=int, others=object, bool
- Non-Null: Business Unit, Aboriginal, Torres Strait, South Sea, Disability, NESB

### TAFE

In [99]:
# tafe_raw.head()

In [None]:
# tafe_raw.info()

In [None]:
# tafe_raw.columns

In [None]:
# tafe_raw.isnull()

In [None]:
# tafe_raw['Reason for ceasing employment'].value_counts()

Unnamed: 0_level_0,count
Reason for ceasing employment,Unnamed: 1_level_1
Resignation,340
Contract Expired,127
Retrenchment/ Redundancy,104
Retirement,82
Transfer,25
Termination,23


In [None]:
# tafe_raw['Employment Type. Employment Type'].value_counts()

Unnamed: 0_level_0,count
Employment Type. Employment Type,Unnamed: 1_level_1
Permanent Full-time,237
Temporary Full-time,177
Contract/casual,71
Permanent Part-time,59
Temporary Part-time,52


In [None]:
# tafe_raw['Classification. Classification'].value_counts()

**`tafe_raw`**
- Record ID in scientific notation
- Columns names are long, descriptive, repetitive
- RangeIndex: 702 entries, 0 to 701
- Data columns (total 72 columns)
- Dtype: ID=int, others=object, cessation year=float
- Non-Null: range 400-500 of 700 rows

## 2. Identify Missing Values and Drop Unnecessary Columns

In [56]:
# dete_raw = pd.read_csv('/dete_survey.csv', na_values="Not Stated")
dete_raw = pd.read_csv('/Users/slp22/code/dataquest projects/dete_survey.csv')

In [100]:
#dete_raw drop columns [28:49] axis=1
dete = dete_raw.drop(dete_raw.columns[28:49],axis=1)
# dete.head()

In [101]:
#tafe drop columns [17:66] axis=1
tafe = tafe_raw.drop(tafe_raw.columns[17:66], axis=1)
# tafe.head()

### Dropped columns from `tafe` [28:39] and `dete` [17:66] that are not relevant to this analysis. And will make the data easier to work with.  

## 3. Clean Column Names

In [102]:
dete_col = dete.columns
# dete_col

In [103]:
tafe_col = tafe.columns
# tafe_col

### 🧹 functions to clean up text

In [122]:
# function to make each column name lowercase
def lower(cols):
    lower_cols = []
    for c in cols:
        lower_cols.append(c.lower())
    return lower_cols

In [123]:
# lower(dete_col)

In [124]:
# function to remove trailing whitespace from end of strings
def spaceless(cols):
    spaceless_cols = []
    for c in cols:
        spaceless_cols.append(c.rstrip())
    return spaceless_cols

In [125]:
# spaceless(dete_col)

In [126]:
# function to replace space with underscore
def replace_punctuation(cols):
    underscore_cols = []
    for c in cols:
        new_c = c.replace(" ", "_").replace(".", "").replace("-", "")
        underscore_cols.append(new_c)
    return underscore_cols

In [127]:
# replace_punctuation(dete_col)

### 🚫 apply clean up functions #1 (nested functions)

In [110]:
# from types import new_class
def clean_up(col):
    new_cols = []

# function to make each column name lowercase

#   # function to remove trailing whitespace from end of strings
#     def spaceless(new_cols):
#         for c in new_cols:
#             new_cols.append(c.rstrip())
#         return new_cols

#   # function to replace space with underscore
#     def replace_punctuation(new_cols):
#         for c in new_cols:
#             new_c = c.replace(" ", "_").replace(".", "").replace("-", "")
#             new_cols.append(new_c)
#         return new_cols
    # return lower(new_cols


In [111]:
# # higer order function lesson

# def generate_age_checker(min_age):
#     def check_age(age):
#         return age > min_age
#     return check_age

# check_min_18 = generate_age_checker(18)
# check_min_21 = generate_age_checker(21)

# print(check_min_18(20))
# print(check_min_21(20))
    

### ✅ apply clean up functions #2 (sequential)



In [112]:
# # appy lower, spaceless, and replace_punctuation functions for tafe_col
# lower_tafe = lower(tafe_col)
# spaceless_tafe = spaceless(lower_tafe)
# clean_tafe_cols = replace_punctuation(spaceless_tafe)

In [113]:
# # appy lower, spaceless, and replace_punctuation functions for dete_col
# lower_dete = lower(dete_col)
# spaceless_dete = spaceless(lower_dete)
# clean_dete_cols = replace_punctuation(spaceless_dete)

### ✅ apply clean up functions #3 (nest func)

In [114]:
# replace_punctuation(spaceless(lower(dete_col)))

In [115]:
# replace_punctuation(spaceless(lower(tafe_col)))

### ✅ apply clean up functions #4 (call func)

# best practice
def clean_up(col):
    lowercased = lower(col)
    without_spaces = spaceless(lowercased)
    without_punctuation = replace_punctuation(without_spaces)
    return without_punctuation

In [129]:
new_dete_col = clean_up(dete_col)
dete.columns = new_dete_col
dete.columns

Index(['id', 'separationtype', 'cease_date', 'dete_start_date',
       'role_start_date', 'position', 'classification', 'region',
       'business_unit', 'employment_status', 'career_move_to_public_sector',
       'career_move_to_private_sector', 'interpersonal_conflicts',
       'job_dissatisfaction', 'dissatisfaction_with_the_department',
       'physical_work_environment', 'lack_of_recognition',
       'lack_of_job_security', 'work_location', 'employment_conditions',
       'maternity/family', 'relocation', 'study/travel', 'ill_health',
       'traumatic_incident', 'work_life_balance', 'workload',
       'none_of_the_above', 'gender', 'age', 'aboriginal', 'torres_strait',
       'south_sea', 'disability', 'nesb'],
      dtype='object')

In [118]:
# tafe.columns

In [119]:
tafe.rename({'Record ID': 'id',
             'CESSATION YEAR': 'cease_date',
             'Reason for ceasing employment': 'separationtype',
             'Gender. What is your Gender?': 'gender',
             'CurrentAge. Current Age': 'age',
             'Employment Type. Employment Type': 'employment_status',
             'Classification. Classification': 'position',
             'LengthofServiceOverall. Overall Length of Service at Institute (in years)': 'institute_service',
             'LengthofServiceCurrent. Length of Service at current workplace (in years)': 'role_service'}, 
            axis='columns',
           inplace=True)

In [120]:
# tafe.head()

### Renamed col names to make it easier to call. 

## 4. Filter the Data

*Filter data to answer*
- Are employees who have only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? 
- What about employees who have been at the job longer?

In [131]:
dete['separationtype'].unique() #'Resignation-Other reasons'
                                #'Resignation-Other employer'
                                #'Resignation-Move overseas/interstate'

array(['Ill Health Retirement', 'Voluntary Early Retirement (VER)',
       'Resignation-Other reasons', 'Age Retirement',
       'Resignation-Other employer',
       'Resignation-Move overseas/interstate', 'Other',
       'Contract Expired', 'Termination'], dtype=object)

In [132]:
tafe['separationtype'].unique() #'Resignation'

array(['Contract Expired', 'Retirement', 'Resignation',
       'Retrenchment/ Redundancy', 'Termination', 'Transfer', nan],
      dtype=object)

### `separationtype`

- `dete`
    - `Resignation-Other reasons`
    - `Resignation-Other employer`
    - `Resignation-Move overseas/interstate`
- `tafe`
    - `Resignation`

In [166]:
#copy datasets
dete.copy()
tafe.copy()

Unnamed: 0,id,Institute,WorkArea,cease_date,separationtype,Contributing Factors. Career Move - Public Sector,Contributing Factors. Career Move - Private Sector,Contributing Factors. Career Move - Self-employment,Contributing Factors. Ill Health,Contributing Factors. Maternity/Family,...,Contributing Factors. Study,Contributing Factors. Travel,Contributing Factors. Other,Contributing Factors. NONE,gender,age,employment_status,position,institute_service,role_service
0,6.341330e+17,Southern Queensland Institute of TAFE,Non-Delivery (corporate),2010.0,Contract Expired,,,,,,...,,,,,Female,26 30,Temporary Full-time,Administration (AO),1-2,1-2
1,6.341337e+17,Mount Isa Institute of TAFE,Non-Delivery (corporate),2010.0,Retirement,-,-,-,-,-,...,-,Travel,-,-,,,,,,
2,6.341388e+17,Mount Isa Institute of TAFE,Delivery (teaching),2010.0,Retirement,-,-,-,-,-,...,-,-,-,NONE,,,,,,
3,6.341399e+17,Mount Isa Institute of TAFE,Non-Delivery (corporate),2010.0,Resignation,-,-,-,-,-,...,-,Travel,-,-,,,,,,
4,6.341466e+17,Southern Queensland Institute of TAFE,Delivery (teaching),2010.0,Resignation,-,Career Move - Private Sector,-,-,-,...,-,-,-,-,Male,41 45,Permanent Full-time,Teacher (including LVT),3-4,3-4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
697,6.350668e+17,Barrier Reef Institute of TAFE,Delivery (teaching),2013.0,Resignation,Career Move - Public Sector,-,-,-,-,...,-,-,-,-,Male,51-55,Temporary Full-time,Teacher (including LVT),1-2,1-2
698,6.350677e+17,Southern Queensland Institute of TAFE,Non-Delivery (corporate),2013.0,Resignation,Career Move - Public Sector,-,-,-,-,...,-,-,-,-,,,,,,
699,6.350704e+17,Tropical North Institute of TAFE,Delivery (teaching),2013.0,Resignation,-,-,-,-,-,...,-,-,Other,-,Female,51-55,Permanent Full-time,Teacher (including LVT),5-6,1-2
700,6.350712e+17,Southbank Institute of Technology,Non-Delivery (corporate),2013.0,Contract Expired,,,,,,...,,,,,Female,41 45,Temporary Full-time,Professional Officer (PO),1-2,1-2


In [167]:
# Filter for resignation types
dete_resignation = dete.loc[(dete['separationtype'] == 'Resignation-Other reasons') | 
                             (dete['separationtype'] =='Resignation-Other employer') | 
                             (dete['separationtype'] == 'Resignation-Move overseas/interstate')]
dete_resignation.head()

Unnamed: 0,id,separationtype,cease_date,dete_start_date,role_start_date,position,classification,region,business_unit,employment_status,...,work_life_balance,workload,none_of_the_above,gender,age,aboriginal,torres_strait,south_sea,disability,nesb
3,4,Resignation-Other reasons,05/2012,2005,2006,Teacher,Primary,Central Queensland,,Permanent Full-time,...,False,False,False,Female,36-40,,,,,
5,6,Resignation-Other reasons,05/2012,1994,1997,Guidance Officer,,Central Office,Education Queensland,Permanent Full-time,...,False,False,False,Female,41-45,,,,,
8,9,Resignation-Other reasons,07/2012,2009,2009,Teacher,Secondary,North Queensland,,Permanent Full-time,...,False,False,False,Female,31-35,,,,,
9,10,Resignation-Other employer,2012,1997,2008,Teacher Aide,,Not Stated,,Permanent Part-time,...,False,False,False,Female,46-50,,,,,
11,12,Resignation-Move overseas/interstate,2012,2009,2009,Teacher,Secondary,Far North Queensland,,Permanent Full-time,...,False,False,False,Male,31-35,,,,,


In [168]:
# Filter for resignation
tafe_resignation = tafe[tafe['separationtype'] == 'Resignation']
tafe_resignation.head()

Unnamed: 0,id,Institute,WorkArea,cease_date,separationtype,Contributing Factors. Career Move - Public Sector,Contributing Factors. Career Move - Private Sector,Contributing Factors. Career Move - Self-employment,Contributing Factors. Ill Health,Contributing Factors. Maternity/Family,...,Contributing Factors. Study,Contributing Factors. Travel,Contributing Factors. Other,Contributing Factors. NONE,gender,age,employment_status,position,institute_service,role_service
3,6.341399e+17,Mount Isa Institute of TAFE,Non-Delivery (corporate),2010.0,Resignation,-,-,-,-,-,...,-,Travel,-,-,,,,,,
4,6.341466e+17,Southern Queensland Institute of TAFE,Delivery (teaching),2010.0,Resignation,-,Career Move - Private Sector,-,-,-,...,-,-,-,-,Male,41 45,Permanent Full-time,Teacher (including LVT),3-4,3-4
5,6.341475e+17,Southern Queensland Institute of TAFE,Delivery (teaching),2010.0,Resignation,-,-,-,-,-,...,-,-,Other,-,Female,56 or older,Contract/casual,Teacher (including LVT),7-10,7-10
6,6.34152e+17,Barrier Reef Institute of TAFE,Non-Delivery (corporate),2010.0,Resignation,-,Career Move - Private Sector,-,-,Maternity/Family,...,-,-,Other,-,Male,20 or younger,Temporary Full-time,Administration (AO),3-4,3-4
7,6.341537e+17,Southern Queensland Institute of TAFE,Delivery (teaching),2010.0,Resignation,-,-,-,-,-,...,-,-,Other,-,Male,46 50,Permanent Full-time,Teacher (including LVT),3-4,3-4


### Filtered for resignations only to answer question. 

## 5. Verify the Data