## Project Introduction

In this project, we'll analyze exit surveys from employees of two educational institutions in Queensland, Australia: the Department of Education, Training and Employment (DETE) and the Technical and Further Education (TAFE) institute.

Our goal is to find out:

- Are employees who worked for a short time leaving because they are unhappy? What about those who worked longer?
- Are younger employees leaving because they are unhappy? What about older employees?

In [None]:
# Import libraries
import pandas as pd
import numpy as np

In [None]:
# Read data
dete_survey = pd.read_csv('dete_survey.csv')
tafe_survey = pd.read_csv('tafe_survey.csv')

In [None]:
dete_survey.head()

In [None]:
dete_survey.info()

In [None]:
tafe_survey.head()

In [None]:
tafe_survey.info()

In [None]:
dete_survey.isnull().sum()

In [None]:
tafe_survey.isnull().sum()

## Observations:
- Both datasets share similar columns, with differing names
- Null values in both datasets
- Excess amount of columns, some may be redundant
- Null value on load for specific cases ("Not Stated")

# Null Value Import and Dropping Redundant Columns

In [None]:
# Read 'Not Stated' as NaN
dete_survey = pd.read_csv('dete_survey.csv', na_values=['Not Stated', 'Unknown'])

In [None]:
# Drop dete columns 28-49
dete_survey_updated = dete_survey.drop(dete_survey.columns[28:49], axis=1)

In [None]:
# Drop tafe columns 17-66
tafe_survey_updated = tafe_survey.drop(tafe_survey.columns[17:66], axis=1)

## Changes Made:
- Read in `Not Stated` and `Unknown` as NaN values
- Removed unnecessary columns in both dete_survey and tafe_survey dataframes

## Why:
- Improve useability of dataframes for further processing and analysis