# Case Study: Do Employees with more Experience make more Money?

### Objectives

+ Change the data type of a column
+ Create categorical data type with the **`cut`** function

### Resources
+ Read the first section on [Categorical Data](http://pandas.pydata.org/pandas-docs/stable/categorical.html)

## Introduction
In this lesson we use datetime functionality to calculate years of experience. We will then determine if more experience results in a higher salary.

In [None]:
import pandas as pd
import numpy as np

In [None]:
employee = pd.read_csv('../../data/employee.csv', parse_dates=['hire_date'])
employee.head()

# Case Study: Do people with more experience make more money?
To answer this question, the number of years of experience needs to be calculated from the column **`hire_date`**. **datetime64** columns can be subtracted from one another. We will use the date that the data was generated which was around December, 2016.

In [None]:
pull_date = pd.Timestamp('2016-12-1')
pull_date

In [None]:
# subtract the hire date from today to get the number of days of experience
experience = pull_date - employee['hire_date']

# print out head of series
experience.head()

### Converting to years
Notice that the data type is now **timedelta64** which just reprsents an amount of time in days. To convert this to years we can divide by 365, but that isn't quite correct. Instead we can use exactly one year of time (around 365.25 days). [See here for more detail](http://pandas.pydata.org/pandas-docs/stable/timedeltas.html#frequency-conversion)

In [None]:
# convert to years
years_experience = experience / pd.Timedelta(1, 'Y')

# inspect and check that it makes sense
years_experience.head()

In [None]:
# Make a new column
employee['experience'] = years_experience

### Outputting the exact duration of 1 year
Let's output what Pandas is using to calculate a single year.

In [None]:
pd.Timedelta(1, 'Y')

### Creating categories for years of experience
It's possible to divide numerical columns into different categories based on their value. The pandas **`cut`** function accepts a Series or an array and a list of the edges of the **bins**. Each category can be given a **label** as well. A series is returned that is of **categorical** type - unique to Pandas. [More on categorical data](http://pandas.pydata.org/pandas-docs/stable/categorical.html)

In [None]:
# create Series of categorical data
bins = [0, 5, 15, 100]
labels = ['Novice', 'Experienced', 'Senior']
exp_categories = pd.cut(years_experience, bins=bins, labels=labels)

In [None]:
# inspect Seriers
exp_categories.head(10)

In [None]:
# get some summary statistics
exp_categories.value_counts()

In [None]:
# Create new column
employee['experience_level'] = exp_categories

# Exercises

In [None]:
# run this cell to get all the transformations from this notebook all at once
employee = pd.read_csv('../../data/employee.csv', parse_dates=['hire_date'])
employee['years_experience'] = (pd.Timestamp('2016-12-1') - employee['hire_date']) / pd.Timedelta(1, 'Y')
employee['experience_level'] =  pd.cut(employee['years_experience'], 
                                       bins=bins, 
                                       labels=labels)

In [None]:
employee.head()

### Problem 1
<span  style="color:green; font-size:16px">Create new columns **`bonus_amount`** and **`total_comp`**. Let **`bonus_amount`** be equal to 500 for every year of experience. Round the values to the nearest 100.</span>

### Problem 2
<span  style="color:green; font-size:16px">Use the **`experience_level`** column to determine if more experienced employees make more money.</span>

# Solutions

### Problem 1
<span  style="color:green; font-size:16px">Create new columns **`bonus_amount`** and **`total_comp`**. Let **`bonus_amount`** be equal to 500 for every year of experience. Round the values to the nearest 100.</span>

In [None]:
employee['bonus_amount'] = employee['years_experience'] * 500
employee['total_comp'] = employee['salary'] + employee['bonus_amount']

employee['bonus_amount'] = employee['bonus_amount'].round(-2)
employee['total_comp'] = employee['total_comp'].round(-2)

employee.head()

### Problem 2
<span  style="color:green; font-size:16px">Use the **`experience_level`** column to determine if more experienced employees make more money.</span>

In [None]:
filt = employee['experience_level'] == 'Novice'
novice = employee.loc[filt, 'salary']

filt = employee['experience_level'] == 'Experienced'
exper = employee.loc[filt, 'salary']

filt = employee['experience_level'] == 'Senior'
senior = employee.loc[filt, 'salary']

novice.mean(), exper.mean(), senior.mean()