# Advanced Pandas Assignment 4

In this assignment, you will practice date formatting, joins, and exporting data from a Pandas dataframe.

### Note about assignments
You can add lines of code according to your preferences. As long as the code required by the assignment is found in this notebook under the corresponding question header (ie. the answer to question 1 is underneath the title "Question 1"), you will receive credit for it.

## About the data
The data used in this assignment is a table built from the Human Resources schema of the Adventure Works 2019 database. This data contains information about each time that Employee Pay History was changed (each line is a pay rate change). It also contains information about the employee and the department they were working in when they received the pay rate listed.

The actual data is stored in a CSV file located inside the `data` folder. The file is called `pay_history.csv`. Another supplemental file called `shifts.csv` has also been provided in the `data` folder that will be used in some of the exercises below.

## Instructions
### Set up
##### Import Pandas
Import the Pandas library into Jupyter Lab.

In [None]:
import pandas as pd

##### Disable column display limit
Use the following code to disable the default limit for displaying columns. If you don't use this code, a data set with more than 20 columns will be truncated when displayed to take up less space.

```python
pd.options.display.max_columns = None
```

In [None]:
pd.options.display.max_columns = None

##### Create the dataframe
Use the `read_csv()` function from Pandas to read the data from the `pay_history.csv` file into a dataframe called `df`.

In [None]:
df = pd.read_csv("./data/pay_history.csv")

##### Preview dataframe
Use the `.head()` method to print out the first 5 rows of the dataframe.

In [None]:
df.head()

### Questions
#### Age when hired
Your organization, Adventure Works, is looking to create a promotion for finding new talent to hire on at the company. They would like to target a specific age group when running the promotion, and need to know some information about how old each current employee was when they were hired on at the company. In this problem, you will create datetime fields for the `BirthDate` and `HireDate` fields and use subtraction to determine how many years old the average employee was when they were hired.

##### Question 1: Print the data types
Use the `.info()` method to print out the data types for each column in the dataframe.

In [None]:
df.info()

##### Question 2: Data types for `BirthDate` and `HireDate`
Looking at the results of the `.info()` method above, what are the default data types for the `BirthDate` and `HireDate` columns? What data type should these columns be converted to that make the most sense?

```
The `BirthDate` and `HireDate` columns have a default data type of `object`, which is a string type. The columns should be converted to data type `datetime`.
```

##### Question 3: Cast the `BirthDate` column to `datetime`
Using the Pandas function `to_datetime()`, cast the `BirthDate` column into data type `datetime` and print it out. Make sure that at the bottom of the Series that gets printed out you see `dtype: datetime64[ns]`.

You shouldn't need to pass in a Python format string to convert this column to a datetime.

In [None]:
pd.to_datetime(df['BirthDate'])

##### Question 4: Save the `BirthDate` column
Using the code from above, save the newly converted `BirthDate` column back to the original dataframe as the column `BirthDate`.

In [None]:
df['BirthDate'] = pd.to_datetime(df['BirthDate'])

##### Question 5: Print out the dataframe info
Using the `.info()` method, print out the dataframe information again. Make sure that the data type for the `BirthDate` column is now `datetime` (`datetime64[ns]` is the same thing).

In [None]:
df.info()

##### Question 6: Save the `HireDate` column
Using code similar to the code that you wrote above, cast the `HireDate` column to data type `datetime` and save it back to the original dataframe as the column `HireDate`.

In [None]:
df['HireDate'] = pd.to_datetime(df['HireDate'])

##### Question 7: Print out the dataframe info again
Using the `.info()` method, print out the dataframe information again. Make sure that the data type for both `BirthDate` and `HireDate` is now `datetime`.

In [None]:
df.info()

##### Question 8: Subtract dates
Using the subtraction operator, create a Series of `timedelta` objects that show how many days between the employees' hire date and birth date. Subtract birth date from hire date.

In [None]:
df['HireDate'] - df['BirthDate']

##### Question 9: What data type is returned?
After subtracting the `BirthDate` column from the `HireDate` column, what is the data type of the Series that is returned?

```
The data type of the Series is `timedelta`.
```

##### Question 10: Save calculation as `DaysOldAtHireDate`.
Using the calculation from above, create a new column called `DaysOldAtHireDate` that contains `timedelta` objects in each row.

In [None]:
df['DaysOldAtHireDate'] = df['HireDate'] - df['BirthDate']

##### Question 11: Extract the days
Using the `.dt.days` property, extract the number of days between each employee's hire date and birth date using the new `DaysOldAtHireDate` column. Print out this Series of integers.

In [None]:
df['DaysOldAtHireDate'].dt.days

##### Question 12: Convert days to years
Using the code from above, convert the number of days old that each employee was when hired to years. You can do this by dividing the number of days by 365.25, which will approximate how old in years each employee is.

In [None]:
df['DaysOldAtHireDate'].dt.days / 365.25

##### Question 13: Save new column `YearsOldAtHireDate`
Using the calculation from above, create a new column in the dataframe `YearsOldAtHireDate` that contains float values that reflect how many years old each employee was when hired.

In [None]:
df['YearsOldAtHireDate'] = df['DaysOldAtHireDate'].dt.days / 365.25

##### Question 14: Find the average age when hired
Use the `.mean()` method on the `YearsOldAtHireDate` column to determine how old the average employee was when they were hired.

In [None]:
df['YearsOldAtHireDate'].mean()

##### Question 15: How old was the average employee when they were hired?
Answer the question, "How old was the average employee when they were hired?" below.

```
The average employee was 30.93 years old.
```

#### Find shift leaders
There has recently been some disorganization among the employees working in the Production department. Employees whose `ShiftID` is 2 have been complaining for a while about problems going on during their shift with production equipment, and nobody seems to know who the manager of that shift is.

The database administrator provided you with a CSV file called "shifts.csv" that contains information about each shift.

##### Question 16: Import the shifts CSV
Using the `read_csv()` function from Pandas, import the file `shifts.csv` from the `data` directory into a Pandas dataframe called `shifts_df`.

In [None]:
shifts_df = pd.read_csv("./data/shifts.csv")

##### Question 17: Print out the top 5 rows
Using the `.head()` method, print out the top 5 rows of the `shifts_df` dataframe.

In [None]:
shifts_df.head()

##### Question 18: Join the `shifts_df` dataframe to the `df` dataframe
Using the `.merge()` method, join the `shifts_df` dataframe to the `df` dataframe using the `ShiftID` column on both dataframes. Save this new dataframe with the joined data as `df`. Print out the first five rows of `df` using the `.head()` method when finished.

In [None]:
df = df.merge(shifts_df, left_on="ShiftID", right_on="ShiftID")
df.head()

##### Question 19: Get shift leaders
Knowing that the `EmployeeID` column of `df` matches up with the `EmployeeShiftLeader` column that was just joined on to `df`, create a filter that returns rows of the dataframe where `EmployeeID` is equal to `EmployeeShiftLeader`.

In [None]:
df.loc[ df['EmployeeID'] == df['EmployeeShiftLeader']]

##### Question 20: Info about the shift leaders
Answer the following question below:

What is the name of the shift leader for shift number 2? What department do they work in? You can look in the `LoginID` field for the first name.

```
David is the leader of shift 2 and works in the Production department.
```

#### Export the data
##### Question 21: Export to CSV
Using the `.to_csv()` method, export the dataframe `df` as a CSV file called `pay_history_modified.csv`. Do not include the dataframe index in the CSV file (you can do this by passing in a parameter `index=False` to the method).

In [None]:
df.to_csv('pay_history_modified.csv', index=False)