# Advanced Pandas Assignment 4

In this assignment, you will practice date formatting, joins, and exporting data from a Pandas dataframe.

### Note about assignments
You can add lines of code according to your preferences. As long as the code required by the assignment is found in this notebook under the corresponding task header (ie. the code and answers for task 1 are underneath the title "Task 1"), you will receive credit for it.

## About the data
The data used in this assignment is a table built from the Human Resources schema of the Adventure Works 2019 database. This data contains information about each time that Employee Pay History was changed (each line is a pay rate change). It also contains information about the employee and the department they were working in when they received the pay rate listed.

The actual data is stored in a CSV file located inside the `data` folder. The file is called `pay_history.csv`. Another supplemental file called `shifts.csv` has also been provided in the `data` folder that will be used in some of the exercises below.

## Instructions
### Set up
##### Import Pandas
Import the Pandas library into Jupyter Lab.

<p style="font-size:.75rem">Expected output: None</p>

##### Disable column display limit
Use the following code to disable the default limit for displaying columns. If you don't use this code, a data set with more than 20 columns will be truncated when displayed to take up less space.

```python
pd.options.display.max_columns = None
```

<p style="font-size:.75rem">Expected output: None</p>

##### Create the dataframe
Read the data from the `pay_history.csv` file into a dataframe called `df`.

<p style="font-size:.75rem">Expected output: None</p>

##### Preview dataframe
Print out the first 5 rows of the dataframe.

<p style="font-size:.75rem">Expected output: 5 rows with indexes 0-4.</p>

### Questions
---
#### Task 1: Age when hired (part 1)
Your organization, Adventure Works, is looking to create a marketing campaign for finding new talent to hire on at the company. They would like to target a specific age group when running the campaign and thus need to know some information about how old each current employee was when they were hired on at the company. In this task, you will use subtraction of the `BirthDate` and `HireDate` fields to determine how many years old the average employee was when they were hired.

First, print out information about each of the columns in the dataframe.

<p style="font-size:.75rem">Expected output: Informational dataframe</p>

Looking at the results of the information printed above, what are the default data types for the `BirthDate` and `HireDate` columns? What data type should these columns be converted to that make the most sense?

```
Your answer here.
```

Using the corresponding Pandas function, cast the column `BirthDate` into data type `datetime` and print it out. Make sure that when the datatype of the column is printed out you see `dtype: datetime64[ns]`. Print out something that shows that the data type of this column was changed.

You shouldn't need to pass in a Python format string to convert this column to a datetime.

<p style="font-size:.75rem">Expected output: Something that shows that the `BirthDate` column was converted to data type `datetime`</p>

Using code similar to the code that you wrote above, cast the `HireDate` column to data type `datetime` and save it back to the original dataframe as the column `HireDate`. Then, print out something that shows that the data type of both the `HireDate` and `BirthDate` columns were changed to `datetime64[ns]`.

<p style="font-size:.75rem">Expected output: Something that shows that the `HireDate` and `BirthDate` columns were converted to data type `datetime`</p>

---
#### Task 2: Age when hired (part 2)
Using the subtraction operator, create a Series of `timedelta` objects that show how many days between the employees' hire date and birth date. Subtract birth date from hire date.

<p style="font-size:.75rem">Expected output: Series of dtype timedelta64[ns] whose first value is 18972 and last value is 12079 days.</p>

After subtracting the `BirthDate` column from the `HireDate` column, what is the data type of the Series that is returned?

```
Your answer here.
```

Using the Series created above, create a new column called `YearsOldAtHireDate` that contains information about how old each employee was when hired. You can do this by extracting the number of days from the `timedelta` objects and then converting them to years by dividing by 365.25. Print out the dataframe to show that the new column `YearsOldAtHireDate` was added correctly.

<p style="font-size:.75rem">Expected output: Dataframe rows showing that the column `YearsOldAtHireDate` was added correctly</p>

---
#### Task 3: Age when hired (part 3)
Determine how old the average employee was when they were hired in years.

Answer the question, "How old was the average employee when they were hired?" below.

```
Your answer here.
```

---
#### Task 4: Find shift leaders
There has recently been some disorganization among the employees working in the Production department. Employees whose `ShiftID` is 2 have been complaining for a while about problems going on during their shift with production equipment, and nobody seems to know who the manager of that shift is.

The database administrator provided you with a CSV file called "shifts.csv" that contains information about each shift.

Import the file `shifts.csv` from the `data` directory into a Pandas dataframe called `shifts_df`. Then, merge the dataframe `shifts_df` to the dataframe `df` using the `ShiftID` column of both. Save the new dataframe with the joined data as `df`. Show the rows of the shift leaders by filtering this new dataframe to show only rows where `EmployeeID` is equal to `EmployeeShiftLeader`.

<p style="font-size:.75rem">Expected output: 3 rows for EmployeeID 46, 70, and 39</p>

What is the name of the shift leader for shift number 2? What department do they work in? You can look in the `LoginID` field for the employee's first name.

```
Your answer here.
```

---
#### Task 5: Export the data
Export the dataframe `df` as a CSV file called `pay_history_modified.csv`. Do not include the dataframe index in the CSV file.

Make sure that this expored CSV file is included in your final submission.

<p style="font-size:.75rem">Expected output: None</p>

---
### Submission

Submit this file as you normally would, by zipping this `.ipynb` file and the `data` folder into a zip folder. However, also zip up your `pay_history_modified.csv` file into this zip folder. Submit this zip folder to Canvas.