# Advanced Pandas Assignment 1

In this assignment, you will practice adding and removing rows and columns from Pandas dataframes. In addition, you will practice sorting dataframes.

Specifically, you will perform the following exercises:
1. Add and remove columns
2. Add and remove rows
3. Sort dataframes

### Note about assignments
You can add lines of code according to your preferences. As long as the code required by the assignment is found in this notebook under the corresponding task header (ie. the code and answers for task 1 are underneath the title "Task 1"), you will receive credit for it.

## About the data
The data used in this assignment is a table built from the Human Resources schema of the Adventure Works 2019 database. This data contains information about each time that Employee Pay History was changed (each line is a pay rate change). It also contains information about the employee and the department they were working in when they received the pay rate listed.

The actual data is stored in a CSV file located inside the `data` folder. The file is called `pay_history.csv`.

## Instructions

Write code to complete the tasks below.

### Set up
##### Import Pandas
Import the Pandas library into Jupyter Lab.

<p style="font-size:.75rem">Expected output: None</p>

##### Disable column display limit
Use the following code to disable the default limit for displaying columns. If you don't use this code, a data set with more than 20 columns will be truncated when displayed to take up less space.

```python
pd.options.display.max_columns = None
```

<p style="font-size:.75rem">Expected output: None</p>

##### Create the dataframe
Read the data from the `pay_history.csv` file into a dataframe called `df`.

<p style="font-size:.75rem">Expected output: None</p>

##### Preview dataframe
Print out the first 5 rows of the dataframe.

<p style="font-size:.75rem">Expected output: 5 rows, indexes 0-4</p>

### Questions
#### Task 1: Rate

The CEO of Adventure Works wants to simplify how employees are paid by simply rounding their rate of pay to two digits. 

Create a new column called `RoundedRate` that contains the values of the `Rate` column rounded to two decimal places. Replace the `Rate` column with the values in the new `RoundedRate` column. Then, remove the `RoundedRate` column from the dataframe. Finally, print out the dataframe.

<p style="font-size:.75rem">Expected output: Dataframe with `RoundedRate` column and without `Rate` column</p>

#### Task 2: Sick days (part 1)
The boss wants to know how many days of sick leave each employee gets. Currently, the data shows the number of *hours* each employee gets.

Create a new column called `SickLeaveDays` by dividing the `SickLeaveHours` column by **24**. Then, print out the resulting dataframe.

<p style="font-size:.75rem">Dataframe with column `SickLeaveDays`, which is `SickLeaveHours` divided by 24</p>

#### Task 3: Sick days (part 2)

Something doesn't look quite right... Some employees only get `0.83333` days of sick leave. That's not even an entire day! What do you think happened?

<p style="font-size:.75rem">Expected output: None</p>

```
Your answer here.
```

#### Task 4: Sick days (part 3)

Re-create the column `SickLeaveDays` by dividing the `SickLeaveHours` column by 8 instead of 24. This should override the existing column `SickLeaveDays`. Then, remove the column `SickLeaveHours` and print out the resulting dataframe.

<p style="font-size:.75rem">Expected output: Dataframe with column `SickLeaveDays`, which is `SickLeaveHours` divided by 24. The dataframe does not have the column `SickLeaveHours`</p>

#### Task 5: Vacation Days
Do the same thing with the `VacationHours` column that you did with `SickHours`. Remove the column `VacationHours` and print out the resulting dataframe.

<p style="font-size:.75rem">Expected output: Dataframe with column `VacationDays` and without column `VacationHours`</p>

#### Task 6: Adding the new hire
Congratulations! The CEO noticed how well you were able to add and remove columns using Pandas and decided to hire an actual panda as a new employee. Now it's your job to get his information added to the data set as a new row.

The new employee gave you some of its information that will be added to the data set, but its up to you to fill in the rest. Go ahead and fill in the rest of this dictionary with made-up information about the employee.

This dictionary will then be converted to a dataframe and concatenated to the original dataframe, so make sure to create it in a format that can easily be converted to a dataframe! (In other words, make sure that the values for each key are inside a list!)

After creating the dictionary, turn the employee into a dataframe. Then, use appropriate methods to combine the original dataframe and the new dataframe into a single dataframe called `df`. Then, print out the dataframe in such a way that the new employee (the panda) can be seen.

<p style="font-size:.75rem">Expected output: Dataframe which shows a new employee with `LoginID` "adventure-works\panda"</p>

In [None]:
panda = {
    'EmployeeID': [291],
    'DepartmentID': [4],
    'RateChangeDate': ['00:00.0'],
    'Rate': ?,
    'PayFrequency': [2],
    'LoginID': ['adventure-works\panda'],
    'OrganizationLevel': [3],
    'JobTitle': ?,
    'BirthDate': ['1/1/2021'],
    'MaritalStatus': ['S'],
    'Gender': ['F'],
    'HireDate': ['6/12/2022'],
    'SalariedFlag': [0],
    'CurrentFlag': [1],
    'ShiftID': [1],
    'StartDate': ['7/4/2022'],
    'EndDate': [None],
    'ModifiedDate': ['00:00.0'],
    'DepartmentName': ['Engineering'],
    'Sub-Department': ['Research and Development'],
    'SickLeaveDays': [0],
    'VacationDays': ?
}

#### Task 7: Removing Sharon
It is common knowledge at Adventure Works that Sharon and pandas **do not get along**. For this reason, Sharon decided that she does not want her name associated with analyses that include the new employee (who is a panda). Because of this, you will need to remove Sharon from the data set.

Find the row in the dataframe where `LoginID` is "adventure-works\sharon0" and remove it. Then, print out the dataframe in such a way that shows that Sharon was successfully deleted from the data set.

<p style="font-size:.75rem">Dataframe that shows that Sharon is no longer in the data set</p>

#### Task 8: Removing organization level 4
The dataframe you are working with is going to be used for analysis of employees working in levels other than organization level 4. Because of this, you will need to remove all records in the dataframe where employees worked in organization level 4.

Remove all rows with `OrganizationLevel` 4. Then, print out the dataframe in such a way that shows that all employees in `OrganizationLevel` 4 were successfully removed.

<p style="font-size:.75rem">Dataframe which shows that `OrganizationLevel` 4 was successfully removed</p>

#### Task 9: Sorting the dataframe
The CEO wants to make sure that none of the employees have a higher rate of pay than he does. He wants to compare the `Rate` of employees in several ways. Print out the dataframe multiple times in the following ways:

- Sorted by `Rate` descending
- Sorted by `Rate` ascending and `LoginID` descending
- Sorted by index

<p style="font-size:.75rem">Expected output: Dataframe printed out three times in the ways listed above</p>