# Advanced Pandas Assignment 1

In this assignment, you will practice adding and removing rows and columns from Pandas dataframes. In addition, you will practice sorting dataframes.

Specifically, you will perform the following exercises:
1. Add and remove columns
2. Add and remove rows
3. Sort dataframes

### Note about assignments
You can add lines of code according to your preferences. As long as the code required by the assignment is found in this notebook under the corresponding question header (ie. the answer to question 1 is underneath the title "Question 1"), you will receive credit for it.

## About the data
The data used in this assignment is a table built from the Human Resources schema of the Adventure Works 2019 database. This data contains information about each time that Employee Pay History was changed (each line is a pay rate change). It also contains information about the employee and the department they were working in when they received the pay rate listed.

The actual data is stored in a CSV file located inside the `data` folder. The file is called `pay_history.csv`.

## Instructions
### Set up
##### Import Pandas
Import the Pandas library into Jupyter Lab.

##### Disable column display limit
Use the following code to disable the default limit for displaying columns. If you don't use this code, a data set with more than 20 columns will be truncated when displayed to take up less space.

```
pd.options.display.max_columns = None
```

##### Create the dataframe
Use the `read_csv()` function from Pandas to read the data from the `pay_history.csv` file into a dataframe called `df`.

##### Preview dataframe
Use the `.head()` method to print out the first 5 rows of the dataframe.

### Questions
#### Rate
##### Question 1: Print out the `Rate` column of the dataframe.

The CEO of Adventure Works wants to simplify how employees are paid by simply rounding their rate of pay to two digits. Print out the `Rate` column of the dataframe to show how the rates are currently listed.

##### Question 2: Use the `round()` function 

Next, use the `round()` function to round the values in the column `Rate` to 2 decimal places. Print out the Series that this creates.

##### Question 3: Save this Series of rounded rates 
Save this Series of rounded rates as a new column in the dataframe called `RoundedRate`.

##### Question 4: Print out the dataframe. 
Print out the new dataframe. Make sure that you can see the new column you created, `RoundedRate`. 

##### Question 5: Remove the column `RoundedRate`
After seeing two different columns with the employees' rates, the CEO has decided to only have one rate column. Use the `.drop()` method and the `columns` argument to remove the `RoundedRate` column from the dataframe.

##### Question 6: Replace the `Rate` column
The CEO of Adventure Works changed his mind again and wants to see the column `Rate` rounded to two decimal places, but he still wants the column to be called `Rate`. Use the `round()` function to replace the column `Rate` with new rates that are rounded to two decimal places.

##### Question 7: Print out the dataframe
Print out the new dataframe. Make sure that the column `Rate` has numbers rounded to two decimal places.

#### Sick days
The boss wants to know how many days of sick leave each employee gets. Currently, the data shows the number of *hours* each employee gets.

##### Question 8: Print out the `SickLeaveHours` column.
Print out the `SickLeaveHours` column of the dataframe.

##### Question 9: Convert `SickLeaveHours` to days
Use division to convert the values in the `SickLeaveHours` column to days. You can convert hours to days by dividing by 24 (since there are 24 hours in a day). Show the resulting Series.

##### Question 10: Save `SickLeaveDays` to dataframe.
Using the Series created above, make a new column in the dataframe called `SickLeaveDays` that contains information about the number of sick days each employee has.

##### Question 11: Print out the dataframe.
Print out the new dataframe. Make sure that there is a column `SickLeaveDays` that contains the number of sick days each employee has.

##### Question 12: Something fishy with the `SickLeaveDays` column.
Something doesn't look quite right... Some employees only get `0.83333` days of sick leave. That's not even an entire day! What do you think happened?

```
    Your answer here.
```

##### Question 13: Fix the `SickLeaveDays` column.
Re-create the column `SickLeaveDays` by dividing the `SickLeaveHours` column by 8 instead of 24. This should override the existing column `SickLeaveDays`.

##### Question 14: Remove the column `SickLeaveHours`
Now that we have the column `SickLeaveDays`, the column `SickLeaveHours` isn't useful to our analysis. Use the `.drop()` method and the `columns` argument to drop it.

##### Question 15: Print out the resulting dataframe.
Use the `.head()` method to print out the resulting dataframe. Make sure that the column `SickLeaveHours` doesn't exist anymore.

#### Vacation Days
Do the same thing with the `VacationHours` column that you did with `SickHours`.

##### Question 16: Create a column called `VacationDays`
Create a column called `VacationDays` that contains information about the number of vacation days that each employee has. You can find this number by taking the `VacationHours` column and dividing it by 8 (8 hours in a work day).

##### Question 17: Remove the column `VacationHours`
Use the `.drop()` method and the `columns` argument to remove the column `VacationHours` from the dataframe.

##### Question 18: Print out the dataframe.
Print out the dataframe to make sure that the `VacationHours` column no longer exists and that the `VacationDays` column does exist and has the right data.

#### Adding the new hire
Congratulations! The CEO noticed how well you were able to add and remove columns using Pandas and decided to hire an actual panda as a new employee. Now it's your job to get his information added to the data set as a new row.

##### Question 19: Create a dictionary that represents a new employee
The new employee gave you some of its information that will be added to the data set, but its up to you to fill in the rest. Go ahead and fill in the rest of this dictionary with made-up information about the employee.

This dictionary will eventually be converted to a dataframe and concatenated to the original dataframe, so make sure to create it in a format that can easily be converted to a dataframe! (In other words, make sure that the values for each key are inside a list!)

In [None]:
panda = {
    'EmployeeID': [291],
    'DepartmentID': [4],
    'RateChangeDate': ['00:00.0'],
    'Rate': ?,
    'PayFrequency': [2],
    'LoginID': ['adventure-works\panda'],
    'OrganizationLevel': [3],
    'JobTitle': ?,
    'BirthDate': ['1/1/2021'],
    'MaritalStatus': ['S'],
    'Gender': ['F'],
    'HireDate': ['6/12/2022'],
    'SalariedFlag': [0],
    'CurrentFlag': [1],
    'ShiftID': [1],
    'StartDate': ['7/4/2022'],
    'EndDate': [None],
    'ModifiedDate': ['00:00.0'],
    'DepartmentName': ['Engineering'],
    'Sub-Department': ['Research and Development'],
    'SickLeaveDays': [0],
    'VacationDays': ?
}

##### Question 20: Turn the new employee into a dataframe
The dictionary that you just created needs to be turned into a dataframe before you can add it to the original dataframe. Turn it into a dataframe by using the Pandas `DataFrame()` function. Make sure to save it to a variable so that you can reference it later.

##### Quesiton 21: Add the new employee to the original dataframe
Use the `concat()` function to concatenate (combine) the new employee to the original dataframe. Remember that the first argument of the `concat()` function is a **list** of dataframes to combine together, and a new dataframe is returned.

Use `ignore_index=True` and save the resulting dataframe to the variable `df`, overriding the original dataframe with updated data.

##### Question 22: Print out the dataframe
Use the `.tail()` or `.head()` method to make sure that the panda was correctly added to the table of employees.

#### Removing Sharon
It is common knowledge at Adventure Works that Sharon's mortal enemy is the panda. For this reason, Sharon decided that she does not want her name associated with analyses that include employee data. Because of this, you will need to remove Sharon from the data set.

##### Question 23: Find the row with Sharon's information
Sharon works in the Engineering deparment. Use filtering methods to filter the dataframe and get back employees in the Engineering department.

##### Question 24: Locate the row index number for Sharon
Sharon is an employee in the Engineering department. Take note of the row her information is on. What is the row index number containing Sharon's information?

```
Your answer here
```

##### Question 25: Drop Sharon
Use the `.drop()` method and Sharon's row index number to drop Sharon from the dataframe. Remember to pass in the row index number into the argument `index=`. Make sure that this code updates the original dataframe.

##### Question 26: Print out the dataframe
Print out the Engineering department again to make sure that Sharon's information was correctly removed.

#### Removing organization level 4
The dataframe you are working with is going to be used for analysis of employees working in levels other than organization level 4. Because of this, you will need to remove all records in the dataframe where employees worked in organization level 4.

##### Question 27: Find rows where `OrganizationLevel` is 4
Use a filter or the `.loc` property to get a filtered dataframe where `OrganizationLevel` **is** 4.

##### Question 28: Get the indexes where `OrganizationLevel` is 4
Get a list of indexes where `OrganizationLevel` is 4. To do this, simply take the code you used above and use the property `.index` on it.

##### Question 29: Save the indexes to a variable
Take the code from above and save the list of indexes to a variable called `level_4_indexes`.

##### Question 30: Drop the records with shift number 1
Use the `.drop()` method to drop the records with organization level 4. Remember that the indexes of people who have organization level 4 are already stored in a variable `level_4_indexes`. You can pass this variable to the `.drop()` method using `index=level_4_indexes`. Make sure to override the original dataframe.

##### Question 31: Print out the dataframe
Print out the resulting dataframe using the `.head()` method to make sure that no employees with `ShiftID` of 1 still exist in the dataframe. If you don't see any in the top five rows, you did this correctly.

#### Sorting the dataframe
The CEO wants to make sure that none of the employees have a higher rate of pay than he does. Sort the dataframe by `Rate` using the `.sort_values()` method.

##### Question 32: Sort the dataframe by `Rate`
Sort the dataframe by the `Rate` column. You can do this by using the `.sort_values()` method. Do not pass in any other arguments to the `.sort_values()` method.

##### Question 33: Sort by `Rate` descending
Use the same code as above but pass in the correct arguments to sort the dataframe by `Rate` descending.

##### Question 34: Sort by `Rate` ascending and `LoginID` descending
Use appropriate arguments to sort the dataframe again by the `Rate` column in ascending order and the `LoginID` column in descending order.

##### Question 35: Save the sorted dataframe
Use the code written above to save the newly sorted dataframe to the variable `df`. This new dataframe will override the old dataframe.

##### Question 36: Reset the indexes
Use the `.sort_index()` method to reset the rows back to their previous order. Make sure to save it to a new variable `df`. This should override the existing dataframe.

##### Question 37: Print out the final dataframe
Use the `.head()` method to print out the first five rows of the dataframe. Make sure that the rows are ordered by their index.