# Advanced Pandas Assignment 1

In this assignment, you will practice adding and removing rows and columns from Pandas dataframes. In addition, you will practice sorting dataframes.

Specifically, you will perform the following exercises:
1. Add and remove columns
2. Add and remove rows
3. Sort dataframes

### Note about assignments
You can add lines of code according to your preferences. As long as the code required by the assignment is found in this notebook under the corresponding question header (ie. the answer to question 1 is underneath the title "Question 1"), you will receive credit for it.

## About the data
The data used in this assignment is a table built from the Human Resources schema of the Adventure Works 2019 database. This data contains information about each time that Employee Pay History was changed (each line is a pay rate change). It also contains information about the employee and the department they were working in when they received the pay rate listed.

The actual data is stored in a CSV file located inside the `data` folder. The file is called `pay_history.csv`.

## Instructions
### Set up
##### Import Pandas
Import the Pandas library into Jupyter Lab.

<p style="font-size:.75rem">Expected output: None</p>

In [72]:
import pandas as pd

##### Disable column display limit
Use the following code to disable the default limit for displaying columns. If you don't use this code, a data set with more than 20 columns will be truncated when displayed to take up less space.

```python
pd.options.display.max_columns = None
```

<p style="font-size:.75rem">Expected output: None</p>

In [73]:
pd.options.display.max_columns = None

##### Create the dataframe
Use the `read_csv()` function from Pandas to read the data from the `pay_history.csv` file into a dataframe called `df`.

<p style="font-size:.75rem">Expected output: None</p>

In [74]:
df = pd.read_csv("./data/pay_history.csv")

##### Preview dataframe
Use the `.head()` method to print out the first 5 rows of the dataframe.

<p style="font-size:.75rem">Expected output: 5 rows, indexes 0-4</p>

In [75]:
df.head()

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,VacationHours,SickLeaveHours,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department
0,1,16,00:00.0,125.5,2,adventure-works\ken0,,Chief Executive Officer,1/29/1969,S,M,1/14/2009,1,99,69,1,1,1/14/2009,,00:00.0,Executive,Executive General and Administration
1,2,1,00:00.0,63.4615,2,adventure-works\terri0,1.0,Vice President of Engineering,8/1/1971,S,F,1/31/2008,1,1,20,1,1,1/31/2008,,00:00.0,Engineering,Research and Development
2,3,1,00:00.0,43.2692,2,adventure-works\roberto0,2.0,Engineering Manager,11/12/1974,M,M,11/11/2007,1,2,21,1,1,11/11/2007,,00:00.0,Engineering,Research and Development
3,4,1,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,48,80,1,1,12/5/2007,5/30/2010,00:00.0,Engineering,Research and Development
4,4,2,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,48,80,1,1,5/31/2010,,00:00.0,Tool Design,Research and Development


### Questions
#### Rate
##### Question 1: Print out the `Rate` column of the dataframe.

The CEO of Adventure Works wants to simplify how employees are paid by simply rounding their rate of pay to two digits. Print out the `Rate` column of the dataframe to show how the rates are currently listed.

<p style="font-size:.75rem">Expected output: Series of dtype float64 from Rate column, length of 304</p>

In [76]:
df['Rate']

0      125.5000
1       63.4615
2       43.2692
3        8.6200
4        8.6200
         ...   
299     23.0769
300     48.1010
301     23.0769
302     23.0769
303     23.0769
Name: Rate, Length: 304, dtype: float64

##### Question 2: Use the `round()` function 

Next, use the `round()` function to round the values in the column `Rate` to 2 decimal places. Print out the Series that this creates.

<p style="font-size:.75rem">Expected output: Series of decimal numbers rounded to 2 decimal places, length 304</p>

In [77]:
round(df['Rate'], 2)

0      125.50
1       63.46
2       43.27
3        8.62
4        8.62
        ...  
299     23.08
300     48.10
301     23.08
302     23.08
303     23.08
Name: Rate, Length: 304, dtype: float64

##### Question 3: Save this Series of rounded rates 
Save this Series of rounded rates as a new column in the dataframe called `RoundedRate`.

<p style="font-size:.75rem">Expected output: None</p>

In [78]:
df['RoundedRate'] = round(df['Rate'], 2)

##### Question 4: Print out the dataframe. 
Print out the new dataframe. Make sure that you can see the new column you created, `RoundedRate`. 

<p style="font-size:.75rem">Expected output: Dataframe, now including column RoundedRate</p>

In [79]:
df.head()

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,VacationHours,SickLeaveHours,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department,RoundedRate
0,1,16,00:00.0,125.5,2,adventure-works\ken0,,Chief Executive Officer,1/29/1969,S,M,1/14/2009,1,99,69,1,1,1/14/2009,,00:00.0,Executive,Executive General and Administration,125.5
1,2,1,00:00.0,63.4615,2,adventure-works\terri0,1.0,Vice President of Engineering,8/1/1971,S,F,1/31/2008,1,1,20,1,1,1/31/2008,,00:00.0,Engineering,Research and Development,63.46
2,3,1,00:00.0,43.2692,2,adventure-works\roberto0,2.0,Engineering Manager,11/12/1974,M,M,11/11/2007,1,2,21,1,1,11/11/2007,,00:00.0,Engineering,Research and Development,43.27
3,4,1,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,48,80,1,1,12/5/2007,5/30/2010,00:00.0,Engineering,Research and Development,8.62
4,4,2,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,48,80,1,1,5/31/2010,,00:00.0,Tool Design,Research and Development,8.62


##### Question 5: Remove the column `RoundedRate`
After seeing two different columns with the employees' rates, the CEO has decided to only have one rate column. Use the `.drop()` method and the `columns` argument to remove the `RoundedRate` column from the dataframe.

<p style="font-size:.75rem">Expected output: None</p>

In [80]:
df.drop(columns='RoundedRate', inplace=True)

##### Question 6: Replace the `Rate` column
The CEO of Adventure Works changed his mind again and wants to see the column `Rate` rounded to two decimal places, but he still wants the column to be called `Rate`. Use the `round()` function to replace the column `Rate` with new rates that are rounded to two decimal places.

<p style="font-size:.75rem">Expected output: None</p>

In [81]:
df['Rate'] = round(df['Rate'], 2)

##### Question 7: Print out the dataframe
Print out the new dataframe. Make sure that the column `Rate` has numbers rounded to two decimal places.

<p style="font-size:.75rem">Expected output: Dataframe, now with the Rate column showing decimal numbers rounded to two decimal places</p>

In [82]:
df.head()

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,VacationHours,SickLeaveHours,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department
0,1,16,00:00.0,125.5,2,adventure-works\ken0,,Chief Executive Officer,1/29/1969,S,M,1/14/2009,1,99,69,1,1,1/14/2009,,00:00.0,Executive,Executive General and Administration
1,2,1,00:00.0,63.46,2,adventure-works\terri0,1.0,Vice President of Engineering,8/1/1971,S,F,1/31/2008,1,1,20,1,1,1/31/2008,,00:00.0,Engineering,Research and Development
2,3,1,00:00.0,43.27,2,adventure-works\roberto0,2.0,Engineering Manager,11/12/1974,M,M,11/11/2007,1,2,21,1,1,11/11/2007,,00:00.0,Engineering,Research and Development
3,4,1,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,48,80,1,1,12/5/2007,5/30/2010,00:00.0,Engineering,Research and Development
4,4,2,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,48,80,1,1,5/31/2010,,00:00.0,Tool Design,Research and Development


#### Sick days
The boss wants to know how many days of sick leave each employee gets. Currently, the data shows the number of *hours* each employee gets.

##### Question 8: Print out the `SickLeaveHours` column.
Print out the `SickLeaveHours` column of the dataframe.

<p style="font-size:.75rem">Expected output: Series of length 304 and dtype int64</p>

In [83]:
df['SickLeaveHours']

0      69
1      20
2      21
3      80
4      80
       ..
299    38
300    30
301    37
302    38
303    37
Name: SickLeaveHours, Length: 304, dtype: int64

##### Question 9: Convert `SickLeaveHours` to days
Use division to convert the values in the `SickLeaveHours` column to days. You can convert hours to days by dividing by 24 (since there are 24 hours in a day). Show the resulting Series.

<p style="font-size:.75rem">Expected output: Series of length 304 and dtype float64</p>

In [84]:
df['SickLeaveHours'] / 24

0      2.875000
1      0.833333
2      0.875000
3      3.333333
4      3.333333
         ...   
299    1.583333
300    1.250000
301    1.541667
302    1.583333
303    1.541667
Name: SickLeaveHours, Length: 304, dtype: float64

##### Question 10: Save `SickLeaveDays` to dataframe.
Using the Series created above, make a new column in the dataframe called `SickLeaveDays` that contains information about the number of sick days each employee has.

<p style="font-size:.75rem">Expected output: None</p>

In [85]:
df['SickLeaveDays'] = df['SickLeaveHours'] / 24

##### Question 11: Print out the dataframe.
Print out the new dataframe. Make sure that there is a column `SickLeaveDays` that contains the number of sick days each employee has.

<p style="font-size:.75rem">Expected output: Dataframe, now with column SickLeaveDays</p>

In [86]:
df.head()

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,VacationHours,SickLeaveHours,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department,SickLeaveDays
0,1,16,00:00.0,125.5,2,adventure-works\ken0,,Chief Executive Officer,1/29/1969,S,M,1/14/2009,1,99,69,1,1,1/14/2009,,00:00.0,Executive,Executive General and Administration,2.875
1,2,1,00:00.0,63.46,2,adventure-works\terri0,1.0,Vice President of Engineering,8/1/1971,S,F,1/31/2008,1,1,20,1,1,1/31/2008,,00:00.0,Engineering,Research and Development,0.833333
2,3,1,00:00.0,43.27,2,adventure-works\roberto0,2.0,Engineering Manager,11/12/1974,M,M,11/11/2007,1,2,21,1,1,11/11/2007,,00:00.0,Engineering,Research and Development,0.875
3,4,1,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,48,80,1,1,12/5/2007,5/30/2010,00:00.0,Engineering,Research and Development,3.333333
4,4,2,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,48,80,1,1,5/31/2010,,00:00.0,Tool Design,Research and Development,3.333333


##### Question 12: Something fishy with the `SickLeaveDays` column.
Something doesn't look quite right... Some employees only get `0.83333` days of sick leave. That's not even an entire day! What do you think happened?

<p style="font-size:.75rem">Expected output: None</p>

```
    The number of Sick Leave Days should be calculated by dividing by 8 instead of 24, since there are 8 hours in a work day.
```

##### Question 13: Fix the `SickLeaveDays` column.
Re-create the column `SickLeaveDays` by dividing the `SickLeaveHours` column by 8 instead of 24. This should override the existing column `SickLeaveDays`.

<p style="font-size:.75rem">Expected output: None</p>

In [87]:
df['SickLeaveDays'] = df['SickLeaveHours'] / 8

##### Question 14: Remove the column `SickLeaveHours`
Now that we have the column `SickLeaveDays`, the column `SickLeaveHours` isn't useful to our analysis. Use the `.drop()` method and the `columns` argument to drop it.

<p style="font-size:.75rem">Expected output: None</p>

In [88]:
df.drop(columns="SickLeaveHours", inplace=True)

##### Question 15: Print out the resulting dataframe.
Use the `.head()` method to print out the resulting dataframe. Make sure that the column `SickLeaveHours` doesn't exist anymore.

<p style="font-size:.75rem">Expected output: Dataframe, now including column SickLeaveDays and without SickLeaveHours column</p>

In [89]:
df.head()

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,VacationHours,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department,SickLeaveDays
0,1,16,00:00.0,125.5,2,adventure-works\ken0,,Chief Executive Officer,1/29/1969,S,M,1/14/2009,1,99,1,1,1/14/2009,,00:00.0,Executive,Executive General and Administration,8.625
1,2,1,00:00.0,63.46,2,adventure-works\terri0,1.0,Vice President of Engineering,8/1/1971,S,F,1/31/2008,1,1,1,1,1/31/2008,,00:00.0,Engineering,Research and Development,2.5
2,3,1,00:00.0,43.27,2,adventure-works\roberto0,2.0,Engineering Manager,11/12/1974,M,M,11/11/2007,1,2,1,1,11/11/2007,,00:00.0,Engineering,Research and Development,2.625
3,4,1,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,48,1,1,12/5/2007,5/30/2010,00:00.0,Engineering,Research and Development,10.0
4,4,2,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,48,1,1,5/31/2010,,00:00.0,Tool Design,Research and Development,10.0


#### Vacation Days
Do the same thing with the `VacationHours` column that you did with `SickHours`.

##### Question 16: Create a column called `VacationDays`
Create a column called `VacationDays` that contains information about the number of vacation days that each employee has. You can find this number by taking the `VacationHours` column and dividing it by 8 (8 hours in a work day).

<p style="font-size:.75rem">Expected output: None</p>

In [90]:
df['VacationDays'] = df['VacationHours'] / 8

##### Question 17: Remove the column `VacationHours`
Use the `.drop()` method and the `columns` argument to remove the column `VacationHours` from the dataframe.

<p style="font-size:.75rem">Expected output: None</p>

In [91]:
df.drop(columns="VacationHours", inplace=True)

##### Question 18: Print out the dataframe.
Print out the dataframe to make sure that the `VacationHours` column no longer exists and that the `VacationDays` column does exist and has the right data.

<p style="font-size:.75rem">Expected output: Dataframe, now with column VacationDays and missing column VacationHours</p>

In [92]:
df.head()

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department,SickLeaveDays,VacationDays
0,1,16,00:00.0,125.5,2,adventure-works\ken0,,Chief Executive Officer,1/29/1969,S,M,1/14/2009,1,1,1,1/14/2009,,00:00.0,Executive,Executive General and Administration,8.625,12.375
1,2,1,00:00.0,63.46,2,adventure-works\terri0,1.0,Vice President of Engineering,8/1/1971,S,F,1/31/2008,1,1,1,1/31/2008,,00:00.0,Engineering,Research and Development,2.5,0.125
2,3,1,00:00.0,43.27,2,adventure-works\roberto0,2.0,Engineering Manager,11/12/1974,M,M,11/11/2007,1,1,1,11/11/2007,,00:00.0,Engineering,Research and Development,2.625,0.25
3,4,1,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,1,1,12/5/2007,5/30/2010,00:00.0,Engineering,Research and Development,10.0,6.0
4,4,2,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,1,1,5/31/2010,,00:00.0,Tool Design,Research and Development,10.0,6.0


#### Adding the new hire
Congratulations! The CEO noticed how well you were able to add and remove columns using Pandas and decided to hire an actual panda as a new employee. Now it's your job to get his information added to the data set as a new row.

##### Question 19: Create a dictionary that represents a new employee
The new employee gave you some of its information that will be added to the data set, but its up to you to fill in the rest. Go ahead and fill in the rest of this dictionary with made-up information about the employee.

This dictionary will eventually be converted to a dataframe and concatenated to the original dataframe, so make sure to create it in a format that can easily be converted to a dataframe! (In other words, make sure that the values for each key are inside a list!)

<p style="font-size:.75rem">Expected output: None</p>

In [93]:
panda = {
    'EmployeeID': [291],
    'DepartmentID': [4],
    'RateChangeDate': ['00:00.0'],
    'Rate': [20],
    'PayFrequency': [2],
    'LoginID': ['adventure-works\panda'],
    'OrganizationLevel': [3],
    'JobTitle': ['Data Bamboo-zler'],
    'BirthDate': ['1/1/2021'],
    'MaritalStatus': ['S'],
    'Gender': ['F'],
    'HireDate': ['6/12/2022'],
    'SalariedFlag': [0],
    'CurrentFlag': [1],
    'ShiftID': [1],
    'StartDate': ['7/4/2022'],
    'EndDate': [None],
    'ModifiedDate': ['00:00.0'],
    'DepartmentName': ['Engineering'],
    'Sub-Department': ['Research and Development'],
    'SickLeaveDays': [0],
    'VacationDays': [100]
}

panda = {
    'EmployeeID': [291],
    'DepartmentID': [4],
    'RateChangeDate': ['00:00.0'],
    'Rate': ?,
    'PayFrequency': [2],
    'LoginID': ['adventure-works\panda'],
    'OrganizationLevel': [3],
    'JobTitle': ?,
    'BirthDate': ['1/1/2021'],
    'MaritalStatus': ['S'],
    'Gender': ['F'],
    'HireDate': ['6/12/2022'],
    'SalariedFlag': [0],
    'CurrentFlag': [1],
    'ShiftID': [1],
    'StartDate': ['7/4/2022'],
    'EndDate': [None],
    'ModifiedDate': ['00:00.0'],
    'DepartmentName': ['Engineering'],
    'Sub-Department': ['Research and Development'],
    'SickLeaveDays': [0],
    'VacationDays': ?
}

##### Question 20: Turn the new employee into a dataframe
The dictionary that you just created needs to be turned into a dataframe before you can add it to the original dataframe. Turn it into a dataframe by using the Pandas `DataFrame()` function. Make sure to save it to a variable so that you can reference it later.

<p style="font-size:.75rem">Expected output: None</p>

In [94]:
new_employee = pd.DataFrame(panda)

##### Quesiton 21: Add the new employee to the original dataframe
Use the `concat()` function to concatenate (combine) the new employee to the original dataframe. Remember that the first argument of the `concat()` function is a **list** of dataframes to combine together, and a new dataframe is returned.

Use `ignore_index=True` and save the resulting dataframe to the variable `df`, overriding the original dataframe with updated data.

<p style="font-size:.75rem">Expected output: None</p>

In [95]:
df = pd.concat([df, new_employee], ignore_index=True)

##### Question 22: Print out the dataframe
Use the `.tail()` or `.head()` method to make sure that the panda was correctly added to the table of employees.

<p style="font-size:.75rem">Expected output: 5 rows of the new dataframe that shows the panda's data was added correctly</p>

In [96]:
df.tail()

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department,SickLeaveDays,VacationDays
300,287,3,00:00.0,48.1,2,adventure-works\amy0,2.0,European Sales Manager,9/20/1957,M,F,4/16/2012,1,1,1,4/16/2012,,00:00.0,Sales,Sales and Marketing,3.75,2.625
301,288,3,00:00.0,23.08,2,adventure-works\rachel0,3.0,Sales Representative,7/9/1975,S,F,5/30/2013,1,1,1,5/30/2013,,00:00.0,Sales,Sales and Marketing,4.625,4.375
302,289,3,00:00.0,23.08,2,adventure-works\jae0,3.0,Sales Representative,3/17/1968,M,F,5/30/2012,1,1,1,5/30/2012,,00:00.0,Sales,Sales and Marketing,4.75,4.625
303,290,3,00:00.0,23.08,2,adventure-works\ranjit0,3.0,Sales Representative,9/30/1975,S,M,5/30/2012,1,1,1,5/30/2012,,00:00.0,Sales,Sales and Marketing,4.625,4.25
304,291,4,00:00.0,20.0,2,adventure-works\panda,3.0,Data Bamboo-zler,1/1/2021,S,F,6/12/2022,0,1,1,7/4/2022,,00:00.0,Engineering,Research and Development,0.0,100.0


#### Removing Sharon
It is common knowledge at Adventure Works that Sharon and pandas are mortal enemies. For this reason, Sharon decided that she does not want her name associated with analyses that include the new employee, the panda. Because of this, you will need to remove Sharon from the data set.

##### Question 23: Find the row with Sharon's information
Sharon works in the Engineering deparment. Use filtering methods to filter the dataframe and get back employees in the Engineering department.

<p style="font-size:.75rem">Expected output: 8 rows, indexes 1-3, 6-7, 15-16, and 304</p>

In [97]:
df.loc[ df['DepartmentName'] == 'Engineering' ]

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department,SickLeaveDays,VacationDays
1,2,1,00:00.0,63.46,2,adventure-works\terri0,1.0,Vice President of Engineering,8/1/1971,S,F,1/31/2008,1,1,1,1/31/2008,,00:00.0,Engineering,Research and Development,2.5,0.125
2,3,1,00:00.0,43.27,2,adventure-works\roberto0,2.0,Engineering Manager,11/12/1974,M,M,11/11/2007,1,1,1,11/11/2007,,00:00.0,Engineering,Research and Development,2.625,0.25
3,4,1,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,1,1,12/5/2007,5/30/2010,00:00.0,Engineering,Research and Development,10.0,6.0
6,5,1,00:00.0,32.69,2,adventure-works\gail0,3.0,Design Engineer,9/27/1952,M,F,1/6/2008,1,1,1,1/6/2008,,00:00.0,Engineering,Research and Development,2.75,0.625
7,6,1,00:00.0,32.69,2,adventure-works\jossef0,3.0,Design Engineer,3/11/1959,M,M,1/24/2008,1,1,1,1/24/2008,,00:00.0,Engineering,Research and Development,2.875,0.75
15,14,1,00:00.0,36.06,2,adventure-works\michael8,3.0,Senior Design Engineer,6/16/1979,S,M,12/30/2010,1,1,1,12/30/2010,,00:00.0,Engineering,Research and Development,2.625,0.375
16,15,1,00:00.0,32.69,2,adventure-works\sharon0,3.0,Design Engineer,5/2/1961,M,F,1/18/2011,1,1,1,1/18/2011,,00:00.0,Engineering,Research and Development,2.75,0.5
304,291,4,00:00.0,20.0,2,adventure-works\panda,3.0,Data Bamboo-zler,1/1/2021,S,F,6/12/2022,0,1,1,7/4/2022,,00:00.0,Engineering,Research and Development,0.0,100.0


##### Question 24: Locate the row index number for Sharon
Sharon is an employee in the Engineering department. Take note of the row her information is on. What is the row index number containing Sharon's information?

```
The row index number is 16.
```

<p style="font-size:.75rem">Expected output: None</p>

##### Question 25: Drop Sharon
Use the `.drop()` method and Sharon's row index number to drop Sharon from the dataframe. Remember to pass in the row index number into the argument `index=`. Make sure that this code updates the original dataframe.

<p style="font-size:.75rem">Expected output: None</p>

In [98]:
df.drop(index=16, inplace=True)

##### Question 26: Print out the dataframe
Print out the Engineering department again to make sure that Sharon's information was correctly removed.

<p style="font-size:.75rem">Expected output: 7 rows, indexes 1-3, 6-7, 15, and 304</p>

In [99]:
df.loc[ df['DepartmentName'] == 'Engineering' ]

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department,SickLeaveDays,VacationDays
1,2,1,00:00.0,63.46,2,adventure-works\terri0,1.0,Vice President of Engineering,8/1/1971,S,F,1/31/2008,1,1,1,1/31/2008,,00:00.0,Engineering,Research and Development,2.5,0.125
2,3,1,00:00.0,43.27,2,adventure-works\roberto0,2.0,Engineering Manager,11/12/1974,M,M,11/11/2007,1,1,1,11/11/2007,,00:00.0,Engineering,Research and Development,2.625,0.25
3,4,1,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,1,1,12/5/2007,5/30/2010,00:00.0,Engineering,Research and Development,10.0,6.0
6,5,1,00:00.0,32.69,2,adventure-works\gail0,3.0,Design Engineer,9/27/1952,M,F,1/6/2008,1,1,1,1/6/2008,,00:00.0,Engineering,Research and Development,2.75,0.625
7,6,1,00:00.0,32.69,2,adventure-works\jossef0,3.0,Design Engineer,3/11/1959,M,M,1/24/2008,1,1,1,1/24/2008,,00:00.0,Engineering,Research and Development,2.875,0.75
15,14,1,00:00.0,36.06,2,adventure-works\michael8,3.0,Senior Design Engineer,6/16/1979,S,M,12/30/2010,1,1,1,12/30/2010,,00:00.0,Engineering,Research and Development,2.625,0.375
304,291,4,00:00.0,20.0,2,adventure-works\panda,3.0,Data Bamboo-zler,1/1/2021,S,F,6/12/2022,0,1,1,7/4/2022,,00:00.0,Engineering,Research and Development,0.0,100.0


#### Removing organization level 4
The dataframe you are working with is going to be used for analysis of employees working in levels other than organization level 4. Because of this, you will need to remove all records in the dataframe where employees worked in organization level 4.

##### Question 27: Find rows where `OrganizationLevel` is 4
Use a filter or the `.loc` property to get a filtered dataframe where `OrganizationLevel` **is** 4.

<p style="font-size:.75rem">Expected output: 190 rows, starting with index 9 (LoginID is diane1) and ending with index 274 (LoginID is reinout0)</p>

In [100]:
df.loc[ df['OrganizationLevel'] == 4 ]

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department,SickLeaveDays,VacationDays
9,8,6,00:00.0,40.87,2,adventure-works\diane1,4.0,Research and Development Engineer,6/5/1986,S,F,12/29/2008,1,1,1,12/29/2008,,00:00.0,Research and Development,Research and Development,6.375,7.750
10,9,6,00:00.0,40.87,2,adventure-works\gigi0,4.0,Research and Development Engineer,1/21/1979,M,F,1/16/2009,1,1,1,1/16/2009,,00:00.0,Research and Development,Research and Development,6.375,7.875
11,10,6,00:00.0,42.48,2,adventure-works\michael6,4.0,Research and Development Manager,11/30/1984,M,M,5/3/2009,1,1,1,5/3/2009,,00:00.0,Research and Development,Research and Development,8.000,2.000
13,12,2,00:00.0,25.00,2,adventure-works\thierry0,4.0,Tool Designer,7/29/1959,M,M,12/11/2007,0,1,1,12/11/2007,,00:00.0,Tool Design,Research and Development,3.000,1.125
14,13,2,00:00.0,25.00,2,adventure-works\janice0,4.0,Tool Designer,5/28/1989,M,F,12/23/2010,0,1,1,12/23/2010,,00:00.0,Tool Design,Research and Development,3.000,1.000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
270,257,5,00:00.0,18.27,2,adventure-works\eric2,4.0,Buyer,9/17/1972,S,M,1/27/2010,0,1,1,1/27/2010,,00:00.0,Purchasing,Inventory Management,5.875,6.750
271,258,5,00:00.0,18.27,2,adventure-works\erin0,4.0,Buyer,1/4/1971,S,F,1/31/2010,0,1,1,1/31/2010,,00:00.0,Purchasing,Inventory Management,5.750,6.625
272,259,5,00:00.0,18.27,2,adventure-works\ben0,4.0,Buyer,6/3/1973,M,M,3/9/2010,0,1,1,3/9/2010,,00:00.0,Purchasing,Inventory Management,5.875,6.875
273,260,5,00:00.0,12.75,2,adventure-works\annette0,4.0,Purchasing Assistant,1/29/1978,M,F,12/6/2010,0,1,1,12/6/2010,,00:00.0,Purchasing,Inventory Management,5.625,6.250


##### Question 28: Get the indexes where `OrganizationLevel` is 4
Get a list of indexes where `OrganizationLevel` is 4. To do this, simply take the code you used above and use the property `.index` on it.

<p style="font-size:.75rem">Expected output: Int64Index of length 190 and dtype int64, starting with index 9 and ending with index 274</p>

In [101]:
df.loc[ df['OrganizationLevel'] == 4 ].index

Int64Index([  9,  10,  11,  13,  14,  31,  32,  33,  34,  35,
            ...
            265, 266, 267, 268, 269, 270, 271, 272, 273, 274],
           dtype='int64', length=190)

##### Question 29: Save the indexes to a variable
Take the code from above and save the list of indexes to a variable called `level_4_indexes`.

<p style="font-size:.75rem">Expected output: None</p>

In [102]:
level_4_indexes = df.loc[ df['OrganizationLevel'] == 4 ].index

##### Question 30: Drop the records with organization level 4
Use the `.drop()` method to drop the records with organization level 4. Remember that the indexes of people who have organization level 4 are already stored in a variable `level_4_indexes`. You can pass this variable to the `.drop()` method using `index=level_4_indexes`. Make sure to override the original dataframe.

<p style="font-size:.75rem">Expected output: None</p>

In [103]:
df.drop(index=level_4_indexes, inplace=True)

##### Question 31: Print out the value counts
Print out the value counts of the `OrganizationLevel` column by selecting the column and then using the `.value_counts()` method on it. Make sure that no employees with `ShiftID` of 4 still exist in the dataframe.

<p style="font-size:.75rem">Expected output: Value counts Series that shows 75 occurrences of 3.0, 27 occurrences of 2.0, and 11 occurrences of 1.0. 4.0 is not listed at all </p>

In [111]:
df['OrganizationLevel'].value_counts()

3.0    75
2.0    27
1.0    11
Name: OrganizationLevel, dtype: int64

#### Sorting the dataframe
The CEO wants to make sure that none of the employees have a higher rate of pay than he does. Sort the dataframe by `Rate` using the `.sort_values()` method.

##### Question 32: Sort the dataframe by `Rate`
Sort the dataframe by the `Rate` column. You can do this by using the `.sort_values()` method. Do not pass in any other arguments to the `.sort_values()` method.

<p style="font-size:.75rem">Expected output: Dataframe sorted in ascending order based on Rate</p>

In [105]:
df.sort_values("Rate")

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department,SickLeaveDays,VacationDays
228,224,8,00:00.0,8.62,2,adventure-works\william0,3.0,Scheduling Assistant,11/6/1981,M,M,1/7/2009,0,1,1,9/1/2011,,00:00.0,Production Control,Manufacturing,5.250,5.625
3,4,1,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,1,1,12/5/2007,5/30/2010,00:00.0,Engineering,Research and Development,10.000,6.000
4,4,2,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,1,1,5/31/2010,,00:00.0,Tool Design,Research and Development,10.000,6.000
227,224,7,00:00.0,8.62,2,adventure-works\william0,3.0,Scheduling Assistant,11/6/1981,M,M,1/7/2009,0,1,1,1/7/2009,8/31/2011,00:00.0,Production,Manufacturing,5.250,5.625
238,233,14,00:00.0,9.75,2,adventure-works\magnus0,3.0,Facilities Administrative Assistant,8/27/1971,M,M,12/21/2009,0,1,1,12/21/2009,,00:00.0,Facilities and Maintenance,Executive General and Administration,7.875,10.875
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
242,234,16,00:00.0,60.10,2,adventure-works\laura1,1.0,Chief Financial Officer,1/6/1976,M,F,1/31/2009,1,1,1,11/14/2013,,00:00.0,Executive,Executive General and Administration,2.500,0.000
1,2,1,00:00.0,63.46,2,adventure-works\terri0,1.0,Vice President of Engineering,8/1/1971,S,F,1/31/2008,1,1,1,1/31/2008,,00:00.0,Engineering,Research and Development,2.500,0.125
286,273,3,00:00.0,72.12,2,adventure-works\brian3,1.0,Vice President of Sales,6/6/1977,S,M,2/15/2011,1,1,1,2/15/2011,,00:00.0,Sales,Sales and Marketing,3.125,1.250
28,25,7,00:00.0,84.13,2,adventure-works\james1,1.0,Vice President of Production,1/7/1983,S,M,2/3/2009,1,1,1,2/3/2009,,00:00.0,Production,Manufacturing,6.500,8.000


##### Question 33: Sort by `Rate` descending
Use the same code as above but pass in the correct arguments to sort the dataframe by `Rate` descending.

<p style="font-size:.75rem">Expected output: Dataframe sorted in descending order based on the Rate column</p>

In [106]:
df.sort_values("Rate", ascending=False)

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department,SickLeaveDays,VacationDays
0,1,16,00:00.0,125.50,2,adventure-works\ken0,,Chief Executive Officer,1/29/1969,S,M,1/14/2009,1,1,1,1/14/2009,,00:00.0,Executive,Executive General and Administration,8.625,12.375
28,25,7,00:00.0,84.13,2,adventure-works\james1,1.0,Vice President of Production,1/7/1983,S,M,2/3/2009,1,1,1,2/3/2009,,00:00.0,Production,Manufacturing,6.500,8.000
286,273,3,00:00.0,72.12,2,adventure-works\brian3,1.0,Vice President of Sales,6/6/1977,S,M,2/15/2011,1,1,1,2/15/2011,,00:00.0,Sales,Sales and Marketing,3.125,1.250
1,2,1,00:00.0,63.46,2,adventure-works\terri0,1.0,Vice President of Engineering,8/1/1971,S,F,1/31/2008,1,1,1,1/31/2008,,00:00.0,Engineering,Research and Development,2.500,0.125
242,234,16,00:00.0,60.10,2,adventure-works\laura1,1.0,Chief Financial Officer,1/6/1976,M,F,1/31/2009,1,1,1,11/14/2013,,00:00.0,Executive,Executive General and Administration,2.500,0.000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238,233,14,00:00.0,9.75,2,adventure-works\magnus0,3.0,Facilities Administrative Assistant,8/27/1971,M,M,12/21/2009,0,1,1,12/21/2009,,00:00.0,Facilities and Maintenance,Executive General and Administration,7.875,10.875
227,224,7,00:00.0,8.62,2,adventure-works\william0,3.0,Scheduling Assistant,11/6/1981,M,M,1/7/2009,0,1,1,1/7/2009,8/31/2011,00:00.0,Production,Manufacturing,5.250,5.625
228,224,8,00:00.0,8.62,2,adventure-works\william0,3.0,Scheduling Assistant,11/6/1981,M,M,1/7/2009,0,1,1,9/1/2011,,00:00.0,Production Control,Manufacturing,5.250,5.625
4,4,2,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,1,1,5/31/2010,,00:00.0,Tool Design,Research and Development,10.000,6.000


##### Question 34: Sort by `Rate` ascending and `LoginID` descending
Use appropriate arguments to sort the dataframe again by the `Rate` column in ascending order and the `LoginID` column in descending order.

<p style="font-size:.75rem">Expected output: Dataframe sorted in ascending order for the Rate column and descending order for the LoginID column</p>

In [107]:
df.sort_values(['Rate', 'LoginID'], ascending=[True, False])

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department,SickLeaveDays,VacationDays
227,224,7,00:00.0,8.62,2,adventure-works\william0,3.0,Scheduling Assistant,11/6/1981,M,M,1/7/2009,0,1,1,1/7/2009,8/31/2011,00:00.0,Production,Manufacturing,5.250,5.625
228,224,8,00:00.0,8.62,2,adventure-works\william0,3.0,Scheduling Assistant,11/6/1981,M,M,1/7/2009,0,1,1,9/1/2011,,00:00.0,Production Control,Manufacturing,5.250,5.625
3,4,1,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,1,1,12/5/2007,5/30/2010,00:00.0,Engineering,Research and Development,10.000,6.000
4,4,2,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,1,1,5/31/2010,,00:00.0,Tool Design,Research and Development,10.000,6.000
238,233,14,00:00.0,9.75,2,adventure-works\magnus0,3.0,Facilities Administrative Assistant,8/27/1971,M,M,12/21/2009,0,1,1,12/21/2009,,00:00.0,Facilities and Maintenance,Executive General and Administration,7.875,10.875
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
242,234,16,00:00.0,60.10,2,adventure-works\laura1,1.0,Chief Financial Officer,1/6/1976,M,F,1/31/2009,1,1,1,11/14/2013,,00:00.0,Executive,Executive General and Administration,2.500,0.000
1,2,1,00:00.0,63.46,2,adventure-works\terri0,1.0,Vice President of Engineering,8/1/1971,S,F,1/31/2008,1,1,1,1/31/2008,,00:00.0,Engineering,Research and Development,2.500,0.125
286,273,3,00:00.0,72.12,2,adventure-works\brian3,1.0,Vice President of Sales,6/6/1977,S,M,2/15/2011,1,1,1,2/15/2011,,00:00.0,Sales,Sales and Marketing,3.125,1.250
28,25,7,00:00.0,84.13,2,adventure-works\james1,1.0,Vice President of Production,1/7/1983,S,M,2/3/2009,1,1,1,2/3/2009,,00:00.0,Production,Manufacturing,6.500,8.000


##### Question 35: Save the sorted dataframe
Use the code written above to save the newly sorted dataframe to the variable `df`. This new dataframe will override the old dataframe.

<p style="font-size:.75rem">Expected output: None</p>

In [108]:
df = df.sort_values(['Rate', 'LoginID'], ascending=[True, False])

##### Question 36: Reset the indexes
Use the `.sort_index()` method to reset the rows back to their previous order. Make sure to save it to a new variable `df`. This should override the existing dataframe.

<p style="font-size:.75rem">Expected output: None</p>

In [109]:
df = df.sort_index()

##### Question 37: Print out the final dataframe
Use the `.head()` method to print out the first five rows of the dataframe. Make sure that the rows are ordered by their index.

<p style="font-size:.75rem">Expected output: 5 rows of dataframe, indexes 0-4. The dataframe is order by the index ascending</p>

In [110]:
df.head()

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department,SickLeaveDays,VacationDays
0,1,16,00:00.0,125.5,2,adventure-works\ken0,,Chief Executive Officer,1/29/1969,S,M,1/14/2009,1,1,1,1/14/2009,,00:00.0,Executive,Executive General and Administration,8.625,12.375
1,2,1,00:00.0,63.46,2,adventure-works\terri0,1.0,Vice President of Engineering,8/1/1971,S,F,1/31/2008,1,1,1,1/31/2008,,00:00.0,Engineering,Research and Development,2.5,0.125
2,3,1,00:00.0,43.27,2,adventure-works\roberto0,2.0,Engineering Manager,11/12/1974,M,M,11/11/2007,1,1,1,11/11/2007,,00:00.0,Engineering,Research and Development,2.625,0.25
3,4,1,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,1,1,12/5/2007,5/30/2010,00:00.0,Engineering,Research and Development,10.0,6.0
4,4,2,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,1,1,5/31/2010,,00:00.0,Tool Design,Research and Development,10.0,6.0
