# Advanced Pandas Assignment 2

In this assignment, you will practice grouping and aggregating data inside of Pandas dataframes. You will also practice using string operations on string data.

### Note about assignments
You can add lines of code according to your preferences. As long as the code required by the assignment is found in this notebook under the corresponding question header (ie. the answer to question 1 is underneath the title "Question 1"), you will receive credit for it.

## About the data
The data used in this assignment is a table built from the Human Resources schema of the Adventure Works 2019 database. This data contains information about each time that Employee Pay History was changed (each line is a pay rate change). It also contains information about the employee and the department they were working in when they received the pay rate listed. 

The actual data is stored in a CSV file located inside the `data` folder. The file is called `pay_history.csv`.

## Instructions
### Set up
##### Import Pandas
Import the Pandas library into Jupyter Lab.

##### Disable column display limit
Use the following code to disable the default limit for displaying columns. If you don't use this code, a data set with more than 20 columns will be truncated when displayed to take up less space.

```
pd.options.display.max_columns = None
```

##### Create the dataframe
Use the `read_csv()` function from Pandas to read the data from the `pay_history.csv` file into a dataframe called `df`.

##### Preview dataframe
Use the `.head()` method to print out the first 5 rows of the dataframe.

### Questions
#### Top Paid Employee
##### Question 1: Get the `Rate` column
Print out the `Rate` column from the dataframe.

##### Question 2: Get the top `Rate` among all employees
Use the `.max()` method on the `Rate` column to find the top pay rate among all employees. Print it out.

##### Question 3: Save max rate as a variable
The data returned by the previous code returns a single number. Copy the code you wrote above and save the result to a variable called `max_rate`.

##### Question 4: Get record for employee(s) who have the top rate
Create a filter that finds records in the dataframe who have a rate equal to `max_rate`. Save the filter to a variable called `filt`.

##### Question 5: Get the top paid employees' records
Use the `filt` variable to get back rows in the dataframe with information about the top earning employee.

##### Question 6: What is the Job Title and Organization Level of the top paid employee?
```
Job Title: Cheif Executive Officer
Organization Level: NaN
```

#### Top Paid Employee by Organization Level
##### Question 7: Group by `OrganizationLevel`
Use the `.groupby()` method to create a `GroupBy` object. Group by the column `OrganizationLevel`.

##### Question 8: Use the `.agg()` method
Use the `.agg()` method on the code you typed above to find the maximum rate in each organization level. Print out the resulting dataframe.

Remember that the `.agg()` method takes a dictionary where keys represent the columns to aggregate on and the value represents *how* to aggregate them.

##### Question 9: Rank the maximum rates
Looking at the aggregation performed above, rank the maximum pay rates across organization levels.

```
Highest paid: ?
2nd highest paid: ?
3rd highest paid: ?
Lowest paid: ?
```

##### Question 10: Aggregate by max and average `Rate`
Add to the code above and find both the maximum and average pay rates across each organization level. Print out the resulting dataframe.

##### Question 11: Rank the average pay rates
Looking at the aggregation performed above, rank the average pay rates across organization levels.

```
Highest paid: ?
2nd highest paid: ?
3rd highest paid: ?
Lowest paid: ?
```

##### Question 12: What do you notice?
Observe the rankings of maximum and average rates of pay. What do you notice about the differences between these rankings? Why might the rankings be different? There is no correct answer, but you should try to think of something that could be further explored.

```
Your answer here.
```

#### Hire Date
##### Question 13: Print out the dataframe
Use the `.head()` method to print out the first five rows of the dataframe.

##### Question 14: Get data types
Use the `.info()` method to get the data types for each column in the dataframe.

##### Question 15: Data type of `HireDate`
Notice the data types outputted from the `.info()` method. What is the data type of `HireDate`? What does that mean?

```
Your answer here.
```

##### Question 16: Split the `HireDate` column
Using the `.str` accessor object and the `.split()` method, get a Series of values in `HireDate` that are split by the forward slash `/` symbol.

##### Question 17: Observe split results
Observe the results of splitting the `HireDate` column by the forward slash `/` symbol. How many items does each list have? What might they represent?

```
Your answer here.
```

##### Question 18: Get the last value from each list
Use the `.str` accessor object on the code you wrote previously to get the second/last value out of the split list. Print out the resulting Series.

##### Question 19: Create column `HireYear`
Using the code you wrote above, create a new column in the dataframe called `HireYear`.

##### Question 20: Print out the resulting dataframe
Print out the resulting dataframe. Make sure it has a column called `HireYear` that contains four-digit years.

#### Hiring rates
Using the column you just created `HireYear`, you will count up how many employees were hired in each year.

##### Question 21: Group by `HireYear`
Create a `GroupBy` object where rows are grouped by the year they were hired. Print it out, but don't save it to a variable.

##### Question 22: Aggregate by counting
Use the `.agg()` method to figure out how many employees were hired in each year. Count each `EmployeeID` using `count`.

##### Question 23: Aggregating by counting unique
Wait! You just realized that this dataset contains data for each pay rate change for all employees. Because employees can appear several times in the data set, you might actually be counting each employee several times.

Change the group by code above to use `nunique` instead of `count`.

##### Question 24: Best growth year
Which year saw the most growth in terms of number of employees? What does this make you think about the organization?

```
Your answer here.
```

#### Assistant vacation days expectation
Anna knows you from your past life and is seeking employement at Adventure Works as an assistant. She doesn't know what or who she would be an assistant to, but she wants you to help her get the inside scoop on how many vacation days she might get if hired.

##### Question 25: Get rows with 'Assistant' in the `JobTitle`
Using the `.str` accessor object and the `.contains()` method, get a Series of boolean (True/False) values that indicate which rows contain the word "Assistant" in the column `JobTitle`. Note that `.contains()` is case sensitive. (all rows will likely all seem to say "False")

##### Question 26: Save to variable
Save the code you wrote above to a variable called `assistant_filter`.

##### Question 27: Get rows with assistants
Pass the variable `assistant_filter` in to the dataframe in the same way that you would a normal filter. Make sure that rows are returned that have a `JobTitle` with the word "Assistant" in it.

##### Question 28: Save the filtered dataframe
Using the code above, save the filtered dataframe to a new variable called `assistant_df`.

##### Question 29: Find the average vacation days
Use the `.mean()` method on the `VacationHours` column of the `assistant_df` dataframe to find the average number of vacation hours for assistants.

##### Question 30: Group by `Sub-Department`
Create a `GroupBy` object by grouping the rows in `assistant_df` by the column `Sub-Department`.

##### Question 31: Aggregate across Sub-Departments
Use the `.agg()` method to aggregate across `Sub-Department` using `VacationHours`. Get the count, mean, and median.

##### Question 32: Add aggregation for `SickLeaveHours`
Add to the code above to include aggregatios across `Sub-Department` using `SickLeaveHours`. Get the mean and median of `SickLeaveHours`.

##### Question 33: Add aggregation for `Rate`
Add to the code above to include aggregations across `Sub-Department` using `Rate`. Get the mean and median of `Rate`.

##### Question 34: Best sub-department
Looking at the aggregated table above, which of the sub-departments gives assistants the most vacation hours? Which one gives the most sick leave hours? Which one pays the most (use the median)?

```
Best for vacation hours: ?
Best for sick leave hours: ?
Best for pay: ?
```