# Pandas Assignment 2
In this assignment you will practice working with Pandas dataframes.

Specifically, you will be carrying out the following excercises:

1. Work with Pandas indexes.
2. Use filtering to get subsets of dataframes.
3. Update rows and columns.

### Note about assignments
You can add lines of code according to you preferences. As long as the code required by the assignment is found in this notebook file, you will receive credit for it regardless of the presence of additional code. You will lose points for removing starting code if it prevents the notebook from running correctly.

## About the data
The data used in this assignment is a table built from the Human Resources schema of the Adventure Works 2019 database. This data contains information about each time that Employee Pay History was changed (each line is a pay rate change). It also contains information about the employee and the department they were working in when they received the pay rate listed.

The actual data is stored in a `.csv` file located inside the `data` directory. The file is called `pay_history.csv`.

##### Imports, dataframe, etc.

Use the code below to show all columns in the dataframe:

```python
pd.set_option('display.max_columns', None)
```

In [2]:
import pandas as pd

df = pd.read_csv("./data/pay_history.csv")
pd.set_option('display.max_columns', None)

## Instructions

Answer the questions below. Make sure to show your work and your answers as the output to the cells.

### Indexes
##### Question 1: Get the column `DepartmentName` as a Series object. Save it into a variable called `departmentSeries`.

<p style="font-size:.75rem">Expected output: None</p>

In [3]:
departmentSeries = df['DepartmentName']

##### Question 2: Get the value counts of each unique value in the `DepartmentName` series.

<p style="font-size:.75rem">Expected output: Series of value counts for 16 unique department names.</p>

In [4]:
departmentSeries.value_counts()

Production                    180
Sales                          18
Purchasing                     15
Marketing                      11
Finance                        11
Information Services           10
Quality Assurance               8
Engineering                     7
Production Control              7
Facilities and Maintenance      7
Shipping and Receiving          6
Human Resources                 6
Tool Design                     5
Document Control                5
Executive                       4
Research and Development        4
Name: DepartmentName, dtype: int64

##### Question 3: Use the `.loc` property to get rows 25, 80, and 122 and the `JobTitle` column.

<p style="font-size:.75rem">Expected output: Series or dataframe from JobTitle column with three rows, indexes 25, 80, and 122. The job titles are Marketing Specialist, Production Technician - WC30, and Production Technician - WC50.</p>

In [5]:
df.loc[[25, 80, 122], "JobTitle"]

25             Marketing Specialist
80     Production Technician - WC30
122    Production Technician - WC50
Name: JobTitle, dtype: object

##### Question 4: Did the previous question return a dataframe or a Series object? Why might this have happened?

Remember that dataframes are formatted so that when you move your mouse over the rows, they change color. Series 

<p style="font-size:.75rem">Expected output: None</p>

```
The previous query returned a Series because only one column was selected. It would have also returned a Series if multiple columns but a single row was selected.
```

##### Question 5: Get the first through sixth columns and the last 5 rows using `.iloc`

<p style="font-size:.75rem">Expected output: Rows with indexes 298-302 and columns DepartmentID, RateChangeDate, Rate, PayFrequency, and LoginID</p>

In [6]:
df_length = df.shape[0]
df.iloc[df_length-6:df_length-1, 1:6]

Unnamed: 0,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID
298,3,00:00.0,48.101,2,adventure-works\syed0
299,3,00:00.0,23.0769,2,adventure-works\lynn0
300,3,00:00.0,48.101,2,adventure-works\amy0
301,3,00:00.0,23.0769,2,adventure-works\rachel0
302,3,00:00.0,23.0769,2,adventure-works\jae0


### Filtering

##### Question 6: Filter the dataframe to only include rows where the employee is in the `DepartmentName` Engineering.

<p style="font-size:.75rem">Expected output: Rows with indexes 1, 2, 3, 6, 7, 15, and 16</p>

In [7]:
df.loc[ df['DepartmentName'] == 'Engineering' ]

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,VacationHours,SickLeaveHours,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department
1,2,1,00:00.0,63.4615,2,adventure-works\terri0,1.0,Vice President of Engineering,8/1/1971,S,F,1/31/2008,1,1,20,1,1,1/31/2008,,00:00.0,Engineering,Research and Development
2,3,1,00:00.0,43.2692,2,adventure-works\roberto0,2.0,Engineering Manager,11/12/1974,M,M,11/11/2007,1,2,21,1,1,11/11/2007,,00:00.0,Engineering,Research and Development
3,4,1,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,48,80,1,1,12/5/2007,5/30/2010,00:00.0,Engineering,Research and Development
6,5,1,00:00.0,32.6923,2,adventure-works\gail0,3.0,Design Engineer,9/27/1952,M,F,1/6/2008,1,5,22,1,1,1/6/2008,,00:00.0,Engineering,Research and Development
7,6,1,00:00.0,32.6923,2,adventure-works\jossef0,3.0,Design Engineer,3/11/1959,M,M,1/24/2008,1,6,23,1,1,1/24/2008,,00:00.0,Engineering,Research and Development
15,14,1,00:00.0,36.0577,2,adventure-works\michael8,3.0,Senior Design Engineer,6/16/1979,S,M,12/30/2010,1,3,21,1,1,12/30/2010,,00:00.0,Engineering,Research and Development
16,15,1,00:00.0,32.6923,2,adventure-works\sharon0,3.0,Design Engineer,5/2/1961,M,F,1/18/2011,1,4,22,1,1,1/18/2011,,00:00.0,Engineering,Research and Development


##### Question 7: Filter the dataframe to include rows where the employee is either in the `DepartmentName` of Engineering or the `Sub-Department` of Quality Assurance.

<p style="font-size:.75rem">Expected output: Rows with indexes 1-7, 15-16, 214-224, 259, and 261, or 20 rows total</p>

In [8]:
df[ (df['DepartmentName'] == 'Engineering') | (df['Sub-Department'] == 'Quality Assurance') ]

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,VacationHours,SickLeaveHours,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department
1,2,1,00:00.0,63.4615,2,adventure-works\terri0,1.0,Vice President of Engineering,8/1/1971,S,F,1/31/2008,1,1,20,1,1,1/31/2008,,00:00.0,Engineering,Research and Development
2,3,1,00:00.0,43.2692,2,adventure-works\roberto0,2.0,Engineering Manager,11/12/1974,M,M,11/11/2007,1,2,21,1,1,11/11/2007,,00:00.0,Engineering,Research and Development
3,4,1,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,48,80,1,1,12/5/2007,5/30/2010,00:00.0,Engineering,Research and Development
6,5,1,00:00.0,32.6923,2,adventure-works\gail0,3.0,Design Engineer,9/27/1952,M,F,1/6/2008,1,5,22,1,1,1/6/2008,,00:00.0,Engineering,Research and Development
7,6,1,00:00.0,32.6923,2,adventure-works\jossef0,3.0,Design Engineer,3/11/1959,M,M,1/24/2008,1,6,23,1,1,1/24/2008,,00:00.0,Engineering,Research and Development
15,14,1,00:00.0,36.0577,2,adventure-works\michael8,3.0,Senior Design Engineer,6/16/1979,S,M,12/30/2010,1,3,21,1,1,12/30/2010,,00:00.0,Engineering,Research and Development
16,15,1,00:00.0,32.6923,2,adventure-works\sharon0,3.0,Design Engineer,5/2/1961,M,F,1/18/2011,1,4,22,1,1,1/18/2011,,00:00.0,Engineering,Research and Development
214,211,13,00:00.0,28.8462,2,adventure-works\hazem0,2.0,Quality Assurance Manager,10/26/1977,S,M,2/28/2009,1,80,60,1,1,2/28/2009,,00:00.0,Quality Assurance,Quality Assurance
215,212,13,00:00.0,21.6346,2,adventure-works\peng0,3.0,Quality Assurance Supervisor,3/18/1976,M,M,12/9/2008,1,81,60,1,1,12/9/2008,,00:00.0,Quality Assurance,Quality Assurance
216,213,13,00:00.0,10.5769,2,adventure-works\sootha0,4.0,Quality Assurance Technician,12/5/1966,M,M,2/23/2010,0,85,62,1,3,2/23/2010,,00:00.0,Quality Assurance,Quality Assurance


##### Question 8: Filter the dataframe to only include rows with the word "Manager" in the `JobTitle`.

<p style="font-size:.75rem">Expected output: 24 rows returned starting with row of index 2 and ending with row of index 300</p>

In [9]:
df.loc[ df['JobTitle'].str.contains("Manager") ]

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,VacationHours,SickLeaveHours,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department
2,3,1,00:00.0,43.2692,2,adventure-works\roberto0,2.0,Engineering Manager,11/12/1974,M,M,11/11/2007,1,2,21,1,1,11/11/2007,,00:00.0,Engineering,Research and Development
8,7,6,00:00.0,50.4808,2,adventure-works\dylan0,3.0,Research and Development Manager,2/24/1987,M,M,2/8/2009,1,61,50,1,1,2/8/2009,,00:00.0,Research and Development,Research and Development
11,10,6,00:00.0,42.4808,2,adventure-works\michael6,4.0,Research and Development Manager,11/30/1984,M,M,5/3/2009,1,16,64,1,1,5/3/2009,,00:00.0,Research and Development,Research and Development
17,16,5,00:00.0,24.0,2,adventure-works\david0,1.0,Marketing Manager,3/19/1975,S,M,12/20/2007,1,40,40,1,1,12/20/2007,7/14/2009,00:00.0,Purchasing,Inventory Management
18,16,4,00:00.0,24.0,2,adventure-works\david0,1.0,Marketing Manager,3/19/1975,S,M,12/20/2007,1,40,40,1,1,7/15/2009,,00:00.0,Marketing,Sales and Marketing
19,16,4,00:00.0,28.75,2,adventure-works\david0,1.0,Marketing Manager,3/19/1975,S,M,12/20/2007,1,40,40,1,1,7/15/2009,,00:00.0,Marketing,Sales and Marketing
29,26,8,00:00.0,24.5192,2,adventure-works\peter0,2.0,Production Control Manager,11/3/1982,M,M,12/1/2008,1,43,41,1,1,12/1/2008,,00:00.0,Production Control,Manufacturing
214,211,13,00:00.0,28.8462,2,adventure-works\hazem0,2.0,Quality Assurance Manager,10/26/1977,S,M,2/28/2009,1,80,60,1,1,2/28/2009,,00:00.0,Quality Assurance,Quality Assurance
220,217,12,00:00.0,17.7885,2,adventure-works\zainal0,3.0,Document Control Manager,1/30/1976,M,M,1/4/2009,0,77,58,1,1,1/4/2009,,00:00.0,Document Control,Quality Assurance
232,227,14,00:00.0,24.0385,2,adventure-works\gary1,2.0,Facilities Manager,2/18/1971,M,M,12/2/2009,1,86,63,1,1,12/2/2009,,00:00.0,Facilities and Maintenance,Executive General and Administration


##### Question 9: Get a dataframe of all employees who are not salaried, are married, and work in the `Department` of Finance.

<p style="font-size:.75rem">Expected output: 3 rows, indexes 251, 252, 275</p>

In [10]:
df.loc[ ~(df['SalariedFlag'] == 1) & (df['MaritalStatus'] == 'S') & (df['DepartmentName'] == 'Finance')]

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,VacationHours,SickLeaveHours,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department
251,243,10,00:00.0,19.0,2,adventure-works\candy0,3.0,Accounts Receivable Specialist,2/23/1976,S,F,1/6/2009,0,61,50,1,1,1/6/2009,,00:00.0,Finance,Executive General and Administration
252,244,10,00:00.0,19.0,2,adventure-works\bryan1,3.0,Accounts Receivable Specialist,9/20/1984,S,M,1/24/2009,0,62,51,1,1,1/24/2009,,00:00.0,Finance,Executive General and Administration
275,262,10,00:00.0,13.4615,2,adventure-works\david5,2.0,Assistant to the Chief Financial Officer,6/21/1964,S,M,1/12/2009,0,56,48,1,1,1/12/2009,,00:00.0,Finance,Executive General and Administration


##### Question 10: Get a dataframe of all employees hired in 2009 who are paid more than $40 per hour. Remember that Pandas currently thinks that the `HireDate` field is a string (use this as an advantage).

<p style="font-size:.75rem">Expected output: 7 rows starting with index 0 and ending with index 242</p>

In [11]:
df.loc[ (df['HireDate'].str.contains('2009')) & (df['Rate'] > 40) ]

Unnamed: 0,EmployeeID,DepartmentID,RateChangeDate,Rate,PayFrequency,LoginID,OrganizationLevel,JobTitle,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,VacationHours,SickLeaveHours,CurrentFlag,ShiftID,StartDate,EndDate,ModifiedDate,DepartmentName,Sub-Department
0,1,16,00:00.0,125.5,2,adventure-works\ken0,,Chief Executive Officer,1/29/1969,S,M,1/14/2009,1,99,69,1,1,1/14/2009,,00:00.0,Executive,Executive General and Administration
8,7,6,00:00.0,50.4808,2,adventure-works\dylan0,3.0,Research and Development Manager,2/24/1987,M,M,2/8/2009,1,61,50,1,1,2/8/2009,,00:00.0,Research and Development,Research and Development
10,9,6,00:00.0,40.8654,2,adventure-works\gigi0,4.0,Research and Development Engineer,1/21/1979,M,F,1/16/2009,1,63,51,1,1,1/16/2009,,00:00.0,Research and Development,Research and Development
11,10,6,00:00.0,42.4808,2,adventure-works\michael6,4.0,Research and Development Manager,11/30/1984,M,M,5/3/2009,1,16,64,1,1,5/3/2009,,00:00.0,Research and Development,Research and Development
28,25,7,00:00.0,84.1346,2,adventure-works\james1,1.0,Vice President of Production,1/7/1983,S,M,2/3/2009,1,64,52,1,1,2/3/2009,,00:00.0,Production,Manufacturing
241,234,16,00:00.0,48.5577,2,adventure-works\laura1,1.0,Chief Financial Officer,1/6/1976,M,F,1/31/2009,1,0,20,1,1,11/14/2013,,00:00.0,Executive,Executive General and Administration
242,234,16,00:00.0,60.0962,2,adventure-works\laura1,1.0,Chief Financial Officer,1/6/1976,M,F,1/31/2009,1,0,20,1,1,11/14/2013,,00:00.0,Executive,Executive General and Administration


### Updating Rows and Columns

##### Question 11: Change the column names to be all lowercase. Make sure that this change is reflected in the original dataframe (ie. print it out).

<p style="font-size:.75rem">Expected output: DataFrame with lowercase column names</p>

In [12]:
df.columns = df.columns.str.lower()

##### Question 12: Use the `.map()` method to change the values in the `maritalstatus` column to say either Married or Not Married.

<p style="font-size:.75rem">Expected output: A Series whose values are either "Not Married" or "Married"</p>

In [13]:
df['maritalstatus'].map({"M" : "Married", "S": "Not Married"})

0      Not Married
1      Not Married
2          Married
3      Not Married
4      Not Married
          ...     
299    Not Married
300        Married
301    Not Married
302        Married
303    Not Married
Name: maritalstatus, Length: 304, dtype: object

##### Question 13: Use the `.replace()` method to replace "Chief Executive Office" with "President and CEO" in the column `jobtitle`. Show the first five rows of the dataframe.

<p style="font-size:.75rem">Expected output: 5 rows, indexes 0-4</p>

In [14]:
df['jobtitle'] = df['jobtitle'].replace({'Chief Executive Officer': 'President and CEO'})
df.head()

Unnamed: 0,employeeid,departmentid,ratechangedate,rate,payfrequency,loginid,organizationlevel,jobtitle,birthdate,maritalstatus,gender,hiredate,salariedflag,vacationhours,sickleavehours,currentflag,shiftid,startdate,enddate,modifieddate,departmentname,sub-department
0,1,16,00:00.0,125.5,2,adventure-works\ken0,,President and CEO,1/29/1969,S,M,1/14/2009,1,99,69,1,1,1/14/2009,,00:00.0,Executive,Executive General and Administration
1,2,1,00:00.0,63.4615,2,adventure-works\terri0,1.0,Vice President of Engineering,8/1/1971,S,F,1/31/2008,1,1,20,1,1,1/31/2008,,00:00.0,Engineering,Research and Development
2,3,1,00:00.0,43.2692,2,adventure-works\roberto0,2.0,Engineering Manager,11/12/1974,M,M,11/11/2007,1,2,21,1,1,11/11/2007,,00:00.0,Engineering,Research and Development
3,4,1,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,48,80,1,1,12/5/2007,5/30/2010,00:00.0,Engineering,Research and Development
4,4,2,00:00.0,8.62,2,adventure-works\rob0,3.0,Senior Tool Designer,12/23/1974,S,M,12/5/2007,0,48,80,1,1,5/31/2010,,00:00.0,Tool Design,Research and Development


##### EXTRA CREDIT Question 14: Create a function that can grab the employee's username out of the `loginid`. Apply the function to the `loginid` Series.

Use the function `get_username()` provided to you below and apply it to the column `loginid` using the `.apply()` method. Note that you do not need to save this new Series as a column since that has not been covered yet.

<p style="font-size:.75rem">Expected output: A Series of usernames, starting with "ken0", "terri0", "roberto0", etc.</p>

In [16]:
def get_username(login):
    return login.split('\\')[-1]

df['loginid'].apply(get_username)

0          ken0
1        terri0
2      roberto0
3          rob0
4          rob0
         ...   
299       lynn0
300        amy0
301     rachel0
302        jae0
303     ranjit0
Name: loginid, Length: 304, dtype: object

##### EXTRA CREDIT Question 15: Use the `.apply()` method to change the `rate` column to be rounded to two decimal places.

Use the function `round_to_two_decimals()` provided to you below and apply it to the column `rate` using the `.apply()` method. Note that you do not need to save this new Series as a column since that has not been covered yet.

<p style="font-size:.75rem">Expected output: A Series of decimal numbers, rounded to two decimal places and starting with 125.50, 63.46, and 43.27</p>

In [18]:
def round_to_two_decimals(rate):
    return round(rate,2)

df['rate'].apply(round_to_two_decimals)

0      125.50
1       63.46
2       43.27
3        8.62
4        8.62
        ...  
299     23.08
300     48.10
301     23.08
302     23.08
303     23.08
Name: rate, Length: 304, dtype: float64