# üéØ Pandas Practice 2: Targeting Weak Points

## Focus Areas
Based on our previous session, this notebook focuses specifically on the syntax areas where you had doubts:
1. **Brackets vs Parentheses** (The `.loc` and `.iloc` rules)
2. **Boolean Logic** (`&`, `|`, and the importance of `()`)
3. **Column Mechanics** (Creating vs Selecting, and the "Reassignment Rule")
4. **Advanced Grouping** (Syntax for `.agg()` and multi-column stats)
5. **Merging** (Chaining multiple merges)

**Instructions:**
Same as before: Try to solve without looking up the syntax!


## Setup
Run this cell to load the new dataset:

In [26]:
import pandas as pd
import numpy as np

# Sample Data: Employee Database
employees = pd.DataFrame({
    'EmpID': [101, 102, 103, 104, 105, 106],
    'Name': ['John Doe', 'Jane Smith', 'Mike Ross', 'Rachel Green', 'Ross Geller', 'Monica Bing'],
    'Dept': ['Sales', 'HR', 'Legal', 'Sales', 'Science', 'Chef'],
    'Salary': [60000, 65000, 80000, 62000, 70000, 55000],
    'Years': [2, 5, 3, 1, 10, 4]
})

departments = pd.DataFrame({
    'Dept': ['Sales', 'HR', 'Legal', 'Science', 'Chef'],
    'Location': ['Floor 1', 'Floor 2', 'Floor 3', 'Lab', 'Kitchen']
})

reviews = pd.DataFrame({
    'EmpID': [101, 102, 103, 105],
    'Rating': [4.5, 4.8, 3.9, 4.2]
})

print("Employee data loaded!")
employees.head()

Employee data loaded!


Unnamed: 0,EmpID,Name,Dept,Salary,Years
0,101,John Doe,Sales,60000,2
1,102,Jane Smith,HR,65000,5
2,103,Mike Ross,Legal,80000,3
3,104,Rachel Green,Sales,62000,1
4,105,Ross Geller,Science,70000,10


---
## Topic 1: Selection Syntax (.loc vs .iloc)

### ‚ö†Ô∏è Refined Notes
*   **Brackets `[]` are for Indexing**: Always use `[]` for `.loc` and `.iloc`.
    *   ‚úÖ `df.loc[...]`
    *   ‚ùå `df.loc(...)`
*   **Comma Placement**: The comma separates Row from Column **inside** the brackets.
    *   ‚úÖ `df.loc[rows, cols]`
    *   ‚ùå `df.loc[rows], [cols]`
*   **Labels vs Positions**:
    *   `.loc`: Uses **Names/Labels** (e.g., "Salary", "Name").
    *   `.iloc`: Uses **Integers/Positions** ONLY (0, 1, 2). It crashes with strings!

### Practice Questions

**Q1.1:** Use `.loc` to select the row where the index is 3, and show only the 'Name' column. (Pay attention to brackets!)

In [4]:
# Your answer here
employees.head()
employees.loc[3]

EmpID              104
Name      Rachel Green
Dept             Sales
Salary           62000
Years                1
Name: 3, dtype: object

**Q1.2:** Use `.iloc` to select the first 3 rows and the first 2 columns. (Remember slicing syntax)

In [None]:
# Your answer here
employees.iloc[0:3, 0:2]

Unnamed: 0,EmpID,Name
0,101,John Doe
1,102,Jane Smith
2,103,Mike Ross


**Q1.3:** Use `.loc` to select all employees with Salary > 65000, showing only their 'Name' and 'Dept'. (Watch where you put the comma!)

In [11]:
# Your answer here
employees.loc[(employees["Salary"] > 65000),("Name", "Dept") ]

Unnamed: 0,Name,Dept
2,Mike Ross,Legal
4,Ross Geller,Science


**Q1.4:** Find the **Name** of the employee with the lowest Salary. Use `.idxmin()` and `.loc` together.

In [18]:
# Your answer here
employees.loc[employees["Salary"].idxmin()]



EmpID             106
Name      Monica Bing
Dept             Chef
Salary          55000
Years               4
Name: 5, dtype: object

**Q1.5 (Tricky):** Try to select the 'Dept' column using `.iloc`. (Hint: You need to know the integer position of 'Dept', not its name!)

In [23]:
# Your answer here
employees.iloc[:,2]

0      Sales
1         HR
2      Legal
3      Sales
4    Science
5       Chef
Name: Dept, dtype: object

---
## Topic 2: Boolean Logic & Filtering

### ‚ö†Ô∏è Refined Notes
*   **The Parentheses Rule**: When using `&` (AND) or `|` (OR), you **MUST** wrap each condition in `()`.
    *   ‚úÖ `(df['A'] > 1) & (df['B'] < 5)`
    *   ‚ùå `df['A'] > 1 & df['B'] < 5`
*   **Operators**:
    *   Use `&` (not `and`)
    *   Use `|` (not `or`)
    *   Use `~` (not `not`)
*   **Column Access**: Remember `df["Col"]`, not `df("Col")`.

### Practice Questions

**Q2.1:** Filter for employees who are in 'Sales' OR 'HR'.

In [30]:
# Your answer here
employees[(employees["Dept"] == "Sales") | (employees["Dept"] == "HR")]

Unnamed: 0,EmpID,Name,Dept,Salary,Years
0,101,John Doe,Sales,60000,2
1,102,Jane Smith,HR,65000,5
3,104,Rachel Green,Sales,62000,1


**Q2.2:** Filter for employees with Salary > 60000 AND Years < 4. (Watch your parentheses!)

In [31]:
# Your answer here
employees[(employees["Salary"] > 60000) & (employees["Years"] < 4)]

Unnamed: 0,EmpID,Name,Dept,Salary,Years
2,103,Mike Ross,Legal,80000,3
3,104,Rachel Green,Sales,62000,1


**Q2.3:** Filter for employees who are NOT in 'Legal'. (Hint: Use `!=` or `~`)

In [33]:
# Your answer here
employees[employees["Dept"] != "Legal"]

Unnamed: 0,EmpID,Name,Dept,Salary,Years
0,101,John Doe,Sales,60000,2
1,102,Jane Smith,HR,65000,5
3,104,Rachel Green,Sales,62000,1
4,105,Ross Geller,Science,70000,10
5,106,Monica Bing,Chef,55000,4


**Q2.4:** Filter for employees where Salary is between 60000 and 70000 (inclusive). Try using `between()` or two conditions.

In [35]:
# Your answer here
employees[(employees["Salary"]>= 60000) & (employees["Salary"] <= 70000)]

Unnamed: 0,EmpID,Name,Dept,Salary,Years
0,101,John Doe,Sales,60000,2
1,102,Jane Smith,HR,65000,5
3,104,Rachel Green,Sales,62000,1
4,105,Ross Geller,Science,70000,10


**Q2.5:** Select employees in 'Sales' or 'Legal' who ALSO have a Salary > 60000. (Careful with order of operations!)

In [41]:
# Your answer here
employees.copy = employees[(employees["Dept"] == "Sales") | (employees["Dept"] == "Legal") ]


employees.copy[employees.copy["Salary"] > 60000]

Unnamed: 0,EmpID,Name,Dept,Salary,Years
2,103,Mike Ross,Legal,80000,3
3,104,Rachel Green,Sales,62000,1


---
## Topic 3: Column Mechanics

### ‚ö†Ô∏è Refined Notes
*   **Read vs Write**:
    *   `df["Col"]` = **READ** (Selects existing). Errors if missing.
    *   `df["Col"] = val` = **WRITE** (Creates/Updates).
*   **Dropping is Temporary**:
    *   `df.drop(...)` returns a **copy**. The original `df` stays the same.
    *   To save it, you must reassign: `df = df.drop(...)`.

### Practice Questions

**Q3.1:** Create a NEW column 'Bonus' and set it to 0 for everyone.

In [45]:
# Your answer here
employees["Bonus"] = 0
employees

Unnamed: 0,EmpID,Name,Dept,Salary,Years,Bonus
0,101,John Doe,Sales,60000,2,0
1,102,Jane Smith,HR,65000,5,0
2,103,Mike Ross,Legal,80000,3,0
3,104,Rachel Green,Sales,62000,1,0
4,105,Ross Geller,Science,70000,10,0
5,106,Monica Bing,Chef,55000,4,0


**Q3.2:** Create a new dataframe `emp_slim` that drops the 'Years' column from `employees`. (Ensure `emp_slim` actually has the column removed!)

In [46]:
# Your answer here
emp_slim = employees.drop(columns = "Years")
emp_slim

Unnamed: 0,EmpID,Name,Dept,Salary,Bonus
0,101,John Doe,Sales,60000,0
1,102,Jane Smith,HR,65000,0
2,103,Mike Ross,Legal,80000,0
3,104,Rachel Green,Sales,62000,0
4,105,Ross Geller,Science,70000,0
5,106,Monica Bing,Chef,55000,0


**Q3.3:** Create a column 'TotalComp' which is Salary + Bonus. (Make sure you created Bonus first!)

In [49]:
# Your answer here
emp_slim["TotalComp"] = emp_slim["Salary"] + emp_slim["Bonus"]
emp_slim

Unnamed: 0,EmpID,Name,Dept,Salary,Bonus,TotalComp
0,101,John Doe,Sales,60000,0,60000
1,102,Jane Smith,HR,65000,0,65000
2,103,Mike Ross,Legal,80000,0,80000
3,104,Rachel Green,Sales,62000,0,62000
4,105,Ross Geller,Science,70000,0,70000
5,106,Monica Bing,Chef,55000,0,55000


**Q3.4:** Rename the 'Salary' column to 'BaseSalary' in the `employees` dataframe. (Remember to save the change!)

In [51]:
# Your answer here
emp_slim["BaseSalary"] = emp_slim["Salary"]
emp_slim = emp_slim.drop(columns = "Salary")
emp_slim

Unnamed: 0,EmpID,Name,Dept,Bonus,TotalComp,BaseSalary
0,101,John Doe,Sales,0,60000,60000
1,102,Jane Smith,HR,0,65000,65000
2,103,Mike Ross,Legal,0,80000,80000
3,104,Rachel Green,Sales,0,62000,62000
4,105,Ross Geller,Science,0,70000,70000
5,106,Monica Bing,Chef,0,55000,55000


**Q3.5:** Try to select a column called 'VacationDays'. It doesn't exist. What happens? Now create it and set it to 15.

In [55]:
# Your answer here
emp_slim["VactionDays"] = 15 
emp_slim

Unnamed: 0,EmpID,Name,Dept,Bonus,TotalComp,BaseSalary,VactionDays
0,101,John Doe,Sales,0,60000,60000,15
1,102,Jane Smith,HR,0,65000,65000,15
2,103,Mike Ross,Legal,0,80000,80000,15
3,104,Rachel Green,Sales,0,62000,62000,15
4,105,Ross Geller,Science,0,70000,70000,15
5,106,Monica Bing,Chef,0,55000,55000,15


---
## Topic 4: Grouping & Aggregation

### ‚ö†Ô∏è Refined Notes
*   **Math Functions**: You can use `.mean()`, `.sum()`, `.min()`, `.max()` directly on a groupby object.
*   **The `.agg()` Syntax**:
    *   It uses **Parentheses** `()` because it's a function.
    *   Inside, it takes a **Dictionary** `{}`.
    *   ‚úÖ `df.groupby('G').agg({'Col': 'mean'})`

### Practice Questions

**Q4.1:** Group by 'Dept' and find the average 'Salary'.

In [60]:
# Your answer here
emp_slim.groupby("Dept")["TotalComp"].mean()

Dept
Chef       55000.0
HR         65000.0
Legal      80000.0
Sales      61000.0
Science    70000.0
Name: TotalComp, dtype: float64

**Q4.2:** Group by 'Dept' and find the maximum 'Years' of experience.

In [61]:
# Your answer here
employees.groupby("Dept")["Years"].max()

Dept
Chef        4
HR          5
Legal       3
Sales       2
Science    10
Name: Years, dtype: int64

**Q4.3:** Group by 'Dept' and calculate TWO things: average Salary and max Years. (Use `.agg()` with a dictionary).

In [64]:
# Your answer here
employees.groupby("Dept").agg({"Salary" : "mean", "Years": "max"})

Unnamed: 0_level_0,Salary,Years
Dept,Unnamed: 1_level_1,Unnamed: 2_level_1
Chef,55000.0,4
HR,65000.0,5
Legal,80000.0,3
Sales,61000.0,2
Science,70000.0,10


**Q4.4:** Count how many employees are in each Department. (Use `.size()` or `.count()`).

In [65]:
# Your answer here
employees.groupby("Dept").size()

Dept
Chef       1
HR         1
Legal      1
Sales      2
Science    1
dtype: int64

**Q4.5:** Group by 'Dept', find the mean Salary, and then **sort** the results from highest to lowest.

In [72]:
# Your answer here
employees.groupby("Dept")["Salary"].mean().sort_values(ascending=False)


Dept
Legal      80000.0
Science    70000.0
HR         65000.0
Sales      61000.0
Chef       55000.0
Name: Salary, dtype: float64

---
## Topic 5: Merging

### ‚ö†Ô∏è Refined Notes
*   **One at a time**: `pd.merge()` only takes 2 dataframes.
*   **Chaining**: To merge 3, you merge the first two, then merge the result with the third.
    *   `df1.merge(df2, ...).merge(df3, ...)`

### Practice Questions

**Q5.1:** Merge `employees` and `departments` on 'Dept' (inner join).

In [74]:
# Your answer here
pd.merge(employees, departments, on="Dept")

Unnamed: 0,EmpID,Name,Dept,Salary,Years,Bonus,Location
0,101,John Doe,Sales,60000,2,0,Floor 1
1,102,Jane Smith,HR,65000,5,0,Floor 2
2,103,Mike Ross,Legal,80000,3,0,Floor 3
3,104,Rachel Green,Sales,62000,1,0,Floor 1
4,105,Ross Geller,Science,70000,10,0,Lab
5,106,Monica Bing,Chef,55000,4,0,Kitchen


**Q5.2:** Merge `employees` and `reviews` on 'EmpID' (left join - keep all employees even if no review).

In [75]:
# Your answer here
pd.merge(employees, reviews, on="EmpID", how="left")

Unnamed: 0,EmpID,Name,Dept,Salary,Years,Bonus,Rating
0,101,John Doe,Sales,60000,2,0,4.5
1,102,Jane Smith,HR,65000,5,0,4.8
2,103,Mike Ross,Legal,80000,3,0,3.9
3,104,Rachel Green,Sales,62000,1,0,
4,105,Ross Geller,Science,70000,10,0,4.2
5,106,Monica Bing,Chef,55000,4,0,


**Q5.3:** Merge `employees`, `departments`, AND `reviews` in one chain. Keep all employees (left joins).

In [76]:
# Your answer here
pd.merge(employees, departments, on="Dept")
pd.merge(employees, reviews, on="EmpID")

Unnamed: 0,EmpID,Name,Dept,Salary,Years,Bonus,Rating
0,101,John Doe,Sales,60000,2,0,4.5
1,102,Jane Smith,HR,65000,5,0,4.8
2,103,Mike Ross,Legal,80000,3,0,3.9
3,105,Ross Geller,Science,70000,10,0,4.2


**Q5.4:** Perform an 'outer' join between `employees` and `reviews`. How many rows do you get?

In [77]:
# Your answer here
pd.merge(employees, reviews, on="EmpID", how="outer")


Unnamed: 0,EmpID,Name,Dept,Salary,Years,Bonus,Rating
0,101,John Doe,Sales,60000,2,0,4.5
1,102,Jane Smith,HR,65000,5,0,4.8
2,103,Mike Ross,Legal,80000,3,0,3.9
3,104,Rachel Green,Sales,62000,1,0,
4,105,Ross Geller,Science,70000,10,0,4.2
5,106,Monica Bing,Chef,55000,4,0,


**Q5.5:** Merge `employees` and `departments`, but only keep rows where the Department exists in BOTH tables.

In [78]:
# Your answer here
pd.merge(employees, departments, on = "Dept", how= "inner")

Unnamed: 0,EmpID,Name,Dept,Salary,Years,Bonus,Location
0,101,John Doe,Sales,60000,2,0,Floor 1
1,102,Jane Smith,HR,65000,5,0,Floor 2
2,103,Mike Ross,Legal,80000,3,0,Floor 3
3,104,Rachel Green,Sales,62000,1,0,Floor 1
4,105,Ross Geller,Science,70000,10,0,Lab
5,106,Monica Bing,Chef,55000,4,0,Kitchen


---
---
# üìù ANSWER KEY


## Topic 1 Answers
**Q1.1:** `employees.loc[3, 'Name']`
**Q1.2:** `employees.iloc[0:3, 0:2]`
**Q1.3:** `employees.loc[employees['Salary'] > 65000, ['Name', 'Dept']]`
**Q1.4:** `employees.loc[employees['Salary'].idxmin(), 'Name']`
**Q1.5:** `employees.iloc[:, 2]` (Assuming 'Dept' is the 3rd column, index 2)


## Topic 2 Answers
**Q2.1:** `employees[(employees['Dept'] == 'Sales') | (employees['Dept'] == 'HR')]`
**Q2.2:** `employees[(employees['Salary'] > 60000) & (employees['Years'] < 4)]`
**Q2.3:** `employees[employees['Dept'] != 'Legal']` OR `employees[~employees['Dept'].isin(['Legal'])]`
**Q2.4:** `employees[employees['Salary'].between(60000, 70000)]`
**Q2.5:** `employees[((employees['Dept'] == 'Sales') | (employees['Dept'] == 'Legal')) & (employees['Salary'] > 60000)]`


## Topic 3 Answers
**Q3.1:** `employees['Bonus'] = 0`
**Q3.2:** `emp_slim = employees.drop(columns=['Years'])`
**Q3.3:** `employees['TotalComp'] = employees['Salary'] + employees['Bonus']`
**Q3.4:** `employees = employees.rename(columns={'Salary': 'BaseSalary'})`
**Q3.5:** `employees['VacationDays']` (KeyError) -> `employees['VacationDays'] = 15`


## Topic 4 Answers
**Q4.1:** `employees.groupby('Dept')['Salary'].mean()`
**Q4.2:** `employees.groupby('Dept')['Years'].max()`
**Q4.3:** `employees.groupby('Dept').agg({'Salary': 'mean', 'Years': 'max'})`
**Q4.4:** `employees.groupby('Dept').size()`
**Q4.5:** `employees.groupby('Dept')['Salary'].mean().sort_values(ascending=False)`


## Topic 5 Answers
**Q5.1:** `pd.merge(employees, departments, on='Dept', how='inner')`
**Q5.2:** `pd.merge(employees, reviews, on='EmpID', how='left')`
**Q5.3:** `employees.merge(departments, on='Dept', how='left').merge(reviews, on='EmpID', how='left')`
**Q5.4:** `pd.merge(employees, reviews, on='EmpID', how='outer')`
**Q5.5:** `pd.merge(employees, departments, on='Dept', how='inner')`
