# üéì Final Pandas Test (Test 1)

**Instructions:**
*   This is a "closed book" style test. Try to solve these 10 questions without looking at previous notes.
*   The questions mix multiple concepts (filtering + grouping, merging + sorting, etc.).
*   Good luck!


## Setup
Run this cell to load the test data:

In [29]:
import pandas as pd
import numpy as np

# DATASET: Retail Store Sales
# ---------------------------
# Products: ID, Name, Category, Price
products = pd.DataFrame({
    'ProductID': [1, 2, 3, 4, 5],
    'Name': ['Laptop', 'Mouse', 'T-Shirt', 'Jeans', 'Headphones'],
    'Category': ['Electronics', 'Electronics', 'Clothing', 'Clothing', 'Electronics'],
    'Price': [1000, 25, 20, 50, 100]
})

# Sales: TransactionID, ProductID, Date, Quantity, StoreID
sales = pd.DataFrame({
    'TransID': [101, 102, 103, 104, 105, 106],
    'ProductID': [1, 3, 2, 1, 5, 4],
    'Quantity': [1, 10, 5, 2, 4, 20],
    'StoreID': ['NY_01', 'NY_01', 'LA_02', 'LA_02', 'NY_01', 'CHI_03']
})

# Stores: StoreID, City, Manager
stores = pd.DataFrame({
    'StoreID': ['NY_01', 'LA_02', 'CHI_03'],
    'City': ['New York', 'Los Angeles', 'Chicago'],
    'Manager': ['Sarah', 'Mike', 'John']
})

print("Test Data Loaded!")


Test Data Loaded!


### Q1: Selection & Indexing
Find the **Price** of the product with `ProductID` 4.
*   Use `.loc` or `.iloc` correctly.
*   Do NOT just look at the table and type "50". Write code that finds it.


In [11]:
# Your answer here
products
products.loc[products["ProductID"] == 4]

Unnamed: 0,ProductID,Name,Category,Price
3,4,Jeans,Clothing,50


### Q2: Boolean Filtering
Filter the `products` dataframe to show only items that are:
1.  In the 'Electronics' category
2.  **AND** have a Price less than 500.


In [12]:
# Your answer here
products[(products["Category"]== "Electronics") & (products["Price"] < 500)]


Unnamed: 0,ProductID,Name,Category,Price
1,2,Mouse,Electronics,25
4,5,Headphones,Electronics,100


### Q3: String Methods
Filter the `stores` dataframe to find stores where the Manager's name starts with 'S' **OR** the City contains "New".


In [18]:
# Your answer here
stores[stores["Manager"].str.startswith("s") | (stores["City"].str.contains("New"))]

Unnamed: 0,StoreID,City,Manager
0,NY_01,New York,Sarah


### Q4: Column Creation
Create a new column in the `sales` dataframe called `TotalRevenue`.
*   Note: `sales` doesn't have Price yet!
*   First, you must **MERGE** `sales` and `products` to get the Price.
*   Then calculate `TotalRevenue = Price * Quantity`.
*   Show the first 5 rows of the result.


In [35]:
# Your answer here
#sales =pd.merge(sales,products, on="ProductID")
#sales["TotalRevenue"] = sales["Price"] * sales["Quantity"]
sales



Unnamed: 0,TransID,ProductID,Quantity,StoreID,Name_x,Category_x,Price_x,Name_y,Category_y,Price_y,Name,Category,Price,TotalRevenue
0,101,1,1,NY_01,Laptop,Electronics,1000,Laptop,Electronics,1000,Laptop,Electronics,1000,1000
1,102,3,10,NY_01,T-Shirt,Clothing,20,T-Shirt,Clothing,20,T-Shirt,Clothing,20,200
2,103,2,5,LA_02,Mouse,Electronics,25,Mouse,Electronics,25,Mouse,Electronics,25,125
3,104,1,2,LA_02,Laptop,Electronics,1000,Laptop,Electronics,1000,Laptop,Electronics,1000,2000
4,105,5,4,NY_01,Headphones,Electronics,100,Headphones,Electronics,100,Headphones,Electronics,100,400
5,106,4,20,CHI_03,Jeans,Clothing,50,Jeans,Clothing,50,Jeans,Clothing,50,1000


### Q5: Grouping & Aggregation
Using the merged dataframe from Q4 (or merge again if you didn't save it):
*   Group by **Category**.
*   Calculate the **Total Quantity** sold for each category.


In [None]:
# Your answer here
sales.groupby("Category_x")["Quantity"].sum()

AttributeError: 'SeriesGroupBy' object has no attribute 'str'

### Q6: Advanced Grouping
Group the `sales` data by **StoreID**.
*   Calculate TWO things:
    1.  The total `Quantity` sold.
    2.  The number of unique `ProductID`s sold (Hint: `.nunique()`).
*   Use `.agg()` syntax.


In [52]:
# Your answer here
sales.groupby("StoreID").agg({"Quantity" : "mean", "ProductID" : "nunique"})

Unnamed: 0_level_0,Quantity,ProductID
StoreID,Unnamed: 1_level_1,Unnamed: 2_level_1
CHI_03,20.0,1
LA_02,3.5,2
NY_01,5.0,3


### Q7: Filtering with Lists
Filter the `sales` dataframe to show transactions only for stores 'NY_01' and 'CHI_03'.
*   Use `.isin()` for this.


In [None]:
# Your answer here
sales[sales["StoreID"].isin(["NY_01", "CHI_03"])]


### Q8: Sorting
Find the **Top 3** most expensive products.
*   Sort the `products` dataframe by Price (Highest to Lowest).
*   Display the top 3 rows.


In [62]:
# Your answer here
sales.sort_values(by="Price", ascending=False).head(3)

Unnamed: 0,TransID,ProductID,Quantity,StoreID,Name_x,Category_x,Price_x,Name_y,Category_y,Price_y,Name,Category,Price,TotalRevenue
0,101,1,1,NY_01,Laptop,Electronics,1000,Laptop,Electronics,1000,Laptop,Electronics,1000,1000
3,104,1,2,LA_02,Laptop,Electronics,1000,Laptop,Electronics,1000,Laptop,Electronics,1000,2000
4,105,5,4,NY_01,Headphones,Electronics,100,Headphones,Electronics,100,Headphones,Electronics,100,400


### Q9: Complex Logic (np.where)
Create a new column in `products` called `LuxuryStatus`.
*   If Price > 500, set it to "High End".
*   Otherwise, set it to "Standard".


In [64]:
# Your answer here
products["LuxuryStatus"] = np.where(products["Price"] > 500, "High End", "Standard")
products

Unnamed: 0,ProductID,Name,Category,Price,LuxuryStatus
0,1,Laptop,Electronics,1000,High End
1,2,Mouse,Electronics,25,Standard
2,3,T-Shirt,Clothing,20,Standard
3,4,Jeans,Clothing,50,Standard
4,5,Headphones,Electronics,100,Standard


### Q10: The "Boss Level" Chain
Write a single chain of commands to:
1.  Merge `sales` and `stores` (on StoreID).
2.  Filter for only 'New York' stores.
3.  Group by 'Manager'.
4.  Calculate the total 'Quantity' sold.


In [66]:
# Your answer here
sales.merge(stores, on="StoreID") \
     .query("City == 'New York'") \
     .groupby("Manager")["Quantity"].sum()

Manager
Sarah    15
Name: Quantity, dtype: int64

---
---
# üìù Answer Key
(Don't peek until you finish!)



**Q1:** `products.loc[products['ProductID'] == 4, 'Price']` (Or set index to ProductID first then `.loc[4]`)

**Q2:** `products[(products['Category'] == 'Electronics') & (products['Price'] < 500)]`

**Q3:** `stores[stores['Manager'].str.startswith('S') | stores['City'].str.contains('New')]`

**Q4:**
```python
merged = pd.merge(sales, products, on='ProductID')
merged['TotalRevenue'] = merged['Price'] * merged['Quantity']
merged.head()
```

**Q5:** `merged.groupby('Category')['Quantity'].sum()`

**Q6:** `sales.groupby('StoreID').agg({'Quantity': 'sum', 'ProductID': 'nunique'})`

**Q7:** `sales[sales['StoreID'].isin(['NY_01', 'CHI_03'])]`

**Q8:** `products.sort_values('Price', ascending=False).head(3)`

**Q9:** `products['LuxuryStatus'] = np.where(products['Price'] > 500, 'High End', 'Standard')`

**Q10:**
```python
sales.merge(stores, on='StoreID') \
     .loc[lambda x: x['City'] == 'New York'] \
     .groupby('Manager')['Quantity'].sum()
# Note: .loc[lambda x: ...] is the safe way to filter inside a chain!
# Alternatively: Filter first, then merge, then group.
```
