# Pandas Mastery Challenge - Ultimate Tutorial

Welcome to the ultimate challenge in our Pandas series! This notebook is designed to test your data manipulation skills to the fullest. Each problem is carefully crafted to cover various aspects of Pandas and provide you with an enriching learning experience.

### **Enjoy the challenges? Show your support with an upvote!**

## Table of Contents
1. [Q1 - The Great Data Merge Maze](#q1)
2. [Q2 - Time Travel with Multi-Index](#q2)
3. [Q3 - The GroupBy Gauntlet](#q3)
4. [Q4 - Regex Riddles in Data Cleaning](#q4)
5. [Q5 - Speed Demon Data Manipulation](#q5)
6. [Q6 - Recursive Riddle](#q6)
7. [Q7 - Statistician's Nightmare](#q7)
8. [Q8 - Pivot Table Puzzles](#q8)
9. [Q9 - Aggregation Aggravation](#q9)
10. [Q10 - Visual Magic with Pandas](#q10)
11. [Q11 - Nested Data Labyrinth](#q11)
12. [Q12 - Async Adventures in Data](#q12)
13. [Q13 - Geo Pandas Adventure](#q13)
14. [Q14 - Network Nexus Analysis](#q14)
15. [Q15 - High Dimensional Hide and Seek](#q15)
16. [Q16 - Real-time Data Rush](#q16)
17. [Q17 - Machine Learning Preprocess Pandemonium](#q17)
18. [Q18 - Text Tango with Pandas](#q18)
19. [Q19 - Anomaly Detection Drama](#q19)
20. [Q20 - Imputation Imbroglio](#q20)


# Q1 - The Great Data Merge Maze
<a id="q1"></a>


**Question:**
Welcome to the Great Data Merge Maze! You are given three datasets: `customers`, `orders`, and `products`. Your task is to merge these datasets to answer the following questions:
1. Which customers ordered which products and at what price?
2. Calculate the total amount spent by each customer.
3. Identify the top 2 customers who spent the most.
4. Determine which products were never ordered.
5. Find the customer who ordered the highest quantity of a single product.

**Datasets:**
- `customers`: Contains customer IDs and names.
- `orders`: Contains order IDs, customer IDs, product IDs, and order quantities.
- `products`: Contains product IDs, names, and prices.

Generate synthetic data for the datasets and merge them to find the answers.


In [1]:
import pandas as pd
import numpy as np

# Seed for reproducibility
np.random.seed(0)

# Customers DataFrame
customers = pd.DataFrame({
    'customer_id': range(1, 11),
    'customer_name': ['Alice Apples', 'Bob Bananas', 'Charlie Cherries', 'David Dates', 'Eve Elderberries', 'Frank Figs', 'Grace Grapes', 'Hannah Honeydew', 'Ivy Iceberg', 'Jack Jicama']
})

# Products DataFrame
products = pd.DataFrame({
    'product_id': range(1, 11),
    'product_name': ['Widget Wonder', 'Gizmo Glitz', 'Doodad Delight', 'Thingamajig Thrill', 'Contraption Charm', 'Gadget Glow', 'Whatchamacallit Whimsy', 'Doohickey Dazzle', 'Whatsit Whiz', 'Gubbins Galore'],
    'product_price': np.random.uniform(10, 100, size=10).round(2)
})

# Orders DataFrame
orders = pd.DataFrame({
    'order_id': range(1, 21),
    'customer_id': np.random.choice(customers['customer_id'], size=20),
    'product_id': np.random.choice(products['product_id'], size=20),
    'order_quantity': np.random.randint(1, 10, size=20)
})

# Display the datasets
print("Customers DataFrame:")
print(customers, "\n")
print("Products DataFrame:")
print(products, "\n")
print("Orders DataFrame:")
print(orders)


Customers DataFrame:
   customer_id     customer_name
0            1      Alice Apples
1            2       Bob Bananas
2            3  Charlie Cherries
3            4       David Dates
4            5  Eve Elderberries
5            6        Frank Figs
6            7      Grace Grapes
7            8   Hannah Honeydew
8            9       Ivy Iceberg
9           10       Jack Jicama 

Products DataFrame:
   product_id            product_name  product_price
0           1           Widget Wonder          59.39
1           2             Gizmo Glitz          74.37
2           3          Doodad Delight          64.25
3           4      Thingamajig Thrill          59.04
4           5       Contraption Charm          48.13
5           6             Gadget Glow          68.13
6           7  Whatchamacallit Whimsy          49.38
7           8        Doohickey Dazzle          90.26
8           9            Whatsit Whiz          96.73
9          10          Gubbins Galore          44.51 

Orders Da

## Solution Explanation

The steps to solve this problem are:
1. Merge the `orders` and `customers` DataFrames on `customer_id` to get customer information with their orders.
2. Merge the resulting DataFrame with the `products` DataFrame on `product_id` to get the product prices along with the customer orders.
3. Calculate the total amount spent by each customer.
4. Identify the top 2 customers who spent the most.
5. Determine which products were never ordered.
6. Find the customer who ordered the highest quantity of a single product.

Let's implement these steps in the following code.


In [2]:
# Step 1: Merge orders with customers to get customer information in orders
# This merge operation will give us a DataFrame that includes each order with the corresponding customer information.
orders_customers = pd.merge(orders, customers, on='customer_id', how='inner')
print("Step 1: Orders merged with Customers")
print(orders_customers, "\n")

Step 1: Orders merged with Customers
    order_id  customer_id  product_id  order_quantity     customer_name
0          1            7           4               9      Grace Grapes
1          2            8           4               5   Hannah Honeydew
2          3            8           8               2   Hannah Honeydew
3          4            9           1               5       Ivy Iceberg
4          5            2           2               9       Bob Bananas
5          6            6          10               2        Frank Figs
6          7           10          10               2       Jack Jicama
7          8            9           1               8       Ivy Iceberg
8          9           10           5               4       Jack Jicama
9         10            5           8               7  Eve Elderberries
10        11            4           4               8       David Dates
11        12            1           3               3      Alice Apples
12        13            4  

In [3]:
# Step 2: Merge the resulting DataFrame with products to get product prices
# This merge operation will add the product details (name and price) to each order.
orders_customers_products = pd.merge(orders_customers, products, on='product_id', how='inner')
print("Step 2: Orders merged with Customers and Products")
print(orders_customers_products, "\n")

Step 2: Orders merged with Customers and Products
    order_id  customer_id  product_id  order_quantity     customer_name  \
0          1            7           4               9      Grace Grapes   
1          2            8           4               5   Hannah Honeydew   
2          3            8           8               2   Hannah Honeydew   
3          4            9           1               5       Ivy Iceberg   
4          5            2           2               9       Bob Bananas   
5          6            6          10               2        Frank Figs   
6          7           10          10               2       Jack Jicama   
7          8            9           1               8       Ivy Iceberg   
8          9           10           5               4       Jack Jicama   
9         10            5           8               7  Eve Elderberries   
10        11            4           4               8       David Dates   
11        12            1           3             

In [4]:
# Step 3: Select relevant columns and display the final merged DataFrame
# We are interested in customer names, product names, product prices, and order quantities.
final_result = orders_customers_products[['customer_name', 'product_name', 'product_price', 'order_quantity']]
print("Step 3: Final Merged DataFrame")
print(final_result, "\n")

Step 3: Final Merged DataFrame
       customer_name            product_name  product_price  order_quantity
0       Grace Grapes      Thingamajig Thrill          59.04               9
1    Hannah Honeydew      Thingamajig Thrill          59.04               5
2    Hannah Honeydew        Doohickey Dazzle          90.26               2
3        Ivy Iceberg           Widget Wonder          59.39               5
4        Bob Bananas             Gizmo Glitz          74.37               9
5         Frank Figs          Gubbins Galore          44.51               2
6        Jack Jicama          Gubbins Galore          44.51               2
7        Ivy Iceberg           Widget Wonder          59.39               8
8        Jack Jicama       Contraption Charm          48.13               4
9   Eve Elderberries        Doohickey Dazzle          90.26               7
10       David Dates      Thingamajig Thrill          59.04               8
11      Alice Apples          Doodad Delight          64.

In [5]:
# Step 4: Calculate the total amount spent by each customer
# We need to multiply the product price by the order quantity for each row and then sum it up per customer.
orders_customers_products['total_price'] = orders_customers_products['product_price'] * orders_customers_products['order_quantity']
customer_spending = orders_customers_products.groupby('customer_name')['total_price'].sum().reset_index()
customer_spending = customer_spending.rename(columns={'total_price': 'total_spent'})
print("Step 4: Total Amount Spent by Each Customer")
print(customer_spending, "\n")

Step 4: Total Amount Spent by Each Customer
      customer_name  total_spent
0      Alice Apples       549.09
1       Bob Bananas      1009.98
2  Charlie Cherries       296.95
3       David Dates      1050.13
4  Eve Elderberries       631.82
5        Frank Figs       346.02
6      Grace Grapes       531.36
7   Hannah Honeydew       475.72
8       Ivy Iceberg      1248.98
9       Jack Jicama       281.54 



In [6]:
# Step 5: Identify the top 2 customers who spent the most
# Sort the customers by the total amount spent in descending order and take the top 2.
top_customers = customer_spending.sort_values(by='total_spent', ascending=False).head(2)
print("Step 5: Top 2 Customers Who Spent the Most")
print(top_customers, "\n")

Step 5: Top 2 Customers Who Spent the Most
  customer_name  total_spent
8   Ivy Iceberg      1248.98
3   David Dates      1050.13 



In [7]:
# Step 6: Determine which products were never ordered
# Find the products that are not present in the orders DataFrame.
ordered_products = orders['product_id'].unique()
all_products = products['product_id'].unique()
never_ordered_products = products[~products['product_id'].isin(ordered_products)]
print("Step 6: Products Never Ordered")
print(never_ordered_products, "\n")

Step 6: Products Never Ordered
   product_id  product_name  product_price
8           9  Whatsit Whiz          96.73 



In [8]:
# Step 7: Find the customer who ordered the highest quantity of a single product
# Identify the maximum order quantity and the corresponding customer and product.
max_order = orders_customers_products.loc[orders_customers_products['order_quantity'].idxmax()]
print("Step 7: Customer Who Ordered the Highest Quantity of a Single Product")
print(max_order[['customer_name', 'product_name', 'order_quantity']])

Step 7: Customer Who Ordered the Highest Quantity of a Single Product
customer_name           Grace Grapes
product_name      Thingamajig Thrill
order_quantity                     9
Name: 0, dtype: object
