# GRIT: Aggregation & Grouping - Day 3

**Learning Objectives**
- Master aggregation functions (COUNT, SUM, AVG, MIN, MAX)
- Group data with GROUP BY
- Filter groups with HAVING
- Combine aggregation with filtering
- Create summary reports

**Why this matters**  
Raw data is like individual puzzle pieces. Aggregation lets you step back and see the big picture - total sales, average prices, customer counts, and business patterns. This is where data becomes insights!

Today you'll learn to summarize mountains of data into meaningful business intelligence.

## Setup: Connect to Our Database

Let's connect to our e-commerce database:

In [None]:
# Load the SQL extension
%load_ext sql

# Connect to our sample database
%sql sqlite:///ecommerce.db

print("✅ Connected to database!")

## Theory: Understanding Aggregation

### What is Aggregation?
Aggregation is like summarizing a shopping list:
- **Individual items**: Apple, Banana, Apple, Orange, Banana
- **Summary**: 2 Apples, 2 Bananas, 1 Orange

### Common Aggregation Functions:
- **COUNT()**: How many items?
- **SUM()**: Total of numbers
- **AVG()**: Average value
- **MIN()**: Smallest value
- **MAX()**: Largest value

### When to Use GROUP BY:
GROUP BY creates subgroups, like:
- "Show me sales by category"
- "Count customers by state"
- "Average price by brand"

### WHERE vs HAVING:
- **WHERE**: Filters rows before grouping
- **HAVING**: Filters groups after aggregation

## Examples: Basic Aggregation Functions

Let's start with counting and basic summaries:

In [None]:
-- Example 1: Count total customers
SELECT COUNT(*) as total_customers
FROM customers;

In [None]:
-- Example 2: Count customers with phone numbers
SELECT COUNT(phone) as customers_with_phone
FROM customers;

In [None]:
-- Example 3: Count products by category
SELECT category, COUNT(*) as product_count
FROM products
GROUP BY category
ORDER BY product_count DESC;

In [None]:
-- Example 4: Total value of all products
SELECT SUM(price) as total_inventory_value
FROM products;

In [None]:
-- Example 5: Average product price
SELECT AVG(price) as average_price
FROM products;

In [None]:
-- Example 6: Most expensive and cheapest products
SELECT MAX(price) as highest_price, MIN(price) as lowest_price
FROM products;

## Examples: GROUP BY with Single Aggregation

GROUP BY creates categories and calculates for each group:

In [None]:
-- Example 7: Average price by category
SELECT category, AVG(price) as avg_price
FROM products
GROUP BY category
ORDER BY avg_price DESC;

In [None]:
-- Example 8: Total stock by category
SELECT category, SUM(stock_quantity) as total_stock
FROM products
GROUP BY category
ORDER BY total_stock DESC;

In [None]:
-- Example 9: Customer count by state
SELECT state, COUNT(*) as customer_count
FROM customers
GROUP BY state
ORDER BY customer_count DESC;

## Examples: Multiple Aggregations per Group

You can calculate multiple summaries for each group:

In [None]:
-- Example 10: Product statistics by category
SELECT category,
       COUNT(*) as product_count,
       AVG(price) as avg_price,
       MIN(price) as min_price,
       MAX(price) as max_price
FROM products
GROUP BY category
ORDER BY product_count DESC;

In [None]:
-- Example 11: Order statistics by status
SELECT order_status,
       COUNT(*) as order_count,
       AVG(total_amount) as avg_order_value,
       SUM(total_amount) as total_value
FROM orders
GROUP BY order_status
ORDER BY total_value DESC;

## Examples: HAVING Clause

HAVING filters groups after aggregation (WHERE filters before):

In [None]:
-- Example 12: Categories with high average prices
SELECT category, AVG(price) as avg_price
FROM products
GROUP BY category
HAVING AVG(price) > 100
ORDER BY avg_price DESC;

In [None]:
-- Example 13: States with many customers
SELECT state, COUNT(*) as customer_count
FROM customers
GROUP BY state
HAVING COUNT(*) >= 2
ORDER BY customer_count DESC;

In [None]:
-- Example 14: Categories with expensive products and good stock
SELECT category,
       COUNT(*) as product_count,
       AVG(price) as avg_price,
       SUM(stock_quantity) as total_stock
FROM products
GROUP BY category
HAVING AVG(price) > 50 AND SUM(stock_quantity) > 20
ORDER BY avg_price DESC;

## Examples: Combining WHERE and HAVING

Use WHERE for row filtering, HAVING for group filtering:

In [None]:
-- Example 15: Popular states with active customers only
SELECT state, COUNT(*) as active_customers
FROM customers
WHERE customer_status = 'active'
GROUP BY state
HAVING COUNT(*) >= 2
ORDER BY active_customers DESC;

In [None]:
-- Example 16: High-value orders by status (exclude cheap orders)
SELECT order_status,
       COUNT(*) as order_count,
       AVG(total_amount) as avg_value
FROM orders
WHERE total_amount > 50
GROUP BY order_status
HAVING COUNT(*) > 2
ORDER BY avg_value DESC;

## Examples: Business Intelligence Reports

Let's create some real business reports:

In [None]:
-- Example 17: Product Category Performance Report
SELECT category,
       COUNT(*) as products_offered,
       SUM(stock_quantity) as total_stock,
       AVG(price) as avg_selling_price,
       MIN(price) as lowest_price,
       MAX(price) as highest_price
FROM products
GROUP BY category
ORDER BY total_stock DESC;

In [None]:
-- Example 18: Customer Demographics by State
SELECT state,
       COUNT(*) as total_customers,
       COUNT(phone) as with_phone,
       ROUND(AVG(CASE WHEN phone IS NOT NULL THEN 1 ELSE 0 END) * 100, 1) as phone_pct
FROM customers
GROUP BY state
ORDER BY total_customers DESC;

In [None]:
-- Example 19: Order Status Summary
SELECT order_status,
       COUNT(*) as order_count,
       SUM(total_amount) as total_revenue,
       AVG(total_amount) as avg_order_value,
       MIN(total_amount) as smallest_order,
       MAX(total_amount) as largest_order
FROM orders
GROUP BY order_status
ORDER BY total_revenue DESC;

## Exercises

### Exercise 1: Basic Counting
Count the total number of orders in our database

In [None]:
-- Your code here
SELECT COUNT(*) as total_orders
FROM orders;

### Exercise 2: Group By Category
Show the count of products in each category, ordered by count descending

In [None]:
-- Your code here
SELECT category, COUNT(*) as product_count
FROM products
GROUP BY category
ORDER BY product_count DESC;

### Exercise 3: Price Statistics
Calculate the average, minimum, and maximum price for products

In [None]:
-- Your code here
SELECT AVG(price) as avg_price,
       MIN(price) as min_price,
       MAX(price) as max_price
FROM products;

### Exercise 4: Using HAVING
Find categories that have more than 2 products

In [None]:
-- Your code here
SELECT category, COUNT(*) as product_count
FROM products
GROUP BY category
HAVING COUNT(*) > 2
ORDER BY product_count DESC;

### Exercise 5: Combined WHERE and HAVING
Find states with active customers, showing only states with 2+ active customers

In [None]:
-- Your code here
SELECT state, COUNT(*) as active_count
FROM customers
WHERE customer_status = 'active'
GROUP BY state
HAVING COUNT(*) >= 2
ORDER BY active_count DESC;

### Exercise 6: Business Report
Create a report showing order status performance with revenue and average order value

In [None]:
-- Your code here
SELECT order_status,
       COUNT(*) as order_count,
       SUM(total_amount) as total_revenue,
       AVG(total_amount) as avg_order_value
FROM orders
GROUP BY order_status
ORDER BY total_revenue DESC;

## Debug-Me Cell

This query has an error. It should show categories with average price over $100, but it's not working. Can you fix it?

In [None]:
-- Debug this query
SELECT category, AVG(price) as avg_price
FROM products
WHERE AVG(price) > 100  -- This won't work!
GROUP BY category;

-- Hint: WHERE cannot use aggregate functions. Use HAVING instead!

## Takeaways & Further Reading

### Aggregation Functions Mastered:
✅ **COUNT()**: Count rows or non-NULL values  
✅ **SUM()**: Add up numeric values  
✅ **AVG()**: Calculate average values  
✅ **MIN()**: Find smallest values  
✅ **MAX()**: Find largest values  

### GROUP BY & HAVING:
✅ **GROUP BY**: Create subgroups for analysis  
✅ **HAVING**: Filter groups after aggregation  
✅ **WHERE vs HAVING**: WHERE filters rows, HAVING filters groups  

### Key Concepts:
- **Aggregation** turns detailed data into summaries
- **GROUP BY** creates categories for analysis
- **HAVING** filters aggregated results (not WHERE)
- **Multiple aggregations** per group provide rich insights
- **Business reports** combine multiple aggregations

### SQL Best Practices:
- Use aliases for complex aggregations
- Combine WHERE and HAVING for powerful filtering
- Use ROUND() for currency formatting
- Order results meaningfully

### Tomorrow Preview:
Day 4: **JOINs & Relationships** - Learn to combine data from multiple tables. We'll connect customers to their orders and products to sales data!

### Practice Resources:
- [SQL GROUP BY](https://www.w3schools.com/sql/sql_groupby.asp)
- [SQL Aggregate Functions](https://www.w3schools.com/sql/sql_count_avg_sum.asp)
- [HAVING vs WHERE](https://www.sqlshack.com/sql-having-vs-where/)

**Congratulations! You can now summarize data like a pro! 📊**