# Notebook 03: Aggregations & GROUP BY

## Learning Objectives

- Use aggregate functions: COUNT, SUM, AVG, MIN, MAX
- Group rows with GROUP BY
- Filter groups with HAVING
- Combine aggregations with other clauses
- Understand COUNT(*) vs COUNT(column)

In [None]:
import os
import sys
from pathlib import Path

project_root = Path.cwd().parent if Path.cwd().name == "notebooks" else Path.cwd()
sys.path.insert(0, str(project_root / "src"))
import duckdb
from sql_exercises import check

os.environ["SQL_NOTEBOOK_NAME"] = "03_aggregations"
conn = duckdb.connect(
    str(project_root / "data" / "databases" / "practice.duckdb"), read_only=True
)
print("Setup complete!")

## Quick Reference

```sql
-- Aggregate functions
SELECT COUNT(*) FROM table;           -- Count all rows
SELECT COUNT(column) FROM table;      -- Count non-NULL values
SELECT SUM(column) FROM table;        -- Sum of values
SELECT AVG(column) FROM table;        -- Average
SELECT MIN(column), MAX(column) FROM table;

-- Grouping
SELECT col, COUNT(*) FROM table GROUP BY col;

-- Filter groups (not rows)
SELECT col, COUNT(*) FROM table GROUP BY col HAVING COUNT(*) > 5;
```

---
## Exercise 1: COUNT All Rows (Easy)

**Problem:** Count the total number of employees in the company.

Return: A single column named `total_employees`

In [None]:
ex_01 = """

"""
conn.execute(ex_01).fetchdf()

In [None]:
check("ex_01", ex_01)

---
## Exercise 2: SUM, AVG, MIN, MAX (Easy)

**Problem:** Calculate salary statistics for all employees.

Return columns: total_salary, avg_salary, min_salary, max_salary

In [None]:
ex_02 = """

"""
conn.execute(ex_02).fetchdf()

In [None]:
check("ex_02", ex_02)

---
## Exercise 3: COUNT with Condition (Easy)

**Problem:** Count how many employees have a salary greater than $100,000.

Return: A single column named `high_earners`

In [None]:
ex_03 = """

"""
conn.execute(ex_03).fetchdf()

In [None]:
check("ex_03", ex_03)

---
## Exercise 4: Basic GROUP BY (Easy)

**Problem:** Count the number of employees in each department.

Return columns: department_id, employee_count

In [None]:
ex_04 = """

"""
conn.execute(ex_04).fetchdf()

In [None]:
check("ex_04", ex_04)

---
## Exercise 5: GROUP BY with Multiple Aggregates (Medium)

**Problem:** For each department, show the count, average salary, and total salary.

Return columns: department_id, emp_count, avg_salary, total_salary

In [None]:
ex_05 = """

"""
conn.execute(ex_05).fetchdf()

In [None]:
check("ex_05", ex_05)

---
## Exercise 6: HAVING Clause (Medium)

**Problem:** Find departments that have more than 18 employees.

Return columns: department_id, employee_count

In [None]:
ex_06 = """

"""
conn.execute(ex_06).fetchdf()

In [None]:
check("ex_06", ex_06)

---
## Exercise 7: COUNT(*) vs COUNT(column) (Medium)

**Problem:** For each department, show the total employees and how many have a commission.

Return columns: department_id, total_employees, employees_with_commission

**Hint:** COUNT(column) only counts non-NULL values

In [None]:
ex_07 = """

"""
conn.execute(ex_07).fetchdf()

In [None]:
check("ex_07", ex_07)

---
## Exercise 8: COUNT DISTINCT (Medium)

**Problem:** Count how many unique job titles exist in each department.

Return columns: department_id, unique_job_titles

In [None]:
ex_08 = """

"""
conn.execute(ex_08).fetchdf()

In [None]:
check("ex_08", ex_08)

---
## Exercise 9: WHERE + GROUP BY + HAVING (Hard)

**Problem:** For active employees only (is_active = true), find job titles that have an average salary above $90,000.

Return columns: job_title, avg_salary, employee_count

In [None]:
ex_09 = """

"""
conn.execute(ex_09).fetchdf()

In [None]:
check("ex_09", ex_09)

---
## Exercise 10: Aggregate on Ecommerce Data (Hard)

**Problem:** Find the total revenue (sum of total_amount) for each order_status. Only include statuses with total revenue over $10,000. Order by revenue descending.

Return columns: order_status, total_revenue, order_count

**Tables:** orders

In [None]:
ex_10 = """

"""
conn.execute(ex_10).fetchdf()

In [None]:
check("ex_10", ex_10)

---
## Exercise 11: Multiple GROUP BY Columns (Hard)

**Problem:** For each combination of department_id and is_active, show the employee count and average salary.

Return columns: department_id, is_active, emp_count, avg_salary

In [None]:
ex_11 = """

"""
conn.execute(ex_11).fetchdf()

In [None]:
check("ex_11", ex_11)

---
## Exercise 12: Product Review Statistics (Hard)

**Problem:** For each product, calculate the average rating and count of reviews. Only include products with at least 5 reviews. Order by average rating descending.

Return columns: product_id, avg_rating, review_count

**Tables:** reviews

In [None]:
ex_12 = """

"""
conn.execute(ex_12).fetchdf()

In [None]:
check("ex_12", ex_12)

---
## Summary

- **COUNT, SUM, AVG, MIN, MAX** - Aggregate functions
- **GROUP BY** - Group rows by column values
- **HAVING** - Filter groups (not rows)
- **COUNT(*)** counts all rows, **COUNT(col)** counts non-NULL
- **COUNT(DISTINCT col)** counts unique values

### Next: Notebook 04 - Joins

In [None]:
conn.close()