# Notebook 06: Subqueries

## Learning Objectives

- Use scalar subqueries in SELECT and WHERE
- Use subqueries with IN, ANY, ALL
- Use EXISTS for existence checks
- Use correlated subqueries
- Use subqueries in FROM clause (derived tables)

In [None]:
import os, sys
from pathlib import Path
project_root = Path.cwd().parent if Path.cwd().name == 'notebooks' else Path.cwd()
sys.path.insert(0, str(project_root / 'src'))
import duckdb
from sql_exercises import check
os.environ['SQL_NOTEBOOK_NAME'] = '06_subqueries'
conn = duckdb.connect(str(project_root / 'data' / 'databases' / 'practice.duckdb'), read_only=True)
print("Setup complete!")

## Quick Reference

```sql
-- Scalar subquery (returns single value)
SELECT * FROM t WHERE col > (SELECT AVG(col) FROM t);

-- IN subquery
SELECT * FROM t WHERE id IN (SELECT id FROM other);

-- EXISTS subquery
SELECT * FROM t WHERE EXISTS (SELECT 1 FROM other WHERE other.t_id = t.id);

-- Derived table (subquery in FROM)
SELECT * FROM (SELECT id, COUNT(*) as cnt FROM t GROUP BY id) sub WHERE cnt > 5;
```

---
## Exercise 1: Scalar Subquery in WHERE (Easy)

**Problem:** Find employees who earn more than the company average salary.

Return columns: employee_id, first_name, last_name, salary

In [None]:
ex_01 = '''

'''
conn.execute(ex_01).fetchdf()

In [None]:
check("ex_01", ex_01)

---
## Exercise 2: IN with Subquery (Easy)

**Problem:** Find all employees who work in departments located in 'San Francisco'.

Return columns: employee_id, first_name, last_name, department_id

In [None]:
ex_02 = '''

'''
conn.execute(ex_02).fetchdf()

In [None]:
check("ex_02", ex_02)

---
## Exercise 3: NOT IN Subquery (Easy)

**Problem:** Find products that have never been ordered.

Return columns: product_id, product_name

**Hint:** NOT IN (SELECT product_id FROM order_items)

In [None]:
ex_03 = '''

'''
conn.execute(ex_03).fetchdf()

In [None]:
check("ex_03", ex_03)

---
## Exercise 4: EXISTS Subquery (Medium)

**Problem:** Find customers who have placed at least one order.

Return columns: customer_id, first_name, last_name

**Hint:** Use EXISTS with a correlated subquery

In [None]:
ex_04 = '''

'''
conn.execute(ex_04).fetchdf()

In [None]:
check("ex_04", ex_04)

---
## Exercise 5: NOT EXISTS (Medium)

**Problem:** Find customers who have never placed an order (using NOT EXISTS).

Return columns: customer_id, first_name, last_name

In [None]:
ex_05 = '''

'''
conn.execute(ex_05).fetchdf()

In [None]:
check("ex_05", ex_05)

---
## Exercise 6: Subquery in SELECT (Medium)

**Problem:** For each department, show the department name and the company's total employee count.

Return columns: department_name, total_company_employees

**Hint:** The scalar subquery calculates the total for all employees

In [None]:
ex_06 = '''

'''
conn.execute(ex_06).fetchdf()

In [None]:
check("ex_06", ex_06)

---
## Exercise 7: Derived Table (Subquery in FROM) (Medium)

**Problem:** Find departments where the average salary is above $80,000. Use a subquery in FROM.

Return columns: department_id, avg_salary

In [None]:
ex_07 = '''

'''
conn.execute(ex_07).fetchdf()

In [None]:
check("ex_07", ex_07)

---
## Exercise 8: Correlated Subquery (Hard)

**Problem:** Find employees who earn above the average salary of their own department.

Return columns: employee_id, first_name, last_name, salary, department_id

In [None]:
ex_08 = '''

'''
conn.execute(ex_08).fetchdf()

In [None]:
check("ex_08", ex_08)

---
## Exercise 9: Multiple Subqueries (Hard)

**Problem:** Find the employee(s) with the highest salary in the entire company.

Return columns: employee_id, first_name, last_name, salary

In [None]:
ex_09 = '''

'''
conn.execute(ex_09).fetchdf()

In [None]:
check("ex_09", ex_09)

---
## Exercise 10: Complex Subquery (Hard)

**Problem:** Find products that have an average rating higher than the overall average rating across all products.

Return columns: product_id, product_name, avg_rating

**Tables:** products, reviews

In [None]:
ex_10 = '''

'''
conn.execute(ex_10).fetchdf()

In [None]:
check("ex_10", ex_10)

---
## Summary

- **Scalar subquery** - Returns single value, used in SELECT/WHERE
- **IN subquery** - Check membership in a list
- **EXISTS** - Check if rows exist (correlated)
- **Derived table** - Subquery in FROM clause
- **Correlated subquery** - References outer query

### Next: Notebook 07 - Set Operations

In [None]:
conn.close()