# Advanced Queries and Joins in SQLite

## Learning Objectives

By the end of this notebook, you will be able to:

1. Filter data with WHERE clauses using various operators
2. Sort and limit query results with ORDER BY and LIMIT
3. Use aggregate functions (COUNT, SUM, AVG, MIN, MAX)
4. Group data with GROUP BY and filter groups with HAVING
5. Combine data from multiple tables using INNER JOIN and LEFT JOIN
6. Write subqueries for complex data retrieval

---

## Setup: Create and Populate the Company Database

First, let's create our three-table company database with sample data.

In [None]:
import sqlite3
import os

# Remove existing database for a fresh start
db_path = 'company.db'
if os.path.exists(db_path):
    os.remove(db_path)

# Create database and connection
conn = sqlite3.connect(db_path)
cursor = conn.cursor()

# Create tables
cursor.executescript('''
    CREATE TABLE departments (
        id INTEGER PRIMARY KEY,
        name TEXT NOT NULL UNIQUE,
        budget REAL DEFAULT 0
    );
    
    CREATE TABLE employees (
        id INTEGER PRIMARY KEY,
        name TEXT NOT NULL,
        department_id INTEGER,
        salary REAL,
        hire_date TEXT,
        FOREIGN KEY (department_id) REFERENCES departments(id)
    );
    
    CREATE TABLE projects (
        id INTEGER PRIMARY KEY,
        name TEXT NOT NULL,
        department_id INTEGER,
        start_date TEXT,
        end_date TEXT,
        FOREIGN KEY (department_id) REFERENCES departments(id)
    );
''')

print("Tables created successfully!")

In [None]:
# Insert departments
departments = [
    (1, 'Engineering', 500000),
    (2, 'Marketing', 300000),
    (3, 'Sales', 400000),
    (4, 'HR', 200000),
    (5, 'Finance', 350000)
]

cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', departments)
print(f"Inserted {len(departments)} departments")

In [None]:
# Insert employees
employees = [
    (1, 'Alice Johnson', 1, 95000, '2020-03-15'),
    (2, 'Bob Smith', 1, 85000, '2021-06-01'),
    (3, 'Carol Williams', 1, 92000, '2019-08-20'),
    (4, 'David Brown', 2, 78000, '2022-01-10'),
    (5, 'Eva Martinez', 2, 82000, '2021-03-25'),
    (6, 'Frank Wilson', 3, 88000, '2020-11-05'),
    (7, 'Grace Lee', 3, 91000, '2019-05-12'),
    (8, 'Henry Taylor', 4, 65000, '2022-07-18'),
    (9, 'Ivy Chen', 1, 105000, '2018-02-28'),
    (10, 'Jack Anderson', 3, 95000, '2020-09-14'),
    (11, 'Karen White', 2, 75000, '2023-02-01'),
    (12, 'Leo Garcia', 5, 89000, '2021-11-20'),
    (13, 'Mia Davis', 4, 62000, '2023-04-15'),
    (14, 'Nathan Moore', 5, 94000, '2019-07-01'),
    (15, 'Olivia Clark', 1, 110000, '2017-09-10')
]

cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', employees)
print(f"Inserted {len(employees)} employees")

In [None]:
# Insert projects
projects = [
    (1, 'Cloud Migration', 1, '2024-01-15', '2024-06-30'),
    (2, 'Brand Refresh', 2, '2024-02-01', '2024-04-30'),
    (3, 'Q2 Sales Campaign', 3, '2024-04-01', '2024-06-30'),
    (4, 'Mobile App v2.0', 1, '2024-03-01', '2024-09-30'),
    (5, 'Employee Portal', 4, '2024-02-15', '2024-05-31'),
    (6, 'Data Analytics Platform', 1, '2024-05-01', '2024-12-31'),
    (7, 'Customer Retention', 3, '2024-03-15', '2024-08-15'),
    (8, 'Budget Planning System', 5, '2024-01-01', '2024-03-31')
]

cursor.executemany('INSERT INTO projects VALUES (?, ?, ?, ?, ?)', projects)
conn.commit()
print(f"Inserted {len(projects)} projects")
print("\nDatabase setup complete!")

---

## 1. WHERE Clauses and Operators

The `WHERE` clause filters rows based on conditions.

### Comparison Operators

| Operator | Description |
|----------|-------------|
| `=` | Equal to |
| `<>` or `!=` | Not equal to |
| `<` | Less than |
| `>` | Greater than |
| `<=` | Less than or equal |
| `>=` | Greater than or equal |

In [None]:
# Equality: Find employees in Engineering (department_id = 1)
cursor.execute('''
    SELECT name, salary FROM employees WHERE department_id = 1
''')

print("Engineering Employees:")
for name, salary in cursor.fetchall():
    print(f"  {name}: ${salary:,.0f}")

In [None]:
# Comparison: Find employees earning more than $90,000
cursor.execute('''
    SELECT name, salary FROM employees WHERE salary > 90000
''')

print("High Earners (>$90k):")
for name, salary in cursor.fetchall():
    print(f"  {name}: ${salary:,.0f}")

In [None]:
# Not equal: Find employees NOT in HR (department_id <> 4)
cursor.execute('''
    SELECT name, department_id FROM employees WHERE department_id <> 4
''')

print("Non-HR Employees (first 5):")
for name, dept_id in cursor.fetchall()[:5]:
    print(f"  {name} (Dept {dept_id})")

### LIKE Operator (Pattern Matching)

The `LIKE` operator matches patterns:
- `%` matches any sequence of characters
- `_` matches any single character

In [None]:
# Names starting with 'A'
cursor.execute("SELECT name FROM employees WHERE name LIKE 'A%'")
print("Names starting with 'A':")
for (name,) in cursor.fetchall():
    print(f"  {name}")

# Names ending with 'son'
cursor.execute("SELECT name FROM employees WHERE name LIKE '%son'")
print("\nNames ending with 'son':")
for (name,) in cursor.fetchall():
    print(f"  {name}")

# Names containing 'ar'
cursor.execute("SELECT name FROM employees WHERE name LIKE '%ar%'")
print("\nNames containing 'ar':")
for (name,) in cursor.fetchall():
    print(f"  {name}")

### IN Operator

The `IN` operator checks if a value matches any value in a list.

In [None]:
# Employees in Engineering, Sales, or Finance (departments 1, 3, 5)
cursor.execute('''
    SELECT name, department_id FROM employees 
    WHERE department_id IN (1, 3, 5)
''')

print("Engineering, Sales, or Finance employees:")
for name, dept_id in cursor.fetchall():
    print(f"  {name} (Dept {dept_id})")

### BETWEEN Operator

The `BETWEEN` operator selects values within a range (inclusive).

In [None]:
# Employees earning between $80,000 and $95,000
cursor.execute('''
    SELECT name, salary FROM employees 
    WHERE salary BETWEEN 80000 AND 95000
    ORDER BY salary
''')

print("Employees earning $80k-$95k:")
for name, salary in cursor.fetchall():
    print(f"  {name}: ${salary:,.0f}")

In [None]:
# Employees hired between specific dates
cursor.execute('''
    SELECT name, hire_date FROM employees 
    WHERE hire_date BETWEEN '2020-01-01' AND '2021-12-31'
    ORDER BY hire_date
''')

print("Employees hired in 2020-2021:")
for name, hire_date in cursor.fetchall():
    print(f"  {name}: {hire_date}")

### Combining Conditions with AND/OR

In [None]:
# AND: Engineering employees earning over $90k
cursor.execute('''
    SELECT name, salary FROM employees 
    WHERE department_id = 1 AND salary > 90000
''')

print("Engineering employees earning >$90k:")
for name, salary in cursor.fetchall():
    print(f"  {name}: ${salary:,.0f}")

In [None]:
# OR: Employees in HR OR earning over $100k
cursor.execute('''
    SELECT name, department_id, salary FROM employees 
    WHERE department_id = 4 OR salary > 100000
''')

print("HR employees OR high earners:")
for name, dept_id, salary in cursor.fetchall():
    print(f"  {name} (Dept {dept_id}): ${salary:,.0f}")

In [None]:
# Complex condition with parentheses
cursor.execute('''
    SELECT name, department_id, salary FROM employees 
    WHERE (department_id = 1 OR department_id = 3) AND salary > 90000
''')

print("Engineering or Sales employees earning >$90k:")
for name, dept_id, salary in cursor.fetchall():
    print(f"  {name} (Dept {dept_id}): ${salary:,.0f}")

---

## 2. ORDER BY and LIMIT

### ORDER BY

Sort results by one or more columns:
- `ASC` (ascending) - default
- `DESC` (descending)

In [None]:
# Sort by salary (ascending - lowest first)
cursor.execute('''
    SELECT name, salary FROM employees ORDER BY salary ASC
''')

print("Employees by salary (ascending):")
for name, salary in cursor.fetchall()[:5]:
    print(f"  {name}: ${salary:,.0f}")

In [None]:
# Sort by salary (descending - highest first)
cursor.execute('''
    SELECT name, salary FROM employees ORDER BY salary DESC
''')

print("Employees by salary (descending):")
for name, salary in cursor.fetchall()[:5]:
    print(f"  {name}: ${salary:,.0f}")

In [None]:
# Sort by multiple columns
cursor.execute('''
    SELECT name, department_id, salary FROM employees 
    ORDER BY department_id ASC, salary DESC
''')

print("Employees by department, then salary:")
current_dept = None
for name, dept_id, salary in cursor.fetchall():
    if dept_id != current_dept:
        print(f"\nDepartment {dept_id}:")
        current_dept = dept_id
    print(f"  {name}: ${salary:,.0f}")

### LIMIT and OFFSET

- `LIMIT n` - Return only the first n rows
- `OFFSET m` - Skip the first m rows

In [None]:
# Top 3 highest paid employees
cursor.execute('''
    SELECT name, salary FROM employees 
    ORDER BY salary DESC 
    LIMIT 3
''')

print("Top 3 Highest Paid:")
for i, (name, salary) in enumerate(cursor.fetchall(), 1):
    print(f"  {i}. {name}: ${salary:,.0f}")

In [None]:
# Pagination: Get employees 6-10 (skip first 5)
cursor.execute('''
    SELECT name, salary FROM employees 
    ORDER BY salary DESC 
    LIMIT 5 OFFSET 5
''')

print("Employees ranked 6-10 by salary:")
for i, (name, salary) in enumerate(cursor.fetchall(), 6):
    print(f"  {i}. {name}: ${salary:,.0f}")

---

## 3. Aggregate Functions

Aggregate functions compute a single result from multiple rows.

| Function | Description |
|----------|-------------|
| `COUNT()` | Number of rows |
| `SUM()` | Sum of values |
| `AVG()` | Average value |
| `MIN()` | Minimum value |
| `MAX()` | Maximum value |

In [None]:
# COUNT: Total number of employees
cursor.execute('SELECT COUNT(*) FROM employees')
count = cursor.fetchone()[0]
print(f"Total employees: {count}")

# COUNT with condition
cursor.execute('SELECT COUNT(*) FROM employees WHERE salary > 90000')
high_earners = cursor.fetchone()[0]
print(f"Employees earning >$90k: {high_earners}")

In [None]:
# SUM: Total salary expense
cursor.execute('SELECT SUM(salary) FROM employees')
total_salary = cursor.fetchone()[0]
print(f"Total salary expense: ${total_salary:,.0f}")

# SUM for specific department
cursor.execute('SELECT SUM(salary) FROM employees WHERE department_id = 1')
eng_salary = cursor.fetchone()[0]
print(f"Engineering salary expense: ${eng_salary:,.0f}")

In [None]:
# AVG: Average salary
cursor.execute('SELECT AVG(salary) FROM employees')
avg_salary = cursor.fetchone()[0]
print(f"Average salary: ${avg_salary:,.2f}")

In [None]:
# MIN and MAX
cursor.execute('SELECT MIN(salary), MAX(salary) FROM employees')
min_sal, max_sal = cursor.fetchone()
print(f"Salary range: ${min_sal:,.0f} - ${max_sal:,.0f}")

# MIN/MAX with dates
cursor.execute('SELECT MIN(hire_date), MAX(hire_date) FROM employees')
oldest, newest = cursor.fetchone()
print(f"Hire date range: {oldest} to {newest}")

In [None]:
# Multiple aggregates in one query
cursor.execute('''
    SELECT 
        COUNT(*) as count,
        SUM(salary) as total,
        AVG(salary) as average,
        MIN(salary) as minimum,
        MAX(salary) as maximum
    FROM employees
''')

count, total, avg, min_s, max_s = cursor.fetchone()
print("Employee Salary Statistics:")
print(f"  Count: {count}")
print(f"  Total: ${total:,.0f}")
print(f"  Average: ${avg:,.2f}")
print(f"  Minimum: ${min_s:,.0f}")
print(f"  Maximum: ${max_s:,.0f}")

---

## 4. GROUP BY and HAVING

### GROUP BY

`GROUP BY` groups rows with the same values and applies aggregate functions to each group.

In [None]:
# Count employees per department
cursor.execute('''
    SELECT department_id, COUNT(*) as employee_count
    FROM employees
    GROUP BY department_id
''')

print("Employees per Department:")
for dept_id, count in cursor.fetchall():
    print(f"  Department {dept_id}: {count} employees")

In [None]:
# Average salary per department
cursor.execute('''
    SELECT department_id, 
           COUNT(*) as count,
           AVG(salary) as avg_salary,
           SUM(salary) as total_salary
    FROM employees
    GROUP BY department_id
    ORDER BY avg_salary DESC
''')

print("Department Salary Statistics:")
print(f"{'Dept':<6} {'Count':<7} {'Avg Salary':<12} {'Total'}")
print("-" * 40)
for dept_id, count, avg_sal, total in cursor.fetchall():
    print(f"{dept_id:<6} {count:<7} ${avg_sal:<11,.0f} ${total:,.0f}")

In [None]:
# Group by year hired (extract year from hire_date)
cursor.execute('''
    SELECT strftime('%Y', hire_date) as year,
           COUNT(*) as hires,
           AVG(salary) as avg_salary
    FROM employees
    GROUP BY strftime('%Y', hire_date)
    ORDER BY year
''')

print("Hiring by Year:")
print(f"{'Year':<6} {'Hires':<7} {'Avg Salary'}")
print("-" * 28)
for year, hires, avg_sal in cursor.fetchall():
    print(f"{year:<6} {hires:<7} ${avg_sal:,.0f}")

### HAVING

`HAVING` filters groups (like WHERE, but for aggregated results).

**Key difference:**
- `WHERE` filters rows before grouping
- `HAVING` filters groups after aggregation

In [None]:
# Departments with more than 2 employees
cursor.execute('''
    SELECT department_id, COUNT(*) as count
    FROM employees
    GROUP BY department_id
    HAVING COUNT(*) > 2
''')

print("Departments with >2 employees:")
for dept_id, count in cursor.fetchall():
    print(f"  Department {dept_id}: {count} employees")

In [None]:
# Departments with average salary over $85,000
cursor.execute('''
    SELECT department_id, AVG(salary) as avg_salary
    FROM employees
    GROUP BY department_id
    HAVING AVG(salary) > 85000
    ORDER BY avg_salary DESC
''')

print("Departments with avg salary >$85k:")
for dept_id, avg_sal in cursor.fetchall():
    print(f"  Department {dept_id}: ${avg_sal:,.0f}")

In [None]:
# Combining WHERE and HAVING
# Find departments where recent hires (after 2020) have avg salary > $75k
cursor.execute('''
    SELECT department_id, 
           COUNT(*) as recent_hires,
           AVG(salary) as avg_salary
    FROM employees
    WHERE hire_date >= '2020-01-01'
    GROUP BY department_id
    HAVING AVG(salary) > 75000
    ORDER BY avg_salary DESC
''')

print("Departments with recent hires (2020+) earning avg >$75k:")
for dept_id, count, avg_sal in cursor.fetchall():
    print(f"  Dept {dept_id}: {count} hires, avg ${avg_sal:,.0f}")

---

## 5. JOIN Operations

JOINs combine rows from two or more tables based on related columns.

### Types of JOINs

| JOIN Type | Description |
|-----------|-------------|
| `INNER JOIN` | Returns only matching rows from both tables |
| `LEFT JOIN` | Returns all rows from left table, matched rows from right |
| `RIGHT JOIN` | Returns all rows from right table (not in SQLite) |
| `CROSS JOIN` | Returns cartesian product of both tables |

### INNER JOIN

Returns only rows where there's a match in both tables.

In [None]:
# Join employees with departments
cursor.execute('''
    SELECT e.name, d.name as department, e.salary
    FROM employees e
    INNER JOIN departments d ON e.department_id = d.id
    ORDER BY d.name, e.salary DESC
''')

print("Employees with Department Names:")
current_dept = None
for name, dept, salary in cursor.fetchall():
    if dept != current_dept:
        print(f"\n{dept}:")
        current_dept = dept
    print(f"  {name}: ${salary:,.0f}")

In [None]:
# Join with aggregation: Count employees per department with names
cursor.execute('''
    SELECT d.name, COUNT(e.id) as employee_count, SUM(e.salary) as total_salary
    FROM departments d
    INNER JOIN employees e ON d.id = e.department_id
    GROUP BY d.id, d.name
    ORDER BY employee_count DESC
''')

print("Department Statistics:")
print(f"{'Department':<15} {'Employees':<12} {'Total Salary'}")
print("-" * 42)
for dept_name, count, total in cursor.fetchall():
    print(f"{dept_name:<15} {count:<12} ${total:,.0f}")

In [None]:
# Join projects with departments
cursor.execute('''
    SELECT p.name as project, d.name as department, 
           p.start_date, p.end_date
    FROM projects p
    INNER JOIN departments d ON p.department_id = d.id
    ORDER BY p.start_date
''')

print("Projects by Department:")
print(f"{'Project':<25} {'Department':<12} {'Start':<12} {'End'}")
print("-" * 60)
for project, dept, start, end in cursor.fetchall():
    print(f"{project:<25} {dept:<12} {start:<12} {end}")

### LEFT JOIN

Returns all rows from the left table, and matching rows from the right table. If no match, returns NULL for right table columns.

In [None]:
# Left join: All departments, even those without employees
cursor.execute('''
    SELECT d.name, COUNT(e.id) as employee_count
    FROM departments d
    LEFT JOIN employees e ON d.id = e.department_id
    GROUP BY d.id, d.name
    ORDER BY employee_count DESC
''')

print("All Departments (including empty ones):")
for dept_name, count in cursor.fetchall():
    print(f"  {dept_name}: {count} employees")

In [None]:
# Find departments without projects
cursor.execute('''
    SELECT d.name
    FROM departments d
    LEFT JOIN projects p ON d.id = p.department_id
    WHERE p.id IS NULL
''')

print("Departments without projects:")
results = cursor.fetchall()
if results:
    for (name,) in results:
        print(f"  {name}")
else:
    print("  (All departments have projects)")

### Multiple JOINs

You can join more than two tables in a single query.

In [None]:
# Join all three tables: employees, departments, and projects
cursor.execute('''
    SELECT d.name as department,
           COUNT(DISTINCT e.id) as employees,
           COUNT(DISTINCT p.id) as projects,
           d.budget
    FROM departments d
    LEFT JOIN employees e ON d.id = e.department_id
    LEFT JOIN projects p ON d.id = p.department_id
    GROUP BY d.id, d.name, d.budget
    ORDER BY d.name
''')

print("Department Overview:")
print(f"{'Department':<15} {'Employees':<12} {'Projects':<10} {'Budget'}")
print("-" * 52)
for dept, emp_count, proj_count, budget in cursor.fetchall():
    print(f"{dept:<15} {emp_count:<12} {proj_count:<10} ${budget:,.0f}")

---

## 6. Subqueries

A subquery is a query nested inside another query. They can be used in:
- `WHERE` clauses
- `FROM` clauses
- `SELECT` clauses

In [None]:
# Subquery in WHERE: Employees earning above average
cursor.execute('''
    SELECT name, salary
    FROM employees
    WHERE salary > (SELECT AVG(salary) FROM employees)
    ORDER BY salary DESC
''')

# Get the average for context
cursor2 = conn.cursor()
cursor2.execute('SELECT AVG(salary) FROM employees')
avg_sal = cursor2.fetchone()[0]

print(f"Employees earning above average (${avg_sal:,.0f}):")
for name, salary in cursor.fetchall():
    print(f"  {name}: ${salary:,.0f}")

In [None]:
# Subquery with IN: Employees in departments with budget > $300k
cursor.execute('''
    SELECT e.name, d.name as department
    FROM employees e
    JOIN departments d ON e.department_id = d.id
    WHERE e.department_id IN (
        SELECT id FROM departments WHERE budget > 300000
    )
    ORDER BY d.name, e.name
''')

print("Employees in high-budget departments (>$300k):")
current_dept = None
for name, dept in cursor.fetchall():
    if dept != current_dept:
        print(f"\n{dept}:")
        current_dept = dept
    print(f"  - {name}")

In [None]:
# Correlated subquery: Employees earning above their department's average
cursor.execute('''
    SELECT e.name, e.salary, d.name as department
    FROM employees e
    JOIN departments d ON e.department_id = d.id
    WHERE e.salary > (
        SELECT AVG(e2.salary)
        FROM employees e2
        WHERE e2.department_id = e.department_id
    )
    ORDER BY d.name, e.salary DESC
''')

print("Employees earning above their department's average:")
current_dept = None
for name, salary, dept in cursor.fetchall():
    if dept != current_dept:
        print(f"\n{dept}:")
        current_dept = dept
    print(f"  {name}: ${salary:,.0f}")

In [None]:
# Subquery in FROM clause (derived table)
cursor.execute('''
    SELECT dept_stats.department, dept_stats.avg_salary, dept_stats.employee_count
    FROM (
        SELECT d.name as department,
               AVG(e.salary) as avg_salary,
               COUNT(e.id) as employee_count
        FROM departments d
        JOIN employees e ON d.id = e.department_id
        GROUP BY d.id, d.name
    ) as dept_stats
    WHERE dept_stats.employee_count >= 2
    ORDER BY dept_stats.avg_salary DESC
''')

print("Department stats (2+ employees):")
print(f"{'Department':<15} {'Avg Salary':<14} {'Employees'}")
print("-" * 38)
for dept, avg_sal, count in cursor.fetchall():
    print(f"{dept:<15} ${avg_sal:<13,.0f} {count}")

In [None]:
# Subquery in SELECT (scalar subquery)
cursor.execute('''
    SELECT 
        name,
        salary,
        salary - (SELECT AVG(salary) FROM employees) as diff_from_avg
    FROM employees
    ORDER BY diff_from_avg DESC
    LIMIT 5
''')

print("Top 5 employees by salary above average:")
print(f"{'Name':<20} {'Salary':<12} {'Above Avg'}")
print("-" * 42)
for name, salary, diff in cursor.fetchall():
    sign = '+' if diff > 0 else ''
    print(f"{name:<20} ${salary:<11,.0f} {sign}${diff:,.0f}")

---

## Exercises

### Exercise 1: WHERE Clause Practice

Find all employees who:
- Work in Engineering (department_id = 1) OR Sales (department_id = 3)
- AND earn between $85,000 and $100,000
- Order by salary descending

In [None]:
# Your code here


<details>
<summary>Click to see solution</summary>

```python
cursor.execute('''
    SELECT name, department_id, salary
    FROM employees
    WHERE (department_id = 1 OR department_id = 3)
      AND salary BETWEEN 85000 AND 100000
    ORDER BY salary DESC
''')

print("Engineering/Sales employees ($85k-$100k):")
for name, dept_id, salary in cursor.fetchall():
    dept_name = "Engineering" if dept_id == 1 else "Sales"
    print(f"  {name} ({dept_name}): ${salary:,.0f}")
```

</details>

### Exercise 2: Aggregate with GROUP BY

Find the department with the highest average salary. Display the department name and average salary.

In [None]:
# Your code here


<details>
<summary>Click to see solution</summary>

```python
cursor.execute('''
    SELECT d.name, AVG(e.salary) as avg_salary
    FROM departments d
    JOIN employees e ON d.id = e.department_id
    GROUP BY d.id, d.name
    ORDER BY avg_salary DESC
    LIMIT 1
''')

dept_name, avg_sal = cursor.fetchone()
print(f"Highest average salary: {dept_name} (${avg_sal:,.0f})")
```

</details>

### Exercise 3: JOIN Query

Create a report showing each project with:
- Project name
- Department name
- Number of employees in that department
- Department budget

Order by number of employees descending.

In [None]:
# Your code here


<details>
<summary>Click to see solution</summary>

```python
cursor.execute('''
    SELECT p.name as project,
           d.name as department,
           COUNT(e.id) as employee_count,
           d.budget
    FROM projects p
    JOIN departments d ON p.department_id = d.id
    LEFT JOIN employees e ON d.id = e.department_id
    GROUP BY p.id, p.name, d.name, d.budget
    ORDER BY employee_count DESC
''')

print("Project Report:")
print(f"{'Project':<25} {'Department':<12} {'Employees':<10} {'Budget'}")
print("-" * 60)
for project, dept, count, budget in cursor.fetchall():
    print(f"{project:<25} {dept:<12} {count:<10} ${budget:,.0f}")
```

</details>

### Exercise 4: Subquery Challenge

Find all employees who earn more than the highest-paid person in the HR department.

In [None]:
# Your code here


<details>
<summary>Click to see solution</summary>

```python
cursor.execute('''
    SELECT e.name, e.salary, d.name as department
    FROM employees e
    JOIN departments d ON e.department_id = d.id
    WHERE e.salary > (
        SELECT MAX(salary)
        FROM employees
        WHERE department_id = 4  -- HR department
    )
    ORDER BY e.salary DESC
''')

# Get max HR salary for context
cursor2 = conn.cursor()
cursor2.execute('SELECT MAX(salary) FROM employees WHERE department_id = 4')
max_hr = cursor2.fetchone()[0]
print(f"Max HR salary: ${max_hr:,.0f}\n")

print("Employees earning more than highest HR salary:")
for name, salary, dept in cursor.fetchall():
    print(f"  {name} ({dept}): ${salary:,.0f}")
```

</details>

### Exercise 5: Complex Query

Create a comprehensive department report showing:
- Department name
- Total employees
- Average salary
- Number of projects
- Budget utilization (total salaries / budget * 100 as percentage)

Only show departments with at least 2 employees. Order by budget utilization descending.

In [None]:
# Your code here


<details>
<summary>Click to see solution</summary>

```python
cursor.execute('''
    SELECT d.name,
           COUNT(DISTINCT e.id) as employees,
           AVG(e.salary) as avg_salary,
           COUNT(DISTINCT p.id) as projects,
           (SUM(e.salary) / d.budget * 100) as budget_util
    FROM departments d
    LEFT JOIN employees e ON d.id = e.department_id
    LEFT JOIN projects p ON d.id = p.department_id
    GROUP BY d.id, d.name, d.budget
    HAVING COUNT(DISTINCT e.id) >= 2
    ORDER BY budget_util DESC
''')

print("Department Report:")
print(f"{'Department':<12} {'Employees':<10} {'Avg Salary':<12} {'Projects':<10} {'Budget Util'}")
print("-" * 60)
for dept, emp, avg_sal, proj, util in cursor.fetchall():
    print(f"{dept:<12} {emp:<10} ${avg_sal:<11,.0f} {proj:<10} {util:.1f}%")
```

</details>

---

## Summary

In this notebook, you learned:

### WHERE Clause Operators
- Comparison: `=`, `<>`, `<`, `>`, `<=`, `>=`
- Pattern matching: `LIKE` with `%` and `_`
- Set membership: `IN`
- Range: `BETWEEN`
- Combining: `AND`, `OR`, `NOT`

### Sorting and Limiting
- `ORDER BY column ASC/DESC`
- `LIMIT n` and `OFFSET m` for pagination

### Aggregate Functions
- `COUNT()`, `SUM()`, `AVG()`, `MIN()`, `MAX()`

### Grouping
- `GROUP BY` to aggregate by categories
- `HAVING` to filter groups

### JOINs
- `INNER JOIN` - matching rows only
- `LEFT JOIN` - all left rows, matched right rows

### Subqueries
- In WHERE, FROM, and SELECT clauses
- Correlated subqueries reference outer query

## Next Steps

In the next notebook, **04_pandas_sql_integration.ipynb**, you'll learn how to:
- Load SQL query results directly into Pandas DataFrames
- Write DataFrames to SQL tables
- Combine SQL and Pandas for powerful data analysis workflows

---

## Cleanup

In [None]:
# Close connection and remove database file
conn.close()

if os.path.exists('company.db'):
    os.remove('company.db')
    print("Database file removed.")