# Notebook 01: SELECT Basics & Filtering

## Learning Objectives

By the end of this notebook, you will be able to:
- Write basic SELECT statements to retrieve data
- Select specific columns instead of all columns
- Use column aliases with AS
- Filter rows with WHERE clause
- Use comparison operators (=, <, >, <=, >=, <>)
- Filter with LIKE for pattern matching
- Use IN for multiple value matching
- Use BETWEEN for range queries
- Combine conditions with AND, OR, NOT
- Handle NULL values with IS NULL / IS NOT NULL

## Setup

Run this cell first to set up the connection and checker:

In [None]:
import os
import sys
from pathlib import Path

# Add src to path for imports
project_root = Path.cwd().parent if Path.cwd().name == 'notebooks' else Path.cwd()
sys.path.insert(0, str(project_root / 'src'))

import duckdb
from sql_exercises import check

# Set notebook name for checker
os.environ['SQL_NOTEBOOK_NAME'] = '01_select_basics'

# Connect to database
db_path = project_root / 'data' / 'databases' / 'practice.duckdb'
conn = duckdb.connect(str(db_path), read_only=True)

print("Setup complete! Connected to practice database.")

## Quick Reference

### SELECT Syntax

```sql
SELECT column1, column2, ...    -- Columns to retrieve
FROM table_name                  -- Table to query
WHERE condition                  -- Optional: filter rows
```

### Comparison Operators

| Operator | Description |
|----------|-------------|
| `=` | Equal to |
| `<>` or `!=` | Not equal to |
| `<` | Less than |
| `>` | Greater than |
| `<=` | Less than or equal |
| `>=` | Greater than or equal |

### Pattern Matching with LIKE

| Pattern | Matches |
|---------|--------|
| `'A%'` | Starts with A |
| `'%a'` | Ends with a |
| `'%the%'` | Contains 'the' |
| `'_ohn'` | 4 chars ending in 'ohn' |

### NULL Handling

```sql
-- Check for NULL (not = NULL)
WHERE column IS NULL
WHERE column IS NOT NULL
```

## Schema Reference

This notebook uses the **employees** dataset. Key tables:

### employees
| Column | Type | Description |
|--------|------|-------------|
| employee_id | INTEGER | Primary key |
| first_name | VARCHAR | Employee first name |
| last_name | VARCHAR | Employee last name |
| email | VARCHAR | Email address |
| phone | VARCHAR | Phone number |
| hire_date | DATE | Date hired |
| job_title | VARCHAR | Job title |
| salary | DECIMAL | Annual salary |
| commission_pct | DECIMAL | Commission percentage (can be NULL) |
| manager_id | INTEGER | Manager's employee_id (NULL for top level) |
| department_id | INTEGER | Foreign key to departments |
| is_active | BOOLEAN | Currently employed |

### departments
| Column | Type | Description |
|--------|------|-------------|
| department_id | INTEGER | Primary key |
| department_name | VARCHAR | Department name |
| location | VARCHAR | Office location |
| budget | DECIMAL | Annual budget |

In [None]:
# Preview the employees table
conn.execute("SELECT * FROM employees LIMIT 5").fetchdf()

In [None]:
# Preview the departments table
conn.execute("SELECT * FROM departments").fetchdf()

---

## Exercise 1: Select All Employees (Easy)

**Problem:** Retrieve all columns for all employees.

**Tables:** employees

**Expected:** All columns, all rows from the employees table

**Hint:** Use `SELECT *` to get all columns

In [None]:
ex_01 = '''
-- Write your SQL query here

'''

# Preview your results
conn.execute(ex_01).fetchdf()

In [None]:
check("ex_01", ex_01)

---

## Exercise 2: Select Specific Columns (Easy)

**Problem:** Retrieve only the first name, last name, and email for all employees.

**Tables:** employees

**Expected:** 3 columns (first_name, last_name, email), all employee rows

**Hint:** List the column names separated by commas

In [None]:
ex_02 = '''
-- Write your SQL query here

'''

# Preview your results
conn.execute(ex_02).fetchdf()

In [None]:
check("ex_02", ex_02)

---

## Exercise 3: Column Aliases (Easy)

**Problem:** Retrieve employee first name and salary, but rename the columns to `name` and `annual_pay` respectively.

**Tables:** employees

**Expected:** 2 columns named `name` and `annual_pay`

**Hint:** Use `AS` to create aliases: `column_name AS alias`

In [None]:
ex_03 = '''
-- Write your SQL query here

'''

# Preview your results
conn.execute(ex_03).fetchdf()

In [None]:
check("ex_03", ex_03)

---

## Exercise 4: Filter with WHERE - Equality (Easy)

**Problem:** Find all employees who work in department 1 (Engineering).

**Tables:** employees

**Expected:** All columns for employees where department_id = 1

**Hint:** Use WHERE with the = operator

In [None]:
ex_04 = '''
-- Write your SQL query here

'''

# Preview your results
conn.execute(ex_04).fetchdf()

In [None]:
check("ex_04", ex_04)

---

## Exercise 5: Filter with Comparison Operators (Medium)

**Problem:** Find all employees with a salary greater than $100,000.

Return columns: employee_id, first_name, last_name, salary

**Tables:** employees

**Expected:** 4 columns, only employees earning > 100000

**Hint:** Salaries are stored as numbers, no $ or commas

In [None]:
ex_05 = '''
-- Write your SQL query here

'''

# Preview your results
conn.execute(ex_05).fetchdf()

In [None]:
check("ex_05", ex_05)

---

## Exercise 6: LIKE Pattern Matching (Medium)

**Problem:** Find all employees whose job title contains the word 'Manager'.

Return columns: employee_id, first_name, last_name, job_title

**Tables:** employees

**Expected:** 4 columns, only employees with 'Manager' in their job title

**Hint:** Use LIKE with % wildcards to match anywhere in the string

In [None]:
ex_06 = '''
-- Write your SQL query here

'''

# Preview your results
conn.execute(ex_06).fetchdf()

In [None]:
check("ex_06", ex_06)

---

## Exercise 7: IN Operator (Medium)

**Problem:** Find all employees who work in the Engineering (1), Sales (2), or Marketing (3) departments.

Return columns: employee_id, first_name, last_name, department_id

**Tables:** employees

**Expected:** 4 columns, only employees in departments 1, 2, or 3

**Hint:** Use IN (value1, value2, value3)

In [None]:
ex_07 = '''
-- Write your SQL query here

'''

# Preview your results
conn.execute(ex_07).fetchdf()

In [None]:
check("ex_07", ex_07)

---

## Exercise 8: BETWEEN for Ranges (Medium)

**Problem:** Find all employees with a salary between $60,000 and $90,000 (inclusive).

Return columns: employee_id, first_name, last_name, salary

**Tables:** employees

**Expected:** 4 columns, only employees with salary in the specified range

**Hint:** BETWEEN is inclusive on both ends

In [None]:
ex_08 = '''
-- Write your SQL query here

'''

# Preview your results
conn.execute(ex_08).fetchdf()

In [None]:
check("ex_08", ex_08)

---

## Exercise 9: Combining Conditions with AND/OR (Hard)

**Problem:** Find all employees who either:
- Work in the Sales department (department_id = 2) AND have a salary > $80,000
- OR work in the Engineering department (department_id = 1) AND have a salary > $120,000

Return columns: employee_id, first_name, last_name, department_id, salary

**Tables:** employees

**Expected:** 5 columns, employees matching either condition

**Hint:** Use parentheses to group conditions correctly

In [None]:
ex_09 = '''
-- Write your SQL query here

'''

# Preview your results
conn.execute(ex_09).fetchdf()

In [None]:
check("ex_09", ex_09)

---

## Exercise 10: Handling NULL Values (Hard)

**Problem:** Find all employees who are in the Sales department (department_id = 2) and DO have a commission percentage (commission_pct is not NULL).

Return columns: employee_id, first_name, last_name, commission_pct, salary

**Tables:** employees

**Expected:** 5 columns, only Sales employees with a commission

**Hint:** Use IS NOT NULL to check for non-NULL values

In [None]:
ex_10 = '''
-- Write your SQL query here

'''

# Preview your results
conn.execute(ex_10).fetchdf()

In [None]:
check("ex_10", ex_10)

---

## Summary

In this notebook, you learned:

1. **SELECT** - How to retrieve data from tables
2. **Column selection** - Choosing specific columns vs SELECT *
3. **Aliases** - Renaming columns with AS
4. **WHERE** - Filtering rows based on conditions
5. **Comparison operators** - =, <>, <, >, <=, >=
6. **LIKE** - Pattern matching with wildcards
7. **IN** - Matching against a list of values
8. **BETWEEN** - Range queries
9. **AND/OR** - Combining conditions
10. **IS NULL** - Handling NULL values

### Next Steps

Continue to **02_sorting_limiting.ipynb** to learn about:
- ORDER BY for sorting results
- LIMIT and OFFSET for pagination
- DISTINCT for unique values

In [None]:
# Cleanup
conn.close()
print("Notebook complete! Connection closed.")