# SQL Exercises: Setup and Introduction

Welcome to SQL Query Mastery! This notebook will help you set up your environment and introduce you to the practice databases.

## What You'll Learn

This series of notebooks covers SQL from basics to advanced:

1. **SELECT Basics** - Retrieving and filtering data
2. **Sorting & Limiting** - ORDER BY, LIMIT, OFFSET
3. **Aggregations** - COUNT, SUM, AVG, GROUP BY, HAVING
4. **Joins** - INNER, LEFT, RIGHT, FULL, SELF, CROSS
5. **Subqueries** - Scalar, column, row, and table subqueries
6. **Set Operations** - UNION, INTERSECT, EXCEPT
7. **String & Date Functions** - Text and temporal manipulation
8. **CASE Statements** - Conditional logic in SQL
9. **CTEs** - Common Table Expressions
10. **Window Functions** - ROW_NUMBER, RANK, DENSE_RANK, NTILE
11. **Advanced Windows** - LAG, LEAD, running totals
12. **Recursive CTEs** - Hierarchical queries
13. **Query Optimization** - Execution plans and performance
14. **Complex Reporting** - Putting it all together

## Step 1: Install Dependencies

Run this cell to install required packages:

In [None]:
# Install required packages
%pip install duckdb pandas faker --quiet

## Step 2: Initialize the Database

The practice database needs to be created before you start. Run this cell once:

In [None]:
import subprocess
import sys
from pathlib import Path

# Find the init script
project_root = Path.cwd().parent if Path.cwd().name == 'notebooks' else Path.cwd()
init_script = project_root / 'data' / 'scripts' / 'init_database.py'

if init_script.exists():
    print("Initializing database... (this may take a minute)")
    result = subprocess.run([sys.executable, str(init_script)], capture_output=True, text=True)
    print(result.stdout)
    if result.returncode != 0:
        print("Error:", result.stderr)
else:
    print(f"Init script not found at: {init_script}")
    print("Make sure you're running from the project directory.")

## Step 3: Connect to the Database

Now let's connect and explore the database structure:

In [None]:
import duckdb
import pandas as pd
from pathlib import Path

# Connect to the practice database
project_root = Path.cwd().parent if Path.cwd().name == 'notebooks' else Path.cwd()
db_path = project_root / 'data' / 'databases' / 'practice.duckdb'

conn = duckdb.connect(str(db_path), read_only=True)
print(f"Connected to: {db_path}")

## Explore the Database

### List All Tables

In [None]:
# Show all tables in the database
conn.execute("SHOW TABLES").fetchdf()

## Database Schema Overview

The practice database contains three datasets:

### 1. Employees (HR Data)
- `departments` - Company departments
- `employees` - Employee records with salaries, managers
- `salary_history` - Historical salary changes
- `projects` - Company projects
- `project_assignments` - Employee-project mappings
- `performance_reviews` - Employee performance reviews

### 2. Ecommerce (Transactional Data)
- `customers` - Customer accounts
- `addresses` - Customer addresses
- `categories` - Product categories (hierarchical)
- `products` - Product catalog
- `orders` - Customer orders
- `order_items` - Line items in orders
- `reviews` - Product reviews
- `promotions` - Discount codes

### 3. Analytics (Event/Session Data)
- `users` - User accounts
- `sessions` - User sessions
- `page_views` - Page view events
- `events` - User actions
- `conversions` - Conversion events
- `ab_tests` - A/B test definitions
- `ab_test_assignments` - User test assignments
- `daily_metrics` - Aggregated metrics

### Describe Table Structure

Use `DESCRIBE` to see column definitions:

In [None]:
# Describe the employees table
conn.execute("DESCRIBE employees").fetchdf()

In [None]:
# Describe the orders table
conn.execute("DESCRIBE orders").fetchdf()

In [None]:
# Describe the sessions table
conn.execute("DESCRIBE sessions").fetchdf()

### Preview Table Data

Let's look at some sample data from each dataset:

In [None]:
# Sample employees
conn.execute("SELECT * FROM employees LIMIT 5").fetchdf()

In [None]:
# Sample products
conn.execute("SELECT * FROM products LIMIT 5").fetchdf()

In [None]:
# Sample sessions
conn.execute("SELECT * FROM sessions LIMIT 5").fetchdf()

## How Exercises Work

Each exercise follows this pattern:

1. Read the problem description
2. Write your SQL in the provided variable
3. Run `check()` to validate your answer

The `check()` function will tell you if your query is correct without revealing the answer.

In [None]:
# Example of how exercises work

# Your query goes in a variable like this:
ex_demo = '''
SELECT first_name, last_name
FROM employees
LIMIT 5
'''

# Preview your results
conn.execute(ex_demo).fetchdf()

## Quick Reference

### Running Queries

```python
# Execute a query and get results as DataFrame
result = conn.execute("SELECT * FROM employees").fetchdf()

# Execute and display directly
conn.execute("SELECT * FROM employees LIMIT 10").fetchdf()
```

### Checking Answers

```python
# At the top of each content notebook:
import os
import sys
sys.path.insert(0, str(Path.cwd().parent / 'src'))
from sql_exercises import check
os.environ['SQL_NOTEBOOK_NAME'] = '01_select_basics'

# Then for each exercise:
ex_01 = '''YOUR SQL HERE'''
check("ex_01", ex_01)
```

### Getting Hints

```python
from sql_exercises.checker import hint
hint("ex_01")  # Shows a hint without the answer
```

## Data Summary

Let's see how much data we have to work with:

In [None]:
# Count rows in each table
tables = conn.execute("SHOW TABLES").fetchall()

print("Table Row Counts:")
print("-" * 35)
for (table_name,) in tables:
    count = conn.execute(f"SELECT COUNT(*) FROM {table_name}").fetchone()[0]
    print(f"{table_name:25} {count:>8,}")

## Ready to Start!

You're all set! Open **01_select_basics.ipynb** to begin learning SQL.

---

### Tips for Success

1. **Read carefully** - Each exercise builds on concepts from previous ones
2. **Try first** - Attempt the solution before looking at hints
3. **Preview results** - Run your query to see output before checking
4. **Learn from failures** - The check function gives helpful feedback
5. **Take notes** - Keep track of new syntax and patterns

In [None]:
# Cleanup: Close connection when done exploring
conn.close()
print("Connection closed. Happy learning!")