# SQL Learning Notes

This notebook contains SQL examples and practice exercises using SQLite. We'll cover basic to intermediate SQL concepts with hands-on examples.

## Topics Covered:
1. Database Setup and Connection
2. Creating Tables
3. Inserting Data
4. Basic SELECT Queries
5. Filtering with WHERE
6. Sorting and Grouping
7. Joins
8. Aggregate Functions
9. Subqueries

## 1. Database Setup and Connection

First, let's import the necessary libraries and connect to our SQLite database.

In [1]:
import sqlite3
import pandas as pd
from IPython.display import display

# Connect to the SQLite database
conn = sqlite3.connect('my_database.db')
cursor = conn.cursor()

print("Connected to SQLite database successfully!")

Connected to SQLite database successfully!


## 2. Creating Tables

Let's create some sample tables to work with. We'll create tables for a simple company database with employees, departments, and projects.

In [2]:
# Create departments table
cursor.execute('''
CREATE TABLE IF NOT EXISTS departments (
    dept_id INTEGER PRIMARY KEY,
    dept_name VARCHAR(100) NOT NULL,
    location VARCHAR(100)
)
''')

# Create employees table
cursor.execute('''
CREATE TABLE IF NOT EXISTS employees (
    emp_id INTEGER PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE,
    hire_date DATE,
    salary DECIMAL(10, 2),
    dept_id INTEGER,
    FOREIGN KEY (dept_id) REFERENCES departments (dept_id)
)
''')

# Create projects table
cursor.execute('''
CREATE TABLE IF NOT EXISTS projects (
    project_id INTEGER PRIMARY KEY,
    project_name VARCHAR(100) NOT NULL,
    start_date DATE,
    end_date DATE,
    budget DECIMAL(12, 2),
    dept_id INTEGER,
    FOREIGN KEY (dept_id) REFERENCES departments (dept_id)
)
''')

conn.commit()
print("Tables created successfully!")

Tables created successfully!


## 3. Inserting Sample Data

Now let's populate our tables with some sample data to work with.

In [3]:
# Insert departments
departments_data = [
    (1, 'Human Resources', 'New York'),
    (2, 'Engineering', 'San Francisco'),
    (3, 'Marketing', 'Chicago'),
    (4, 'Sales', 'Los Angeles'),
    (5, 'Finance', 'Boston')
]

cursor.executemany('INSERT OR REPLACE INTO departments VALUES (?, ?, ?)', departments_data)

# Insert employees
employees_data = [
    (1, 'John', 'Doe', 'john.doe@company.com', '2020-01-15', 75000, 2),
    (2, 'Jane', 'Smith', 'jane.smith@company.com', '2019-03-22', 82000, 2),
    (3, 'Mike', 'Johnson', 'mike.johnson@company.com', '2021-06-10', 65000, 3),
    (4, 'Sarah', 'Williams', 'sarah.williams@company.com', '2020-09-05', 90000, 1),
    (5, 'David', 'Brown', 'david.brown@company.com', '2018-11-12', 95000, 4),
    (6, 'Emily', 'Davis', 'emily.davis@company.com', '2022-02-28', 70000, 5),
    (7, 'Robert', 'Wilson', 'robert.wilson@company.com', '2019-08-17', 88000, 2),
    (8, 'Lisa', 'Anderson', 'lisa.anderson@company.com', '2021-04-03', 72000, 3)
]

cursor.executemany('INSERT OR REPLACE INTO employees VALUES (?, ?, ?, ?, ?, ?, ?)', employees_data)

# Insert projects
projects_data = [
    (1, 'Website Redesign', '2023-01-01', '2023-06-30', 150000, 2),
    (2, 'Mobile App Development', '2023-03-15', '2023-12-31', 300000, 2),
    (3, 'Marketing Campaign Q2', '2023-04-01', '2023-06-30', 75000, 3),
    (4, 'Sales Training Program', '2023-02-01', '2023-05-31', 50000, 4),
    (5, 'Financial System Upgrade', '2023-01-15', '2023-08-31', 200000, 5)
]

cursor.executemany('INSERT OR REPLACE INTO projects VALUES (?, ?, ?, ?, ?, ?)', projects_data)

conn.commit()
print("Sample data inserted successfully!")

Sample data inserted successfully!


## 4. Basic SELECT Queries

Let's start with simple SELECT statements to retrieve data from our tables.

In [4]:
# Example 1: Select all columns from employees table
print("All employees:")
df = pd.read_sql_query("SELECT * FROM employees", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 2: Select specific columns
print("Employee names and emails:")
df = pd.read_sql_query("SELECT first_name, last_name, email FROM employees", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 3: Select all departments
print("All departments:")
df = pd.read_sql_query("SELECT * FROM departments", conn)
display(df)

All employees:


Unnamed: 0,emp_id,first_name,last_name,email,hire_date,salary,dept_id
0,1,John,Doe,john.doe@company.com,2020-01-15,75000,2
1,2,Jane,Smith,jane.smith@company.com,2019-03-22,82000,2
2,3,Mike,Johnson,mike.johnson@company.com,2021-06-10,65000,3
3,4,Sarah,Williams,sarah.williams@company.com,2020-09-05,90000,1
4,5,David,Brown,david.brown@company.com,2018-11-12,95000,4
5,6,Emily,Davis,emily.davis@company.com,2022-02-28,70000,5
6,7,Robert,Wilson,robert.wilson@company.com,2019-08-17,88000,2
7,8,Lisa,Anderson,lisa.anderson@company.com,2021-04-03,72000,3




Employee names and emails:


Unnamed: 0,first_name,last_name,email
0,John,Doe,john.doe@company.com
1,Jane,Smith,jane.smith@company.com
2,Mike,Johnson,mike.johnson@company.com
3,Sarah,Williams,sarah.williams@company.com
4,David,Brown,david.brown@company.com
5,Emily,Davis,emily.davis@company.com
6,Robert,Wilson,robert.wilson@company.com
7,Lisa,Anderson,lisa.anderson@company.com




All departments:


Unnamed: 0,dept_id,dept_name,location
0,1,Human Resources,New York
1,2,Engineering,San Francisco
2,3,Marketing,Chicago
3,4,Sales,Los Angeles
4,5,Finance,Boston


## 5. Filtering with WHERE Clause

The WHERE clause allows us to filter data based on specific conditions.

In [5]:
# Example 1: Filter by salary
print("Employees with salary > 80000:")
df = pd.read_sql_query("SELECT first_name, last_name, salary FROM employees WHERE salary > 80000", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 2: Filter by department
print("Employees in Engineering department (dept_id = 2):")
df = pd.read_sql_query("SELECT first_name, last_name, salary FROM employees WHERE dept_id = 2", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 3: Multiple conditions with AND
print("Engineering employees with salary > 80000:")
df = pd.read_sql_query("SELECT first_name, last_name, salary FROM employees WHERE dept_id = 2 AND salary > 80000", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 4: Using LIKE for pattern matching
print("Employees with first name starting with 'J':")
df = pd.read_sql_query("SELECT first_name, last_name, email FROM employees WHERE first_name LIKE 'J%'", conn)
display(df)

Employees with salary > 80000:


Unnamed: 0,first_name,last_name,salary
0,Jane,Smith,82000
1,Sarah,Williams,90000
2,David,Brown,95000
3,Robert,Wilson,88000




Employees in Engineering department (dept_id = 2):


Unnamed: 0,first_name,last_name,salary
0,John,Doe,75000
1,Jane,Smith,82000
2,Robert,Wilson,88000




Engineering employees with salary > 80000:


Unnamed: 0,first_name,last_name,salary
0,Jane,Smith,82000
1,Robert,Wilson,88000




Employees with first name starting with 'J':


Unnamed: 0,first_name,last_name,email
0,John,Doe,john.doe@company.com
1,Jane,Smith,jane.smith@company.com


## 6. Sorting and Grouping Data

Learn how to sort results with ORDER BY and group data with GROUP BY.

In [6]:
# Example 1: ORDER BY - Sort by salary (highest first)
print("Employees sorted by salary (highest first):")
df = pd.read_sql_query("SELECT first_name, last_name, salary FROM employees ORDER BY salary DESC", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 2: ORDER BY multiple columns
print("Employees sorted by department, then by salary:")
df = pd.read_sql_query("SELECT first_name, last_name, salary, dept_id FROM employees ORDER BY dept_id, salary DESC", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 3: GROUP BY with COUNT
print("Number of employees per department:")
df = pd.read_sql_query("SELECT dept_id, COUNT(*) as employee_count FROM employees GROUP BY dept_id", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 4: GROUP BY with AVG
print("Average salary per department:")
df = pd.read_sql_query("SELECT dept_id, AVG(salary) as avg_salary FROM employees GROUP BY dept_id ORDER BY avg_salary DESC", conn)
display(df)

Employees sorted by salary (highest first):


Unnamed: 0,first_name,last_name,salary
0,David,Brown,95000
1,Sarah,Williams,90000
2,Robert,Wilson,88000
3,Jane,Smith,82000
4,John,Doe,75000
5,Lisa,Anderson,72000
6,Emily,Davis,70000
7,Mike,Johnson,65000




Employees sorted by department, then by salary:


Unnamed: 0,first_name,last_name,salary,dept_id
0,Sarah,Williams,90000,1
1,Robert,Wilson,88000,2
2,Jane,Smith,82000,2
3,John,Doe,75000,2
4,Lisa,Anderson,72000,3
5,Mike,Johnson,65000,3
6,David,Brown,95000,4
7,Emily,Davis,70000,5




Number of employees per department:


Unnamed: 0,dept_id,employee_count
0,1,1
1,2,3
2,3,2
3,4,1
4,5,1




Average salary per department:


Unnamed: 0,dept_id,avg_salary
0,4,95000.0
1,1,90000.0
2,2,81666.666667
3,5,70000.0
4,3,68500.0


## 7. JOINs - Combining Data from Multiple Tables

JOINs allow us to combine data from multiple tables based on related columns.

In [7]:
# Example 1: INNER JOIN - Employees with their department names
print("Employees with department names:")
df = pd.read_sql_query("""
SELECT e.first_name, e.last_name, e.salary, d.dept_name, d.location
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
ORDER BY d.dept_name, e.last_name
""", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 2: JOIN with projects
print("Projects with their department information:")
df = pd.read_sql_query("""
SELECT p.project_name, p.budget, d.dept_name, d.location
FROM projects p
INNER JOIN departments d ON p.dept_id = d.dept_id
ORDER BY p.budget DESC
""", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 3: Multiple JOINs - Show employees working in departments that have projects
print("Employees in departments with active projects:")
df = pd.read_sql_query("""
SELECT DISTINCT e.first_name, e.last_name, d.dept_name, p.project_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
INNER JOIN projects p ON d.dept_id = p.dept_id
ORDER BY d.dept_name, e.last_name
""", conn)
display(df)

Employees with department names:


Unnamed: 0,first_name,last_name,salary,dept_name,location
0,John,Doe,75000,Engineering,San Francisco
1,Jane,Smith,82000,Engineering,San Francisco
2,Robert,Wilson,88000,Engineering,San Francisco
3,Emily,Davis,70000,Finance,Boston
4,Sarah,Williams,90000,Human Resources,New York
5,Lisa,Anderson,72000,Marketing,Chicago
6,Mike,Johnson,65000,Marketing,Chicago
7,David,Brown,95000,Sales,Los Angeles




Projects with their department information:


Unnamed: 0,project_name,budget,dept_name,location
0,Mobile App Development,300000,Engineering,San Francisco
1,Financial System Upgrade,200000,Finance,Boston
2,Website Redesign,150000,Engineering,San Francisco
3,Marketing Campaign Q2,75000,Marketing,Chicago
4,Sales Training Program,50000,Sales,Los Angeles




Employees in departments with active projects:


Unnamed: 0,first_name,last_name,dept_name,project_name
0,John,Doe,Engineering,Mobile App Development
1,John,Doe,Engineering,Website Redesign
2,Jane,Smith,Engineering,Mobile App Development
3,Jane,Smith,Engineering,Website Redesign
4,Robert,Wilson,Engineering,Mobile App Development
5,Robert,Wilson,Engineering,Website Redesign
6,Emily,Davis,Finance,Financial System Upgrade
7,Lisa,Anderson,Marketing,Marketing Campaign Q2
8,Mike,Johnson,Marketing,Marketing Campaign Q2
9,David,Brown,Sales,Sales Training Program


## 8. Aggregate Functions

Aggregate functions perform calculations on multiple rows and return a single result.

In [8]:
# Example 1: Basic aggregate functions
print("Salary statistics:")
df = pd.read_sql_query("""
SELECT 
    COUNT(*) as total_employees,
    AVG(salary) as average_salary,
    MIN(salary) as minimum_salary,
    MAX(salary) as maximum_salary,
    SUM(salary) as total_payroll
FROM employees
""", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 2: Aggregate with GROUP BY and department names
print("Department salary statistics:")
df = pd.read_sql_query("""
SELECT 
    d.dept_name,
    COUNT(e.emp_id) as employee_count,
    AVG(e.salary) as avg_salary,
    MAX(e.salary) as max_salary,
    MIN(e.salary) as min_salary
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_id, d.dept_name
ORDER BY avg_salary DESC
""", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 3: HAVING clause (filtering groups)
print("Departments with average salary > 75000:")
df = pd.read_sql_query("""
SELECT 
    d.dept_name,
    COUNT(e.emp_id) as employee_count,
    AVG(e.salary) as avg_salary
FROM departments d
INNER JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_id, d.dept_name
HAVING AVG(e.salary) > 75000
ORDER BY avg_salary DESC
""", conn)
display(df)

Salary statistics:


Unnamed: 0,total_employees,average_salary,minimum_salary,maximum_salary,total_payroll
0,8,79625.0,65000,95000,637000




Department salary statistics:


Unnamed: 0,dept_name,employee_count,avg_salary,max_salary,min_salary
0,Sales,1,95000.0,95000,95000
1,Human Resources,1,90000.0,90000,90000
2,Engineering,3,81666.666667,88000,75000
3,Finance,1,70000.0,70000,70000
4,Marketing,2,68500.0,72000,65000




Departments with average salary > 75000:


Unnamed: 0,dept_name,employee_count,avg_salary
0,Sales,1,95000.0
1,Human Resources,1,90000.0
2,Engineering,3,81666.666667


## 9. Practice Exercises

Try these exercises to test your SQL knowledge. Write your queries in the cell below!

### Exercise Questions:

1. **Find all employees hired after 2020-01-01**
2. **List the top 3 highest paid employees with their department names**
3. **Find departments with more than 2 employees**
4. **Calculate the total budget for all projects by department**
5. **Find employees whose salary is above the company average**
6. **List all projects that end in 2023 with their department information**

Use the cell below to write and test your SQL queries:

In [9]:
# Practice Area - Write your SQL queries here!

# Example: Exercise 1 - Find all employees hired after 2020-01-01
print("Exercise 1: Employees hired after 2020-01-01")
query1 = """
SELECT first_name, last_name, hire_date, salary
FROM employees 
WHERE hire_date > '2020-01-01'
ORDER BY hire_date
"""
df = pd.read_sql_query(query1, conn)
display(df)

print("\n" + "="*50 + "\n")

# Exercise 2: Top 3 highest paid employees with department names
print("Exercise 2: Top 3 highest paid employees with department names")
query2 = """
SELECT e.first_name, e.last_name, e.salary, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
ORDER BY e.salary DESC
LIMIT 3
"""
df = pd.read_sql_query(query2, conn)
display(df)

print("\n" + "="*50 + "\n")

# Exercise 3: Departments with more than 2 employees
print("Exercise 3: Departments with more than 2 employees")
query3 = """
SELECT d.dept_name, COUNT(e.emp_id) as employee_count
FROM departments d
INNER JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_id, d.dept_name
HAVING COUNT(e.emp_id) > 2
"""
df = pd.read_sql_query(query3, conn)
display(df)

print("\n" + "="*50 + "\n")

# Exercise 4: Total budget for all projects by department
print("Exercise 4: Total budget for all projects by department")
query4 = """
SELECT d.dept_name, SUM(p.budget) as total_budget
FROM departments d
INNER JOIN projects p ON d.dept_id = p.dept_id
GROUP BY d.dept_id, d.dept_name
ORDER BY total_budget DESC
"""
df = pd.read_sql_query(query4, conn)
display(df)

print("\n" + "="*50 + "\n")

# Exercise 5: Employees with salary above company average
print("Exercise 5: Employees with salary above company average")
query5 = """
SELECT first_name, last_name, salary,
       (SELECT AVG(salary) FROM employees) as company_avg
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
ORDER BY salary DESC
"""
df = pd.read_sql_query(query5, conn)
display(df)

print("\n" + "="*50 + "\n")

# Exercise 6: Projects ending in 2023 with department info
print("Exercise 6: Projects ending in 2023 with department info")
query6 = """
SELECT p.project_name, p.end_date, p.budget, d.dept_name, d.location
FROM projects p
INNER JOIN departments d ON p.dept_id = d.dept_id
WHERE p.end_date LIKE '2023%'
ORDER BY p.end_date
"""
df = pd.read_sql_query(query6, conn)
display(df)

Exercise 1: Employees hired after 2020-01-01


Unnamed: 0,first_name,last_name,hire_date,salary
0,John,Doe,2020-01-15,75000
1,Sarah,Williams,2020-09-05,90000
2,Lisa,Anderson,2021-04-03,72000
3,Mike,Johnson,2021-06-10,65000
4,Emily,Davis,2022-02-28,70000




Exercise 2: Top 3 highest paid employees with department names


Unnamed: 0,first_name,last_name,salary,dept_name
0,David,Brown,95000,Sales
1,Sarah,Williams,90000,Human Resources
2,Robert,Wilson,88000,Engineering




Exercise 3: Departments with more than 2 employees


Unnamed: 0,dept_name,employee_count
0,Engineering,3




Exercise 4: Total budget for all projects by department


Unnamed: 0,dept_name,total_budget
0,Engineering,450000
1,Finance,200000
2,Marketing,75000
3,Sales,50000




Exercise 5: Employees with salary above company average


Unnamed: 0,first_name,last_name,salary,company_avg
0,David,Brown,95000,79625.0
1,Sarah,Williams,90000,79625.0
2,Robert,Wilson,88000,79625.0
3,Jane,Smith,82000,79625.0




Exercise 6: Projects ending in 2023 with department info


Unnamed: 0,project_name,end_date,budget,dept_name,location
0,Sales Training Program,2023-05-31,50000,Sales,Los Angeles
1,Website Redesign,2023-06-30,150000,Engineering,San Francisco
2,Marketing Campaign Q2,2023-06-30,75000,Marketing,Chicago
3,Financial System Upgrade,2023-08-31,200000,Finance,Boston
4,Mobile App Development,2023-12-31,300000,Engineering,San Francisco


In [10]:
# Clean up: Close the database connection when done
# Uncomment the line below when you're finished with your work
# conn.close()
# print("Database connection closed.")

print("Setup complete! You now have a fully functional SQL learning environment.")
print("The database contains sample data for employees, departments, and projects.")
print("Work through the examples above and try the practice exercises!")

Setup complete! You now have a fully functional SQL learning environment.
The database contains sample data for employees, departments, and projects.
Work through the examples above and try the practice exercises!


## Next Steps

Now that you've completed the basic SQL exercises, here are some ways to continue learning:

### Advanced Topics to Explore:
- **Window Functions** - ROW_NUMBER(), RANK(), LAG(), LEAD()
- **Common Table Expressions (CTEs)** - WITH clauses for complex queries
- **Indexes** - Optimizing query performance
- **Views** - Creating reusable query definitions
- **Stored Procedures** - (Database-specific implementations)
- **Data Modeling** - Normalization and relationship design

### Practice Resources:
- [SQLBolt](https://sqlbolt.com/) - Interactive SQL lessons
- [HackerRank SQL](https://www.hackerrank.com/domains/sql) - Coding challenges
- [LeetCode Database](https://leetcode.com/problemset/database/) - Algorithm-style SQL problems
- [Kaggle Learn SQL](https://www.kaggle.com/learn/intro-to-sql) - Hands-on micro-course

### Building Your Own Projects:
1. **Import real datasets** (CSV files) into SQLite
2. **Create data analysis projects** combining SQL with pandas
3. **Build a web dashboard** using your SQL queries
4. **Practice with different databases** (PostgreSQL, MySQL, etc.)

Happy learning! 🚀

## 🚀 GitHub Codespace Setup

### Automated Setup (Recommended)

For new or existing codespaces, run the automated setup script:

```bash
# Make scripts executable
chmod +x setup.sh serve.sh rebuild.sh

# Run the complete setup
./setup.sh
```

This script will:
- ✅ Install all required Python packages
- ✅ Build the Jupyter Book
- ✅ Display access URLs for your codespace
- ✅ Set up development scripts

### Development Commands

After setup, use these quick commands:

```bash
# Rebuild after making changes
./rebuild.sh

# Start local server (alternative to Live Server)
./serve.sh

# Full setup (only needed once per codespace)
./setup.sh
```

### Manual Setup (If Needed)

```bash
# Install dependencies
pip install -r requirements.txt

# Build the book
jupyter-book build /workspaces/sql-notes

# Serve locally
cd _build/html && python -m http.server 8000
```

### Access Your Book

**GitHub Codespace URLs:**
- Via Live Server: `https://your-codespace-name-5500.app.github.dev/_build/html/README.html`
- Via Python server: `https://your-codespace-name-8000.app.github.dev/`

**Auto-rebuild on changes:**
```bash
# Install watchdog for auto-rebuild (included in requirements.txt)
pip install watchdog

# Watch for changes and auto-rebuild
watchmedo shell-command --patterns="*.ipynb;*.md" --recursive --command='./rebuild.sh' .
```

### Pro Tips:

1. **Bookmark your codespace URL** - it stays active while your codespace runs
2. **Use `./rebuild.sh`** after making changes to see updates
3. **Port forwarding** happens automatically in GitHub Codespaces
4. **Share your book** - codespace URLs are publicly accessible

🎯 **Quick Start:** Run `./setup.sh` and you're ready to go!