# Chapter 2: Intermediate SQL

This chapter builds upon the basics and introduces more powerful SQL concepts for data analysis and reporting.

## Topics Covered:
1. GROUP BY and Aggregate Functions
2. INNER JOINs - Combining Tables
3. LEFT and RIGHT JOINs
4. Advanced Filtering with HAVING
5. Multiple Table Operations
6. Practical Data Analysis Examples

## Prerequisites

Make sure you've completed Chapter 1 and have the database setup with sample data. If you need to reconnect:

In [None]:
import sqlite3
import pandas as pd
from IPython.display import display

# Connect to the existing database
conn = sqlite3.connect('my_database.db')
cursor = conn.cursor()

# Verify our tables exist
tables = pd.read_sql_query("SELECT name FROM sqlite_master WHERE type='table'", conn)
print("Available tables:")
display(tables)

## 1. GROUP BY and Aggregate Functions

Aggregate functions allow us to perform calculations across multiple rows and group results.

In [None]:
# Example 1: Basic aggregate functions
print("Overall salary statistics:")
df = pd.read_sql_query("""
SELECT 
    COUNT(*) as total_employees,
    AVG(salary) as average_salary,
    MIN(salary) as minimum_salary,
    MAX(salary) as maximum_salary,
    SUM(salary) as total_payroll
FROM employees
""", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 2: GROUP BY with COUNT
print("Employee count per department:")
df = pd.read_sql_query("""
SELECT dept_id, COUNT(*) as employee_count
FROM employees 
GROUP BY dept_id
ORDER BY employee_count DESC
""", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 3: GROUP BY with multiple aggregates
print("Salary statistics by department:")
df = pd.read_sql_query("""
SELECT 
    dept_id,
    COUNT(*) as employee_count,
    AVG(salary) as avg_salary,
    MAX(salary) as max_salary,
    MIN(salary) as min_salary
FROM employees
GROUP BY dept_id
ORDER BY avg_salary DESC
""", conn)
display(df)

## 2. INNER JOINs - Combining Tables

JOINs allow us to combine data from multiple tables based on related columns.

In [None]:
# Example 1: Basic INNER JOIN - Employees with department names
print("Employees with their department names:")
df = pd.read_sql_query("""
SELECT 
    e.first_name, 
    e.last_name, 
    e.salary, 
    d.dept_name, 
    d.location
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
ORDER BY d.dept_name, e.last_name
""", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 2: JOIN with aggregation
print("Department statistics with names:")
df = pd.read_sql_query("""
SELECT 
    d.dept_name,
    d.location,
    COUNT(e.emp_id) as employee_count,
    AVG(e.salary) as avg_salary,
    MAX(e.salary) as max_salary
FROM departments d
INNER JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_id, d.dept_name, d.location
ORDER BY avg_salary DESC
""", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 3: Three-table JOIN
print("Projects with department and employee info:")
df = pd.read_sql_query("""
SELECT 
    p.project_name,
    p.budget,
    d.dept_name,
    COUNT(e.emp_id) as team_size,
    AVG(e.salary) as avg_team_salary
FROM projects p
INNER JOIN departments d ON p.dept_id = d.dept_id
INNER JOIN employees e ON d.dept_id = e.dept_id
GROUP BY p.project_id, p.project_name, p.budget, d.dept_name
ORDER BY p.budget DESC
""", conn)
display(df)

## 3. LEFT JOINs - Including All Records

LEFT JOINs include all records from the left table, even if there's no match in the right table.

In [None]:
# Example 1: All departments, even those without employees
print("All departments with employee counts (including empty departments):")
df = pd.read_sql_query("""
SELECT 
    d.dept_name,
    d.location,
    COUNT(e.emp_id) as employee_count,
    COALESCE(AVG(e.salary), 0) as avg_salary
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_id, d.dept_name, d.location
ORDER BY employee_count DESC
""", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 2: Departments without projects
print("Departments and their project counts:")
df = pd.read_sql_query("""
SELECT 
    d.dept_name,
    COUNT(p.project_id) as project_count,
    COALESCE(SUM(p.budget), 0) as total_budget
FROM departments d
LEFT JOIN projects p ON d.dept_id = p.dept_id
GROUP BY d.dept_id, d.dept_name
ORDER BY project_count DESC, total_budget DESC
""", conn)
display(df)

## 4. HAVING Clause - Filtering Groups

The HAVING clause filters groups after GROUP BY, unlike WHERE which filters individual rows.

In [None]:
# Example 1: Departments with more than 1 employee
print("Departments with more than 1 employee:")
df = pd.read_sql_query("""
SELECT 
    d.dept_name,
    COUNT(e.emp_id) as employee_count,
    AVG(e.salary) as avg_salary
FROM departments d
INNER JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_id, d.dept_name
HAVING COUNT(e.emp_id) > 1
ORDER BY employee_count DESC
""", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 2: High-budget projects
print("Projects with budget > 100,000:")
df = pd.read_sql_query("""
SELECT 
    d.dept_name,
    COUNT(p.project_id) as project_count,
    SUM(p.budget) as total_budget,
    AVG(p.budget) as avg_project_budget
FROM departments d
INNER JOIN projects p ON d.dept_id = p.dept_id
GROUP BY d.dept_id, d.dept_name
HAVING SUM(p.budget) > 100000
ORDER BY total_budget DESC
""", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 3: Departments with average salary > 75000
print("High-paying departments (avg salary > 75000):")
df = pd.read_sql_query("""
SELECT 
    d.dept_name,
    COUNT(e.emp_id) as employee_count,
    AVG(e.salary) as avg_salary,
    MAX(e.salary) as max_salary
FROM departments d
INNER JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_id, d.dept_name
HAVING AVG(e.salary) > 75000
ORDER BY avg_salary DESC
""", conn)
display(df)

## 5. Complex Queries - Combining Multiple Concepts

Let's build more sophisticated queries that combine JOINs, aggregations, and filtering.

In [None]:
# Example 1: Department performance analysis
print("Department Performance Analysis:")
df = pd.read_sql_query("""
SELECT 
    d.dept_name,
    d.location,
    COUNT(DISTINCT e.emp_id) as employees,
    COUNT(DISTINCT p.project_id) as projects,
    COALESCE(SUM(p.budget), 0) as total_budget,
    AVG(e.salary) as avg_salary,
    CASE 
        WHEN AVG(e.salary) > 80000 THEN 'High Pay'
        WHEN AVG(e.salary) > 70000 THEN 'Medium Pay'
        ELSE 'Standard Pay'
    END as pay_grade
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
LEFT JOIN projects p ON d.dept_id = p.dept_id
GROUP BY d.dept_id, d.dept_name, d.location
ORDER BY total_budget DESC, avg_salary DESC
""", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 2: Employee ranking within departments
print("Employee rankings by salary within each department:")
df = pd.read_sql_query("""
SELECT 
    d.dept_name,
    e.first_name || ' ' || e.last_name as full_name,
    e.salary,
    RANK() OVER (PARTITION BY d.dept_name ORDER BY e.salary DESC) as salary_rank,
    AVG(e.salary) OVER (PARTITION BY d.dept_name) as dept_avg_salary
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
ORDER BY d.dept_name, salary_rank
""", conn)
display(df)

print("\n" + "="*50 + "\n")

# Example 3: Project ROI analysis (hypothetical)
print("Project efficiency analysis:")
df = pd.read_sql_query("""
SELECT 
    p.project_name,
    d.dept_name,
    p.budget,
    COUNT(e.emp_id) as team_size,
    p.budget / COUNT(e.emp_id) as budget_per_employee,
    AVG(e.salary) as avg_team_salary,
    ROUND(p.budget / (COUNT(e.emp_id) * AVG(e.salary)), 2) as budget_to_salary_ratio
FROM projects p
INNER JOIN departments d ON p.dept_id = d.dept_id
INNER JOIN employees e ON d.dept_id = e.dept_id
GROUP BY p.project_id, p.project_name, d.dept_name, p.budget
ORDER BY budget_to_salary_ratio DESC
""", conn)
display(df)

## Practice Exercises - Chapter 2

Test your intermediate SQL skills with these exercises:

### Exercise Questions:
1. **Find departments with the highest average salary**
2. **List all projects with their team details (department name, employee count, total salary cost)**
3. **Identify employees earning above their department's average salary**
4. **Find departments that have both employees and projects**
5. **Calculate the total company investment (sum of all salaries + project budgets) by department**

In [None]:
# Practice Area - Intermediate SQL Exercises

# Exercise 1: Departments with highest average salary
print("Exercise 1: Departments ranked by average salary")
query1 = """
SELECT 
    d.dept_name,
    COUNT(e.emp_id) as employee_count,
    AVG(e.salary) as avg_salary,
    MAX(e.salary) as max_salary
FROM departments d
INNER JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_id, d.dept_name
ORDER BY avg_salary DESC
"""
df = pd.read_sql_query(query1, conn)
display(df)

print("\n" + "="*50 + "\n")

# Exercise 2: Your turn!
print("Exercise 2: Projects with team details")
# Write your query here
query2 = """
SELECT 
    p.project_name,
    d.dept_name,
    p.budget,
    COUNT(e.emp_id) as team_size,
    SUM(e.salary) as total_salary_cost,
    AVG(e.salary) as avg_team_salary
FROM projects p
INNER JOIN departments d ON p.dept_id = d.dept_id
INNER JOIN employees e ON d.dept_id = e.dept_id
GROUP BY p.project_id, p.project_name, d.dept_name, p.budget
ORDER BY p.budget DESC
"""
df = pd.read_sql_query(query2, conn)
display(df)

print("\n" + "="*50 + "\n")

# Exercise 3: Employees above department average
print("Exercise 3: Employees earning above department average")
query3 = """
SELECT 
    e.first_name,
    e.last_name,
    e.salary,
    d.dept_name,
    dept_avg.avg_salary as dept_avg_salary,
    e.salary - dept_avg.avg_salary as above_average_by
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
INNER JOIN (
    SELECT dept_id, AVG(salary) as avg_salary
    FROM employees
    GROUP BY dept_id
) dept_avg ON e.dept_id = dept_avg.dept_id
WHERE e.salary > dept_avg.avg_salary
ORDER BY above_average_by DESC
"""
df = pd.read_sql_query(query3, conn)
display(df)

# Continue with remaining exercises...

## Chapter Summary

In Chapter 2, you mastered:

✅ **GROUP BY & Aggregates** - Summarizing data with COUNT, AVG, SUM, MAX, MIN  
✅ **INNER JOINs** - Combining related data from multiple tables  
✅ **LEFT JOINs** - Including all records even without matches  
✅ **HAVING Clause** - Filtering grouped results  
✅ **Complex Queries** - Combining multiple concepts for advanced analysis  
✅ **Window Functions** - Ranking and analytical functions  

### Key Concepts Learned:
- The difference between WHERE and HAVING
- When to use INNER vs LEFT JOINs
- How to structure multi-table queries
- Performance considerations for complex queries

### Next Steps
In Chapter 3, we'll explore:
- Subqueries and CTEs (Common Table Expressions)
- Window functions in detail
- Query optimization techniques
- Advanced data manipulation

Ready for advanced concepts? Head to **Chapter 3: Advanced SQL**!