# SQL Practice Notebook

This notebook provides a series of SQL exercises based on a simulated database. Use the `run_sql()` function to execute queries and explore the data.

In [1]:

import sqlite3
import pandas as pd

DB_PATH = "../data/sql_practice.db"

def run_sql(query, params=None):
    """
    Execute a SQL query against the SQLite database and return a DataFrame.
    """
    with sqlite3.connect(DB_PATH) as conn:
        return pd.read_sql_query(query, conn, params=params)

# Example usage:
# df = run_sql("SELECT * FROM contractors LIMIT 5")
# df


## Level 1: Fundamentals

### Question 1: Get all contractors located in Miami.

In [19]:
df = run_sql("SELECT name FROM sqlite_master WHERE type='table' ORDER BY name")
df

Unnamed: 0,name
0,contractors
1,events
2,jobs
3,photos


In [18]:
df = run_sql("SELECT * FROM contractors WHERE city = :city", params={"city":"Miami"})
df

Unnamed: 0,id,name,city,state
0,9,Dynamic Contractors,Miami,FL


### Question 2: Find the 5 most recent uploaded photos.

In [22]:
df = run_sql("SELECT * FROM photos ORDER BY upload_time DESC LIMIT 5")
df

Unnamed: 0,id,job_id,contractor_id,upload_time,size,description
0,467,36,9,2025-01-25 14:00:00,4774,Photo 14 for job 36
1,468,36,9,2025-01-25 14:00:00,2017,Photo 15 for job 36
2,194,13,4,2025-01-22 10:00:00,600,Photo 6 for job 13
3,471,36,9,2025-01-22 09:00:00,3606,Photo 18 for job 36
4,465,36,9,2025-01-22 05:00:00,4007,Photo 12 for job 36


### Question 3: Count how many jobs each contractor has.

In [6]:
df = run_sql("SELECT COUNT(contractor_id) as num_of_jobs FROM jobs GROUP BY contractor_id")
df

Unnamed: 0,num_of_jobs
0,4
1,4
2,4
3,4
4,4
5,4
6,4
7,4
8,4
9,4


### Question 4: Get average photo size per job.

In [8]:
df = run_sql("SELECT AVG(size) as average_size_photo FROM photos GROUP BY job_id")
df

Unnamed: 0,average_size_photo
0,2686.882353
1,2497.785714
2,2273.333333
3,2200.4
4,2692.947368
5,2474.25
6,2431.05
7,2746.5
8,2344.846154
9,1592.454545


### Question 5: Find all contractors whose name contains 'Builders'.

In [11]:
df = run_sql("SELECT name FROM contractors WHERE name LIKE '%Builders%'")
df

Unnamed: 0,name
0,Ace Builders
1,Innovative Builders


### Question 6: List the names of contractors in alphabetical order.

In [15]:
df = run_sql("SELECT name FROM contractors ORDER BY name ASC")
df

Unnamed: 0,name
0,Ace Builders
1,Cam Experts
2,ConstructIT
3,Dynamic Contractors
4,Innovative Builders
5,Photo Pro Contractors
6,Premier Builds
7,Quality Co
8,Smith Construction
9,Vision Contracting


## Level 2: Joins

### Question 7: List all job sites and their assigned contractors.

In [19]:
df = run_sql("SELECT jobs.contractor_id, jobs.status, contractors.name, contractors.city, contractors.state FROM jobs INNER JOIN contractors ON jobs.contractor_id = contractors.id")
df

Unnamed: 0,contractor_id,status,name,city,state
0,1,open,Smith Construction,Omaha,NE
1,1,closed,Smith Construction,Omaha,NE
2,1,open,Smith Construction,Omaha,NE
3,1,closed,Smith Construction,Omaha,NE
4,2,open,Ace Builders,Chicago,IL
5,2,open,Ace Builders,Chicago,IL
6,2,closed,Ace Builders,Chicago,IL
7,2,open,Ace Builders,Chicago,IL
8,3,open,Cam Experts,Denver,CO
9,3,open,Cam Experts,Denver,CO


### Question 8: Find contractors who haven't uploaded any photos.

In [25]:
df = run_sql("SELECT contractor_id FROM photos LEFT JOIN contractors ON contractors.id = photos.contractor_id")
df

Unnamed: 0,contractor_id
0,1
1,1
2,1
3,1
4,1
...,...
524,10
525,10
526,10
527,10


### Question 9: Get the contractor with the highest number of jobs.

In [None]:
# Write your SQL query here


*Hint:* Group jobs by contractor, count them, order by count descending and limit 1.

### Question 10: List jobs and the number of photos each has.

In [None]:
# Write your SQL query here


*Hint:* Join jobs with photos and group by job id.

### Question 11: Find jobs that have no photos uploaded.

In [None]:
# Write your SQL query here


*Hint:* Left join jobs with photos and filter for NULL photo ids.

### Question 12: For each contractor, list the first photo upload time across all jobs.

In [None]:
# Write your SQL query here


*Hint:* Group photos by contractor and use MIN(upload_time).

## Level 3: Subqueries & CTEs

### Question 13: Find contractors with more jobs than the average contractor.

In [None]:
# Write your SQL query here


*Hint:* Compute average job count using a subquery or CTE.

### Question 14: For each contractor, find their most recent job start date.

In [None]:
# Write your SQL query here


*Hint:* Group by contractor and take MAX(start_date).

### Question 15: Get the top 3 job sites with the most photos.

In [None]:
# Write your SQL query here


*Hint:* Group photos by job_id, count them, order by count descending, limit 3.

### Question 16: List jobs that have above-average number of photos.

In [None]:
# Write your SQL query here


*Hint:* Calculate average photo count per job in a subquery.

### Question 17: Find contractors who have at least 2 open jobs.

In [None]:
# Write your SQL query here


*Hint:* Filter jobs by status and count per contractor.

## Level 4: Window Functions

### Question 18: For each job, rank photos by upload time.

In [None]:
# Write your SQL query here


*Hint:* Use ROW_NUMBER() OVER (PARTITION BY job_id ORDER BY upload_time).

### Question 19: Find each contractor's most recent job using window functions.

In [None]:
# Write your SQL query here


*Hint:* Use ROW_NUMBER() over partition by contractor and order by start_date descending.

### Question 20: For each contractor, compute the running total number of photos uploaded by date.

In [None]:
# Write your SQL query here


*Hint:* Use a window SUM() over partition by contractor_id and order by date.

### Question 21: Calculate the 7-day rolling count of photos for each contractor.

In [None]:
# Write your SQL query here


*Hint:* In SQLite, implement a correlated subquery since RANGE frames on dates aren't supported.

### Question 22: For each job, compute the difference in days between consecutive photo uploads.

In [None]:
# Write your SQL query here


*Hint:* Use LAG() to get previous upload_time and subtract dates.

### Question 23: Rank contractors by total photo count across all jobs.

In [None]:
# Write your SQL query here


*Hint:* Aggregate photos per contractor and apply RANK() function.

## Level 5: Data Cleaning & Transformation

### Question 24: Standardize contractor names to uppercase.

In [None]:
# Write your SQL query here


*Hint:* Use UPPER() function on name.

### Question 25: Extract year and month from job start date.

In [None]:
# Write your SQL query here


*Hint:* Use SUBSTR() or STRFTIME() to get year and month.

### Question 26: Flag jobs with missing contractor info.

In [None]:
# Write your SQL query here


*Hint:* Left join jobs with contractors and filter for NULL contractor IDs.

### Question 27: Categorize jobs by duration: Short (<15 days), Medium (15-30 days), Long (>30 days).

In [None]:
# Write your SQL query here


*Hint:* Use JULIANDAY() difference and CASE expression.

### Question 28: Trim any spaces in contractor names and combine city/state into a single location field.

In [None]:
# Write your SQL query here


*Hint:* Use TRIM() and concatenation.

## Level 6: Performance & Scale

### Question 29: Compare query times before and after indexing job_id in photos table.

In [None]:
# Write your SQL query here


*Hint:* Create an index on photos(job_id), run EXPLAIN QUERY PLAN, and compare the results.

### Question 30: Rewrite a slow query using CTEs or a different join order.

In [None]:
# Write your SQL query here


*Hint:* Identify expensive joins and re-order them, or pre-aggregate with CTEs.

### Question 31: Identify the bottleneck of a query using EXPLAIN.

In [None]:
# Write your SQL query here


*Hint:* Use EXPLAIN QUERY PLAN to analyze query steps and find which table scans are expensive.

## Level 7: ML Feature Engineering

### Question 32: Average time between job start and first photo upload.

In [None]:
# Write your SQL query here


*Hint:* Calculate first photo upload per job and subtract start_date, then average across jobs.

### Question 33: Label contractors as 'Active' if they uploaded a photo in the last 30 days; otherwise 'Dormant'.

In [None]:
# Write your SQL query here


*Hint:* Compare MAX(upload_time) per contractor to DATE('now','-30 day').

### Question 34: Create job-level features: number_of_photos, days_to_first_photo, most_recent_upload_time.

In [None]:
# Write your SQL query here


*Hint:* Aggregate per job to compute photo count, first upload, last upload, and difference in days between start and first upload.

### Question 35: Create weekly cohorts of contractors based on the week of their first photo upload and track retention.

In [None]:
# Write your SQL query here


*Hint:* Use STRFTIME() to get week numbers and join across weeks to see repeat uploads.