
# DS2002 — 2026-01-28  
## SQLite + Joins + Grouping — Studio (Lab)

This lab is designed to be completed independently. You will build comfort with SQL by answering questions against a small, realistic database. The point is not to “memorize SQL.” The point is to learn how to think in tables, connect them with joins, and summarize results with grouping.

You will work with a university-style dataset built from four tables: students, classes, enrollments, and grades. The data is already provided. Your job is to write the SQL that answers the questions.

### Submission (Canvas)
Put this in your GIT repo ans Share the GitURL with me. Make sure you've invited me and Daniel to your github so we can see the code.

In Kaggle notebooks, Python code cells must contain valid Python. In this lab, you will run SQL inside Python by calling `q(''' ... ''')`.

If you paste raw `SELECT ...` into a code cell, Python will error.  
If you paste Markdown fences like ```sql into a code cell, Python will error.  
Just like we did before. 



## Part 0 — Setup (run this cell)

This creates an in-memory SQLite database and defines two helpers:

- `exec_sql(...)` runs SQL that changes the database (DDL/DML).
- `q(...)` runs SQL queries (SELECT) and returns a dataframe.

There is also a helper called `run_or_todo(...)`. It allows the lab to ship with placeholders. If your query still contains the word `TODO`, the cell will not run and will instead remind you what to do.


In [13]:

import sqlite3
import pandas as pd
from IPython.display import display, Markdown

conn = sqlite3.connect(":memory:")
cur = conn.cursor()

def exec_sql(sql: str):
    cur.executescript(sql)
    conn.commit()

def q(sql: str) -> pd.DataFrame:
    return pd.read_sql_query(sql, conn)

def run_or_todo(sql: str, label: str = ""):
    if "TODO" in sql:
        print(f"{label} — TODO: write the SQL, then re-run this cell.")
        return None
    df = q(sql)
    display(df)
    return df

print("SQLite ready.")


SQLite ready.



## Part 1 — Create Tables (DDL)

Run the next cell. Read the schema. Ask yourself what each row means.

A healthy mental model here is: every table is a statement about the world. A good table is a set of facts where every row means the same kind of thing. If you cannot describe the “one thing” a row represents, the table design is probably wrong.


In [14]:

exec_sql('''
DROP TABLE IF EXISTS grades;
DROP TABLE IF EXISTS enrollments;
DROP TABLE IF EXISTS students;
DROP TABLE IF EXISTS classes;

CREATE TABLE students (
    student_id INTEGER PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name  TEXT NOT NULL,
    major      TEXT NOT NULL,
    class_year INTEGER NOT NULL
);

CREATE TABLE classes (
    class_id     INTEGER PRIMARY KEY,
    course_code  TEXT NOT NULL UNIQUE,
    title        TEXT NOT NULL,
    instructor   TEXT NOT NULL,
    credits      INTEGER NOT NULL
);

CREATE TABLE enrollments (
    enrollment_id INTEGER PRIMARY KEY,
    student_id INTEGER NOT NULL,
    class_id   INTEGER NOT NULL,
    term       TEXT NOT NULL,
    UNIQUE(student_id, class_id, term),
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (class_id)   REFERENCES classes(class_id)
);

CREATE TABLE grades (
    enrollment_id INTEGER PRIMARY KEY,
    numeric_grade REAL NOT NULL,
    FOREIGN KEY (enrollment_id) REFERENCES enrollments(enrollment_id)
);
''')

print("Tables created.")


Tables created.


## Entity Relationship Diagram (ERD)

```mermaid
erDiagram
    STUDENTS {
        int student_id PK
        string first_name
        string last_name
        string major
        int class_year
    }

    CLASSES {
        int class_id PK
        string course_code
        string title
        string instructor
        int credits
    }

    ENROLLMENTS {
        int enrollment_id PK
        int student_id FK
        int class_id FK
        string term
    }

    GRADES {
        int enrollment_id PK
        float numeric_grade
    }

    STUDENTS ||--o{ ENROLLMENTS : enrolls_in
    CLASSES  ||--o{ ENROLLMENTS : contains
    ENROLLMENTS ||--|| GRADES : receives



## Part 2 — Load Data (Provided)

This lab loads:
- 50 students
- 10 classes
- enrollments for most students (a few have zero enrollments on purpose)
- one numeric grade per enrollment (0–100 scale)

Run the next cell.


In [None]:

exec_sql('''

INSERT INTO students (student_id, first_name, last_name, major, class_year) VALUES
(1, 'Casey', 'Brown', 'Econ', 2027),
(2, 'Riley', 'Robinson', 'Business', 2027),
(3, 'Avery', 'Brown', 'Business', 2027),
(4, 'Riley', 'Wright', 'Biology', 2027),
(5, 'Alex', 'Lewis', 'Biology', 2026),
(6, 'Casey', 'Lewis', 'History', 2027),
(7, 'Morgan', 'Young', 'Psych', 2027),
(8, 'Logan', 'Brown', 'Biology', 2026),
(9, 'Parker', 'Thomas', 'DS', 2028),
(10, 'Casey', 'King', 'Business', 2027),
(11, 'Avery', 'Harris', 'DS', 2026),
(12, 'Rowan', 'Walker', 'History', 2027),
(13, 'Reese', 'Thomas', 'Business', 2027),
(14, 'Quinn', 'Wilson', 'Psych', 2027),
(15, 'Emerson', 'Lewis', 'CS', 2027),
(16, 'Avery', 'Smith', 'History', 2028),
(17, 'Reese', 'Lee', 'Math', 2027),
(18, 'Cameron', 'Young', 'Biology', 2029),
(19, 'Finley', 'Lewis', 'Econ', 2029),
(20, 'Quinn', 'Robinson', 'Psych', 2029),
(21, 'Quinn', 'Anderson', 'CS', 2026),
(22, 'Rowan', 'Lewis', 'Psych', 2029),
(23, 'Riley', 'Young', 'CS', 2029),
(24, 'Jamie', 'Harris', 'DS', 2027),
(25, 'Reese', 'Martinez', 'DS', 2027),
(26, 'Casey', 'Jackson', 'Math', 2026),
(27, 'Riley', 'Martinez', 'Econ', 2026),
(28, 'Logan', 'Young', 'History', 2029),
(29, 'Reese', 'Clark', 'Business', 2026),
(30, 'Skyler', 'Anderson', 'History', 2028),
(31, 'Drew', 'Davis', 'History', 2026),
(32, 'Reese', 'Jackson', 'Business', 2029),
(33, 'Riley', 'Davis', 'Business', 2027),
(34, 'Taylor', 'Brown', 'Psych', 2029),
(35, 'Avery', 'Johnson', 'Econ', 2029),
(36, 'Alex', 'Robinson', 'CS', 2027),
(37, 'Jordan', 'Lee', 'Business', 2026),
(38, 'Taylor', 'Thomas', 'Math', 2028),
(39, 'Skyler', 'Clark', 'History', 2027),
(40, 'Taylor', 'Harris', 'Biology', 2026),
(41, 'Skyler', 'Young', 'DS', 2026),
(42, 'Jamie', 'Davis', 'DS', 2027),
(43, 'Finley', 'Wilson', 'Psych', 2029),
(44, 'Reese', 'Walker', 'Econ', 2027),
(45, 'Jamie', 'Clark', 'Psych', 2027),
(46, 'Reese', 'Smith', 'Psych', 2029),
(47, 'Drew', 'Anderson', 'History', 2028),
(48, 'Finley', 'Harris', 'Econ', 2029),
(49, 'Skyler', 'Smith', 'History', 2027),
(50, 'Skyler', 'King', 'Business', 2027);

INSERT INTO classes (class_id, course_code, title, instructor, credits) VALUES
(1, 'DS2002', 'Data Science Systems', 'Williamson', 3),
(2, 'CS2100', 'Intro to Programming', 'Nguyen', 4),
(3, 'STAT2120', 'Statistics I', 'Patel', 3),
(4, 'ECON2010', 'Microeconomics', 'Garcia', 3),
(5, 'MATH2310', 'Discrete Mathematics', 'Chen', 3),
(6, 'BIO1100', 'Foundations of Biology', 'Kim', 4),
(7, 'PSYC1010', 'Intro Psychology', 'Rivera', 3),
(8, 'HIST1020', 'World History', 'Bennett', 3),
(9, 'BUSI1800', 'Business Foundations', 'Singh', 3),
(10, 'DS3100', 'Data Wrangling', 'Olsen', 3);

INSERT INTO enrollments (enrollment_id, student_id, class_id, term) VALUES
(1, 1, 1, 'SP26'),
(2, 1, 7, 'SP26'),
(3, 1, 10, 'SP26'),
(4, 1, 8, 'SP26'),
(5, 2, 10, 'SP26'),
(6, 2, 5, 'SP26'),
(7, 2, 7, 'SP26'),
(8, 2, 3, 'SP26'),
(9, 2, 1, 'SP26'),
(10, 3, 4, 'SP26'),
(11, 3, 8, 'SP26'),
(12, 3, 3, 'SP26'),
(13, 3, 10, 'SP26'),
(14, 3, 9, 'SP26'),
(15, 4, 9, 'SP26'),
(16, 4, 5, 'SP26'),
(17, 4, 4, 'SP26'),
(18, 4, 1, 'SP26'),
(19, 5, 5, 'SP26'),
(20, 5, 10, 'SP26'),
(21, 5, 3, 'SP26'),
(22, 6, 3, 'SP26'),
(23, 6, 4, 'SP26'),
(24, 6, 8, 'SP26'),
(25, 6, 7, 'SP26'),
(26, 6, 9, 'SP26'),
(27, 7, 4, 'SP26'),
(28, 7, 2, 'SP26'),
(29, 7, 5, 'SP26'),
(30, 7, 3, 'SP26'),
(31, 7, 10, 'SP26'),
(32, 8, 4, 'SP26'),
(33, 8, 5, 'SP26'),
(34, 8, 8, 'SP26'),
(35, 9, 1, 'SP26'),
(36, 9, 3, 'SP26'),
(37, 9, 9, 'SP26'),
(38, 10, 9, 'SP26'),
(39, 10, 4, 'SP26'),
(40, 10, 2, 'SP26'),
(41, 12, 6, 'SP26'),
(42, 12, 7, 'SP26'),
(43, 12, 1, 'SP26'),
(44, 13, 5, 'SP26'),
(45, 13, 8, 'SP26'),
(46, 13, 7, 'SP26'),
(47, 13, 3, 'SP26'),
(48, 13, 6, 'SP26'),
(49, 14, 2, 'SP26'),
(50, 14, 6, 'SP26'),
(51, 14, 5, 'SP26'),
(52, 14, 1, 'SP26'),
(53, 14, 7, 'SP26'),
(54, 15, 8, 'SP26'),
(55, 15, 5, 'SP26'),
(56, 15, 4, 'SP26'),
(57, 16, 6, 'SP26'),
(58, 16, 7, 'SP26'),
(59, 16, 5, 'SP26'),
(60, 17, 1, 'SP26'),
(61, 17, 9, 'SP26'),
(62, 17, 10, 'SP26'),
(63, 17, 7, 'SP26'),
(64, 17, 8, 'SP26'),
(65, 18, 9, 'SP26'),
(66, 18, 3, 'SP26'),
(67, 18, 7, 'SP26'),
(68, 18, 5, 'SP26'),
(69, 18, 6, 'SP26'),
(70, 19, 9, 'SP26'),
(71, 19, 4, 'SP26'),
(72, 19, 2, 'SP26'),
(73, 19, 1, 'SP26'),
(74, 19, 10, 'SP26'),
(75, 20, 1, 'SP26'),
(76, 20, 2, 'SP26'),
(77, 20, 9, 'SP26'),
(78, 20, 7, 'SP26'),
(79, 21, 9, 'SP26'),
(80, 21, 4, 'SP26'),
(81, 21, 7, 'SP26'),
(82, 23, 6, 'SP26'),
(83, 23, 3, 'SP26'),
(84, 23, 5, 'SP26'),
(85, 25, 5, 'SP26'),
(86, 25, 7, 'SP26'),
(87, 25, 3, 'SP26'),
(88, 26, 5, 'SP26'),
(89, 26, 10, 'SP26'),
(90, 26, 2, 'SP26'),
(91, 27, 9, 'SP26'),
(92, 27, 3, 'SP26'),
(93, 27, 1, 'SP26'),
(94, 28, 10, 'SP26'),
(95, 28, 9, 'SP26'),
(96, 28, 4, 'SP26'),
(97, 29, 8, 'SP26'),
(98, 29, 6, 'SP26'),
(99, 29, 9, 'SP26'),
(100, 30, 1, 'SP26'),
(101, 30, 2, 'SP26'),
(102, 30, 6, 'SP26'),
(103, 31, 5, 'SP26'),
(104, 31, 2, 'SP26'),
(105, 31, 9, 'SP26'),
(106, 31, 6, 'SP26'),
(107, 32, 10, 'SP26'),
(108, 32, 7, 'SP26'),
(109, 32, 3, 'SP26'),
(110, 33, 4, 'SP26'),
(111, 33, 9, 'SP26'),
(112, 33, 3, 'SP26'),
(113, 33, 1, 'SP26'),
(114, 34, 5, 'SP26'),
(115, 34, 4, 'SP26'),
(116, 34, 9, 'SP26'),
(117, 34, 1, 'SP26'),
(118, 34, 3, 'SP26'),
(119, 36, 3, 'SP26'),
(120, 36, 10, 'SP26'),
(121, 36, 1, 'SP26'),
(122, 36, 5, 'SP26'),
(123, 37, 4, 'SP26'),
(124, 37, 10, 'SP26'),
(125, 37, 3, 'SP26'),
(126, 38, 1, 'SP26'),
(127, 38, 3, 'SP26'),
(128, 38, 5, 'SP26'),
(129, 38, 9, 'SP26'),
(130, 40, 7, 'SP26'),
(131, 40, 9, 'SP26'),
(132, 40, 1, 'SP26'),
(133, 40, 3, 'SP26'),
(134, 41, 3, 'SP26'),
(135, 41, 2, 'SP26'),
(136, 41, 6, 'SP26'),
(137, 41, 7, 'SP26'),
(138, 41, 10, 'SP26'),
(139, 42, 1, 'SP26'),
(140, 42, 6, 'SP26'),
(141, 42, 9, 'SP26'),
(142, 43, 2, 'SP26'),
(143, 43, 3, 'SP26'),
(144, 43, 8, 'SP26'),
(145, 43, 1, 'SP26'),
(146, 44, 8, 'SP26'),
(147, 44, 10, 'SP26'),
(148, 44, 9, 'SP26'),
(149, 45, 10, 'SP26'),
(150, 45, 3, 'SP26'),
(151, 45, 2, 'SP26'),
(152, 46, 10, 'SP26'),
(153, 46, 7, 'SP26'),
(154, 46, 9, 'SP26'),
(155, 46, 8, 'SP26'),
(156, 47, 3, 'SP26'),
(157, 47, 10, 'SP26'),
(158, 47, 6, 'SP26'),
(159, 47, 2, 'SP26'),
(160, 47, 5, 'SP26'),
(161, 48, 4, 'SP26'),
(162, 48, 5, 'SP26'),
(163, 48, 8, 'SP26'),
(164, 49, 7, 'SP26'),
(165, 49, 6, 'SP26'),
(166, 49, 5, 'SP26'),
(167, 50, 9, 'SP26'),
(168, 50, 6, 'SP26'),
(169, 50, 3, 'SP26'),
(170, 50, 10, 'SP26');

INSERT INTO grades (enrollment_id, numeric_grade) VALUES
(1, 69.9),
(2, 79.6),
(3, 63.5),
(4, 82.9),
(5, 73.3),
(6, 89.4),
(7, 92.1),
(8, 77.8),
(9, 81.3),
(10, 73.1),
(11, 79.1),
(12, 79.2),
(13, 100),
(14, 58.0),
(15, 90.5),
(16, 88.6),
(17, 86.5),
(18, 95.1),
(19, 86.2),
(20, 100),
(21, 74.1),
(22, 71.0),
(23, 85.1),
(24, 61.2),
(25, 75.6),
(26, 94.9),
(27, 86.8),
(28, 76.1),
(29, 89.8),
(30, 75.9),
(31, 84.9),
(32, 67.6),
(33, 69.1),
(34, 68.8),
(35, 90.1),
(36, 78.4),
(37, 83.4),
(38, 74.5),
(39, 89.3),
(40, 99.8),
(41, 83.4),
(42, 78.8),
(43, 80.7),
(44, 97.4),
(45, 74.1),
(46, 92.0),
(47, 84.0),
(48, 77.2),
(49, 79.6),
(50, 79.3),
(51, 82.3),
(52, 76.4),
(53, 76.8),
(54, 85.1),
(55, 94.8),
(56, 83.2),
(57, 80.7),
(58, 75.1),
(59, 78.5),
(60, 90.1),
(61, 75.6),
(62, 67.9),
(63, 60.1),
(64, 74.7),
(65, 85.2),
(66, 79.3),
(67, 95.9),
(68, 79.1),
(69, 72.9),
(70, 85.6),
(71, 85.5),
(72, 73.5),
(73, 78.9),
(74, 77.8),
(75, 86.4),
(76, 83.8),
(77, 90.6),
(78, 94.3),
(79, 71.5),
(80, 82.2),
(81, 94.8),
(82, 78.1),
(83, 96.2),
(84, 76.1),
(85, 88.4),
(86, 80.0),
(87, 91.1),
(88, 90.3),
(89, 98.5),
(90, 74.2),
(91, 64.5),
(92, 85.1),
(93, 76.7),
(94, 74.9),
(95, 92.3),
(96, 81.9),
(97, 90.0),
(98, 89.9),
(99, 78.0),
(100, 80.4),
(101, 92.9),
(102, 76.5),
(103, 84.9),
(104, 78.4),
(105, 72.2),
(106, 83.3),
(107, 80.9),
(108, 85.7),
(109, 89.7),
(110, 66.5),
(111, 82.6),
(112, 62.7),
(113, 73.1),
(114, 87.2),
(115, 95.5),
(116, 59.8),
(117, 95.9),
(118, 56.0),
(119, 100),
(120, 78.3),
(121, 77.0),
(122, 99.3),
(123, 98.4),
(124, 67.8),
(125, 100),
(126, 75.7),
(127, 80.1),
(128, 88.2),
(129, 81.0),
(130, 83.0),
(131, 76.8),
(132, 94.8),
(133, 84.7),
(134, 79.5),
(135, 94.1),
(136, 86.6),
(137, 77.7),
(138, 82.2),
(139, 87.7),
(140, 92.0),
(141, 82.7),
(142, 84.6),
(143, 72.9),
(144, 71.9),
(145, 95.2),
(146, 77.3),
(147, 73.9),
(148, 76.6),
(149, 88.8),
(150, 62.1),
(151, 93.6),
(152, 78.3),
(153, 88.8),
(154, 68.2),
(155, 90.8),
(156, 74.8),
(157, 79.0),
(158, 87.2),
(159, 89.1),
(160, 75.1),
(161, 88.6),
(162, 79.4),
(163, 93.7),
(164, 77.3),
(165, 90.2),
(166, 90.6),
(167, 88.5),
(168, 68.3),
(169, 81.7),
(170, 77.4);

''')

print("Data inserted.")
print("Students:", q("SELECT COUNT(*) AS n FROM students;").iloc[0,0])
print("Classes:", q("SELECT COUNT(*) AS n FROM classes;").iloc[0,0])
print("Enrollments:", q("SELECT COUNT(*) AS n FROM enrollments;").iloc[0,0])
print("Grades:", q("SELECT COUNT(*) AS n FROM grades;").iloc[0,0])



## Part 3 — Warm-Up (examples are done for you)

Before you answer questions, run a couple quick queries to see what you’re working with. This is what professionals do. You should never write a pile of queries against a dataset you haven’t looked at.


In [None]:

display(Markdown("### Preview: students (first 10 rows)"))
q('''
SELECT student_id, first_name, last_name, major, class_year
FROM students
ORDER BY student_id
LIMIT 10;
''')


In [None]:

display(Markdown("### Preview: classes (all rows)"))
q('''
SELECT class_id, course_code, title, instructor, credits
FROM classes
ORDER BY class_id;
''')



## Part 4 — Studio Questions (you write the SQL)

For each question:
1. Read the prompt.
2. Write the SQL inside the triple quotes.
3. Run the cell to verify the output.

A good habit is to say out loud what each row in your result represents. If you can’t do that, your query is probably not structured correctly.


### Q1. List all students (first name, last name, major, class year) ordered by last name then first name.

In [None]:
display(Markdown("#### Q1"))
run_or_todo('''
SELECT TODO
''', label="Q1")


### Q2. How many students are in each major? Show major and the count. Order by count descending.

In [None]:
display(Markdown("#### Q2"))
run_or_todo('''
SELECT TODO
''', label="Q2")


### Q3. List the 10 classes (course_code, title, instructor). Order by course_code.

In [None]:
display(Markdown("#### Q3"))
run_or_todo('''
SELECT TODO
''', label="Q3")


### Q4. Show every enrollment with student full name, course_code, and term. Limit to 20 rows so it’s readable.

In [None]:
display(Markdown("#### Q4"))
run_or_todo('''
SELECT TODO
''', label="Q4")


### Q5. For each class, how many students are enrolled? Show course_code and enrolled_count. Order by enrolled_count descending.

In [None]:
display(Markdown("#### Q5"))
run_or_todo('''
SELECT TODO
''', label="Q5")


### Q6. Which students have NO enrollments? Show student_id and full name. This requires a LEFT JOIN.

In [None]:
display(Markdown("#### Q6"))
run_or_todo('''
SELECT TODO
''', label="Q6")


### Q7. Show each student's average grade across all their enrollments. Include student name, major, and avg_grade. Order by avg_grade descending.

In [None]:
display(Markdown("#### Q7"))
run_or_todo('''
SELECT TODO
''', label="Q7")


### Q8. For each class (course_code), compute the class average grade. Order by average grade descending.

In [None]:
display(Markdown("#### Q8"))
run_or_todo('''
SELECT TODO
''', label="Q8")


### Q9. Which classes have an average grade below 75? Show course_code and avg_grade. This requires HAVING.

In [None]:
display(Markdown("#### Q9"))
run_or_todo('''
SELECT TODO
''', label="Q9")


### Q10. Find the top 10 highest-scoring enrollment records. Show student name, course_code, numeric_grade.

In [None]:
display(Markdown("#### Q10"))
run_or_todo('''
SELECT TODO
''', label="Q10")


### Q11. For each instructor, how many enrollments are they responsible for (across all their classes)? Show instructor and total_enrollments.

In [None]:
display(Markdown("#### Q11"))
run_or_todo('''
SELECT TODO
''', label="Q11")


### Q12. Show grade distribution counts using CASE: count how many grades are A (>=90), B (80–89.9), C (70–79.9), D (60–69.9), F (<60).

In [None]:
display(Markdown("#### Q12"))
run_or_todo('''
SELECT TODO
''', label="Q12")


### Q13. For each major, what is the average grade across all enrollments by students in that major? Order by avg_grade descending.

In [None]:
display(Markdown("#### Q13"))
run_or_todo('''
SELECT TODO
''', label="Q13")


### Q14. Which students are taking 5 classes in SP26? Show student name and num_classes. Use GROUP BY and HAVING.

In [None]:
display(Markdown("#### Q14"))
run_or_todo('''
SELECT TODO
''', label="Q14")


### Q15. Pick one class (by course_code) and list all students enrolled in it, along with their numeric grade, sorted highest to lowest. Use DS2002.

In [None]:
display(Markdown("#### Q15"))
run_or_todo('''
SELECT TODO
''', label="Q15")



## Part 5 — Reflection (short)

In a sentence or two, answer: what part of SQL feels most natural so far, and what feels the most confusing? Put your answer in the next cell as a Python string.


In [None]:
reflection = """TODO: Write your reflection here."""
print(reflection)


## You are done

Make sure you ran every cell and replaced every TODO. Then submit your Kaggle Notebook URL in Canvas.
