# Formative Worksheet 02: SQL (MySQL) ‚Äî Normalising `major`

> Goal: repeat the full workflow (**CREATE TABLE**, **INSERT**, **SELECT**, **UPDATE**, **DELETE**) but now with **normalisation**:
- Create a secondary table `majors`
- Replace `students.major` with `students.major_id`

‚ö†Ô∏è Start from **zero** (assume the previous worksheet does not exist).


## üìì Environment setup (Jupyter Notebook)
If you're running this in a fresh environment, install the required packages.


In [None]:
pip install ipykernel jupyterlab jupysql pymysql cryptography --upgrade --no-cache-dir


## Connect JupySQL to your MySQL server
1) Load the `sql` extension.
2) Connect to your server (replace user, password, host, port, and database).


In [2]:
%load_ext sql


In [3]:
%sql mysql+pymysql://mysql_user:mysql_password@localhost:3306/mydatabase

%config SqlMagic.displaylimit = 0


---
## Exercise 1 ‚Äî Create the tables (`majors` and `students`)

Create two tables:

### Table A: `majors`
- `id` (integer, primary key, auto-increment)
- `name` (text, not null, **unique**)

### Table B: `students`
- `id` (integer, primary key, auto-increment)
- `name` (text, not null)
- `gpa` (decimal/numeric, 2 decimal places, not null)
- `birthdate` (date, not null)
- `major_id` (integer, not null)

üí° Tip: drop tables first to ensure you start from scratch:
`DROP TABLE IF EXISTS students;` then `DROP TABLE IF EXISTS majors;`

üí° Tip (CREATE syntax example with generic placeholders):
```sql
CREATE TABLE <table_name> (
  <id_column> INT AUTO_INCREMENT PRIMARY KEY,
  <text_column> VARCHAR(<max_length>) NOT NULL,
  <decimal_column> DECIMAL(<precision>,<scale>) NOT NULL,
  <date_column> DATE NOT NULL,
  <datetime_column> DATETIME,
  FOREIGN KEY (<other_table_id>) REFERENCES <other_table>(<id>)
);
```

Optional extension (only if your class is ready): add a `FOREIGN KEY (major_id)` referencing `majors(id)`.


In [44]:
%%sql

-- EXERCISE 1:
-- 1) DROP TABLE IF EXISTS ... (students first)
DROP TABLE IF EXISTS students;
DROP TABLE IF EXISTS majors;

-- 2) CREATE TABLE majors
CREATE TABLE majors (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(255) NOT NULL UNIQUE -- (Optional) Add a UNIQUE constraint to the name column
);

-- 3) CREATE TABLE students
CREATE TABLE students (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(255) NOT NULL,
    
    gpa DECIMAL(4, 2) NOT NULL,
    CHECK (gpa >= 0.00 AND gpa <= 20.00),

    birthdate DATE NOT NULL,
    
    -- Add a major_id column to students as an INT
    major_id INT,
    
    -- Add a FOREIGN KEY constraint to major_id referencing majors(id)
    FOREIGN KEY (major_id) REFERENCES majors(id)
);


## Exercise 2 ‚Äî Confirm the structure (SELECT)
Check that both tables exist and that the column types are correct.

üí° Tip: In MySQL you can use `DESCRIBE <table_name>;` or `SHOW COLUMNS FROM <table_name>;`.


In [45]:
%%sql

-- EXERCISE 2:
-- Check majors structure
DESCRIBE majors;


Field,Type,Null,Key,Default,Extra
id,int,NO,PRI,,auto_increment
name,varchar(255),NO,UNI,,


In [46]:
%%sql

-- EXERCISE 2 (continued):
-- Check students structure
DESCRIBE students;


Field,Type,Null,Key,Default,Extra
id,int,NO,PRI,,auto_increment
name,varchar(255),NO,,,
gpa,"decimal(4,2)",NO,,,
birthdate,date,NO,,,
major_id,int,YES,MUL,,


---
## Exercise 3 ‚Äî Insert data into `majors`
Insert the following majors into the `majors` table:
- Computer Science
- Economics
- Biology
- Engineering
- Mathematics
- Physics
- Chemistry

üí° Tip (INSERT syntax):
```sql
INSERT INTO <table_name> (<column_name>) VALUES
  ('<value1>'),
  ('<value2>');
```


In [47]:
%%sql

-- EXERCISE 3:
-- Insert the 7 majors here
INSERT INTO majors (name) VALUES 
    ('Computer Science'),
    ('Economics'),
    ('Biology'),
    ('Engineering'),
    ('Mathematics'),
    ('Physics'),
    ('Chemistry')
;


## Exercise 4 ‚Äî Verify `majors` (SELECT)
Show all rows in `majors`.


In [48]:
%%sql

-- EXERCISE 4:
SELECT * FROM majors;


id,name
3,Biology
7,Chemistry
1,Computer Science
2,Economics
4,Engineering
5,Mathematics
6,Physics


---
## Exercise 5 ‚Äî Insert data into `students` (10 records)
Insert 10 students using the dataset below. Since `students` now uses `major_id`, you should **lookup** the id from `majors`.

Dataset (major names are provided):

| id | name | major | gpa | birthdate |
|---:|---|---|---:|---|
| 1 | Ana Silva | Computer Science | 17.50 | 2007-03-14 |
| 2 | Bruno Costa | Economics | 14.20 | 2006-11-02 |
| 3 | Carla Mendes | Biology | 16.10 | 2007-07-29 |
| 4 | Daniel Rocha | Engineering | 13.80 | 2006-01-18 |
| 5 | Eva Santos | Mathematics | 18.30 | 2007-09-05 |
| 6 | Filipe Almeida | Mathematics | 12.60 | 2006-05-21 |
| 7 | Guilherme Ferreira | Mathematics | 15.70 | 2007-12-10 |
| 8 | Helena Sousa | Physics | 16.90 | 2006-08-03 |
| 9 | In√™s Pereira | Biology | 13.10 | 2007-02-27 |
| 10 | Jo√£o Martins | Chemistry | 14.90 | 2006-04-16 |

üí° Tip: you can insert using a subquery to fetch the `major_id`:
```sql
INSERT INTO students (name, major_id, gpa, birthdate)
VALUES (
  '<student_name>',
  (SELECT id FROM majors WHERE name = '<major_name>'),
  <gpa_value>,
  '<YYYY-MM-DD>'
);
```
- Make sure dates use the format `YYYY-MM-DD` (See more about ISO 8601 and the MySQL reference).


In [43]:
%%sql

-- EXERCISE 5:
-- Insert the 10 students (major_id must come from majors)
INSERT INTO students (name, gpa, birthdate, major_id) VALUES
    ('Ana Silva',          17.50, '2007-03-14', 1),
    ('Bruno Costa',        14.20, '2006-11-02', 2),
    ('Carla Mendes',       16.10, '2007-07-29', 3),
    ('Daniel Rocha',       13.80, '2006-01-18', 4),
    ('Eva Santos',         18.30, '2007-09-05', 5),
    ('Filipe Almeida',     12.60, '2006-05-21', 5),
    ('Guilherme Ferreira', 15.70, '2007-12-10', 5),
    ('Helena Sousa',       16.90, '2006-08-03', 6),
    ('In√™s Pereira',       13.10, '2007-02-27', 3),
    ('Jo√£o Martins',       14.90, '2006-04-16', 7)
;

Or, alternatively:

In [49]:
%%sql

INSERT INTO students (name, major_id, gpa, birthdate) VALUES
    ('Ana Silva',          (SELECT id FROM majors WHERE name = 'Computer Science'), 17.50, '2007-03-14'),
    ('Bruno Costa',        (SELECT id FROM majors WHERE name = 'Economics'), 14.20, '2006-11-02'),
    ('Carla Mendes',       (SELECT id FROM majors WHERE name = 'Biology'), 16.10, '2007-07-29'),
    ('Daniel Rocha',       (SELECT id FROM majors WHERE name = 'Engineering'), 13.80, '2006-01-18'),
    ('Eva Santos',         (SELECT id FROM majors WHERE name = 'Mathematics'), 18.30, '2007-09-05'),
    ('Filipe Almeida',     (SELECT id FROM majors WHERE name = 'Mathematics'), 12.60, '2006-05-21'),
    ('Guilherme Ferreira', (SELECT id FROM majors WHERE name = 'Mathematics'), 15.70, '2007-12-10'),
    ('Helena Sousa',       (SELECT id FROM majors WHERE name = 'Physics'), 16.90, '2006-08-03'),
    ('In√™s Pereira',       (SELECT id FROM majors WHERE name = 'Biology'), 13.10, '2007-02-27'),
    ('Jo√£o Martins',       (SELECT id FROM majors WHERE name = 'Chemistry'), 14.90, '2006-04-16')
;


## Exercise 6 ‚Äî Verify students with a JOIN (SELECT)
Write a query that shows students with their major name (not the id). Include:
- student id, student name, major name, gpa, birthdate

üí° Tip: use `JOIN` between `students` and `majors`.

```SQL
SELECT <columns>
FROM <table_A> AS a
JOIN <table_B> AS b
  ON a.<foreign_key_column> = b.<primary_key_column>;
```


In [53]:
%%sql

-- EXERCISE 6:
SELECT
    s.id,
    s.name,
    m.name AS "major",
    s.gpa,
    s.birthdate
FROM students AS s

JOIN majors AS m
    -- Join students and majors on the major_id foreign key
    ON s.major_id = m.id
    
ORDER BY s.gpa DESC
;


id,name,major,gpa,birthdate
5,Eva Santos,Mathematics,18.3,2007-09-05
1,Ana Silva,Computer Science,17.5,2007-03-14
8,Helena Sousa,Physics,16.9,2006-08-03
3,Carla Mendes,Biology,16.1,2007-07-29
7,Guilherme Ferreira,Mathematics,15.7,2007-12-10
10,Jo√£o Martins,Chemistry,14.9,2006-04-16
2,Bruno Costa,Economics,14.2,2006-11-02
4,Daniel Rocha,Engineering,13.8,2006-01-18
9,In√™s Pereira,Biology,13.1,2007-02-27
6,Filipe Almeida,Mathematics,12.6,2006-05-21


---
## Exercise 7 ‚Äî Update data (UPDATE)
Update **Bruno Costa**'s `gpa` to **15.00**.

üí° Tip (UPDATE syntax):
```sql
UPDATE <table_name>
SET <column_name> = <new_value>
WHERE <condition>;
```


In [55]:
%%sql

-- EXERCISE 7:
-- UPDATE Bruno Costa's GPA to 15.00
UPDATE students
SET gpa = 15.00
WHERE name = 'Bruno Costa';


## Exercise 8 ‚Äî Confirm the update (SELECT)
Show Bruno Costa (with major name) to confirm the change.


In [56]:
%%sql

-- EXERCISE 8:
-- SELECT (JOIN) to confirm the update
SELECT
    s.id,
    s.name,
    m.name AS major,
    s.gpa,
    s.birthdate
FROM students AS s
JOIN majors AS m
    ON s.major_id = m.id
WHERE s.name = 'Bruno Costa';


id,name,major,gpa,birthdate
2,Bruno Costa,Economics,15.0,2006-11-02


---
## Exercise 9 ‚Äî Delete records (DELETE)
Delete **one or more** students with `gpa` **below 13.00**.

üí° Tip (DELETE syntax):
```sql
DELETE FROM <table_name>
WHERE <condition>;
```
‚ö†Ô∏è If you omit the `WHERE` clause, you will delete **all** rows.


In [57]:
%%sql

-- EXERCISE 9:
-- DELETE (gpa < 13.00)

DELETE FROM students
WHERE gpa < 13.00;



## Exercise 10 ‚Äî Verification query (SELECT)
Show all remaining students (with major name) after the DELETE.


In [58]:
%%sql

-- EXERCISE 10:
-- SELECT (JOIN) to verify current rows

SELECT
    s.id,
    s.name,
    m.name AS major,
    s.gpa,
    s.birthdate
FROM students AS s
JOIN majors AS m
    ON s.major_id = m.id
ORDER BY s.id;


id,name,major,gpa,birthdate
1,Ana Silva,Computer Science,17.5,2007-03-14
2,Bruno Costa,Economics,15.0,2006-11-02
3,Carla Mendes,Biology,16.1,2007-07-29
4,Daniel Rocha,Engineering,13.8,2006-01-18
5,Eva Santos,Mathematics,18.3,2007-09-05
7,Guilherme Ferreira,Mathematics,15.7,2007-12-10
8,Helena Sousa,Physics,16.9,2006-08-03
9,In√™s Pereira,Biology,13.1,2007-02-27
10,Jo√£o Martins,Chemistry,14.9,2006-04-16


---
## Exercise 11 ‚Äî Global verification query (SELECT + ORDER BY)
Write a query that shows `id`, `name`, `major`, `gpa`, ordered by `gpa` (highest to lowest).

üí° Tip:
```sql
SELECT <columns>
FROM <table_name>
ORDER BY <sort_column> DESC;
```


In [59]:
%%sql

-- EXERCISE 11:
-- SELECT with JOIN + ORDER BY gpa DESC
SELECT
    s.id,
    s.name,
    m.name AS major,
    s.gpa
FROM students AS s
JOIN majors AS m
    ON s.major_id = m.id
ORDER BY s.gpa DESC;


id,name,major,gpa
5,Eva Santos,Mathematics,18.3
1,Ana Silva,Computer Science,17.5
8,Helena Sousa,Physics,16.9
3,Carla Mendes,Biology,16.1
7,Guilherme Ferreira,Mathematics,15.7
2,Bruno Costa,Economics,15.0
10,Jo√£o Martins,Chemistry,14.9
4,Daniel Rocha,Engineering,13.8
9,In√™s Pereira,Biology,13.1


---
## Exercise 12 ‚Äî SELECT with filters
Write **two** queries:
1) Show only students whose major is `Computer Science`.
2) Show students with `gpa` between **15.00** and **18.00** (inclusive).


In [60]:
%%sql

-- EXERCISE 12 (1):
-- Filter by major name (requires JOIN or subquery)

SELECT
    s.id,
    s.name,
    m.name AS major,
    s.gpa,
    s.birthdate
FROM students AS s
JOIN majors AS m
    ON s.major_id = m.id
WHERE m.name = 'Computer Science';


id,name,major,gpa,birthdate
1,Ana Silva,Computer Science,17.5,2007-03-14


In [61]:
%%sql

-- EXERCISE 12 (2):
-- Filter by GPA range
SELECT
    s.id,
    s.name,
    m.name AS major,
    s.gpa,
    s.birthdate
FROM students AS s
JOIN majors AS m
    ON s.major_id = m.id
WHERE s.gpa BETWEEN 15.00 AND 18.00;


id,name,major,gpa,birthdate
1,Ana Silva,Computer Science,17.5,2007-03-14
2,Bruno Costa,Economics,15.0,2006-11-02
3,Carla Mendes,Biology,16.1,2007-07-29
7,Guilherme Ferreira,Mathematics,15.7,2007-12-10
8,Helena Sousa,Physics,16.9,2006-08-03


---
## Challenge (optional) ‚Äî GROUP BY (by major)
Write a query that shows, **for each major**:
- the major name
- the **number of students** in that major
- the **average GPA** for that major

üí° Tip: Use `GROUP BY <column>` with aggregate functions like `COUNT()` and `AVG()`.


In [65]:
%%sql

-- CHALLENGE:
-- Show, for each major:
-- 1) number of students
-- 2) average GPA
-- (Optional) order by average GPA (highest to lowest)



UnboundLocalError: cannot access local variable 'result' where it is not associated with a value

In [None]:
%%sql

SELECT
    m.name AS major,
    COUNT(*) AS num_students,
    ROUND(AVG(s.gpa), 2) AS avg_gpa
FROM students AS s
JOIN majors AS m
    ON s.major_id = m.id
GROUP BY m.name
ORDER BY avg_gpa DESC;


major,num_students,avg_gpa
Computer Science,1,17.5
Mathematics,2,17.0
Physics,1,16.9
Economics,1,15.0
Chemistry,1,14.9
Biology,2,14.6
Engineering,1,13.8


---
**end of doc**