# Formative Worksheet 02: SQL (MySQL) ‚Äî Normalising `major`

> Goal: repeat the full workflow (**CREATE TABLE**, **INSERT**, **SELECT**, **UPDATE**, **DELETE**) but now with **normalisation**:
- Create a secondary table `majors`
- Replace `students.major` with `students.major_id`

‚ö†Ô∏è Start from **zero** (assume the previous worksheet does not exist).


## üìì Environment setup (Jupyter Notebook)
If you're running this in a fresh environment, install the required packages.


In [None]:
pip install ipykernel jupyterlab jupysql pymysql cryptography --upgrade --no-cache-dir


## Connect JupySQL to your MySQL server
1) Load the `sql` extension.
2) Connect to your server (replace user, password, host, port, and database).


In [None]:
%load_ext sql


In [None]:
%sql mysql+pymysql://mysql_user:mysql_password@localhost:3306/mydatabase

%config SqlMagic.displaylimit = 0


---
## Exercise 1 ‚Äî Create the tables (`majors` and `students`)

Create two tables:

### Table A: `majors`
- `id` (integer, primary key, auto-increment)
- `name` (text, not null, **unique**)

### Table B: `students`
- `id` (integer, primary key, auto-increment)
- `name` (text, not null)
- `gpa` (decimal/numeric, 2 decimal places, not null)
- `birthdate` (date, not null)
- `major_id` (integer, not null)
- Reference / Link to majors table.

üí° Tip: drop tables first to ensure you start from scratch:
`DROP TABLE IF EXISTS students;` then `DROP TABLE IF EXISTS majors;`

üí° Tip (CREATE syntax example with generic placeholders):
```sql
CREATE TABLE <table_name> (
    <id_column> INT AUTO_INCREMENT PRIMARY KEY,
    <text_column> VARCHAR(<max_length>) NOT NULL,
    <decimal_column> DECIMAL(<precision>,<scale>) NOT NULL,
    <date_column> DATE NOT NULL,
    <datetime_column> DATETIME,

    <other_table_id> INT,
    FOREIGN KEY (<other_table_id>) REFERENCES <other_table>(<id>)
);
```

Optional extension (only if your class is ready): add a `FOREIGN KEY (major_id)` referencing `majors(id)`.


In [None]:
%%sql

-- EXERCISE 1:
-- 1) DROP TABLE IF EXISTS ... (students first)
-- 2) CREATE TABLE majors
-- 3) CREATE TABLE students



## Exercise 2 ‚Äî Confirm the structure (SELECT)
Check that both tables exist and that the column types are correct.

üí° Tip: In MySQL you can use `DESCRIBE <table_name>;` or `SHOW COLUMNS FROM <table_name>;`.


In [None]:
%%sql

-- EXERCISE 2:
-- Check majors structure



In [None]:
%%sql

-- EXERCISE 2 (continued):
-- Check students structure



---
## Exercise 3 ‚Äî Insert data into `majors`
Insert the following majors into the `majors` table:
- Computer Science
- Economics
- Biology
- Engineering
- Mathematics
- Physics
- Chemistry

üí° Tip (INSERT syntax):
```sql
INSERT INTO <table_name> (<column_name>) VALUES
  ('<value1>'),
  ('<value2>');
```


In [None]:
%%sql

-- EXERCISE 3:
-- Insert the 7 majors here



## Exercise 4 ‚Äî Verify `majors` (SELECT)
Show all rows in `majors`.


In [None]:
%%sql

-- EXERCISE 4:
-- SELECT * FROM majors



---
## Exercise 5 ‚Äî Insert data into `students` (10 records)
Insert 10 students using the dataset below. Since `students` now uses `major_id`, you should **lookup** the id from `majors`.

Dataset (major names are provided):

| id | name | major | gpa | birthdate |
|---:|---|---|---:|---|
| 1 | Ana Silva | Computer Science | 17.50 | 2007-03-14 |
| 2 | Bruno Costa | Economics | 14.20 | 2006-11-02 |
| 3 | Carla Mendes | Biology | 16.10 | 2007-07-29 |
| 4 | Daniel Rocha | Engineering | 13.80 | 2006-01-18 |
| 5 | Eva Santos | Mathematics | 18.30 | 2007-09-05 |
| 6 | Filipe Almeida | Mathematics | 12.60 | 2006-05-21 |
| 7 | Guilherme Ferreira | Mathematics | 15.70 | 2007-12-10 |
| 8 | Helena Sousa | Physics | 16.90 | 2006-08-03 |
| 9 | In√™s Pereira | Biology | 13.10 | 2007-02-27 |
| 10 | Jo√£o Martins | Chemistry | 14.90 | 2006-04-16 |

üí° Tip: you can insert using a subquery to fetch the `major_id`:
```sql
INSERT INTO students (name, major_id, gpa, birthdate)
VALUES (
  '<student_name>',
  (SELECT id FROM majors WHERE name = '<major_name>'),
  <gpa_value>,
  '<YYYY-MM-DD>'
);
```
- Make sure dates use the format `YYYY-MM-DD` (See more about ISO 8601 and the MySQL reference).


In [None]:
%%sql

-- EXERCISE 5:
-- Insert the 10 students (major_id must come from majors)



## Exercise 6 ‚Äî Verify students with a JOIN (SELECT)
Write a query that shows students with their major name (not the id). Include:
- student id, student name, major name, gpa, birthdate

üí° Tip: use `JOIN` between `students` and `majors`.


In [None]:
%%sql

-- EXERCISE 6:
-- SELECT with JOIN to display major name



---
## Exercise 7 ‚Äî Update data (UPDATE)
Update **Bruno Costa**'s `gpa` to **15.00**.

üí° Tip (UPDATE syntax):
```sql
UPDATE <table_name>
SET <column_name> = <new_value>
WHERE <condition>;
```


In [None]:
%%sql

-- EXERCISE 7:
-- UPDATE Bruno Costa's GPA to 15.00



## Exercise 8 ‚Äî Confirm the update (SELECT)
Show Bruno Costa (with major name) to confirm the change.


In [None]:
%%sql

-- EXERCISE 8:
-- SELECT (JOIN) to confirm the update



---
## Exercise 9 ‚Äî Delete records (DELETE)
Delete **one or more** students with `gpa` **below 13.00**.

üí° Tip (DELETE syntax):
```sql
DELETE FROM <table_name>
WHERE <condition>;
```
‚ö†Ô∏è If you omit the `WHERE` clause, you will delete **all** rows.


In [None]:
%%sql

-- EXERCISE 9:
-- DELETE (gpa < 13.00)



## Exercise 10 ‚Äî Verification query (SELECT)
Show all remaining students (with major name) after the DELETE.


In [None]:
%%sql

-- EXERCISE 10:
-- SELECT (JOIN) to verify current rows



---
## Exercise 11 ‚Äî Global verification query (SELECT + ORDER BY)
Write a query that shows `id`, `name`, `major`, `gpa`, ordered by `gpa` (highest to lowest).

üí° Tip:
```sql
SELECT <columns>
FROM <table_name>
ORDER BY <sort_column> DESC;
```


In [None]:
%%sql

-- EXERCISE 11:
-- SELECT with JOIN + ORDER BY gpa DESC



---
## Exercise 12 ‚Äî SELECT with filters
Write **two** queries:
1) Show only students whose major is `Computer Science`.
2) Show students with `gpa` between **15.00** and **18.00** (inclusive).


In [None]:
%%sql

-- EXERCISE 12 (1):
-- Filter by major name (requires JOIN or subquery)



In [None]:
%%sql

-- EXERCISE 12 (2):
-- Filter by GPA range



---
## Challenge (optional) ‚Äî GROUP BY (by major)
Write a query that shows, **for each major**:
- the major name
- the **number of students** in that major
- the **average GPA** for that major

üí° Tip: Use `GROUP BY <column>` with aggregate functions like `COUNT()` and `AVG()`.


In [None]:
%%sql

-- CHALLENGE:
-- Show, for each major:
-- 1) number of students
-- 2) average GPA
-- (Optional) order by average GPA (highest to lowest)


---
**end of doc**