# Introduction to SQL

As a Data Engineer, you will have to manipulate data to process it and help the organization find valuable insights. To do so, you should learn the basics of SQL to work with data in different ways.

This time, you will be working with a DVD rental sample database where you will find information about stores, customers, and rented films. Below you will find the diagram with the relationships between the tables, and a brief description of each one.

The purpose of this lab is to answer business questions using SQL language to query the database. The idea is that you get the same answer that is shown in each exercise.

# Outline
- [ 1 - Database](#1)
- [ 2 - Running SQL Commands in a Notebook](#2)
- [ 3 - Create, Read, Update, and Delete (CRUD) Operations](#3)
  - [ 3.1 - CREATE TABLE](#3.1)
    - [ Exercise 1](#ex01)
  - [ 3.2 - SELECT](#3.2)
    - [ Exercise 2](#ex02)
    - [ Exercise 3](#ex03)
  - [ 3.3 - WHERE](#3.3)
    - [ Exercise 4](#ex04)
  - [ 3.4 - INSERT INTO](#3.4)
    - [ Exercise 5](#ex05)
  - [ 3.5 - UPDATE](#3.5)
    - [ Exercise 6](#ex06)
  - [ 3.6 - DELETE](#3.6)
    - [ Exercise 7](#ex07)
- [ 4 - SQL Clauses](#4)
  - [ 4.1 - ALIASES](#4.1)
    - [ Exercise 8](#ex08)
  - [ 4.2 - JOIN](#4.2)
    - [ Exercise 9](#ex09)
    - [ Exercise 10](#ex10)
  - [ 4.3 - GROUP BY](#4.3)
    - [ Exercise 11](#ex11)
  - [ 4.4 - ORDER BY](#4.4)
    - [ Exercise 12](#ex12)
  - [ 4.5 - LIMIT](#4.5)
    - [ Exercise 13](#ex13)
- [ 5 - Conclusion](#5)

<a name='1'></a>
## 1 - Database

You will be working with a modified version of the [Sakila Sample Database](https://dev.mysql.com/doc/sakila/en/), which is licensed under the [New BSD license](https://dev.mysql.com/doc/sakila/en/sakila-license.html).

For learning purposes, let's assume that the data belongs to _Rentio_, which is a fictitious company dedicated to renting movies to clients from all around the world.

In the database, you will be able to find the data of the stores and the staff who works in them as well as their addresses. Each store manages its inventory, so when a store receives a new DVD, information about the film, category, language and actors is inserted into the database. Also, every time a new customer rents a film, the customer's basic information is inserted into the database along with his address. Additionally, a rental is added as soon as a transaction occurs with information about inventory, film, and paying customers.

After all, you will have to process the data to answer questions that can give us general information about the business, and that can help us to understand the most rented films by different attributes.

Rentio's transactional database includes the following tables.

- `actor`: Contains the actor's data such as first and last name.
- `address`: Contains address data of staff and customers.
- `category`: Contains category data of the film.
- `city`: Has city names.
- `country`: Has country names.
- `customer`: Contains customer data such as first name, last name, stores where they bought, and if it is active or not.
- `film`: Contains film data such as title, description, language, and ratings.
- `film_actor`: Stores the relationship between film and actor.
- `film_category`: Stores the relationship between film and category.
- `inventory`: Contains inventory data related to the films and the store where they are stored.
- `language`: Has language names.
- `payment`: Contains payment data from customers related to the staff, the amounts, and dates.
- `rental`: Contains rental data related to the customer, staff, rental dates, and return date.
- `staff`: Contains staff data such as first name, last name, stores where they work, and if it is active or not.
- `store`: Contains store data such as the manager and store address.

Here you can find the entity-relationship model (ERM) of the transactional database showing all the tables and how they are related:

![rentio-database-erd](images/rentio_database_erd.jpg)

<a name='2'></a>
## 2 - Running SQL Commands in a Notebook

To interact with SQL Databases within a JupyterLab notebook, you will leverage the SQL "magic" offered by the `ipython-sql` extension. JupyterLab defines "magic" as special commands prefixed with `%`. Here, you'll employ the `load_ext` magic to load the `ipython-sql` extension. Load the SQL module:

In [None]:
%load_ext sql

The provided magic command loads the `ipython-sql` extension, enabling connection to databases supported by [SQLAlchemy](https://www.sqlalchemy.org/features.html). In this example, you'll connect to an existing MySQL database. However, to proceed, it's essential to obtain your credentials and establish a connection to the MySQL database.

It's worth mentioning that `dotenv`, a package to load "Environment Variables" is employed here to retrieve credentials (such as username or password) external to the Jupyter notebook. Modify the value of the variables in the `env` file with the values to create the connection.

In [None]:
import os 

from dotenv import load_dotenv

load_dotenv('./env', override=True)

DBHOST = os.getenv('DBHOST')
DBPORT = os.getenv('DBPORT')
DBNAME = os.getenv('DBNAME')
DBUSER = os.getenv('DBUSER')
DBPASSWORD = os.getenv('DBPASSWORD')

connection_url = f'mysql+pymysql://{DBUSER}:{DBPASSWORD}@{DBHOST}:{DBPORT}/{DBNAME}'

%sql {connection_url}

<a name='3'></a>
## 3 - Create, Read, Update, and Delete (CRUD) Operations

CRUD stands for Create, Read, Update, and Delete, which are basic operations for manipulating data. When we talk about databases, we use `INSERT INTO`, `SELECT`, `UPDATE`, and `DELETE` statements respectively to refer to CRUD operations.

<a name='3.1'></a>
### 3.1 - CREATE TABLE

Before using the statements for CRUD operations, you will see the `CREATE TABLE` statement which is used to create a new table in a database. You must specify the name of the columns, and the data type for each column. You can check the full list of data types [here](https://dev.mysql.com/doc/refman/8.0/en/data-types.html).

```sql
CREATE TABLE table_name (
    column1 datatype,
    column2 datatype,
    column3 datatype,
   ...
);
```

<a name='ex01'></a>
### Exercise 1

Write a SQL query to create a replica of the `category` table called `category_copy`. Use these columns:

| column name | data type   |
| ----------- | ----------- |
| category_id | INTEGER     |
| name        | VARCHAR(25) |
| last_update | TIMESTAMP   |

**NOTE:** We are using the magic command `%%sql` to allow multiline query syntax

In [None]:
%%sql 
### START CODE HERE ### (~ 3 lines of code)
CREATE TABLE None (
    None None,
    None None,
    None None
); ### END CODE HERE ###


<a name='3.2'></a>
### 3.2 - SELECT

The `SELECT` statement is used to get data from a database. It also goes along with the `FROM` clause to indicate the table you want to query.

You could specify the columns of the table you want to retrieve from the query by listing each one as follows:

```sql
SELECT
    column1,
    column2,
    column3,
    ...
FROM table_name;
```

What's more, you could use `*` to get all the columns from the table:

```sql
SELECT
    *
FROM table_name;
```


<a name='ex02'></a>
### Exercise 2

Write a SQL query to retrieve the title, length, and release year of the films.


In [None]:
%%sql
### START CODE HERE ### (~ 2 lines of code)
SELECT None, None, None
FROM None;
### END CODE HERE ###

##### __Expected Output__ 

**Note:** Not all of the records are shown here. Their order may change.

| **title**         | **length** | **release_year** |
| ----------------- | ---------- | ---------------- |
| ACADEMY DINOSAUR  | 86         | 2006             |
| ACE GOLDFINGER    | 48         | 2006             |
| ADAPTATION HOLES  | 50         | 2006             |
| AFFAIR PREJUDICE  | 117        | 2006             |
| ...               | ...        | ...              |
```

<a name='ex03'></a>
### Exercise 3

Write an SQL query to get all the columns of the store table.

<details>
<summary>Expected Output</summary>

| **store_id** | **manager_staff_id** | **address_id** | **last_update**         |
| ------------ | -------------------- | -------------- | ----------------------- |
| 1            | 1                    | 1              | 2006-02-15 09:57:12.000 |
| 2            | 2                    | 2              | 2006-02-15 09:57:12.000 |

**Note:** The order of the records may change.

</details>

In [None]:
%%sql
/*YOUR CODE HERE*/

<a name='3.3'></a>
### 3.3 - WHERE

The `WHERE` clause is used to filter data based on a condition. In the end, the query will return the rows which satisfy the condition.

```sql
SELECT
    *
FROM table_name
WHERE column1 = 'value1';
```

<a name='ex04'></a>
### Exercise 4

Write an SQL query to retrieve the first name, last name, and email of each `active` manager.

<details>
<summary>Expected Output</summary>

| **First_Name** | **Last_Name** | **Email**                    |
| -------------- | ------------- | ---------------------------- |
| Mike           | Hillyer       | Mike.Hillyer@sakilastaff.com |
| Jon            | Stephens      | Jon.Stephens@sakilastaff.com |

**Note:** The order of the records may change.

</details>

In [None]:
%%sql
/*YOUR CODE HERE*/

<a name='3.4'></a>
### 3.4 - INSERT INTO

The `INSERT INTO` statement is used to insert new rows in a table.

You could insert new rows without specifying some columns, but you will have to write the column names and values that you want to insert. That's useful when some columns are filled automatically by the default value of the column or when the column is of `SERIAL` data type.

```sql
INSERT INTO table_name (
  column1,
  column2,
  column3,
  ...
)
VALUES (
  'value1',
  'value2',
  'value3',
  ...
);
```

If you are adding the values for all the columns of the table, you could only specify the values to be inserted.

```sql
INSERT INTO table_name
VALUES (
  'value1',
  'value2',
  'value3',
  ...
);
```

<a name='ex05'></a>
### Exercise 5

Write an SQL query to insert the following rows to the `category_copy` table:

| **category_id** | **name**  | **last_update**         |
| --------------- | --------- | ----------------------- |
| 1               | Horror    | 2006-02-15 09:46:27.000 |
| 10              | Animation | 2006-02-15 09:46:27.000 |
| 20              | Pop       | 2006-02-15 09:46:27.000 |

If you execute the `SELECT` statement on the table you should get:

| **category_id** | **name**  | **last_update**         |
| --------------- | --------- | ----------------------- |
| 1               | Horror    | 2006-02-15 09:46:27.000 |
| 10              | Animation | 2006-02-15 09:46:27.000 |
| 20              | Pop       | 2006-02-15 09:46:27.000 |

**Note:** The order of the records may change.

In [None]:
%%sql
/*YOUR CODE HERE*/

<a name='3.5'></a>
### 3.5 - UPDATE

The `UPDATE` statement is used to change the values of some columns on existing rows in a table. You could use the `WHERE` clause to filter the rows you want to change.

```sql
UPDATE table_name
SET
  column2 = 'value2',
  column3 = 'value3',
...
WHERE column1 = 'value1';
```

<a name='ex06'></a>
### Exercise 6

Write an SQL query to perform the following changes:

- Change the `last_update` value to `2020-09-12 08:00:00.000` for all the rows.
- Change the `category_id` value to `2` for the row with the `name` of `Animation`.
- Change the `name` value to `Action` for the row with the `category_id` of `1`.

You can add more cells for each query if you want.

If you execute the `SELECT` statement on the table you should get:

| **category_id** | **name**  | **last_update**         |
| --------------- | --------- | ----------------------- |
| 1               | Action    | 2020-09-12 08:00:00.000 |
| 2               | Animation | 2020-09-12 08:00:00.000 |
| 20              | Pop       | 2020-09-12 08:00:00.000 |

**Note:** The order of the records may change.

In [None]:
%%sql
/*YOUR CODE HERE*/

<a name='3.6'></a>
### 3.6 - DELETE

The `DELETE` statement is used to delete existing rows in a table. It also goes along with the `FROM` clause to indicate the table where you want to delete the rows. You could use the `WHERE` clause to filter the rows you want to change.

**You have to be careful because you will delete all rows of a table if you don't specify a condition:**

```sql
DELETE FROM table_name;
```

That's why you should add a condition unless you want to delete all:

```sql
DELETE FROM table_name
WHERE column1 = 'value1';
```

<a name='ex07'></a>
### Exercise 7

Write an SQL query to delete the row where the `category_id` is `20`.

If you execute the `SELECT` statement on the table you should get:

| **category_id** | **name**  | **last_update**         |
| --------------- | --------- | ----------------------- |
| 1               | Action    | 2020-09-12 08:00:00.000 |
| 2               | Animation | 2020-09-12 08:00:00.000 |

**Note:** The order of the records may change.

In [None]:
%%sql
/*YOUR CODE HERE*/

<a name='4'></a>
## 4 - SQL Clauses

In the next sections, you will see how to create more complex SQL queries to read data from a database using the most common clauses.

<a name='4.1'></a>
### 4.1 - ALIASES

Aliases temporarily change the name of a column. They allow you to use and display a more readable name for the columns. To create an alias you could use the keywords `AS` next to a column to change its name:

```sql
SELECT
    column1 AS Alias_Column_1,
    column2 AS Alias_Column_2,
    column3 AS Alias_Column_3,
    ...
FROM table_name;
```

<a name='ex08'></a>
### Exercise 8

Write an SQL query to obtain the title, length, and release year of the films. Change column names to have `film_` as a prefix.

<details>
<summary>Expected Output</summary>

| **film_title**    | **film_length** | **film_release_year** |
| ----------------- | --------------- | --------------------- |
| Chamber Italian   | 117             | 2006                  |
| Grosse Wonderful  | 49              | 2006                  |
| Airport Pollock   | 54              | 2006                  |
| Bright Encounters | 73              | 2006                  |
| Academy Dinosaur  | 86              | 2006                  |
| Ace Goldfinger    | 48              | 2006                  |
| Adaptation Holes  | 50              | 2006                  |
| Affair Prejudice  | 117             | 2006                  |
| African Egg       | 130             | 2006                  |
| Agent Truman      | 169             | 2006                  |
| Airplane Sierra   | 62              | 2006                  |
| Alabama Devil     | 114             | 2006                  |
| Aladdin Calendar  | 63              | 2006                  |
| Alamo Videotape   | 126             | 2006                  |
| Alaska Phantom    | 136             | 2006                  |
| ...               | ...             | ...                   |

**Note:** The order of the records may change.

</details>

In [None]:
%%sql
/*YOUR CODE HERE*/

<a name='4.2'></a>
### 4.2 - JOIN

The `JOIN` clause is used to combine the data from multiple tables based on a shared column between the tables.

```sql
SELECT
    *
FROM table1
JOIN table2 ON table1.column1 = table2.column2;
```

By default, the `JOIN` clause is equivalent to `INNER JOIN` and it returns the rows with common values on the column in both tables. There are also other types of joins:

- `LEFT JOIN`: Returns the rows from the left table and the matched rows from the right table.
- `RIGHT JOIN`: Returns the rows from the right table and the matched rows from the left table.
- `FULL JOIN`: Returns the rows when there is a match in either of both tables.

<a name='ex09'></a>
### Exercise 9

Write an SQL query to get the city, address, district, and phone number of each store.

<details>
<summary>Expected Output</summary>

| **city**   | **address**        | **district** | **phone** |
| ---------- | ------------------ | ------------ | --------- |
| Lethbridge | 47 MySakila Drive  | Alberta      |           |
| Woodridge  | 28 MySQL Boulevard | QLD          |           |

**Note:** The order of the records may change.

</details>

In [None]:
%%sql
/*YOUR CODE HERE*/

<a name='ex10'></a>
### Exercise 10

Write an SQL query to obtain the country, city, and address of the stores where active managers work.

<details>
<summary>Expected Output</summary>

| **country** | **city**   | **address**        |
| ----------- | ---------- | ------------------ |
| Canada      | Lethbridge | 47 MySakila Drive  |
| Australia   | Woodridge  | 28 MySQL Boulevard |

**Note:** The order of the records may change.

</details>

In [None]:
%%sql
/*YOUR CODE HERE*/

<a name='4.3'></a>
### 4.3 - GROUP BY

The `GROUP BY` statement is used to group rows based on their values. It will group the columns with the same value and for the other columns, you could use aggregate functions such as `COUNT`, `SUM`, `AVG`, `MIN`, `MAX` to perform some calculations.

```sql
SELECT
    column1,
    COUNT(column2),
    SUM(column3)
FROM table1
GROUP BY column1;
```

<a name='ex11'></a>
### Exercise 11

Write an SQL query to retrieve the number of films by rating.

<details>
<summary>Expected Output</summary>

| **rating** | **films** |
| ---------- | --------- |
| G          | 178       |
| R          | 195       |
| PG         | 194       |
| PG-13      | 223       |
| NC-17      | 210       |

**Note:** The order of the records may change.

</details>

In [None]:
%%sql
/*YOUR CODE HERE*/

<a name='4.4'></a>
### 4.4 - ORDER BY

The `ORDER BY` clause is used to sort the rows in ascending or descending order based on one or more columns. By default, this clause will sort the rows in ascending order, but you could use the `DESC` keyword to order the rows in descending order.

```sql
SELECT
    *
FROM table1
ORDER BY column1 DESC;
```

<a name='ex12'></a>
### Exercise 12

Write an SQL query to get the number of films by category. Sort the results by the number of films in ascending order.

<details>
<summary>Expected Output</summary>

| **category** | **films** |
| ------------ | --------- |
| Music        | 51        |
| Horror       | 56        |
| Travel       | 57        |
| Classics     | 57        |
| Comedy       | 58        |
| Children     | 60        |
| Sci-Fi       | 61        |
| Games        | 61        |
| Drama        | 62        |
| New          | 63        |
| Action       | 64        |
| Animation    | 66        |
| Documentary  | 68        |
| Family       | 69        |
| Foreign      | 73        |
| Sports       | 74        |

</details>

In [None]:
%%sql
/*YOUR CODE HERE*/

<a name='4.5'></a>
### 4.5 - LIMIT

The `LIMIT` clause is used to limit the number of rows the query is returning.

```sql
SELECT
    *
FROM table1
LIMIT 1;
```

<a name='ex13'></a>
### Exercise 13

Write an SQL query to obtain the category with the highest number of films.

<details>
<summary>Expected Output</summary>

| **category** | **films** |
| ------------ | --------- |
| Sports       | 74        |

</details>

In [None]:
%%sql
/*YOUR CODE HERE*/

<a name='5'></a>
## 5 - Conclusion

During this lab, you've written several SQL queries to manipulate data from the DVD rental sample database. Firstly, you created queries to gather general information relevant to the business. Then, you built queries using more complex clauses to perform data transformation while combining multiple tables. Finally, you perform other basic CRUD operations. Overall, you now have the basic knowledge to process data using SQL.

## References

<a id="1">[1]</a> SQL Tutorial, W3schools, 2022. [Online]. Available: <https://www.w3schools.com/sql/>. [Accessed: 07- Mar- 2022]