<div align="right" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/Logo blue_dark.png"  style="width:25px" align="right";/>
</div>

# Database views
© ExploreAI Academy

A view is a virtual table that represents data derived from one or more base tables. It does not store data. It is basically a saved query that can be used as a table. Views allow you to simplify complex queries, abstract the underlying data structure, and provide some security by allowing access to specific columns or rows of data. 

In this lesson, we will construct a complex query and then create a view of that query.

> ⚠️ This notebook will not run on Google Colab because it cannot connect to a local database. Please make sure that this notebook is running on the same local machine as your MySQL Workbench installation and MySQL `united_nations` database.

## Learning objectives

By the end of this lesson, you should know:
- What a view is and why it's needed.
- How to create a view.
- How to join tables using views.
- How to impute Null values using views.

## Connecting to our MySQL database

Using our `Access_to_Basic_Services` table in the `united_nations` database that we created in MySQL Workbench, we want to answer some questions about our dataset. We can apply the same queries we used in MySQL Workbench in this notebook if we connect to our MySQL server by running the cells below.


In [1]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook. 
# If you get an error here, make sure that mysql and pymysql are installed correctly. 

%load_ext sql

In [2]:
# Establish a connection to the local database using the '%sql' magic command.
# Replace `password` with our connection password and `db_name` with our database name. 
# If you get an error here, please make sure the database name or password is correct.

%sql mysql+pymysql://root:123@localhost:3306/united_nations


'Connected: root@united_nations'


To make a query, we add the `%%sql` command to the start of a cell, create one open line then add the query like below, and run the cell.

In [3]:
%%sql

SELECT 
    *
FROM
    Access_to_Basic_Services
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Region,Sub_region,Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Central and Southern Asia,Central Asia,Kazakhstan,2015,94.67,98.0,17.542806,184.39,2699700.0,4.93
Central and Southern Asia,Central Asia,Kazakhstan,2016,94.67,98.0,17.794055,137.28,2699700.0,4.96
Central and Southern Asia,Central Asia,Kazakhstan,2017,95.0,98.0,18.037776,166.81,2699700.0,4.9
Central and Southern Asia,Central Asia,Kazakhstan,2018,95.0,98.0,18.276452,179.34,2699700.0,4.85
Central and Southern Asia,Central Asia,Kazakhstan,2019,95.0,98.0,18.513673,181.67,2699700.0,4.8


## Exercise


We have decomposed the `united_nations.Access_to_Basic_Services` table into three tables. The `united_nations.Basic_Services` table, `united_nations.Economic_Indicators` table, and the `united_nations.Geographic_Locations` table. In this tutorial, we will be constructing a complex query to view the unemployment rate in Sub-Saharan Africa. We will then save that query as a VIEW.

### Task 1 
To view the unemployment rate per region we will have to first join the `Economic_Indicators` table to the `Geographic_Locations` table.

In [3]:
%%sql

-- Select the necessary columns from the tables
SELECT
	loc.Country_name, -- Select the country name from the Geographic_Locations table
	eco.Time_period, -- Select the time period from the Economic_Indicators table
	eco.Pct_unemployment -- Select the unemployment rate from the Economic_Indicator table
FROM
	united_nations.Geographic_Location as loc -- Alias the Geographic table as 'loc'
LEFT JOIN united_nations.Economic_Indicators as eco -- Perform a LEFT JOIN between the Economic_Indicators table and the Geographic_Location table, aliasing the Economic_Indicators table as 'eco'
ON eco.Country_name = loc.Country_name -- Join the tables based on the Country_name column
LIMIT 10;

 * mysql+pymysql://root:***@localhost:3306/united_nations
10 rows affected.


Country_name,Time_period,Pct_unemployment
Afghanistan,2015,
Afghanistan,2016,
Afghanistan,2017,11.18
Afghanistan,2018,
Afghanistan,2019,
Afghanistan,2020,11.71
Algeria,2015,11.21
Algeria,2016,10.2
Algeria,2017,12.0
Algeria,2018,


### Task 2

After joining our tables, the query  now returns a table with the unemployment rate, country, and region. This allows us to filter for countries in Sub-Saharan Africa. So let's filter this table ensuring that we include Sub Sub-Saharan Africa only. Use the `WHERE` clause to filter our results for `Sub-Saharan Africa`. The table should include the following columns: `Country_name`, `Time_period`, and `Pct_unemployment`.

In [10]:
%%sql

-- Select the necessary columns from the tables
SELECT
	loc.Country_name, -- Select the country name from the Geographic_Locations table
	eco.Time_period, -- Select the time period from the Economic_Indicators table
	eco.Pct_unemployment -- Select the unemployment rate from the Economic_Indicator table
FROM
	united_nations.Geographic_Location as loc -- Alias the Geographic table as 'loc'
LEFT JOIN united_nations.Economic_Indicators as eco -- Perform a LEFT JOIN between the Economic_Indicators table and the Geographic_Location table, aliasing the Economic_Indicators table as 'eco'
ON eco.Country_name = loc.Country_name -- Join the tables based on the Country_name column
WHERE loc.Region = 'Sub-Saharan Africa' -- Filter the results for 'Sub-Saharan Africa' region
LIMIT 10;

 * mysql+pymysql://root:***@localhost:3306/united_nations
10 rows affected.


Country_name,Time_period,Pct_unemployment
Angola,2015,
Angola,2016,
Angola,2017,
Angola,2018,
Angola,2019,16.5
Angola,2020,
Benin,2015,
Benin,2016,
Benin,2017,
Benin,2018,1.47


### Task 3

After filtering, we will notice that there are Null values in the `Pct_unemployment` column. Fill in the Null values with 33.65. Rename the imputed column, `Pct_unemployment_imputed`.

In [11]:
%%sql

-- Select the necessary columns from the tables
SELECT
	loc.Country_name, -- Select the country name from the Geographic_Locations table
	eco.Time_period, -- Select the time period from the Economic_Indicators table
	IFNULL(eco.Pct_unemployment, 33.65) as Pct_unemployment_imputed -- Fill in the Null values with 33.65 using IFNULL function and rename the column
FROM
	united_nations.Geographic_Location as loc -- Alias the Geographic table as 'loc'
LEFT JOIN united_nations.Economic_Indicators as eco -- Perform a LEFT JOIN between the Economic_Indicators table and the Geographic_Location table, aliasing the Economic_Indicators table as 'eco'
ON eco.Country_name = loc.Country_name -- Join the tables based on the Country_name column
WHERE loc.Region = 'Sub-Saharan Africa' -- Filter the results for 'Sub-Saharan Africa' region
LIMIT 10;

 * mysql+pymysql://root:***@localhost:3306/united_nations
10 rows affected.


Country_name,Time_period,Pct_unemployment_imputed
Angola,2015,33.65
Angola,2016,33.65
Angola,2017,33.65
Angola,2018,33.65
Angola,2019,16.5
Angola,2020,33.65
Benin,2015,33.65
Benin,2016,33.65
Benin,2017,33.65
Benin,2018,1.47


### Task 4

Now that we have the query constructed, let's create a VIEW of that query. Name the VIEW, `united_nations.Country_Unemployment_Rate`.

In [5]:
%%sql
-- Create a VIEW named 'united_nations.Country_Unemployment_Rate'
CREATE VIEW
	united_nations.Country_Unemployment_Rate
AS
-- Select the necessary columns from the tables
SELECT
	loc.Country_name, -- Select the country name from the Geographic_Locations table
	eco.Time_period, -- Select the time period from the Economic_Indicators table
	IFNULL(eco.Pct_unemployment, 33.65) as Pct_unemployment_imputed -- Fill in the Null values with 33.65 using IFNULL function and rename the column
FROM
	united_nations.Geographic_Location as loc -- Alias the Geographic table as 'loc'
LEFT JOIN united_nations.Economic_Indicators as eco -- Perform a LEFT JOIN between the Economic_Indicators table and the Geographic_Location table, aliasing the Economic_Indicators table as 'eco'
ON eco.Country_name = loc.Country_name -- Join the tables based on the Country_name column
WHERE loc.Region = 'Sub-Saharan Africa' -- Filter the results for 'Sub-Saharan Africa' region
LIMIT 10;

 * mysql+pymysql://root:***@localhost:3306/united_nations
(pymysql.err.OperationalError) (1050, "Table 'Country_Unemployment_Rate' already exists")
[SQL: -- Create a VIEW named 'united_nations.Country_Unemployment_Rate'
CREATE VIEW
	united_nations.Country_Unemployment_Rate
AS
-- Select the necessary columns from the tables
SELECT
	loc.Country_name, -- Select the country name from the Geographic_Locations table
	eco.Time_period, -- Select the time period from the Economic_Indicators table
	IFNULL(eco.Pct_unemployment, 33.65) as Pct_unemployment_imputed -- Fill in the Null values with 33.65 using IFNULL function and rename the column
FROM
	united_nations.Geographic_Location as loc -- Alias the Geographic table as 'loc'
LEFT JOIN united_nations.Economic_Indicators as eco -- Perform a LEFT JOIN between the Economic_Indicators table and the Geographic_Location table, aliasing the Economic_Indicators table as 'eco'
ON eco.Country_name = loc.Country_name -- Join the tables based on the Countr

### Task 5

Finally, let's have a look at the VIEW.


In [15]:
%%sql

SELECT *
FROM
    united_nations.Country_unemployment_Rate
LIMIT 10;

 * mysql+pymysql://root:***@localhost:3306/united_nations
10 rows affected.


Country_name,Time_period,Pct_unemployment_imputed
Angola,2015,33.65
Angola,2016,33.65
Angola,2017,33.65
Angola,2018,33.65
Angola,2019,16.5
Angola,2020,33.65
Benin,2015,33.65
Benin,2016,33.65
Benin,2017,33.65
Benin,2018,1.47


## Solutions

### Task 1

In [16]:
%%sql

SELECT 
    loc.Country_name,
    eco.Time_period,
    eco.Pct_unemployment
FROM 
    united_nations.Geographic_Location as loc
LEFT JOIN united_nations.Economic_Indicators as eco
ON eco.Country_name = loc.Country_name;

# Use LIMIT if you think the results set will be too large


 * mysql+pymysql://root:***@localhost:3306/united_nations
1048 rows affected.
0 rows affected.


[]

### Task 2

In [17]:
%%sql

SELECT 
    loc.Country_name,
    eco.Time_period,
    eco.Pct_unemployment
FROM 
    united_nations.Geographic_Location as loc
LEFT JOIN united_nations.Economic_Indicators as eco
ON eco.Country_name = loc.Country_name
WHERE REGION = 'Sub-Saharan Africa';



 * mysql+pymysql://root:***@localhost:3306/united_nations
297 rows affected.


Country_name,Time_period,Pct_unemployment
Angola,2015,
Angola,2016,
Angola,2017,
Angola,2018,
Angola,2019,16.5
Angola,2020,
Benin,2015,
Benin,2016,
Benin,2017,
Benin,2018,1.47


### Task 3

In [18]:
%%sql

SELECT 
    loc.Country_name,
    eco.Time_period,
    IFNULL (eco.Pct_unemployment,33.65) as Pct_unemployment_imputed
FROM 
    united_nations.Geographic_Location as loc
LEFT JOIN united_nations.Economic_Indicators as eco
ON eco.Country_name = loc.Country_name
WHERE REGION = 'Sub-Saharan Africa';


 * mysql+pymysql://root:***@localhost:3306/united_nations
297 rows affected.


Country_name,Time_period,Pct_unemployment_imputed
Angola,2015,33.65
Angola,2016,33.65
Angola,2017,33.65
Angola,2018,33.65
Angola,2019,16.5
Angola,2020,33.65
Benin,2015,33.65
Benin,2016,33.65
Benin,2017,33.65
Benin,2018,1.47


### Task 4



In [20]:
%%sql

CREATE VIEW united_nations.Country_Unemployment_Rate
AS

SELECT 
    loc.Country_name,
    eco.Time_period,
    IFNULL (eco.Pct_unemployment,33.65) as PCT_unemployment_imputed
FROM 
    united_nations.Geographic_Location as loc
LEFT JOIN united_nations.Economic_Indicators as eco
ON eco.Country_name = loc.Country_name
WHERE REGION = 'Sub-Saharan Africa';




 * mysql+pymysql://root:***@localhost:3306/united_nations
(pymysql.err.OperationalError) (1050, "Table 'Country_Unemployment_Rate' already exists")
[SQL: CREATE VIEW united_nations.Country_Unemployment_Rate
AS

SELECT 
    loc.Country_name,
    eco.Time_period,
    IFNULL (eco.Pct_unemployment,33.65) as PCT_unemployment_imputed
FROM 
    united_nations.Geographic_Location as loc
LEFT JOIN united_nations.Economic_Indicators as eco
ON eco.Country_name = loc.Country_name
WHERE REGION = 'Sub-Saharan Africa';]
(Background on this error at: https://sqlalche.me/e/20/e3q8)


### Task 5


In [21]:
%%sql

SELECT *
FROM
    united_nations.Country_unemployment_Rate
LIMIT 10;

 * mysql+pymysql://root:***@localhost:3306/united_nations
10 rows affected.


Country_name,Time_period,Pct_unemployment_imputed
Angola,2015,33.65
Angola,2016,33.65
Angola,2017,33.65
Angola,2018,33.65
Angola,2019,16.5
Angola,2020,33.65
Benin,2015,33.65
Benin,2016,33.65
Benin,2017,33.65
Benin,2018,1.47


## Summary

 


In this notebook, we created a join with the `Economic_Indicators` table and the `Geographical_Locations` table. This allowed us to filter our data by `Region`, specifically the `Sub-Saharan African` region. We then imputed the Null values in the `Pct_unemployment` column. Finally, we saved this complex query in a VIEW and named that VIEW `united_nations.Country_unemployment_Rate`.



#  

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/EAI_Blue_Dark.png"  style="width:200px";/>
</div>