# 10 - Using Views to Simplify Queries

One of the beautiful aspects of the relational data model and SQL is that the output of a query is also a table, a relation to be precise. It may consist of a single column or a single row, but it is a table nonetheless. A **view** is a query that can be used like a table. A view can be considered as a virtual table that does not hold data. They just hold a query. Every time a view is accessed, the query underlying it is run and the returned results can be used as though they made up an actual table.

There are several reasons for using views. I think the utmost reason is we are lazy and we do not want to write the same complicated long query sentences every time. I am kidding. However, keep in mind the **DRY programming principle**: Don’t Repeat Yourself. Avoiding repetition saves time and prevents unnecessary mistakes. This is one of right reasons that we save queries as reusable database views.

SQLite views are created using the **CREATE VIEW** statement. Views can be created from a single table, multiple tables, or another view. Following is the basic CREATE VIEW syntax:
```
CREATE [TEMP | TEMPORARY] VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE [condition];
```

In [1]:
import pandas as pd
import mysql.connector as sql
import os

In [2]:
connection = sql.connect(
    host = os.environ.get('mysql_host'),
    user = os.environ.get('mysql_user'),
    password = os.environ.get('mysql_password')
)

cursor = connection.cursor()

If you do not remember the tables in the sakila database, you can always use the following command to query.

In [3]:
pd.read_sql_query("""
    SHOW TABLES
    FROM world
    """,
    connection)

Unnamed: 0,Tables_in_world
0,city
1,country
2,country_decades
3,countrylanguage


## 1. Simplifying queries with views
In the previous notebook, we used CASE and Subquery to calculate average population from the country table. Here we use a view to simplify the calculation.

### 1.1 Have a recall on how to calculate the avg(population) (ordering by population)

In [6]:
pd.read_sql_query("""
    SELECT c.Name, c.Continent, c.Region, c.Decade, AVG(c.Population) AS avg_pop
    FROM (
        SELECT w.Name, w.Continent, w.Region,
        CASE
            WHEN (IndepYear) between 1900 AND 1909 THEN '00s'
            WHEN (IndepYear) between 1910 AND 1919 THEN '10s'
            WHEN (IndepYear) between 1920 AND 1929 THEN '20s'
            WHEN (IndepYear) between 1930 AND 1939 THEN '30s'
            WHEN (IndepYear) between 1940 AND 1949 THEN '40s'
            WHEN (IndepYear) between 1950 AND 1959 THEN '50s'
            WHEN (IndepYear) between 1960 AND 1969 THEN '60s'
            WHEN (IndepYear) between 1970 AND 1979 THEN '70s'
            WHEN (IndepYear) between 1980 AND 1989 THEN '80s'
            WHEN (IndepYear) between 1990 AND 1999 THEN '90s'
            ELSE 'Other'
        END Decade,
        Population
        FROM world.country w) c
    GROUP BY c.Name, c.Continent, c.Region, c.Decade
    ORDER BY avg_pop DESC
    """,
    connection)

Unnamed: 0,Name,Continent,Region,Decade,avg_pop
0,China,Asia,Eastern Asia,Other,1.277558e+09
1,India,Asia,Southern and Central Asia,40s,1.013662e+09
2,United States,North America,North America,Other,2.783570e+08
3,Indonesia,Asia,Southeast Asia,40s,2.121070e+08
4,Brazil,South America,South America,Other,1.701150e+08
...,...,...,...,...,...
234,Bouvet Island,Antarctica,Antarctica,Other,0.000000e+00
235,Heard Island and McDonald Islands,Antarctica,Antarctica,Other,0.000000e+00
236,British Indian Ocean Territory,Africa,Eastern Africa,Other,0.000000e+00
237,South Georgia and the South Sandwich Islands,Antarctica,Antarctica,Other,0.000000e+00


### 1.2 Creating a view

In [7]:
pd.read_sql_query("""
    CREATE 
        OR REPLACE
        VIEW world.country_decades AS
        SELECT w.Name, w.Continent, w.Region,
            CASE
                WHEN (w.IndepYear) between 1900 AND 1909 THEN '00s'
                WHEN (w.IndepYear) between 1910 AND 1919 THEN '10s'
                WHEN (w.IndepYear) between 1920 AND 1929 THEN '20s'
                WHEN (w.IndepYear) between 1930 AND 1939 THEN '30s'
                WHEN (w.IndepYear) between 1940 AND 1949 THEN '40s'
                WHEN (w.IndepYear) between 1950 AND 1959 THEN '50s'
                WHEN (w.IndepYear) between 1960 AND 1969 THEN '60s'
                WHEN (w.IndepYear) between 1970 AND 1979 THEN '70s'
                WHEN (w.IndepYear) between 1980 AND 1989 THEN '80s'
                WHEN (w.IndepYear) between 1990 AND 1999 THEN '90s'
                ELSE 'Other'
            END Decade,
            w.Population
            FROM world.country w;
        """,
    connection)

TypeError: 'NoneType' object is not iterable

Let's query country_decades view

In [10]:
pd.read_sql_query("""
    SELECT *
    FROM world.country_decades
    LIMIT 5
    """,
    connection)

Unnamed: 0,Name,Continent,Region,Decade,Population
0,Aruba,North America,Caribbean,Other,103000
1,Afghanistan,Asia,Southern and Central Asia,10s,22720000
2,Angola,Africa,Central Africa,70s,12878000
3,Anguilla,North America,Caribbean,Other,8000
4,Albania,Europe,Southern Europe,10s,3401200


### 1.3 Recalculate avg_pop with views
The code realy gets shorter

In [12]:
pd.read_sql_query("""
    SELECT Name, Continent, Region, Decade, AVG(Population) AS avg_pop
    FROM world.country_decades
    GROUP BY Name, Continent, Region, Decade
    ORDER BY avg_pop DESC
    """,
    connection)

Unnamed: 0,Name,Continent,Region,Decade,avg_pop
0,China,Asia,Eastern Asia,Other,1.277558e+09
1,India,Asia,Southern and Central Asia,40s,1.013662e+09
2,United States,North America,North America,Other,2.783570e+08
3,Indonesia,Asia,Southeast Asia,40s,2.121070e+08
4,Brazil,South America,South America,Other,1.701150e+08
...,...,...,...,...,...
234,Bouvet Island,Antarctica,Antarctica,Other,0.000000e+00
235,Heard Island and McDonald Islands,Antarctica,Antarctica,Other,0.000000e+00
236,British Indian Ocean Territory,Africa,Eastern Africa,Other,0.000000e+00
237,South Georgia and the South Sandwich Islands,Antarctica,Antarctica,Other,0.000000e+00


### 1.4 Deleting views
It is quite easy to delete views. Just drop it like the following.

In [18]:
pd.read_sql_query("""
    DROP VIEW IF EXISTS 
        world.country_decades""",
    connection)

TypeError: 'NoneType' object is not iterable

## Summary
Views are virtual tables that do not hold data, only SQL statements. Those statements are executed each time the view is accessed. Because views are created dynamically as they are accessed and the data in those views are always fresh and up-to-date, they have some advantages over creating a subtables from a table. The data in subtables is static and could be out-to-date.

A view is useful in some cases:

First, views provide an abstraction layer over tables. You can add and remove the columns in the view without touching the schema of the underlying tables.
Second, you can use views to encapsulate complex queries with joins to simplify the data access.

# References
- [Chonghua Yin notebook](https://github.com/royalosyin/Practice-SQL-with-SQLite-and-Jupyter-Notebook/blob/master/ex10-Using%20Views%20to%20Simplify%20Queries.ipynb)