<a href="https://colab.research.google.com/github/engineer-nicolas/cs50sql/blob/master/lecture_4_Viewing/lecture_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lecture 4 - Viewing - CS50 SQL harvard


## Views

A view is a virtual table defined by a query.

Views are useful for:


* simplifying: putting together data from different tables to be queried more simply

* aggregating: running aggregate functions, like finding the sum, and storing the results

* partitioning: dividing data into logical pieces

* securing: hiding columns that should be kept secure



In [1]:
# Let's import the required libraries
import sqlite3
import pandas as pd

# Let's connect to the SQLite database used in CS50
conn = sqlite3.connect("longlist.db")

In [5]:
df=pd.read_sql_query(
    """
    SELECT *
    FROM sqlite_master;
    """,
    conn
)
df

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,authors,authors,2,"CREATE TABLE ""authors"" (\n ""id"" INTEGER,\n ..."
1,table,authored,authored,3,"CREATE TABLE ""authored"" (\n ""author_id"" INT..."
2,table,books,books,4,"CREATE TABLE ""books"" (\n ""id"" INTEGER,\n ..."
3,table,publishers,publishers,5,"CREATE TABLE ""publishers"" (\n ""id"" INTEGER,..."
4,table,ratings,ratings,6,"CREATE TABLE ""ratings"" (\n ""book_id"" INTEGE..."
5,table,translators,translators,7,"CREATE TABLE ""translators"" (\n ""id"" INTEGER..."
6,table,translated,translated,8,"CREATE TABLE ""translated"" (\n ""translator_i..."


In [10]:
for e in range(len(df)):
    print(df["sql"][e])

CREATE TABLE "authors" (
    "id" INTEGER,
    "name" TEXT,
    "country" TEXT,
    "birth" INTEGER,
    PRIMARY KEY("id")
)
CREATE TABLE "authored" (
    "author_id" INTEGER,
    "book_id" INTEGER,
    FOREIGN KEY("author_id") REFERENCES "authors"("id"),
    FOREIGN KEY("book_id") REFERENCES "books"("id")
)
CREATE TABLE "books" (
    "id" INTEGER,
    "isbn" TEXT,
    "title" TEXT,
    "publisher_id" INTEGER,
    "format" TEXT,
    "pages" INTEGER,
    "published" TEXT,
    "year" INTEGER,
    PRIMARY KEY("id"),
    FOREIGN KEY("publisher_id") REFERENCES "publishers"("id")
)
CREATE TABLE "publishers" (
    "id" INTEGER,
    "publisher" TEXT,
    PRIMARY KEY("id")
)
CREATE TABLE "ratings" (
    "book_id" INTEGER,
    "rating" INTEGER,
    FOREIGN KEY("book_id") REFERENCES "books"("id")
)
CREATE TABLE "translators" (
    "id" INTEGER,
    "name" TEXT,
    PRIMARY KEY("id")
)
CREATE TABLE "translated" (
    "translator_id" INTEGER,
    "book_id" INTEGER,
    FOREIGN KEY("translator_id") 

## Simplifying

To select the books written by Fernanda Melchor, we would write this nested query.

In [None]:
#books written by Fernanda Melchor
pd.read_sql_query(
"""
SELECT "title"
FROM "books"
WHERE "id" IN (
    SELECT "book_id"
    FROM "authored"
    WHERE "author_id" = (
        SELECT "id"
        FROM "authors"
        WHERE "name" = 'Fernanda Melchor'
    )
);
""",
conn
)

Unnamed: 0,title
0,Paradais
1,Hurricane Season


To simplify this, let us first use JOIN to create a view containing authors and their books

In [16]:
joined_tables=pd.read_sql_query(
"""
SELECT "name", "title" FROM "authors"
JOIN "authored" ON "authors"."id" = "authored"."author_id"
JOIN "books" ON "books"."id" = "authored"."book_id";
""",
conn
)
joined_tables

Unnamed: 0,name,title
0,Eva Baltasar,Boulder
1,Cheon Myeong-Kwan,Whale
2,Maryse Condé,The Gospel According to the New World
3,Gauz,Standing Heavy
4,Georgi Gospodinov,Time Shelter
...,...,...
73,Han Kang,The White Book
74,László Krasznahorkai,The World Goes On
75,Virginie Despentes,Vernon Subutex 1
76,Ariana Harwicz,"Die, My Love"


## CREATE VIEW ___ AS + DROP VIEW IF EXISTS


To save the virtual table created in the previous step as a view, we need to change the query.

In [23]:
conn.execute('''
DROP VIEW IF EXISTS "view_author_and_title";
''')

conn.execute('''
CREATE VIEW "view_author_and_title" AS
SELECT "authors"."name", "books"."title"
FROM "authors"
JOIN "authored" ON "authors"."id" = "authored"."author_id"
JOIN "books" ON "books"."id" = "authored"."book_id";
''')

conn.commit()
pd.read_sql_query(
    """
    SELECT *
    FROM sqlite_master;
    """,
    conn
)


Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,authors,authors,2,"CREATE TABLE ""authors"" (\n ""id"" INTEGER,\n ..."
1,table,authored,authored,3,"CREATE TABLE ""authored"" (\n ""author_id"" INT..."
2,table,books,books,4,"CREATE TABLE ""books"" (\n ""id"" INTEGER,\n ..."
3,table,publishers,publishers,5,"CREATE TABLE ""publishers"" (\n ""id"" INTEGER,..."
4,table,ratings,ratings,6,"CREATE TABLE ""ratings"" (\n ""book_id"" INTEGE..."
5,table,translators,translators,7,"CREATE TABLE ""translators"" (\n ""id"" INTEGER..."
6,table,translated,translated,8,"CREATE TABLE ""translated"" (\n ""translator_i..."
7,view,view_author_and_title,view_author_and_title,0,"CREATE VIEW ""view_author_and_title"" AS\nSELECT..."


Using this view, we can considerably simplify the query needed to find the books written by Fernanda Melchor

In [25]:
simplified_query_with_view = pd.read_sql_query(
'''
SELECT "title"
FROM "view_author_and_title"
WHERE "name" = 'Fernanda Melchor';
''',
conn
)
simplified_query_with_view


Unnamed: 0,title
0,Paradais
1,Hurricane Season


A view, being a virtual table, does not consume much more disk space to create. The data within a view is still stored in the underlying tables, but still accessible through this simplfied view.


## Aggregating

In lecture 1, we saw how to find the average rating of every book, rounded to 2 decimal places.

In [26]:
pd.read_sql_query(
"""
SELECT "book_id", ROUND(AVG("rating"), 2) AS "average rating" 
FROM "ratings"
GROUP BY "book_id";
""",
conn
)

Unnamed: 0,book_id,average rating
0,1,3.77
1,2,3.97
2,3,3.04
3,4,3.57
4,5,4.06
...,...,...
73,74,3.83
74,75,3.77
75,76,3.89
76,77,3.54


We could also display the title and year of every book
> **Note:** SQL clauses must be written in a specific syntactic order.The correct order is:
>
> `SELECT → FROM → JOIN → WHERE → GROUP BY → HAVING → ORDER BY → LIMIT`

In [30]:
pd.read_sql_query(
"""
SELECT 
    "book_id", 
    "title", 
    "year", 
    ROUND(AVG("rating"), 2) AS "average rating" 
FROM "ratings"
JOIN "books" ON "ratings"."book_id" = "books"."id"
GROUP BY "book_id";
""",
conn
)

Unnamed: 0,book_id,title,year,average rating
0,1,Boulder,2023,3.77
1,2,Whale,2023,3.97
2,3,The Gospel According to the New World,2023,3.04
3,4,Standing Heavy,2023,3.57
4,5,Time Shelter,2023,4.06
...,...,...,...,...
73,74,The White Book,2018,3.83
74,75,The World Goes On,2018,3.77
75,76,Vernon Subutex 1,2018,3.89
76,77,"Die, My Love",2018,3.54


This aggregated data can be stored in a view:

In [57]:
conn.execute('''
DROP VIEW IF EXISTS "view_book_ratings";
''')

conn.execute('''
CREATE VIEW "view_book_ratings" AS
SELECT 
    "book_id" AS "id", 
    "title", 
    "year", 
    ROUND(AVG("rating"), 2) AS "average rating" 
FROM "ratings"
JOIN "books" ON "ratings"."book_id" = "books"."id"
GROUP BY "book_id";
''')

conn.commit()
df=pd.read_sql_query(
    """
    SELECT *
    FROM sqlite_master;
    """,
    conn
)
df

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,authors,authors,2,"CREATE TABLE ""authors"" (\n ""id"" INTEGER,\n ..."
1,table,authored,authored,3,"CREATE TABLE ""authored"" (\n ""author_id"" INT..."
2,table,books,books,4,"CREATE TABLE ""books"" (\n ""id"" INTEGER,\n ..."
3,table,publishers,publishers,5,"CREATE TABLE ""publishers"" (\n ""id"" INTEGER,..."
4,table,ratings,ratings,6,"CREATE TABLE ""ratings"" (\n ""book_id"" INTEGE..."
5,table,translators,translators,7,"CREATE TABLE ""translators"" (\n ""id"" INTEGER..."
6,table,translated,translated,8,"CREATE TABLE ""translated"" (\n ""translator_i..."
7,view,view_author_and_title,view_author_and_title,0,"CREATE VIEW ""view_author_and_title"" AS\nSELECT..."
8,view,view_book_ratings,view_book_ratings,0,"CREATE VIEW ""view_book_ratings"" AS\nSELECT \n ..."


Each time a view is created, it gets added to the schema. We can verify this by running `.schema` to observe that `view_author_and_title` and `view_book_ratings` are now part of this database’s schema.

In [59]:
aggregating_query_with_view = pd.read_sql_query(
'''
SELECT * FROM "view_book_ratings";
''',
conn
)
aggregating_query_with_view

Unnamed: 0,id,title,year,average rating
0,1,Boulder,2023,3.77
1,2,Whale,2023,3.97
2,3,The Gospel According to the New World,2023,3.04
3,4,Standing Heavy,2023,3.57
4,5,Time Shelter,2023,4.06
...,...,...,...,...
73,74,The White Book,2018,3.83
74,75,The World Goes On,2018,3.77
75,76,Vernon Subutex 1,2018,3.89
76,77,"Die, My Love",2018,3.54


In [60]:
pd.read_sql_query(
"""
PRAGMA table_info("view_book_ratings");
""",
conn
)

Unnamed: 0,cid,name,type,notnull,dflt_value,pk
0,0,id,INTEGER,0,,0
1,1,title,TEXT,0,,0
2,2,year,INTEGER,0,,0
3,3,average rating,,0,,0


## CREATE TEMPORARY VIEW ___ AS and DROP VIEW IF EXISTS

To create temporary views that are not stored in the database schema, we can use CREATE TEMPORARY VIEW. 

This command creates a view that exists only for the duration of our connection with the database.



To find the average rating of books per year, we can use the view we already created and store the results in a temporary view:

In [65]:
conn.execute("""
DROP VIEW IF EXISTS "average_ratings_by_year";
"""
)

conn.execute("""
CREATE TEMPORARY VIEW "average_ratings_by_year" AS
SELECT 
    "year", 
    AVG("average rating") AS "rating" 
FROM "view_book_ratings" 
GROUP BY "year";
""")

conn.commit()
pd.read_sql_query(
    """
    SELECT *
    FROM sqlite_master;
    """,
    conn
)


Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,authors,authors,2,"CREATE TABLE ""authors"" (\n ""id"" INTEGER,\n ..."
1,table,authored,authored,3,"CREATE TABLE ""authored"" (\n ""author_id"" INT..."
2,table,books,books,4,"CREATE TABLE ""books"" (\n ""id"" INTEGER,\n ..."
3,table,publishers,publishers,5,"CREATE TABLE ""publishers"" (\n ""id"" INTEGER,..."
4,table,ratings,ratings,6,"CREATE TABLE ""ratings"" (\n ""book_id"" INTEGE..."
5,table,translators,translators,7,"CREATE TABLE ""translators"" (\n ""id"" INTEGER..."
6,table,translated,translated,8,"CREATE TABLE ""translated"" (\n ""translator_i..."
7,view,view_author_and_title,view_author_and_title,0,"CREATE VIEW ""view_author_and_title"" AS\nSELECT..."
8,view,view_book_ratings,view_book_ratings,0,"CREATE VIEW ""view_book_ratings"" AS\nSELECT \n ..."


In [66]:
pd.read_sql_query(
"""
SELECT *
FROM "sqlite_temp_master";
""",
conn
)

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,view,average_ratings_by_year,average_ratings_by_year,0,"CREATE VIEW ""average_ratings_by_year"" AS\nSELE..."


In [67]:
pd.read_sql_query(
"""
SELECT *
FROM "temp"."average_ratings_by_year";
""",
conn
)


Unnamed: 0,year,rating
0,2018,3.746154
1,2019,3.640769
2,2020,3.788462
3,2021,3.692308
4,2022,3.868462
5,2023,3.783846


## Common Table Expression (CTE)

- A regular view exists forever in our database schema. 
- A temporary view exists for the duration of our connection with the database. 
- A CTE is a view that exists for a single query alone.

## Partioning

Views can be used to partition data

## Securing

Views can be used to enhance database security by limiting access to certain data.

## Soft Deletions