# Lab | PostgreSQL Select

## Introduction
In this lab you will practice `SELECT` statement which will be extremely useful in your future work as a data analyst/scientist/engineer.  
- Use code from `publications_database.sql` to create your databese, tables and insert data. 
  
# **1. Who have published what at where?**

In this challenge you will write a PostgreSQL `SELECT` query that `JOIN` various tables to figure out what titles each author has published at which publishers.   
Your output should have at least the following columns:  
From `authors` table:  
- `au_id` - the ID of the author.
- `au_lname` - author last name. 
- `au_fname` - author first name.   

From `titles` table:  
- `title` - name of the published title.  

From `publishers` table:  
- `pub_name` - name of the publisher where the title was published.

Your output will look something like below:

![Challenge 1 output](./images/challenge-1.png)

*Note: the screenshot above is not the complete output.*

If your query is correct, the total rows in your output should be the same as the total number of records in table `titleauthor`.

In [1]:
import pandas as pd

In [2]:
import sqlalchemy as db

In [3]:
# for python-dotenv method

from dotenv import load_dotenv
load_dotenv()
import os

In [4]:
# PostgreSQL

db_server = "postgresql"
db_user = "postgres"
db_password = os.environ.get('PASSWORD')
db_host = "localhost"
db_database = "w09_06_lab"
db_port = 5432

# create the engine
engine = db.create_engine(
    f"{db_server}://{db_user}:{db_password}@{db_host}:{db_port}/{db_database}"
)

# open the connection
conn = engine.connect()

# Close the connection
# conn.close()

In [5]:
conn

<sqlalchemy.engine.base.Connection at 0x7fbff9123340>

In [6]:
print('authors')
print(pd.read_sql_table("authors", conn).head(2))
print('')
print('titles')
print(pd.read_sql_table("titles", conn).head(2))
print('')
print('publishers')
print(pd.read_sql_table("publishers", conn).head(2))
print('')
print('sales')
print(pd.read_sql_table("sales", conn).head(2))
print('')
print('titleauthor')
print(pd.read_sql_table("titleauthor", conn).head(2))
print('')

authors
         au_id au_lname  au_fname         phone            address  \
0  172-32-1176    White   Johnson  408 496-7223    10932 Bigge Rd.   
1  213-46-8915    Green  Marjorie  415 986-7020  309 63rd St. #411   

         city state    zip  contract  
0  Menlo Park    CA  94025         1  
1     Oakland    CA  94618         1  

titles
  title_id                                              title      type  \
0   BU1032                The Busy Executive's Database Guide  business   
1   BU1111  Cooking with Computers: Surreptitious Balance ...  business   

   pub_id    price    advance royalty ytd_sales  \
0    1389  19.9900  5000.0000      10      4095   
1    1389  11.9500  5000.0000      10      3876   

                                               notes              pubdate  
0  An overview of available database systems with...  1991-06-12 00:00:00  
1  Helpful hints on how to use your electronic re...  1991-06-09 00:00:00  

publishers
   pub_id          pub_name        c

In [7]:
query = '''
SELECT
    *
FROM
    authors
LIMIT 2;
'''
pd.read_sql(query, conn)

Unnamed: 0,au_id,au_lname,au_fname,phone,address,city,state,zip,contract
0,172-32-1176,White,Johnson,408 496-7223,10932 Bigge Rd.,Menlo Park,CA,94025,1
1,213-46-8915,Green,Marjorie,415 986-7020,309 63rd St. #411,Oakland,CA,94618,1


In [8]:
pd.read_sql_table("authors", conn).head(2)

Unnamed: 0,au_id,au_lname,au_fname,phone,address,city,state,zip,contract
0,172-32-1176,White,Johnson,408 496-7223,10932 Bigge Rd.,Menlo Park,CA,94025,1
1,213-46-8915,Green,Marjorie,415 986-7020,309 63rd St. #411,Oakland,CA,94618,1


In [9]:
pd.read_sql_table("titles", conn).head(2)

Unnamed: 0,title_id,title,type,pub_id,price,advance,royalty,ytd_sales,notes,pubdate
0,BU1032,The Busy Executive's Database Guide,business,1389,19.99,5000.0,10,4095,An overview of available database systems with...,1991-06-12 00:00:00
1,BU1111,Cooking with Computers: Surreptitious Balance ...,business,1389,11.95,5000.0,10,3876,Helpful hints on how to use your electronic re...,1991-06-09 00:00:00


In [10]:
pd.read_sql_table("publishers", conn).head(2)

Unnamed: 0,pub_id,pub_name,city,state,country
0,736,New Moon Books,Boston,MA,USA
1,877,Binnet & Hardley,Washington,DC,USA


In [11]:
pd.read_sql_table("sales", conn).head(2)

Unnamed: 0,stor_id,ord_num,ord_date,qty,payterms,title_id
0,6380,6871,1994-09-14 00:00:00,5,Net 60,BU1032
1,6380,722a,1994-09-13 00:00:00,3,Net 60,PS2091


In [12]:
pd.read_sql_table("titleauthor", conn).head(2)

Unnamed: 0,au_id,title_id,au_ord,royaltyper
0,172-32-1176,PS3333,1,100
1,213-46-8915,BU1032,2,40


In [20]:
query = '''
SELECT
    authors.au_id AS "AUTHOR_ID",
    authors.au_lname AS "LAST_NAME",
    authors.au_fname AS "FIRST_NAME",
    titles.title AS "TITLE",
    publishers.pub_name AS "PUBLISHER"
FROM
    authors
INNER JOIN
    titleauthor
    ON authors.au_id=titleauthor.au_id
LEFT JOIN
    titles
    ON titleauthor.title_id=titles.title_id
LEFT JOIN
    publishers
    ON titles.pub_id=publishers.pub_id;
'''
pd.read_sql(query, conn)

Unnamed: 0,AUTHOR_ID,LAST_NAME,FIRST_NAME,TITLE,PUBLISHER
0,172-32-1176,White,Johnson,Prolonged Data Deprivation: Four Case Studies,New Moon Books
1,213-46-8915,Green,Marjorie,The Busy Executive's Database Guide,Algodata Infosystems
2,213-46-8915,Green,Marjorie,You Can Combat Computer Stress!,New Moon Books
3,238-95-7766,Carson,Cheryl,But Is It User Friendly?,Algodata Infosystems
4,267-41-2394,O'Leary,Michael,Cooking with Computers: Surreptitious Balance ...,Algodata Infosystems
5,267-41-2394,O'Leary,Michael,"Sushi, Anyone?",Binnet & Hardley
6,274-80-9391,Straight,Dean,Straight Talk About Computers,Algodata Infosystems
7,409-56-7008,Bennet,Abraham,The Busy Executive's Database Guide,Algodata Infosystems
8,427-17-2319,Dull,Ann,Secrets of Silicon Valley,Algodata Infosystems
9,472-27-2349,Gringlesby,Burt,"Sushi, Anyone?",Binnet & Hardley


If your query is correct, the total rows in your output should be the same as the total number of records in table titleauthor.

In [14]:
query = '''
SELECT
    COUNT(*)
FROM
    titleauthor;
'''
pd.read_sql(query, conn)

Unnamed: 0,count
0,25


# **2. Who have published how many at where?**

Elevating from your solution in challenge 1, query how many titles each author has published at each publisher. Order your output by the title count in descending order.  

Your output should look something like below:

![Challenge 2 output](./images/challenge-2.png)

*Note: the screenshot above is not the complete output.*  

To check if your output is correct, sum up the `TITLE COUNT` column. The sum number should be the same as the total number of records in Table `titleauthor`.  

*Hint: In order to count the number of titles published by an author, you need to use [COUNT](https://www.w3resource.com/PostgreSQL/postgresql-count-function.php).  Also check out [GROUP BY](https://www.w3resource.com/PostgreSQL/postgresql-group-by.php) because you will count the rows of different groups of data.*

In [19]:
query = '''
SELECT
    authors.au_id AS "AUTHOR_ID",
    authors.au_lname AS "LAST_NAME",
    authors.au_fname AS "FIRST_NAME",
    publishers.pub_name AS "PUBLISHER",
    COUNT(*) AS "TITLE_COUNT"
FROM
    authors
INNER JOIN
    titleauthor
    ON authors.au_id=titleauthor.au_id
LEFT JOIN
    titles
    ON titleauthor.title_id=titles.title_id
LEFT JOIN
    publishers
    ON titles.pub_id=publishers.pub_id
GROUP BY
    authors.au_id,
    authors.au_lname,
    authors.au_fname,
    publishers.pub_name
ORDER BY
    "TITLE_COUNT" DESC;
'''
pd.read_sql(query, conn)

Unnamed: 0,AUTHOR_ID,LAST_NAME,FIRST_NAME,PUBLISHER,TITLE_COUNT
0,998-72-3567,Ringer,Albert,New Moon Books,2
1,267-41-2394,O'Leary,Michael,Binnet & Hardley,1
2,899-46-2035,Ringer,Anne,Binnet & Hardley,1
3,274-80-9391,Straight,Dean,Algodata Infosystems,1
4,213-46-8915,Green,Marjorie,New Moon Books,1
5,172-32-1176,White,Johnson,New Moon Books,1
6,756-30-7391,Karsen,Livia,Binnet & Hardley,1
7,267-41-2394,O'Leary,Michael,Algodata Infosystems,1
8,213-46-8915,Green,Marjorie,Algodata Infosystems,1
9,724-80-9391,MacFeather,Stearns,Algodata Infosystems,1


# **3. Best Selling Authors**

Who are the top 3 authors who have sold the highest number of titles?   
Your output should have at least the following columns:  

From `authors` table:  
- `au_id` - the ID of the author.
- `au_lname` - author last name. 
- `au_fname` - author first name. 
  
From `sales` table:
- `qty` - quantity
  
Your output should be ordered based on `TOTAL` from high to low.  
Only output the top 3 best selling authors.

Your output should look something like below:

![Challenge 3 output](./images/challenge-3.png)

In [21]:
query = '''
SELECT
    authors.au_id AS "AUTHOR ID",
    authors.au_lname AS "LAST NAME",
    authors.au_fname AS "FIRST NAME",
    SUM(sales.qty) AS "TOTAL"
FROM
    authors
INNER JOIN
    titleauthor
    ON authors.au_id=titleauthor.au_id
LEFT JOIN
    sales
    ON titleauthor.title_id=sales.title_id
GROUP BY
    authors.au_id,
    authors.au_lname,
    authors.au_fname
ORDER BY
    "TOTAL" DESC
LIMIT 3;
'''
pd.read_sql(query, conn)

Unnamed: 0,AUTHOR ID,LAST NAME,FIRST NAME,TOTAL
0,899-46-2035,Ringer,Anne,148
1,998-72-3567,Ringer,Albert,133
2,213-46-8915,Green,Marjorie,50


# **4. Best selling authors ranking**

Now modify your solution in challenge 3 so that the output will display all 23 authors instead of the top 3.  Note that the authors who have sold 0 titles should also appear in your output, ideally display `0` instead of `NULL`.   
Also order your results based on `TOTAL` from high to low. 

Your output should look something like below:

![Challenge 4 output](./images/challenge-4.png)

In [22]:
query = '''
SELECT
    authors.au_id AS "AUTHOR ID",
    authors.au_lname AS "LAST NAME",
    authors.au_fname AS "FIRST NAME",
    SUM(sales.qty) AS "TOTAL"
FROM
    authors
INNER JOIN
    titleauthor
    ON authors.au_id=titleauthor.au_id
LEFT JOIN
    sales
    ON titleauthor.title_id=sales.title_id
GROUP BY
    authors.au_id,
    authors.au_lname,
    authors.au_fname
ORDER BY
    "TOTAL" DESC;
'''
pd.read_sql(query, conn)

Unnamed: 0,AUTHOR ID,LAST NAME,FIRST NAME,TOTAL
0,899-46-2035,Ringer,Anne,148
1,998-72-3567,Ringer,Albert,133
2,213-46-8915,Green,Marjorie,50
3,427-17-2319,Dull,Ann,50
4,846-92-7186,Hunter,Sheryl,50
5,724-80-9391,MacFeather,Stearns,45
6,267-41-2394,O'Leary,Michael,45
7,807-91-6654,Panteley,Sylvia,40
8,722-51-5454,DeFrance,Michel,40
9,238-95-7766,Carson,Cheryl,30
