# Lab | SQL Select - SOLUTIONS

## Introduction

In this lab you will practice how to use the `SELECT` statement which will be extremely useful in your future work as a data analyst/scientist/engineer. **You will use the `publications` database **  (publications.db file).


![Publications DB Schema](authors.png)

You will create a `solutions.ipynb` file in the `your-code` directory to record your solutions to all challenges.

## Challenge 1 - Who Have Published What At Where?

In this challenge you will write a `SELECT` query that joins various tables to figure out what titles each author has published at which publishers. Your output should have at least the following columns:

* `AUTHOR_ID` - the ID of the author
* `LAST_NAME` - author last name
* `FIRST_NAME` - author first name
* `TITLE` - name of the published title
* `PUBLISHER` - name of the publisher where the title was published

Your output will look something like below:

![Challenge 1 output](challenge-1.png)

*Note: the screenshot above is not the complete output.*

If your query is correct, the total rows in your output should be the same as the total number of records in Table `titleauthor`.



#### Load the extension

In [1]:
%load_ext sql

#### Connect to the database

In [2]:
%sql sqlite:////Users/annavilardell/Desktop/BootcampData/20-lab-sql-select/publications.db

#### Get list of tables

In [19]:
%config SqlMagic.style="ALL"

In [3]:
#Todos los nombres de las tablas abajo para saber que tenemos 
#aunque también lo vemos en la foto de labase de datos de arriba

In [4]:
%%sql
SELECT name 
FROM sqlite_master
WHERE type='table'
ORDER BY name;

 * sqlite:////Users/annavilardell/Desktop/BootcampData/20-lab-sql-select/publications.db
Done.


name
authors
discounts
employee
jobs
pub_info
publishers
roysched
sales
stores
titleauthor


In [None]:
#ahora empezaremos a buscar cada una de las listas pedidas: 
#AUTHOR_ID - the ID of the author
#LAST_NAME - author last name
#FIRST_NAME - author first name
#TITLE - name of the published title
#PUBLISHER - name of the publisher where the title was published

In [13]:
#View authors table

In [18]:
%%sql
SELECT *
FROM authors
limit 3

 * sqlite:////Users/annavilardell/Desktop/BootcampData/20-lab-sql-select/publications.db
Done.


au_id,au_lname,au_fname,phone,address,city,state,zip,contract
172-32-1176,White,Johnson,408 496-7223,10932 Bigge Rd.,Menlo Park,CA,94025,1
213-46-8915,Green,Marjorie,415 986-7020,309 63rd St. #411,Oakland,CA,94618,1
238-95-7766,Carson,Cheryl,415 548-7723,589 Darwin Ln.,Berkeley,CA,94705,1


In [15]:
#View titles table

In [17]:
%%sql
SELECT *
FROM titles
limit 3

 * sqlite:////Users/annavilardell/Desktop/BootcampData/20-lab-sql-select/publications.db
Done.


title_id,title,type,pub_id,price,advance,royalty,ytd_sales,notes,pubdate
BU1032,The Busy Executive's Database Guide,business,1389,19.99,5000,10,4095,An overview of available database systems with emphasis on common business applications. Illustrated.,1991-06-12 00:00:00
BU1111,Cooking with Computers: Surreptitious Balance Sheets,business,1389,11.95,5000,10,3876,Helpful hints on how to use your electronic resources to the best advantage.,1991-06-09 00:00:00
BU2075,You Can Combat Computer Stress!,business,736,2.99,10125,24,18722,The latest medical and psychological techniques for living with the electronic office. Easy-to-understand explanations.,1991-06-30 00:00:00


In [32]:
#View publishers table

In [31]:
%%sql
SELECT *
FROM publishers
limit 2

 * sqlite:////Users/annavilardell/Desktop/BootcampData/20-lab-sql-select/publications.db
Done.


pub_id,pub_name,city,state,country
736,New Moon Books,Boston,MA,USA
877,Binnet & Hardley,Washington,DC,USA


In [33]:
#From titleauthor: libros de cada autor. ordenamos por author así vemos si alguien a escrito 2 libros

In [50]:
%%sql
SELECT *
FROM titleauthor
order by au_id
limit 3

 * sqlite:////Users/annavilardell/Desktop/BootcampData/20-lab-sql-select/publications.db
Done.


au_id,title_id,au_ord,royaltyper
172-32-1176,PS3333,1,100
213-46-8915,BU1032,2,40
213-46-8915,BU2075,1,100


In [42]:
#hacemos join con las 3 tablas: TABLA FINAL

In [99]:
%%sql
select titleauthor.au_id as "AUTHOR ID",
        authors.au_lname as "LAST NAME",
        authors.au_fname as "FIRST NAME", 
        titles.title as TITLE, 
        publishers.pub_name as PUBLISHER
from titleauthor join titles on titles.title_id = titleauthor.title_id
                join authors on authors.au_id = titleauthor.au_id
                join publishers on publishers.pub_id=titles.pub_id
    

 * sqlite:////Users/annavilardell/Desktop/BootcampData/20-lab-sql-select/publications.db
Done.


AUTHOR ID,LAST NAME,FIRST NAME,TITLE,PUBLISHER
172-32-1176,White,Johnson,Prolonged Data Deprivation: Four Case Studies,New Moon Books
213-46-8915,Green,Marjorie,The Busy Executive's Database Guide,Algodata Infosystems
213-46-8915,Green,Marjorie,You Can Combat Computer Stress!,New Moon Books
238-95-7766,Carson,Cheryl,But Is It User Friendly?,Algodata Infosystems
267-41-2394,O'Leary,Michael,Cooking with Computers: Surreptitious Balance Sheets,Algodata Infosystems
267-41-2394,O'Leary,Michael,"Sushi, Anyone?",Binnet & Hardley
274-80-9391,Straight,Dean,Straight Talk About Computers,Algodata Infosystems
409-56-7008,Bennet,Abraham,The Busy Executive's Database Guide,Algodata Infosystems
427-17-2319,Dull,Ann,Secrets of Silicon Valley,Algodata Infosystems
472-27-2349,Gringlesby,Burt,"Sushi, Anyone?",Binnet & Hardley


## Challenge 2 - Who Have Published How Many At Where?

Elevating from your solution in Challenge 1, query how many titles each author has published at each publisher. Your output should look something like below:

![Challenge 2 output](challenge-2.png)

*Note: the screenshot above is not the complete output.*

To check if your output is correct, sum up the `TITLE COUNT` column. The sum number should be the same as the total number of records in Table `titleauthor`.

*Hint: In order to count the number of titles published by an author, you need to use [ COUNT](https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#count). Also check out [Group By](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#group-by-clause) because you will count the rows of different groups of data. Refer to the references and learn by yourself. These features will be formally discussed in the Temp Tables and Subqueries lesson.*



In [101]:
%%sql
select titleauthor.au_id as "AUTHOR ID",
        authors.au_lname as "LAST NAME",
        authors.au_fname as "FIRST NAME", 
        titles.title as TITLE, 
        publishers.pub_name as PUBLISHER,
        count(titles.title) as "TITLE COUNT"
from titleauthor join titles on titles.title_id = titleauthor.title_id
                join authors on authors.au_id = titleauthor.au_id
                join publishers on publishers.pub_id=titles.pub_id
group by titles.title

 * sqlite:////Users/annavilardell/Desktop/BootcampData/20-lab-sql-select/publications.db
Done.


AUTHOR ID,LAST NAME,FIRST NAME,TITLE,PUBLISHER,TITLE COUNT
238-95-7766,Carson,Cheryl,But Is It User Friendly?,Algodata Infosystems,1
724-80-9391,MacFeather,Stearns,Computer Phobic AND Non-Phobic Individuals: Behavior Variations,Binnet & Hardley,2
267-41-2394,O'Leary,Michael,Cooking with Computers: Surreptitious Balance Sheets,Algodata Infosystems,2
486-29-1786,Locksley,Charlene,Emotional Security: A New Algorithm,New Moon Books,1
648-92-1872,Blotchet-Halls,Reginald,Fifty Years in Buckingham Palace Kitchens,Binnet & Hardley,1
899-46-2035,Ringer,Anne,Is Anger the Enemy?,New Moon Books,2
998-72-3567,Ringer,Albert,Life Without Fear,New Moon Books,1
486-29-1786,Locksley,Charlene,Net Etiquette,Algodata Infosystems,1
807-91-6654,Panteley,Sylvia,"Onions, Leeks, and Garlic: Cooking Secrets of the Mediterranean",Binnet & Hardley,1
172-32-1176,White,Johnson,Prolonged Data Deprivation: Four Case Studies,New Moon Books,1


## Challenge 3 - Best Selling Authors

Who are the top 3 authors who have sold the highest number of titles? Write a query to find out.

Requirements:

* Your output should have the following columns:
	* `AUTHOR_ID` - the ID of the author
	* `LAST_NAME` - author last name
	* `FIRST_NAME` - author first name
	* `TOTAL` - total number of titles sold from this author
* Your output should be ordered based on `TOTAL` from high to low.
* Only output the top 3 best selling authors.

*Hint: In order to calculate the total of profits of an author, you need to use the [SUM function](https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#sum). Refer to the reference and learn how to use it.*




In [92]:
%%sql
select *
from sales

limit 5

 * sqlite:////Users/annavilardell/Desktop/BootcampData/20-lab-sql-select/publications.db
Done.


stor_id,ord_num,ord_date,qty,payterms,title_id
6380,6871,1994-09-14 00:00:00,5,Net 60,BU1032
6380,722a,1994-09-13 00:00:00,3,Net 60,PS2091
7066,A2976,1993-05-24 00:00:00,50,Net 30,PC8888
7066,QA7442.3,1994-09-13 00:00:00,75,ON invoice,PS2091
7067,D4482,1994-09-14 00:00:00,10,Net 60,PS2091


In [107]:
%%sql
select titleauthor.au_id as "AUTHOR ID",
        authors.au_lname as "LAST NAME",
        authors.au_fname as "FIRST NAME", 
        sum (sales.qty) as TOTAL
from titleauthor join titles on titles.title_id = titleauthor.title_id
                join authors on authors.au_id = titleauthor.au_id
                join sales on sales.title_id = titles.title_id
group by titleauthor.au_id
order by total desc

limit 3

 * sqlite:////Users/annavilardell/Desktop/BootcampData/20-lab-sql-select/publications.db
Done.


AUTHOR ID,LAST NAME,FIRST NAME,TOTAL
899-46-2035,Ringer,Anne,148
998-72-3567,Ringer,Albert,133
213-46-8915,Green,Marjorie,50


## Challenge 4 - Best Selling Authors Ranking

Now modify your solution in Challenge 3 so that the output will display all 23 authors instead of the top 3. Note that the authors who have sold 0 titles should also appear in your output (ideally display `0` instead of `NULL` as the `TOTAL`). Also order your results based on `TOTAL` from high to low.



In [113]:
%%sql
select titleauthor.au_id as "AUTHOR ID",
        authors.au_lname as "LAST NAME",
        authors.au_fname as "FIRST NAME", 
        sum(ifnull(sales.qty,0)) as TOTAL
from titleauthor join titles on titles.title_id = titleauthor.title_id
                join authors on authors.au_id = titleauthor.au_id
                join sales on sales.title_id = titles.title_id
group by titleauthor.au_id
order by total desc

 * sqlite:////Users/annavilardell/Desktop/BootcampData/20-lab-sql-select/publications.db
Done.


AUTHOR ID,LAST NAME,FIRST NAME,TOTAL
899-46-2035,Ringer,Anne,148
998-72-3567,Ringer,Albert,133
213-46-8915,Green,Marjorie,50
427-17-2319,Dull,Ann,50
846-92-7186,Hunter,Sheryl,50
267-41-2394,O'Leary,Michael,45
724-80-9391,MacFeather,Stearns,45
722-51-5454,DeFrance,Michel,40
807-91-6654,Panteley,Sylvia,40
238-95-7766,Carson,Cheryl,30


In [114]:
#no me imprime los que no tiene publicaciones, pero no veo el fallo.