# Lab | SQL Select 

## <font color=red>En este lab hice de Alumno de la semana, tenía las soluciones 😉.</font>

## Introduction

In this lab you will practice how to use the `SELECT` statement which will be extremely useful in your future work as a data analyst/scientist/engineer. **You will use the `publications` database.** 

You will create a `solutions.ipynb` file in the `your-code` directory to record your solutions to all challenges.

In [1]:
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/Users/paula/Ironhack/Ironhack_Data_Analytics.json" 

In [2]:
from google.cloud import bigquery

In [3]:
client = bigquery.Client()

## Challenge 1 - Who Have Published What At Where?

In this challenge you will write a `SELECT` query that joins various tables to figure out what titles each author has published at which publishers. Your output should have at least the following columns:

* `AUTHOR_ID` - the ID of the author
* `LAST_NAME` - author last name
* `FIRST_NAME` - author first name
* `TITLE` - name of the published title
* `PUBLISHER` - name of the publisher where the title was published


In [4]:
QUERY1 = """ 
SELECT 
    auth.au_id AS author_id, 
    auth.au_lname AS last_name, 
    auth.au_fname AS first_name,
    titles.title AS title, 
    pubs.pub_name AS publisher

FROM 
    `ironhack-data-analytics.publications.authors` auth
INNER JOIN 
    `ironhack-data-analytics.publications.titleauthor` titau 
ON 
    auth.au_id = titau.au_id
INNER JOIN 
    `ironhack-data-analytics.publications.titles` titles 
ON 
    titau.title_id  = titles.title_id
INNER JOIN 
    `ironhack-data-analytics.publications.publishers` pubs 
ON 
    titles.pub_id  = pubs.pub_id
    
ORDER BY 
    auth.au_id ASC
"""

In [5]:
query_job = client.query(QUERY1)

In [6]:
df=query_job.to_dataframe()

In [7]:
df.head(10)

Unnamed: 0,author_id,last_name,first_name,title,publisher
0,172-32-1176,White,Johnson,Prolonged Data Deprivation: Four Case Studies,New Moon Books
1,213-46-8915,Green,Marjorie,The Busy Executive's Database Guide,Algodata Infosystems
2,213-46-8915,Green,Marjorie,You Can Combat Computer Stress!,New Moon Books
3,238-95-7766,Carson,Cheryl,But Is It User Friendly?,Algodata Infosystems
4,267-41-2394,O'Leary,Michael,Cooking with Computers: Surreptitious Balance ...,Algodata Infosystems
5,267-41-2394,O'Leary,Michael,"Sushi, Anyone?",Binnet & Hardley
6,274-80-9391,Straight,Dean,Straight Talk About Computers,Algodata Infosystems
7,409-56-7008,Bennet,Abraham,The Busy Executive's Database Guide,Algodata Infosystems
8,427-17-2319,Dull,Ann,Secrets of Silicon Valley,Algodata Infosystems
9,472-27-2349,Gringlesby,Burt,"Sushi, Anyone?",Binnet & Hardley


In [8]:
QUERY2 = """ 
SELECT 
    COUNT(*) AS total

FROM 
    `ironhack-data-analytics.publications.titleauthor`
"""

In [9]:
query_job = client.query(QUERY2)

In [10]:
df2=query_job.to_dataframe()

In [11]:
df2

Unnamed: 0,total
0,25


In [12]:
df.count()

author_id     25
last_name     25
first_name    25
title         25
publisher     25
dtype: int64

## Challenge 2 - Who Have Published How Many At Where?

Elevating from your solution in Challenge 1, query how many titles each author has published at each publisher. 
Your output should look something like below:

![Challenge 2 output](challenge-2.png)

*Note: the screenshot above is not the complete output.*

To check if your output is correct, sum up the `TITLE COUNT` column. The sum number should be the same 
as the total number of records in Table `titleauthor`.

*Hint: In order to count the number of titles published by an author, you need to use [ COUNT]
(https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#count). 

Also check out [Group By](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#group-by-clause) 
because you will count the rows of different groups of data. Refer to the references and learn by yourself. 
                                                                                               
 
 These features will be formally discussed in the Temp Tables and Subqueries lesson.*

In [13]:
QUERY3 = """ 
SELECT 
    auth.au_id AS author_id, 
    auth.au_lname AS last_name, 
    auth.au_fname AS first_name,
    titles.title AS title, 
    COUNT(titles.title_id) AS title_count

FROM 
    `ironhack-data-analytics.publications.authors` auth
INNER JOIN 
    `ironhack-data-analytics.publications.titleauthor` titau 
ON 
    auth.au_id = titau.au_id

INNER JOIN 
    `ironhack-data-analytics.publications.titles` titles 
ON 
    titau.title_id  = titles.title_id
INNER JOIN 
    `ironhack-data-analytics.publications.publishers` pubs 
ON 
    titles.pub_id  = pubs.pub_id
GROUP BY
    1,2,3,4
ORDER BY
    title_count DESC, author_id DESC
"""

In [14]:
query_job = client.query(QUERY3)

In [15]:
df3=query_job.to_dataframe()

In [16]:
df3.head(10)

Unnamed: 0,author_id,last_name,first_name,title,title_count
0,998-72-3567,Ringer,Albert,Life Without Fear,1
1,998-72-3567,Ringer,Albert,Is Anger the Enemy?,1
2,899-46-2035,Ringer,Anne,Is Anger the Enemy?,1
3,899-46-2035,Ringer,Anne,The Gourmet Microwave,1
4,846-92-7186,Hunter,Sheryl,Secrets of Silicon Valley,1
5,807-91-6654,Panteley,Sylvia,"Onions, Leeks, and Garlic: Cooking Secrets of ...",1
6,756-30-7391,Karsen,Livia,Computer Phobic AND Non-Phobic Individuals: Be...,1
7,724-80-9391,MacFeather,Stearns,Computer Phobic AND Non-Phobic Individuals: Be...,1
8,724-80-9391,MacFeather,Stearns,Cooking with Computers: Surreptitious Balance ...,1
9,722-51-5454,DeFrance,Michel,The Gourmet Microwave,1


## Challenge 3 - Best Selling Authors

Who are the top 3 authors who have sold the highest number of titles? Write a query to find out.

Requirements:

* Your output should have the following columns:
	* `AUTHOR_ID` - the ID of the author
	* `LAST_NAME` - author last name
	* `FIRST_NAME` - author first name
	* `TOTAL` - total number of titles sold from this author
* Your output should be ordered based on `TOTAL` from high to low.
* Only output the top 3 best selling authors.

*Hint: In order to calculate the total of profits of an author, you need to use the 
[SUM function](https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#sum). 
Refer to the reference and learn how to use it.*

In [17]:
QUERY4 = """ 
SELECT 
    auth.au_id AS author_id, 
    auth.au_lname AS last_name, 
    auth.au_fname AS first_name,
    SUM(sales.qty) AS total

FROM 
    `ironhack-data-analytics.publications.authors` auth
INNER JOIN 
    `ironhack-data-analytics.publications.titleauthor` titau 
ON 
    auth.au_id = titau.au_id
INNER JOIN 
    `ironhack-data-analytics.publications.titles` titles 
ON 
    titau.title_id  = titles.title_id
INNER JOIN 
    `ironhack-data-analytics.publications.sales` sales 
ON 
    titles.title_id  = sales.title_id
GROUP BY
    1,2,3
ORDER BY
    total DESC
"""

In [18]:
query_job = client.query(QUERY4)

In [19]:
df4=query_job.to_dataframe()

In [20]:
df4.head(3)

Unnamed: 0,author_id,last_name,first_name,total
0,899-46-2035,Ringer,Anne,148
1,998-72-3567,Ringer,Albert,133
2,213-46-8915,Green,Marjorie,50


## Challenge 4 - Best Selling Authors Ranking

Now modify your solution in Challenge 3 so that the output will display all 23 authors instead of the top 3. 
Note that the authors who have sold 0 titles should also appear in your output 
(ideally display `0` instead of `NULL` as the `TOTAL`). 

Also order your results based on `TOTAL` from high to low.

In [33]:
QUERY5 = """ 
SELECT 
    auth.au_id AS author_id, 
    auth.au_lname AS last_name, 
    auth.au_fname AS first_name,
    COALESCE(SUM(sales.qty), 0) AS total

FROM 
    `ironhack-data-analytics.publications.authors` auth
LEFT JOIN 
    `ironhack-data-analytics.publications.titleauthor` titau 
ON 
    auth.au_id = titau.au_id
LEFT JOIN 
    `ironhack-data-analytics.publications.titles` titles 
ON 
    titau.title_id  = titles.title_id
LEFT JOIN 
    `ironhack-data-analytics.publications.sales` sales 
ON 
    titles.title_id  = sales.title_id
GROUP BY
    1,2,3
ORDER BY
    total DESC
"""

In [34]:
query_job = client.query(QUERY5)

In [35]:
df5=query_job.to_dataframe()

In [36]:
df5.head(23)

Unnamed: 0,author_id,last_name,first_name,total
0,899-46-2035,Ringer,Anne,148
1,998-72-3567,Ringer,Albert,133
2,213-46-8915,Green,Marjorie,50
3,427-17-2319,Dull,Ann,50
4,846-92-7186,Hunter,Sheryl,50
5,724-80-9391,MacFeather,Stearns,45
6,267-41-2394,O'Leary,Michael,45
7,807-91-6654,Panteley,Sylvia,40
8,722-51-5454,DeFrance,Michel,40
9,238-95-7766,Carson,Cheryl,30


## Bonus Challenge - Most Profiting Authors

Authors earn money from their book sales in two ways: advance and royalties. 
    An advance is the money that the publisher pays the author before the book comes out. 
    The royalties the author will receive is typically a percentage of the entire book sales. 
    The total profit an author receives by publishing a book is the sum of the advance and the royalties.

Given the information above, who are the 3 most profiting authors and how much royalties each of them have received?
Write a query to find out.

Requirements:

* Your output should have the following columns:
	* `AUTHOR_ID` - the ID of the author
	* `LAST_NAME` - author last name
	* `FIRST_NAME` - author first name
	* `PROFIT` - total profit the author has received combining the advance and royalties
* Your output should be ordered from higher `PROFIT` values to lower values.
* Only output the top 3 most profiting authors.

*Hints:* 

* If a title has multiple authors, how they split the royalties can be found in the `royaltyper` column
of the `titleauthor` table.
* We assume the coauthors will split the advance in the same way as the royalties.

In [47]:
QUERY6 = """  
WITH royalties_table AS(
SELECT 
    title_id,
    au_id, 
    au_lname, 
    au_fname,
    SUM(advance_au) AS advances,
    SUM(royalties) AS royalties
FROM 
    (SELECT
        titles.title_id,
        titles.price,
        (titles.advance * titau.royaltyper / 100) AS advance_au,
        titles.royalty,
        sales.qty,
        authors.au_id,
        au_lname,
        au_fname,
        titau.royaltyper,
        (titles.price * sales.qty * titles.royalty * titau.royaltyper / 10000) AS royalties
    FROM
        `ironhack-data-analytics.publications.titles` titles         
    INNER JOIN 
        `ironhack-data-analytics.publications.sales` sales 
    ON 
        titles.title_id  = sales.title_id
    INNER JOIN 
        `ironhack-data-analytics.publications.titleauthor` titau 
    ON 
        sales.title_id = titau.title_id    
    INNER JOIN 
        `ironhack-data-analytics.publications.authors` authors 
    ON     
        authors.au_id = titau.au_id)
GROUP BY
    1,2,3,4)
            SELECT
                au_id AS author_id,
                au_lname AS last_name,
                au_fname AS first_name,
                sum(advances + royalties) AS profits
            FROM
                royalties_table
            GROUP BY
                1,2,3
            ORDER BY
                profits DESC
            LIMIT 3
"""

In [48]:
query_job = client.query(QUERY6)

In [49]:
df6=query_job.to_dataframe()

In [50]:
df6

Unnamed: 0,author_id,last_name,first_name,profits
0,722-51-5454,DeFrance,Michel,22521.528
1,213-46-8915,Green,Marjorie,14162.11
2,899-46-2035,Ringer,Anne,12128.132
