# Lab | Advanced SQL

## Introduction

In this lab you will practice SQL Subqueries and Action Queries. We will again use the publications database, used in the past lab.

Create a `solutions.ipynb` file in the `your-code` directory to record your solutions to all challenges.

## Challenge 1 - Most Profiting Authors

In this challenge let's have a close look at the bonus challenge of the previous *SQL SELECT* lab -- **who are the top 3 most profiting authors**? Even if you have solved or think you have solved that problem in the previous lab, please still complete this challenge because the step-by-step guidances are helpful to train your problem-solving thinking.

In order to solve this problem, it is important for you to keep the following points in mind:

* In table `sales`, a title can appear several times. The royalties need to be calculated for each sale.

* Despite a title can have multiple `sales` records, the advance must be calculated only once for each title.

* In your eventual solution, you need to sum up the following profits for each individual author:
    * All advances which is calculated exactly once for each title.
    * All royalties in each sale.

Therefore, you will not be able to achieve the goal with a single SELECT query. Instead, you will need to follow several steps in order to achieve the eventual solution. Below is an overview of the steps:

1. Calculate the royalty of each sale for each author.

1. Using the output from Step 1 as a sub-table, aggregate the total royalties for each title for each author.

1. Using the output from Step 2 as a sub-table, calculate the total profits of each author by aggregating the advances and total royalties of each title.

Below we'll guide you through each step. In your `solutions.ipynb`, please include the SELECT queries of each step so that your TA can review your problem-solving process.

### Step 1: Calculate the royalties of each sales for each author

Write a SELECT query to obtain the following output:

* Title ID
* Author ID
* Royalty of each sale for each author
    * The formula is:
        ```
        sales_royalty = titles.price * sales.qty * titles.royalty / 100 * titleauthor.royaltyper / 100
        ```
    * Note that `titles.royalty` and `titleauthor.royaltyper` are divided by 100 respectively because they are percentage numbers instead of floats.

In the output of this step, each title may appear more than once for each author. This is because a title can have more than one sales.

In [1]:
#Cargando SQL y accediendo a nuestra base de datos.

%load_ext sql
%sql sqlite:///publications.db

'Connected: @publications.db'

In [3]:
#En este query simplemente unimos las 3 tablas necesarias (titles, titleauthor y sales) para realizar un select de los royalties
#aprovechando la fórmula proporcionada en el enunciado.

In [4]:
%%sql
select t.title_id as Title_id, ta.au_id as Author_id, (t.price * s.qty * t.royalty / 100 * ta.royaltyper / 100) as Royalty
from titles t inner join titleauthor ta on t.title_id = ta.title_id 
inner join sales s on t.title_id = s.title_id

 * sqlite:///publications.db
Done.


Title_id,Author_id,Royalty
PS3333,172-32-1176,29.985
BU1032,213-46-8915,3.9979999999999993
BU1032,213-46-8915,7.995999999999999
BU2075,213-46-8915,25.116000000000003
PC1035,238-95-7766,110.16
BU1111,267-41-2394,11.95
TC7777,267-41-2394,8.994
BU7832,274-80-9391,29.985
BU1032,409-56-7008,5.996999999999999
BU1032,409-56-7008,11.993999999999998


### Step 2: Aggregate the total royalties for each title for each author

Using the output from Step 1, write a query to obtain the following output:

* Title ID
* Author ID
* Aggregated royalties of each title for each author
* Hint: use the *SUM* subquery and group by both `au_id` and `title_id`

In the output of this step, each title should appear only once for each author.

In [5]:
#Usamos el mismo query que antes pero sumando los royalties mediante *sum*, cambiamos el nombre de su columna a uno más acorde
#y finalmente agrupamos por autor y título. En primera instancia pensaba que había un error por mi parte hasta que me di cuenta
#de que algunos títulos tienen más de un autor, por lo que más de un individuo cobra royalties por el mismo item.

In [6]:
%%sql
select t.title_id as Title_id, ta.au_id as Author_id, sum(t.price * s.qty * t.royalty / 100 * ta.royaltyper / 100) as Aggregated_royalty
from titles t inner join titleauthor ta on t.title_id = ta.title_id 
inner join sales s on t.title_id = s.title_id
group by ta.au_id, t.title_id

 * sqlite:///publications.db
Done.


Title_id,Author_id,Aggregated_royalty
PS3333,172-32-1176,29.985
BU1032,213-46-8915,11.993999999999998
BU2075,213-46-8915,25.116000000000003
PC1035,238-95-7766,110.16
BU1111,267-41-2394,11.95
TC7777,267-41-2394,8.994
BU7832,274-80-9391,29.985
BU1032,409-56-7008,17.990999999999996
PC8888,427-17-2319,50.0
TC7777,472-27-2349,8.994


### Step 3: Calculate the total profits of each author

Now that each title has exactly one row for each author where the advance and royalties are available, we are ready to obtain the eventual output. Using the output from Step 2, write a query to obtain the following output:

* Author ID
* Profits of each author by aggregating the advance and total royalties of each title

Sort the output based on a total profits from high to low, and limit the number of rows to 3.

In [7]:
#Usamos una subquery para extraer el autor y el total de ganancias del query anterior. Posteriormente agrupamos en orden
#descendente y mostramos sólo los 3 autores top.

In [9]:
%%sql
select Author_id, (Aggregated_royalty + advance) as Total_profits
from 
(select t.title_id as Title_id, ta.au_id as Author_id, sum(t.price * s.qty * t.royalty / 100 * ta.royaltyper / 100) as Aggregated_royalty, t.advance as advance
from titles t inner join titleauthor ta on t.title_id = ta.title_id 
inner join sales s on t.title_id = s.title_id
group by ta.au_id, t.title_id)
order by Total_profits desc 
limit 3

 * sqlite:///publications.db
Done.


Author_id,Total_profits
722-51-5454,15021.528
899-46-2035,15007.176
213-46-8915,10150.116


## Challenge 2

Elevating from your solution in Challenge 1 , create a table named `most_profiting_authors` to hold the data about the most profiting authors. The table should have 2 columns:

* `au_id` - Author ID
* `profits` - The profits of the author aggregating the advances and royalties

In [10]:
#Creamos una tabla que contenga el anterior query con los nombres proporcionados en el enunciado.

In [12]:
%%sql
create table most_profiting_authors as 
select Author_id as au_id, (Aggregated_royalty + advance) as profits
from 
(select t.title_id as Title_id, ta.au_id as Author_id, sum(t.price * s.qty * t.royalty / 100 * ta.royaltyper / 100) as Aggregated_royalty, t.advance as advance
from titles t inner join titleauthor ta on t.title_id = ta.title_id 
inner join sales s on t.title_id = s.title_id
group by ta.au_id, t.title_id)
order by profits desc

 * sqlite:///publications.db
(sqlite3.OperationalError) database is locked
[SQL: create table most_profit_authors as 
select Author_id as au_id, (Aggregated_royalty + advance) as profits
from 
(select t.title_id as Title_id, ta.au_id as Author_id, sum(t.price * s.qty * t.royalty / 100 * ta.royaltyper / 100) as Aggregated_royalty, t.advance as advance
from titles t inner join titleauthor ta on t.title_id = ta.title_id 
inner join sales s on t.title_id = s.title_id
group by ta.au_id, t.title_id)
order by profits desc]
(Background on this error at: http://sqlalche.me/e/13/e3q8)


In [13]:
#Por alguna razón en el notebook no me crea bien la tabla aunque especifique otro nombre (para evitar duplicados), desde
#DB Browser no tengo problemas.

<div align="right">Ironhack DA PT 2021</div>
    
<div align="right">Xavier Esteban</div>