# Practice on JOINs
We will use a few small datasets to get some more hands on experience on joins.

In [1]:
# This cell is to enable the "hint" functionality. After each question there is a cell with either a hint about the correct answer or the solution. 
from IPython.display import Pretty as disp
hint = 'https://raw.githubusercontent.com/soltaniehha/Business-Analytics-Toolbox/master/docs/hints/'  # path to hints on GitHub

First, let's have a look at these datasets:

<img src="https://github.com/soltaniehha/Business-Analytics-Toolbox/blob/master/docs/images/sample-rows.png?raw=true" width="1200" align="center"/>

And their schemas:

<img src="https://github.com/soltaniehha/Business-Analytics-Toolbox/blob/master/docs/images/sample-schema.png?raw=true" width="400" align="center"/>

We can also query them directly:

In [2]:
%%bigquery
SELECT * FROM `ba-770.public.transactions`

Unnamed: 0,TransactionID,CustomerID,ProductID,Quantity,Date,Time
0,4203847,8103954,444,1,2019-08-09,11:20
1,4786540,8530495,444,1,2019-08-11,13:11
2,4392037,8574930,222,1,2019-08-08,9:00
3,4328790,8574930,111,3,2019-08-10,8:34
4,4029348,8430294,444,2,2019-08-10,12:30
5,4392564,8430294,222,2,2019-08-10,10:34


In [3]:
%%bigquery
SELECT * FROM `ba-770.public.product_info`

Unnamed: 0,ID,Name,Price
0,444,Water,0.99
1,111,Orange Juice,3.49
2,222,Milk 3%,2.95
3,333,Rice,4.99


In [4]:
%%bigquery
SELECT * FROM `ba-770.public.customer_info`

Unnamed: 0,CustomerID,CustomerName,Email,City,Age
0,8530495,John,john_smith@yahoo.com,Boston,32
1,8574930,Mary,mk_888@gmail.com,Boston,22
2,8103954,Sue,ss_2000@gmail.com,Allston,36
3,8430294,Alex,ag_23@hotmail.com,Brookline,21


In [5]:
%%bigquery
SELECT * FROM `ba-770.public.customer_info_past`

Unnamed: 0,CustomerName,Email,City,Age,CustomerID
0,Alberto,al99@gmail.com,Cambridge,25,8574839
1,Maria,maria_lopez@yahoo.com,Brookline,43,8920395


### Question1 
Which product is the most popular one? (based on quantity)

In [6]:
%%bigquery
SELECT ProductID, SUM(Quantity) total FROM `ba-770.public.transactions` 
GROUP BY ProductID
ORDER BY total DESC

Unnamed: 0,ProductID,total
0,444,4
1,222,3
2,111,3


We can see that it is product 444 (bought 4 times), and from the `product_info` table we can see that it's water. 

Can you write a JOIN statement that does this match finding for us automatically so we don't need to look up another table?

In [7]:
# Your answer goes here

ProductID,total,Name
444,4,Water
222,3,Milk 3%
111,3,Orange Juice


In [8]:
# HINT: Uncomment and execute the cell below to get help
#disp(hint + '06-03-prod-join-hint')

In [9]:
# SOLUTION: Uncomment and execute the cell below to get help
#disp(hint + '06-03-prod-join')

### Question 2
What was the revenue generated by each product? List it by their names.

Try to reuse the query from the previous question with modifications.

In [10]:
# Your answer goes here

Name,Revenue
Orange Juice,10.47
Milk 3%,8.850000000000001
Water,3.96
Rice,


In [11]:
# SOLUTION: Uncomment and execute the cell below to get help
#disp(hint + '06-03-prod-rev')

### Question 3
List all of the customers (current and past) and display all of the products they have purchased.

#### Step1
As the first step let's combine both of our customer datasets:

In [12]:
# Your answer goes here

CustomerID,CustomerName,Email,City,Age
8574839,Alberto,al99@gmail.com,Cambridge,25
8920395,Maria,maria_lopez@yahoo.com,Brookline,43
8530495,John,john_smith@yahoo.com,Boston,32
8574930,Mary,mk_888@gmail.com,Boston,22
8103954,Sue,ss_2000@gmail.com,Allston,36
8430294,Alex,ag_23@hotmail.com,Brookline,21


In [13]:
# HINT: Uncomment and execute the cell below to get help
#disp(hint + '06-03-cust-union-hint')

In [14]:
# SOLUTION: Uncomment and execute the cell below to get help
#disp(hint + '06-03-cust-union')

#### Step2
Now we can use this new query to find what these customers have purchased. For this step let's join this query with our transactions table to find out what products they have been purchasing:

In [15]:
# Your answer goes here

CustomerName,ProductID
John,444.0
Mary,222.0
Mary,111.0
Sue,444.0
Alex,444.0
Alex,222.0
Alberto,
Maria,


In [16]:
# SOLUTION: Uncomment and execute the cell below to get help
#disp(hint + '06-03-cust-union-trans')

#### Step3
We can "chain" another join to the one above and bring in the product names:

In [17]:
# Your answer goes here

CustomerName,ProductID,ProdcutName
John,444.0,Water
Mary,222.0,Milk 3%
Mary,111.0,Orange Juice
Sue,444.0,Water
Alex,444.0,Water
Alex,222.0,Milk 3%
Alberto,,
Maria,,


In [18]:
# SOLUTION: Uncomment and execute the cell below to get help
#disp(hint + '06-03-cust-union-trans-prod')

#### Step4
Finally, with this table we can perform a GROUP BY and use `STRING_AGG()` to get to our answer.

Hint: Use a subquery to make things easier to write and read.

In [19]:
# Your answer goes here

CustomerName,Products
Alberto,
Alex,"Water,Milk 3%"
John,Water
Maria,
Mary,"Milk 3%,Orange Juice"
Sue,Water


In [20]:
# SOLUTION: Uncomment and execute the cell below to get help
#disp(hint + '06-03-cust-union-trans-prod-final')

### Question 4
Customers from which city have spent the most amount of money? What is their average age?

1. Union the two customer tables
2. Join it with transactions
3. Join the result with the product table
4. Aggregate over City

In [21]:
# Your answer goes here

City,AverageAge,Revenue
Boston,25.33333333333333,7.430000000000001
Brookline,28.33333333333333,3.94
Allston,36.0,0.99
Cambridge,25.0,


In [22]:
# SOLUTION: Uncomment and execute the cell below to get help
#disp(hint + '06-03-city-rev')

### Question 5
Take the last query and
1. print a rounded version of the AverageAge
2. Print only the cities that have generated some revenue

In [23]:
# Your answer goes here

City,AverageAge,Revenue
Boston,25.0,7.430000000000001
Brookline,28.0,3.94
Allston,36.0,0.99


In [24]:
# SOLUTION: Uncomment and execute the cell below to get help
#disp(hint + '06-03-city-rev-clean')