# Using Joins to Analyze Book Sales in SQL

We will look at a database from a fictional bookstore. 

First we will use some simple queries to get a feel for the structure and meaning of the different tables and columns. After which we will join two tables together to create insights we could otherwise not get. First with a basic join, then building on top of the simple join using `group by` and `count`.

In [None]:
-- Selecting some customer data at random to make sure integration is set up correctly

SELECT *
FROM customer
LIMIT 10


Unnamed: 0,customer_id,first_name,last_name,email
0,1,Ursola,Purdy,upurdy0@cdbaby.com
1,2,Ruthanne,Vatini,rvatini1@fema.gov
2,3,Reidar,Turbitt,rturbitt2@geocities.jp
3,4,Rich,Kirsz,rkirsz3@jalbum.net
4,5,Carline,Kupis,ckupis4@tamu.edu
5,6,Kandy,Adamec,kadamec5@weather.com
6,7,Jermain,Giraudeau,jgiraudeau6@elpais.com
7,8,Nolly,Bonicelli,nbonicelli7@examiner.com
8,9,Phebe,Curdell,pcurdell8@usa.gov
9,10,Euell,Guilder,eguilder9@themeforest.net


## 1. How are the tables linked in the Gravity Books database?

- We'll be focusing on the customer and order related tables.

### What do the tables tell us about the database structure?

In this case we can derive from the tables and columns that:

- `customer` has order in `cust_order` 
- The status of the orders in `cust_order` is in `order_history` 
- To understand what the `status_id` in `order_history` means we need to look at `order_status`.
- More information about what is in an order from `cust_order` is in `order_line`.

## 2. What do the ids in the status tables mean?

- Selecting the full `address_status` table to understand which id responds to which status & repeating for the `order_status` field.

In [3]:
-- Look up the meaning of address_status ids
SELECT * FROM address_status;

-- Look up the meaning of order_status ids
SELECT * FROM order_status;

Unnamed: 0,status_id,status_value
0,1,Order Received
1,2,Pending Delivery
2,3,Delivery In Progress
3,4,Delivered
4,5,Cancelled
5,6,Returned


## 3. How does the order_history table work?

The table name indicates that this table will hold historic information in some way, looking at the columns we can see that each entry get a unique id, `history_id`, indicating that there is most likely going to be multiple entries per order_id.
Additionally the only other columns are `status_id` and `status_date` column, indicating this table is storing the date at which an order moved to a specific status.

In [7]:
-- Select an order_id from order_history with a status_id corresponding to `Returned`
--SELECT * FROM order_history WHERE status_id = 6 LIMIT 10;

-- Select all data in order_history with the order_id you found with the query above.
SELECT * FROM order_history WHERE order_id = 4412;

Unnamed: 0,history_id,order_id,status_id,status_date
0,4411,4412,1,2021-01-13 10:54:14.267000+00:00
1,10842,4412,2,2021-01-14 07:27:36.267000+00:00
2,14396,4412,3,2021-01-14 07:49:58.267000+00:00
3,19564,4412,4,2021-01-16 04:00:18.267000+00:00
4,22149,4412,6,2021-01-13 10:56:33.267000+00:00


## 4. How many errors are returned by users?

A colleague working in the bookstore had the same customer come in twice in a single week to return an order. They thought this was unusual and asked you to investigate. Let us look into the data to see if there is an issue with a significant amount of users returning multiple orders.

- Writing a query to find the amount of returned orders in `order_history`.

In [9]:
SELECT
	COUNT(*)
FROM order_history
WHERE status_id = 6

Unnamed: 0,count
0,200


## 5. Join the order_history and cust_order tables

To analyze the number of users returning multiple orders we have to link returned orders to customers.


In [18]:
--SELECT *
--FROM order_history AS oh
--INNER JOIN cust_order AS co
--	ON oh.order_id = co.order_id
--WHERE oh.status_id = 6;

SELECT customer_id, COUNT(*) AS num_of_returns
FROM (SELECT *
		FROM order_history AS oh
		INNER JOIN cust_order AS co
			ON oh.order_id = co.order_id
		WHERE oh.status_id = 6) AS sample
GROUP BY customer_id
HAVING COUNT(*) > 1
ORDER BY num_of_returns DESC;


Unnamed: 0,customer_id,num_of_returns
0,107,3
1,60,3
2,52,2
3,97,2
4,103,2
5,42,2
6,43,2
7,57,2
8,3,2
9,546,2


## 6. Do a significant number of users return multiple orders?

Now that we successfully joined the `order_history` and `cust_order` table we can add the other parts needed to get a clear view of outliers in the data.

### Instructions

- Only looking at data of returned orders.
- Ordering the data on the amount of returned orders from most to least.

In [3]:
SELECT co.customer_id,
		COUNT(oh.order_id) AS returned_orders
FROM order_history AS oh
INNER JOIN cust_order AS co
	ON oh.order_id = co.order_id
WHERE oh.status_id = 6
GROUP BY co.customer_id
ORDER BY returned_orders DESC

Unnamed: 0,customer_id,returned_orders
0,60,3
1,107,3
2,83,2
3,42,2
4,94,2
...,...,...
171,92,1
172,1598,1
173,679,1
174,337,1


## 7. Who are the customers returning more orders than usual?

After showing that there are some users returning more orders than others, we would like to create a list of users to investigate or contact and get to the bottom of why these users return more orders than others.

- Filtering for users that have returned 2 or more orders using a `having` clause.
- Adding additional data for each customer: `first_name`, `last_name` and `email`.

In [4]:
SELECT co.customer_id, c.first_name, c.last_name, c.email,
		COUNT(oh.order_id) AS returned_orders
FROM cust_order AS co
INNER JOIN order_history AS oh
	ON co.order_id = oh.order_id
LEFT JOIN customer AS c
	ON co.customer_id = c.customer_id
WHERE oh.status_id = 6
GROUP BY co.customer_id, c.first_name, c.last_name, c.email
HAVING COUNT(oh.order_id) >= 2
ORDER BY returned_orders DESC

Unnamed: 0,customer_id,first_name,last_name,email,returned_orders
0,107,Estelle,Alton,ealton2y@vimeo.com,3
1,60,Kincaid,De Avenell,kdeavenell1n@wikia.com,3
2,38,,,,2
3,2,Ruthanne,Vatini,rvatini1@fema.gov,2
4,57,Doyle,Shimwell,dshimwell1k@ox.ac.uk,2
5,52,Levy,Thacker,lthacker1f@hc360.com,2
6,344,Mariette,Tulley,mtulley9j@yahoo.co.jp,2
7,248,Edyth,Revie,erevie6v@liveinternet.ru,2
8,97,Noellyn,Sanderson,nsanderson2o@webnode.com,2
9,14,Gusella,Quogan,gquogand@whitehouse.gov,2
