# Practical Uses of SQL For Making Business Decisions

The goal for this project is to come up with business decisions based on SQL queries. This project is based off the Chinook Database, with the following schema.

<img src = 'chinook-schema.svg' width=500>

In [1]:
%%capture
%load_ext sql
%sql sqlite:///chinook.db

In [2]:
%%sql
SELECT name, type FROM sqlite_master
WHERE type IN ('table', 'view');

 * sqlite:///chinook.db
Done.


name,type
album,table
artist,table
customer,table
employee,table
genre,table
invoice,table
invoice_line,table
media_type,table
playlist,table
playlist_track,table


## Topselling Genres (To Determine New Record Purchases)

The first thing I am going to want to do is get a look at what tables and views the database has. In addition to all the tables in the schema, there are two views:

1. customer_usa: Customers in the USA
2. customer_gt_90_dollars: Customers who have purchased more than 90 dollars in music

A new record label has signed on to distribute through our store. We've come to an agreement where I will sell three of the 4 new albums they are selling:

|Artist Name| 	Genre|
|---|---|
|Regal |	Hip-Hop|
|Red Tone |	Punk|
|Meteor and the Girls |	Pop
|Slim Jim Bites |	Blues

Since I don't know anything about the individual artists, the only information I have to know what I want to purchase is the name of the artist and the genre.

In general, unless it's a big named artist, the name has low impact on the sales. These names are fairly neutral in sentiment, so I will rely on the genre to be the deciding factor of my choice.

In [3]:
%%sql
DROP VIEW genre_count;
CREATE VIEW genre_count AS
SELECT g.name genre_name,
       SUM(il.quantity) number_of_tracks
FROM invoice_line il
INNER JOIN track t ON t.track_id = il.track_id
INNER JOIN genre g ON t.genre_id = g.genre_id
GROUP BY 1
ORDER BY 2 DESC;

SELECT genre_name,
       number_of_tracks,
      ROUND(CAST(number_of_tracks AS FLOAT) * 100 / (SELECT SUM(number_of_tracks)
                                          FROM genre_count),2) genre_percent
FROM genre_count;

 * sqlite:///chinook.db
Done.
Done.
Done.


genre_name,number_of_tracks,genre_percent
Rock,2635,55.39
Metal,619,13.01
Alternative & Punk,492,10.34
Latin,167,3.51
R&B/Soul,159,3.34
Blues,124,2.61
Jazz,121,2.54
Alternative,117,2.46
Easy Listening,74,1.56
Pop,63,1.32


Since the 4 options were Punk, Pop, Hip Hop, and Blues, and my sales quantities are shown above. It looks like I will be ordering the Punk, Blues, and Pop albums and leaving off the Hip Hop album. As it stands, since Pop and Hip Hop are fairly close to each other around 1\% of sales, I would probably just take the punk and blues albums.

Something that I note from my own music experience, Alternative \& punk are so different that they should not be combined into one genre. Punk by itself is already a diverse genre, and Alternative is rather all encompassing. These two being grouped together may be throwing the representation of the sales entirely out of whack

## Top Sales Agents

I want to determine the top sales agents for my music store. If I can determine who has the best selling velocity, value/time, then I can try to determine what makes them sell more than ther other sales folk. That information will be valuable to train new sales agents.

I am going to look at a table involving the Agent Name, Total Sales, and Initial Start Date. Then, I will determine sales velocity after.

- I will first compile a table of every customer that has the customer ID and total sales
- I will link this table with the employee table to then gather total sales by each employee. I will also include their start date

In [4]:
%%sql
DROP VIEW IF EXISTS customer_total;
CREATE VIEW customer_total AS
    SELECT c.customer_id customer_id,
           c.support_rep_id support_rep_id,
           PRINTF('%.2f',SUM(i.total)) total
    FROM customer c
    INNER JOIN invoice i ON c.customer_id = i.customer_id
    GROUP BY 1;

SELECT * FROM customer_total
LIMIT 5;

 * sqlite:///chinook.db
Done.
Done.
Done.


customer_id,support_rep_id,total
1,3,108.9
2,5,82.17
3,3,99.99
4,4,72.27
5,4,144.54


In [5]:
%%sql
SELECT e.first_name || ' ' || e.last_name sales_agent,
       e.hire_date,
       JULIANDAY((SELECT MAX(invoice_date) FROM invoice)) - JULIANDAY(hire_date) work_length_days,
       PRINTF('%.2f',SUM(ct.total)) sales_total,
       SUM(ct.total) / (JULIANDAY((SELECT MAX(invoice_date) FROM invoice)) - JULIANDAY(hire_date)) AS sales_velocity
FROM employee e
INNER JOIN customer_total ct
ON e.employee_id = ct.support_rep_id
GROUP BY sales_agent
ORDER BY sales_velocity DESC;

 * sqlite:///chinook.db
Done.


sales_agent,hire_date,work_length_days,sales_total,sales_velocity
Jane Peacock,2017-04-01 00:00:00,1369.0,1731.51,1.264799123447772
Steve Johnson,2017-10-17 00:00:00,1170.0,1393.92,1.1913846153846157
Margaret Park,2017-05-03 00:00:00,1337.0,1584.0,1.1847419596110698


### Conclusions

- After looking through the invoice data, I noticed that the last invoice was dated 2020-12-30 and that no employees were listed with a date of leaving employment.

- Using the difference between the last invoice date and the date of hire, I was able to create a length of time, in days, that each employee had worked for the company.

- Dividing the total sales volume by the number of days leads me to the metric that I am calling *sales velocity* 

$$\text{Sales Velocity} = \frac{\text{Total Sales}}{\text{Days Worked}}$$

- Initially, looking at the hire date, I assumed that Jane Peacock was not doing the best because Steve Johnson had accumulated only a few hundred dollars less in sales while having worked 6 months less than Jane. This was before looking at the final invoice date.

- Our clear winner, is Jane Peacock with almost 5 percent hire sales than the other two. Steve Johnson comes in second, only slightly ahead of Margaret Park.

## Sales Variation by Country

It is important while looking through sales to determine which countries are our highest sellers, and which countries have opportunities for us to improve sales. If we know who is purchasing the most we can use that information to determine marketing strategies that will help going forward.

I am going to create a table with the following information grouped by country:

- total number of customers
- total value of sales
- average value of sales per customer
- average order value

These values will be coming from the invoice and customer tables.

Total number of customers per country:  
COUNT(customer_id) GROUP BY country, from customer table

Total value of Sales:  
SUM(totals) GROUP BY billing_country, from invoice table

AVERAGE value of sales per customer:  
Total Sales value divided by number of customers

Number of orders by country:  
COUNT (invoice_id) GROUP BY country

Average order value:  
Total sales value divided by number of orders

I will select the total value of sales grouped by customer id as a subquery, then 


In [58]:
%%sql
WITH customer_country AS (
SELECT COUNT(customer_id) num_customers,
       country
FROM customer
GROUP BY 2),

country_total AS (
SELECT billing_country country,
       SUM(total) total_sales,
       COUNT(invoice_id) number_of_sales
FROM invoice
GROUP BY 1),

country_values AS (
SELECT CASE
          WHEN cc.num_customers = 1 THEN 'Other'
          ELSE cc.country
       END country_choice,
       SUM(cc.num_customers) num_customers,
       SUM(ct.total_sales) total_sales,
       SUM(ct.number_of_sales) number_of_sales,
       CASE
         WHEN cc.num_customers = 1 THEN 1
         ELSE 0
       END is_other
FROM customer_country cc
INNER JOIN country_total ct
ON cc.country = ct.country
GROUP BY country_choice)

SELECT country_choice country,
       num_customers,
       ROUND(total_sales,2) total_sales,
       PRINTF('%.2f',(total_sales / num_customers)) avg_customer_sale,
       PRINTF('%.2f', (total_sales / number_of_sales)) average_sale
FROM country_values
ORDER BY is_other ASC,
         total_sales DESC;

 * sqlite:///chinook.db
Done.


country,num_customers,total_sales,avg_customer_sale,average_sale
USA,13,1040.49,80.04,7.94
Canada,8,535.59,66.95,7.05
Brazil,5,427.68,85.54,7.01
France,5,389.07,77.81,7.78
Germany,4,334.62,83.66,8.16
Czech Republic,2,273.24,136.62,9.11
United Kingdom,3,245.52,81.84,8.77
Portugal,2,185.13,92.57,6.38
India,2,183.15,91.58,8.72
Other,15,1094.94,73.0,7.45


In [7]:
%%sql
SELECT customer_id,
       SUM(total) customer_total,
       billing_country country
FROM invoice
GROUP BY 1;

 * sqlite:///chinook.db
Done.


customer_id,customer_total,country
1,108.89999999999998,Brazil
2,82.17,Germany
3,99.99,Canada
4,72.27000000000001,Norway
5,144.54000000000002,Czech Republic
6,128.7,Czech Republic
7,69.3,Austria
8,60.38999999999999,Belgium
9,37.61999999999999,Denmark
10,60.39,Brazil


In [8]:
%%sql
SELECT billing_country country,
       SUM(total) country_total,
       COUNT(invoice_id)
FROM invoice
GROUP BY 1;

 * sqlite:///chinook.db
Done.


country,country_total,COUNT(invoice_id)
Argentina,39.6,5
Australia,81.18,10
Austria,69.3,9
Belgium,60.38999999999999,7
Brazil,427.68000000000006,61
Canada,535.5900000000001,76
Chile,97.02,13
Czech Republic,273.24000000000007,30
Denmark,37.61999999999999,10
Finland,79.2,11


In [9]:
%%sql
SELECT COUNT(customer_id) num_customers,
       country country
FROM customer
GROUP BY 2;

 * sqlite:///chinook.db
Done.


num_customers,country
1,Argentina
1,Australia
1,Austria
1,Belgium
5,Brazil
8,Canada
1,Chile
2,Czech Republic
1,Denmark
1,Finland
