# Answering Business Questions using SQL

Skills: SQL Intermediate (subqueries, multiple joins, set operations, aggregate functions etc.), basic Pandas

The Chinook database is provided as a SQLite database file called chinook.db

In [1]:
# First, we need to create several helper functions

import sqlite3
import pandas as pd

# Takes a SQL query as an argument and returns a pd df of that query
def run_query(q):
    with sqlite3.connect('chinook.db') as conn:
        return pd.read_sql(q, conn)
    
# Takes a SQL command as an argument and executes it using the sqlite module    
def run_command(c):
    with sqlite3.connect('chinook.db') as conn:
        conn.isolation_level = None
        conn.execute(c)
        
# Calls the run_query() function to return a list of all tables and views in the database.
def show_tables():
    q = "SELECT name, type FROM sqlite_master WHERE type IN (\"table\",\"view\");"
    return run_query(q)

# Show all tables existed in chinook database
show_tables()

Unnamed: 0,name,type
0,album,table
1,artist,table
2,customer,table
3,employee,table
4,genre,table
5,invoice,table
6,invoice_line,table
7,media_type,table
8,playlist,table
9,playlist_track,table


The Chinook record store has just signed a deal with a new record label, and we should select the first three albums that will be added to the store, from a list of four.

| Artist Name | Genre   |
| ----------- | -----   |
| Regal       | Hip-Hop |
| Red Tone    | Punk    |
| Meteor and the Girls | Pop |
| Slim Jim Bites | Blues |

We're interested in finding out which genres sell the best in the USA.

In [2]:
q1 ='''
SELECT
    sq.genre genre,
    sq.track_sold num_of_track_sold
FROM (SELECT
          g.name genre,
          il.invoice_id invoice_id,
          COUNT(il.track_id) track_sold
      FROM genre g
      LEFT JOIN track t ON t.genre_id = g.genre_id
      INNER JOIN invoice_line il ON il.track_id = t.track_id
      GROUP BY genre) sq
LEFT JOIN invoice i ON i.invoice_id = sq.invoice_id
WHERE i.billing_country = \"USA\";
'''

genres_usa = run_query(q1)
run_query(q1)

Unnamed: 0,genre,num_of_track_sold
0,Alternative,117
1,Classical,47
2,Heavy Metal,8
3,Metal,619
4,R&B/Soul,159
5,Rock,2635


From data above, it can be seen that Alternative & Punk genres dominated USA market. Comparing to the potential artist, we should purchase Red Tone, Meteor and the Girls, and Slim Jim Bites' album.

Each customer for the Chinook store gets assigned to a sales support agent within the company when they first make a purchase. We have to analyze the purchases of customers belonging to each employee to see if any sales support agent is performing either better or worse than the others.

In [3]:
q2 ='''
SELECT
    e.first_name || \" \" || e.last_name employee_name,
    sq.total_sales total_sales
FROM (SELECT
          c.support_rep_id employee_id,
          SUM(i.total) total_sales
      FROM invoice i
      INNER JOIN customer c ON c.customer_id = i.customer_id
      GROUP BY employee_id) sq
INNER JOIN employee e ON e.employee_id = sq.employee_id;
'''

sales_per_employee = run_query(q2)
run_query(q2)

Unnamed: 0,employee_name,total_sales
0,Jane Peacock,1731.51
1,Margaret Park,1584.0
2,Steve Johnson,1393.92


It can be seen that Jane Peacock is the best sales rep with total sales 1731.51 USD, with Margaret Park coming in second position with 1584 USD total sales.

Next, we will analyze sales by country.

In [4]:
q3 = '''
SELECT
    country_summary,
    SUM(customers_count) customer_count,
    SUM(order_count) order_count,
    SUM(total_sales) total_sales,
    SUM(total_sales)/SUM(customers_count) avg_sales_percustomer,
    SUM(total_sales)/SUM(order_count) avg_order_value
FROM (SELECT
          CASE
              WHEN final.customers_count = 1 THEN \"Others\"
              ELSE final.country
          END AS country_summary,
          final.*,
          CASE
              WHEN final.customers_count = 1 THEN 1
              ELSE 0
          END AS sort
      FROM final)
GROUP BY country_summary
ORDER BY sort ASC
          ;
'''
sales_by_country = run_query(q3)
run_query(q3)

Unnamed: 0,country_summary,customer_count,order_count,total_sales,avg_sales_percustomer,avg_order_value
0,Brazil,5,61,427.68,85.536,7.011148
1,Canada,8,76,535.59,66.94875,7.047237
2,Czech Republic,2,30,273.24,136.62,9.108
3,France,5,50,389.07,77.814,7.7814
4,Germany,4,41,334.62,83.655,8.161463
5,India,2,21,183.15,91.575,8.721429
6,Portugal,2,29,185.13,92.565,6.383793
7,USA,13,131,1040.49,80.037692,7.942672
8,United Kingdom,3,28,245.52,81.84,8.768571
9,Others,15,147,1094.94,72.996,7.448571


From the result above, there are several findings:
1. USA has the most customers, most orders, and the biggest sales revenue for chinook compared to other countries.
2. Czech Republic has the highest average sales per customer and average order value compared to other countries.

Management are currently considering changing their purchasing strategy to save money. The strategy they are considering is to purchase only the most popular tracks from each album from record companies, instead of purchasing every track from an album.

We should find out what percentage of purchases are individual tracks vs whole albums, so that management can use this data to understand the effect this decision might have on overall revenue.

In [5]:
q4 = '''
WITH invoice_first_track AS
    (
     SELECT
         il.invoice_id invoice_id,
         MIN(il.track_id) first_track_id
     FROM invoice_line il
     GROUP BY 1
    )

SELECT
    album_purchase,
    COUNT(invoice_id) number_of_invoices,
    CAST(count(invoice_id) AS FLOAT) / (
                                         SELECT COUNT(*) FROM invoice
                                      ) percent
FROM
    (
    SELECT
        ifs.*,
        CASE
            WHEN
                 (
                  SELECT t.track_id FROM track t
                  WHERE t.album_id = (
                                      SELECT t2.album_id FROM track t2
                                      WHERE t2.track_id = ifs.first_track_id
                                     ) 

                  EXCEPT 

                  SELECT il2.track_id FROM invoice_line il2
                  WHERE il2.invoice_id = ifs.invoice_id
                 ) IS NULL
             AND
                 (
                  SELECT il2.track_id FROM invoice_line il2
                  WHERE il2.invoice_id = ifs.invoice_id

                  EXCEPT 

                  SELECT t.track_id FROM track t
                  WHERE t.album_id = (
                                      SELECT t2.album_id FROM track t2
                                      WHERE t2.track_id = ifs.first_track_id
                                     ) 
                 ) IS NULL
             THEN "yes"
             ELSE "no"
         END AS "album_purchase"
     FROM invoice_first_track ifs
    )
GROUP BY album_purchase;
'''

run_query(q4)

Unnamed: 0,album_purchase,number_of_invoices,percent
0,no,500,0.814332
1,yes,114,0.185668


Album purchases account for 18.6% of purchases. Based on this data, we would recommend against purchasing only select tracks from albums from record companies, since there is potential to lose one fifth of revenue.