# Introduction and Schema Diagram

We'll be working on the [Chinook](https://github.com/lerocha/chinook-database) database which represents a digital media store, including tables for artists, albums, media tracks, invoices and customers. The data is taken from using real data from an iTunes library, and was obtained through github repository. We'll use the data to answer some business questions such as:

- What albums to purchase?
- How are employee sales doing?
- Which countries are purchasing from Chinook?
-  Are albums puchases or individual track purchases better?

# Overview of the data

Below we'll load in the chinook data base. A database schema will also be provided to show the different relationships the database has.

In [1]:
#loading into chinook data base

In [2]:
%%capture
%load_ext sql
%sql sqlite:///chinook.db

'Connected: None@chinook.db'

In [3]:
#checking list of all tables and views in the data base

In [4]:
%%sql
SELECT name, type
FROM sqlite_master
WHERE type in ('table', 'view');

Done.


name,type
album,table
artist,table
customer,table
employee,table
genre,table
invoice,table
invoice_line,table
media_type,table
playlist,table
playlist_track,table


The database schema that is correlated with the tables above is: ![Chinook Schema](https://s3.amazonaws.com/dq-content/191/chinook-schema.svg)

# Selecting Albums to Purchase

In [5]:
#finding number of tracks sold in USA

In [6]:
%%sql
WITH usa_total_sales AS 
(
    SELECT SUM(il.quantity) total_sales
    FROM invoice_line il
    INNER JOIN invoice i ON i.invoice_id = il.invoice_id
    WHERE i.billing_country = 'USA'
) 

SELECT 
    g.name genre, 
    SUM(il.quantity) num_sold,
    ROUND((SUM(il.quantity) * 100.0 /
    (
        SELECT total_sales
        FROM usa_total_sales
    )
    ),2) percent_sold
    
FROM invoice i
INNER JOIN invoice_line il ON il.invoice_id = i.invoice_id
INNER JOIN track t ON t.track_id = il.track_id
INNER JOIN genre g ON g.genre_id = t.genre_id
WHERE i.billing_country = 'USA'
GROUP BY 1
ORDER BY 2 DESC


Done.


genre,num_sold,percent_sold
Rock,561,53.38
Alternative & Punk,130,12.37
Metal,124,11.8
R&B/Soul,53,5.04
Blues,36,3.43
Alternative,35,3.33
Latin,22,2.09
Pop,22,2.09
Hip Hop/Rap,20,1.9
Jazz,14,1.33


From the results above, we can see Rock, Alternative & Punk, and Metal are the three leading genres that sell the most tracks. The bottom three are TV Shows, Soundtrack, and Heavy Metal.

# Analyzing Employee Sales Performance

In [7]:
#Finding the total dollar amount of sales assigned to each sales support agent

In [8]:
%%sql
SELECT
    e.first_name || "" || e.last_name employee_name,
    e.employee_id,
    e.country,
    e.state,
    e.city,
    ROUND(TOTAL(i.total),2) total_sales
FROM employee e 
INNER JOIN customer c ON c.support_rep_id = e.employee_id
INNER JOIN invoice i ON i.customer_id = c.customer_id
WHERE e.title = 'Sales Support Agent'
GROUP BY 1
ORDER BY 6 DESC

Done.


employee_name,employee_id,country,state,city,total_sales
JanePeacock,3,Canada,AB,Calgary,1731.51
MargaretPark,4,Canada,AB,Calgary,1584.0
SteveJohnson,5,Canada,AB,Calgary,1393.92


From the results above, there are only three employees with the title 'Sales Support Agent'. Each have similar values in total sales with the lowest being around \\$1393.92 dollars and the highest being \$1731.51. All the employees are in the same area of Canada.

# Analyzing Sales by Country

In [9]:
#finding number of customers and sales from each country, groupoing countries with value '1' into 'Other'

In [10]:
%%sql
WITH country_filter AS (
    SELECT
        country,
        CASE
            WHEN
                COUNT(DISTINCT customer_id) = 1 THEN 'Other'
            ELSE
                country
        END AS other_country
    FROM customer 
    GROUP BY country
)

SELECT 
    cf.other_country Country,
    COUNT(DISTINCT c.customer_id) customers,
    ROUND(SUM (i.total), 2) total_sales_value,
    ROUND(SUM(i.total)/COUNT(DISTINCT c.customer_id),2) avg_per_customer,
    ROUND(SUM(i.total)/COUNT(DISTINCT i.invoice_id),2) avg_order
FROM country_filter cf
INNER JOIN customer c ON c.country = cf.country
INNER JOIN invoice i ON c.customer_id = i.customer_id
GROUP BY 1
ORDER BY cf.other_country = 'Other'

Done.


Country,customers,total_sales_value,avg_per_customer,avg_order
Brazil,5,427.68,85.54,7.01
Canada,8,535.59,66.95,7.05
Czech Republic,2,273.24,136.62,9.11
France,5,389.07,77.81,7.78
Germany,4,334.62,83.66,8.16
India,2,183.15,91.57,8.72
Portugal,2,185.13,92.57,6.38
USA,13,1040.49,80.04,7.94
United Kingdom,3,245.52,81.84,8.77
Other,15,1094.94,73.0,7.45


From the values, we can see that while the most sales in the category of 'Other', USA is the single country that has the most sales. Its interesting to see that the Czech Republic customers spend the most, although there are only two customers from that country. The lowest average sale is from Canada, maybe cause of import/export fees. Portugal has the lowest average order, but the second highest average in sale per customer.

# Albums vs Individual Tracks

In [11]:
#Categorizing invoices as either track or album purchase

In [12]:
%%sql

WITH album_invoice AS   (
    SELECT 
        il.invoice_id,                                     
        t.album_id
    FROM invoice_line il
    LEFT JOIN track t ON t.track_id = il.track_id
    GROUP BY 1
),

     album_or_tracks AS (
         SELECT ai.*,
             CASE
                 WHEN (
                     SELECT il.track_id 
                     FROM invoice_line il
                     WHERE il.invoice_id = ai.invoice_id
                     EXCEPT                                          
                     SELECT t.track_id 
                     FROM track t
                     WHERE t.album_id = ai.album_id) IS NULL
                             
                 AND (
                     SELECT t.track_id 
                     FROM track t
                     WHERE t.album_id = ai.album_id                                         
                     EXCEPT                                          
                     SELECT il.track_id 
                     FROM invoice_line il
                     WHERE il.invoice_id = ai.invoice_id) IS NULL
                                   
              THEN "Albums"
              ELSE "Tracks"
              END AS invoice_type                             
              FROM album_invoice ai
     )
                           
SELECT invoice_type,
       COUNT(*) num_invoices,
       ROUND(COUNT(*)*100 / (SELECT COUNT(*) FROM invoice), 2) percent_of_invoice
FROM album_or_tracks
GROUP BY 1

Done.


invoice_type,num_invoices,percent_of_invoice
Albums,114,18.0
Tracks,500,81.0


From the results we see that customers typically purchase more individual tracks. Usually an album will contain at least around 10 tracks, so profit wise the album purchases would result into more profit. Also, there is a certain joy where listening throughout the whole album in the correct order to get a sense of artistic feel where individual tracks do not. Based on these reasons, Chinook should continue to purchase full albums from companies.