# Project: Answering Business Questions with SQL

In this project, I will be using SQL to answer Business Questions using the Chinook database. The chinook database represents the data of the Chinook store, a digital media company that sells records. Below is a copy of the database schema that I will be querying from. 

<img src="https://raw.githubusercontent.com/sunnyyan97/Analyzing-Hacker-News-Posts/main/Screen%20Shot%202021-04-01%20at%201.19.20%20PM.png">

I will pretend to be an employee of the store and answer hypothetical Business Questions that will inform Chinook's future strategy using Python's SQLite module.

### Reading and Connecting to the Database

In [1]:
%%capture
%load_ext sql
%sql sqlite:///chinook.db

'Connected: None@chinook.db'

### Getting familiar with the Chinook Database

I want to get more acquainted with the database before doing analysis, so I wrote a query returning all of the tables and views in the database.

In [2]:
%%sql
SELECT
    name,
    type
FROM sqlite_master
WHERE type IN ("table", "view");

Done.


name,type
album,table
artist,table
customer,table
employee,table
genre,table
invoice,table
invoice_line,table
media_type,table
playlist,table
playlist_track,table


Wanted to do a test to make sure the database is working properly so I wrote a query that returns all of the albums by Queen that are in the database.

In [3]:
%%sql
SELECT * from artist ar
INNER JOIN album al on al.artist_id = ar.artist_id
WHERE name = 'Queen'
ORDER BY 1;

Done.


artist_id,name,album_id,title,artist_id_1
51,Queen,36,Greatest Hits II,51
51,Queen,185,Greatest Hits I,51
51,Queen,186,News Of The World,51


### Analyzing the Different Genres

After analyzing Queen, the first prompt I will respond to is the following:

The Chinook Record store has just signed a deal with a new record label, and I have been tasked with selecting the first three albumns that will be added to the store, from a list of four. The list I currently have appears as follows:

<img src="Albums.png">

To answer the question, I wanted to see which genres are represented the most in the store currently and how well they sold.

This query returns all of the different genres that are being sold in the Digital Media store currently, how many tracks of the genre have been sold and how much of a percentage the genre's sales play in the Chinook Company's total sales. 

In [4]:
%%sql

WITH usa_tracks_sold AS
    (
     SELECT il.* from invoice_line il
     INNER JOIN invoice i on il.invoice_id = i.invoice_id
     INNER JOIN customer c on i.customer_id = c.customer_id
     WHERE country = 'USA'
    )
SELECT
    g.name genre,
    count(uts.invoice_line_id) tracks_sold,
    cast(count(uts.invoice_line_id) AS FLOAT) / (SELECT COUNT(*) from usa_tracks_sold) percentage_sold
FROM usa_tracks_sold uts
INNER JOIN track t on t.track_id = uts.track_id
INNER JOIN genre g on g.genre_id = t.genre_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;

Done.


genre,tracks_sold,percentage_sold
Rock,561,0.5337773549000951
Alternative & Punk,130,0.1236917221693625
Metal,124,0.1179828734538534
R&B/Soul,53,0.0504281636536631
Blues,36,0.0342530922930542
Alternative,35,0.033301617507136
Latin,22,0.0209324452901998
Pop,22,0.0209324452901998
Hip Hop/Rap,20,0.0190294957183634
Jazz,14,0.0133206470028544


The table shows that all genres of Rock Music, including Punk and Metal, perform the best in our digitial media catalogue. Given these results it would make sense to further invest in Rock Music albums to sell. As for the four albums that we've been given, I would choose the Pop, Punk and Blues albums. Pop and Hip-Hop are both not very popular in the Chinook store, combining for less than 4% of total sales, but Pop has a slight lead over hip-hop.

### Analyzing Employee Sales Performance

The next question I've been asked to answer is, is any sales support agent performing far better or worse than the others?

For background, every customer gets assigned to a support sales rep when they make their first purchase. I will analyze the purchases of customers belonging to each employee to answer this question. 

The below query pulls each sales rep's total sales along with their hire date and name from the Chinook Database.

In [5]:
%%sql

WITH customer_support_rep_sales AS
    (
     SELECT
        i.customer_id,
        c.support_rep_id,
        SUM(i.total) total
     FROM invoice i
     INNER JOIN customer c ON c.customer_id = i.customer_id
     GROUP BY 1, 2
    )

SELECT
    e.first_name || " " || e.last_name employee,
    e.hire_date,
    SUM(csrs.total) total_sales
    FROM employee e
    INNER JOIN customer_support_rep_sales csrs ON e.employee_id = csrs.support_rep_id
    GROUP BY 1;

Done.


employee,hire_date,total_sales
Jane Peacock,2017-04-01 00:00:00,1731.5099999999998
Margaret Park,2017-05-03 00:00:00,1584.0000000000002
Steve Johnson,2017-10-17 00:00:00,1393.92


It appears that each sales rep has performed similarly during their tenure. Steve Johnson has the lowest amount of total sales but he was hired well after Jane and Margaret.

### Analyzing Sales by Country

My next task is to analyze the sales data for customers from each country that buys from Chinook. 

For analysis I wanted to include the following four categories in my query. 

-number of customers   
-total value of sales  
-average order size by dollar amount  
-average customer lifetime value  

I will calculate these four categories to get a holistic view of how the store's sales are performing in each country. The below query calculates the data for the categories. 

In [7]:
%%sql
WITH country_or_other AS
    (
     SELECT
        CASE
            WHEN (
                  SELECT count(*)
                  FROM customer
                  where country = c.country
                ) = 1 THEN "Other"
            ELSE c.country
        END AS country,
        c.customer_id,
        il.*
     FROM invoice_line il
     INNER JOIN invoice i on i.invoice_id = il.invoice_id
     INNER JOIN customer c on c.customer_id = i.customer_id
    )
    
SELECT
    country,
    customers,
    total_sales,
    average_order,
    customer_lifetime_value
FROM
    (
     SELECT
        country,
        count(distinct customer_id) customers,
        SUM(unit_price) total_sales,
        SUM(unit_price) / count(distinct customer_id) customer_lifetime_value,
        SUM(unit_price) / count(distinct invoice_id) average_order,
        CASE
            WHEN country = "Other" THEN 1
            ELSE 0
        END AS sort
    FROM country_or_other
    GROUP BY country
    ORDER BY sort ASC, total_sales DESC);        

Done.


country,customers,total_sales,average_order,customer_lifetime_value
USA,13,1040.490000000008,7.942671755725252,80.03769230769292
Canada,8,535.5900000000034,7.047236842105309,66.94875000000043
Brazil,5,427.6800000000025,7.011147540983647,85.53600000000048
France,5,389.0700000000021,7.781400000000042,77.81400000000042
Germany,4,334.6200000000016,8.161463414634186,83.6550000000004
Czech Republic,2,273.24000000000103,9.108000000000034,136.62000000000052
United Kingdom,3,245.5200000000008,8.768571428571457,81.84000000000026
Portugal,2,185.13000000000025,6.383793103448284,92.56500000000013
India,2,183.1500000000002,8.72142857142858,91.5750000000001
Other,15,1094.9400000000085,7.448571428571486,72.99600000000056


As expected, the majority of sales come from the US and Canada. However, our digital media company has a strong worldwide presence, with the value of total sales from other countries being higher than the value of total sales in America. Perhaps this is a market that could be further explored.

### Albums vs Individual Tracks

Currently with the way our digital media store is set up, customers can buy records as albums or as individual tracks. Management wants to save money by simply offering albums and not individual tracks. This below query will investigate the percentage of purchases that are albums vs. individual tracks.

In [11]:
%%sql
WITH invoice_first_track AS
    (
     SELECT
        il.invoice_id invoice_id,
        MIN(il.track_id) first_track_id
     FROM invoice_line il
     GROUP BY 1
    )

SELECT
    album_purchase,
    COUNT(invoice_id) number_of_invoices,
    CAST(count(invoice_id) AS FLOAT) / (SELECT count(*) FROM invoice) percent
    
FROM
    (
    SELECT
        ifs.*,
        CASE
            WHEN
                (
                 SELECT t.track_id FROM track t
                 WHERE t.album_id = (
                                     SELECT t2.album_id FROM track t2
                                     WHERE t2.track_id = ifs.first_track_id
                                     )
                EXCEPT
            
                SELECT il2.track_id FROM invoice_line il2
                WHERE il2.invoice_id = ifs.invoice_id
                ) IS NULL
            AND
                (
                 SELECT il2.track_id FROM invoice_line il2
                 WHERE il2.invoice_id = ifs.invoice_id
                
                 EXCEPT
    
                 SELECT t.track_id FROM track t
                 WHERE t.album_id = (SELECT t2.album_id FROM track t2
                                     WHERE t2.track_id = ifs.first_track_id
                                    )
                ) IS NULL
            THEN "yes"
            ELSE "no"
        END AS "album_purchase"
    FROM invoice_first_track ifs
    )
GROUP BY album_purchase

Done.


album_purchase,number_of_invoices,percent
no,500,0.8143322475570033
yes,114,0.1856677524429967


This table clearly shows that customers preferring buying and listening individual tracks over albums.