# Guided Project: Answering Business Questions Using SQL

In this project, we practice using sqlite to answer business questions for a company that sells music. 

In [2]:
import pandas as pd
import sqlite3

## Creating Some Helpful Functions

In [3]:
def run_query(q):
    with sqlite3.connect('chinook.db') as conn:
        return pd.read_sql(q, conn)

In [4]:
def run_command(q):
    with sqlite3.connect('chinook.db') as conn:
        conn.isolation_level = None
        return conn.execute(q)

In [5]:
def show_tables():
    show = "SELECT name, type FROM sqlite_master WHERE type IN ('table','view');"
    return run_query(show)

## Viewing the Tables in the Data

In [6]:
show_tables()

Unnamed: 0,name,type
0,album,table
1,artist,table
2,customer,table
3,employee,table
4,genre,table
5,invoice,table
6,invoice_line,table
7,media_type,table
8,playlist,table
9,playlist_track,table


## Finding the Top Music Genres for Customers from the USA

In [7]:
run_query('''SELECT g.name genre, COUNT(t.name) frequency, CAST(COUNT(t.name) AS FLOAT)/  
                        (SELECT COUNT(*) FROM genre g 
                        INNER JOIN track t ON t.genre_id = g.genre_id 
                        INNER JOIN invoice_line il ON t.track_id = il.track_id 
                        INNER JOIN invoice i ON il.invoice_id = i.invoice_id 
                        WHERE i.billing_country = 'USA' 
                        ) percent_sold
           FROM genre g 
           INNER JOIN track t ON t.genre_id = g.genre_id 
           INNER JOIN invoice_line il ON t.track_id = il.track_id 
           INNER JOIN invoice i ON il.invoice_id = i.invoice_id 
           WHERE i.billing_country = 'USA' 
           GROUP BY g.name 
           ORDER BY frequency DESC 
           ''')

Unnamed: 0,genre,frequency,percent_sold
0,Rock,561,0.533777
1,Alternative & Punk,130,0.123692
2,Metal,124,0.117983
3,R&B/Soul,53,0.050428
4,Blues,36,0.034253
5,Alternative,35,0.033302
6,Pop,22,0.020932
7,Latin,22,0.020932
8,Hip Hop/Rap,20,0.019029
9,Jazz,14,0.013321


Above, we see that the top genre for American customers is Rock, with over half of total sales being in that genre. The next two most popular are Alternative and Metal. Consequently, if the music store is looking to expand its US market, it should consider including more music in these genres. 

## Sales by Sales Support Agent

In [8]:
run_query('''SELECT e.first_name || ' ' || e.last_name employee,
                    STRFTIME('%m/%d/%Y', hire_date) hire_date,
                    STRFTIME('%m/%d/%Y', birthdate) birthdate,
                    SUM(i.total) sales
             FROM employee e
             INNER JOIN customer c ON e.employee_id = c.support_rep_id
             INNER JOIN invoice i ON i.customer_id = c.customer_id
             GROUP BY employee

''')

Unnamed: 0,employee,hire_date,birthdate,sales
0,Jane Peacock,04/01/2017,08/29/1973,1731.51
1,Margaret Park,05/03/2017,09/19/1947,1584.0
2,Steve Johnson,10/17/2017,03/03/1965,1393.92


Above, we see that Jane Peacock is the most effective sales agent at the music store. This could be because she has been working there the longest. It could also be that, as the youngest sales agent, she is more aware of current trends in the music industry. 

## Sales by Country

In [9]:
run_query('''WITH customer_groups AS (
SELECT CASE
WHEN COUNT(DISTINCT(c.customer_id)) = 1 THEN 'Other'
ELSE c.country
END AS other_country,
country
FROM customer c 
INNER JOIN invoice i ON i.customer_id = c.customer_id
GROUP BY c.country
)
             SELECT other_country country,
                    COUNT(DISTINCT(c.customer_id)) number_of_customers,
                    SUM(total) total_value_of_sales,
                    ROUND(SUM(total)/COUNT(DISTINCT(c.customer_id)), 2) sales_per_customer,
                    ROUND(SUM(total)/COUNT(*), 2) sales_per_order
             FROM customer c
             INNER JOIN invoice i ON i.customer_id = c.customer_id
             INNER JOIN customer_groups cg ON cg.country = c.country
             GROUP BY other_country
             ORDER BY CASE WHEN other_country = 'Other' THEN 1 ELSE 0 END, total_value_of_sales DESC
''')

Unnamed: 0,country,number_of_customers,total_value_of_sales,sales_per_customer,sales_per_order
0,USA,13,1040.49,80.04,7.94
1,Canada,8,535.59,66.95,7.05
2,Brazil,5,427.68,85.54,7.01
3,France,5,389.07,77.81,7.78
4,Germany,4,334.62,83.66,8.16
5,Czech Republic,2,273.24,136.62,9.11
6,United Kingdom,3,245.52,81.84,8.77
7,Portugal,2,185.13,92.57,6.38
8,India,2,183.15,91.58,8.72
9,Other,15,1094.94,73.0,7.45


Here, we see that the most common countries for store customers are the United States and Cananda. Additionally, we see that customers from Canada generally spend less than customers from other countries. 

## Album and Non-Album Sales

In [16]:
run_query('''
WITH album_count AS (
SELECT t.album_id, COUNT(DISTINCT(t.track_id)) num_tracks
FROM track t
INNER JOIN album a ON a.album_id = t.album_id
INNER JOIN invoice_line i ON i.track_id = t.track_id
GROUP BY t.album_id
),

album_or_not AS (
SELECT COUNT(*), a.title, i.invoice_id, 
CASE
WHEN COUNT(DISTINCT(a.album_id)) == 1 AND COUNT(a.album_id) = ac.num_tracks THEN 'yes'
ELSE 'no'
END AS album_purchase
FROM track t
INNER JOIN album a ON a.album_id = t.album_id
INNER JOIN invoice_line i ON i.track_id = t.track_id
INNER JOIN album_count ac ON ac.album_id = t.album_id
GROUP BY i.invoice_id
)

SELECT album_purchase, COUNT(*) number_of_sales, CAST(COUNT(*) AS FLOAT)/(SELECT COUNT(DISTINCT(invoice_id)) FROM invoice_line) percent
FROM album_or_not
GROUP BY album_purchase;
''')

Unnamed: 0,album_purchase,number_of_sales,percent
0,no,500,0.814332
1,yes,114,0.185668


Above, we see that less than 20% of purchases are of complete albums. Most customers seem to prefer buying individual tracks.