## Create Helper Function

In [40]:
import pandas as pd
import sqlite3

def run_query(q):
    with sqlite3.connect('chinook.db') as conn:
        return pd.read_sql(q, conn)

    
def run_command(q):
    with sqlite3.connect('chinook.db') as conn:
        conn.isolation_level = None
        conn.execute(q)

        
def show_tables():
    show_q = '''SELECT name, type
                FROM sqlite_master
                WHERE type IN ("table","view")
             '''
    return run_query(show_q)

show_tables()

Unnamed: 0,name,type
0,album,table
1,artist,table
2,customer,table
3,employee,table
4,genre,table
5,invoice,table
6,invoice_line,table
7,media_type,table
8,playlist,table
9,playlist_track,table


## Database Schema Diagram
![Image](https://s3.amazonaws.com/dq-content/280/chinook-schema.svg)

## Popular Track Genre (USA)

In [42]:
usa_top_genres = '''
SELECT  g.name AS Genre,
        COUNT(g.name) AS Count,
        COUNT(g.name)/CAST(1051 AS FLOAT) AS Percentage
        
        FROM invoice i
        INNER JOIN customer c ON c.customer_id = i.customer_id
        INNER JOIN invoice_line il ON il.invoice_id = i.invoice_id
        INNER JOIN track t ON t.track_id = il.track_id
        INNER JOIN genre g ON g.genre_id = t.genre_id
        WHERE c.country = "USA"
        GROUP BY Genre
        ORDER BY Count DESC
        '''

run_query(usa_top_genres)


Unnamed: 0,Genre,Count,Percentage
0,Rock,561,0.533777
1,Alternative & Punk,130,0.123692
2,Metal,124,0.117983
3,R&B/Soul,53,0.050428
4,Blues,36,0.034253
5,Alternative,35,0.033302
6,Latin,22,0.020932
7,Pop,22,0.020932
8,Hip Hop/Rap,20,0.019029
9,Jazz,14,0.013321


Based on the sales of tracks across different genres in the USA, we should purchase the new albums by the following artists:

- Red Tone (Punk)
- Slim Jim Bites (Blues)
- Meteor and the Girls (Pop)

It's worth keeping in mind that combined, these three genres only make up only 17% of total sales, so we should be on the lookout for artists and albums from the 'rock' genre, which accounts for 53% of sales.

## Employee Sales Performance

In [45]:
employee_sales_q = '''
SELECT  e.first_name||' '||e.last_name AS employee,
        e.hire_date,
        SUM(i.total) AS sales
        
        FROM employee e
        INNER JOIN customer c ON c.support_rep_id = e.employee_id
        INNER JOIN invoice i ON i.customer_id = c.customer_id
        GROUP BY employee
        '''
e_sale = run_query(employee_sales_q)
e_sale

Unnamed: 0,employee,hire_date,sales
0,Jane Peacock,2017-04-01 00:00:00,1731.51
1,Margaret Park,2017-05-03 00:00:00,1584.0
2,Steve Johnson,2017-10-17 00:00:00,1393.92


Jane recorded top sales followed by Margaret and Steve, the difference roughly corresponds with the differences in their hiring dates.

## Analyzing Sales by Country

In [83]:
c_customer_q = '''
WITH country_customer AS (
    SELECT 
        CASE WHEN COUNT(DISTINCT c.customer_id) = 1
            THEN 'Other' ELSE c.country
        END AS Country,
        COUNT(DISTINCT c.customer_id) AS Customers,
        SUM(i.total) AS Sales,
        COUNT(i.invoice_id) AS Orders    
    FROM customer c
    INNER JOIN invoice i ON i.customer_id = c.customer_id
    GROUP BY c.country
    ORDER BY Customers DESC
    )

SELECT
    Country, 
    SUM(Customers) Customers,
    SUM(Sales) Sales,
    SUM(Sales)/SUM(Orders) 'Average Orders' 
FROM (SELECT *,
        CASE WHEN country = 'Other' THEN 1 ELSE 0 END AS sort
      FROM country_customer)
GROUP BY country
ORDER BY sort, Customers DESC
'''

run_query(c_customer_q)


Unnamed: 0,Country,Customers,Sales,Average Orders
0,USA,13,1040.49,7.942672
1,Canada,8,535.59,7.047237
2,Brazil,5,427.68,7.011148
3,France,5,389.07,7.7814
4,Germany,4,334.62,8.161463
5,United Kingdom,3,245.52,8.768571
6,Czech Republic,2,273.24,9.108
7,India,2,183.15,8.721429
8,Portugal,2,185.13,6.383793
9,Other,15,1094.94,7.448571


Based on average order data, there may be opportunity in the following countries:

- United Kingdom
- Czech Republic
- India

It's worth keeping in mind that because the amount of data from each of these countries is relatively low. Because of this, we should be cautious spending too much money on new marketing campaigns, as the sample size is not large enough to give us high confidence. A better approach would be to run small campaigns in these countries, collecting and analyzing the new customers to make sure that these trends hold with new customers.

## Albums vs Individual Tracks

In [104]:
view_album_q = '''

WITH 
album_track AS (
    SELECT album_id, COUNT(track_id) AS total_tracks
    FROM track
    GROUP BY album_id
    ),
    
purchase_album AS (
    SELECT il.invoice_id,
    COUNT(t.track_id) purchased_track,
    COUNT(DISTINCT a.album_id) from_album, 
    at.total_tracks track_from_album,
    CASE
        WHEN 
            at.total_tracks > 0 --incase if album should 3 track
                AND COUNT(DISTINCT a.album_id) = 1
                AND COUNT(t.track_id) = at.total_tracks
        THEN 1
        ELSE 0
        END AS full_album

    FROM invoice_line il
    INNER JOIN track t ON t.track_id = il.track_id
    INNER JOIN album a ON a.album_id = t.album_id
    INNER JOIN album_track at ON at.album_id = t.album_id
    GROUP BY invoice_id
    )
    
    SELECT
    CASE WHEN full_album = 0 THEN 'no' ELSE 'yes'
             END AS 'album_purchase',
    COUNT(full_album) num_invoice,
    ROUND(CAST(COUNT(full_album) AS FLOAT)/614, 4) percentage
    FROM purchase_album
    GROUP BY full_album
    
'''

run_query(view_album_q)

Unnamed: 0,album_purchase,num_invoice,percentage
0,no,500,0.8143
1,yes,114,0.1857


Album purchases account for 18.6% of purchases. Based on this data, I would recommend against purchasing only select tracks from albums from record companies, since there is potential to lose one fifth of revenue.

In [26]:
show_tables()

Unnamed: 0,name,type
0,album,table
1,artist,table
2,customer,table
3,employee,table
4,genre,table
5,invoice,table
6,invoice_line,table
7,media_type,table
8,playlist,table
9,playlist_track,table
