# Project: Answering Business Questions using SQL

## Creating Helper Functions

First we import sqlite and pandas

In [2]:
import sqlite3
import pandas as pd

making a function to take SQL query as an argument and returns a pandas dataframe of that query

In [3]:
def run_query(q):
    with sqlite3.connect('chinook.db') as conn:
        return pd.read_sql(q, conn)

making a function that takes a SQL command as an argumant and executs it using sqlite module

In [4]:
def run_command(c):
    with sqlite3.connect('chinook.db') as conn:
        conn.isolation_level = None
        conn.execute(c)

creating a function that calls run_query() function to return a list of all tables and views in the databasee

In [5]:
def show_tables():
    q = '''
    SELECT
        name,
        type
    FROM sqlite_master
    WHERE type IN ("table","view");
    '''
    return run_query(q)

showing the tables

In [6]:
show_tables()

Unnamed: 0,name,type
0,album,table
1,artist,table
2,customer,table
3,employee,table
4,genre,table
5,invoice,table
6,invoice_line,table
7,media_type,table
8,playlist,table
9,playlist_track,table


## Selecting Albums to Purchase

query to return each genre, with the number of tracks sold in the USA in absolute numbers and in percentages

In [18]:
genres_num_track = '''
WITH track_sold_us AS(
    SELECT il.track_id AS track_id, SUM(il.quantity) AS tracks_sold
    FROM invoice_line AS il
    INNER JOIN invoice AS i ON i.invoice_id = il.invoice_id
    WHERE i.billing_country = 'USA'
    GROUP BY 1
    ORDER BY 2 DESC
    ),
    track_genre AS(
    SELECT t.track_id AS track_id, g.name AS genre_name
    FROM track AS t 
    INNER JOIN genre AS g ON g.genre_id = t.genre_id
    )

SELECT tg.genre_name AS genre_name, 
       SUM(tsu.tracks_sold) AS tracks_sold,
       ROUND(CAST(SUM(tsu.tracks_sold) AS FLOAT) / (SELECT SUM(tracks_sold) FROM track_sold_us) * 100, 2) AS percentage_sold
FROM track_genre AS tg 
INNER JOIN track_sold_us AS tsu ON tsu.track_id = tg.track_id
GROUP BY 1
ORDER BY 2 DESC;
'''
run_query(genres_num_track)

Unnamed: 0,genre_name,tracks_sold,percentage_sold
0,Rock,561,53.38
1,Alternative & Punk,130,12.37
2,Metal,124,11.8
3,R&B/Soul,53,5.04
4,Blues,36,3.43
5,Alternative,35,3.33
6,Pop,22,2.09
7,Latin,22,2.09
8,Hip Hop/Rap,20,1.9
9,Jazz,14,1.33


Out of the albums proposed, we can choose based on the most selling genre tracks. The album that can be profitable is Red Tone, which is punk, second most selling genre. This is followed by Slim Jim Bites, whose genre is Blues, fifth most sellling genre. As for the third album that can be included, based on the data, we must choose Pop, which sits on the seventh position

## Analyzing Employee Sales Performance

Finding the total dollar amount of sales assigned to each sales support agent within the company

In [20]:
sales_agent = '''
WITH sales_employee_list AS(
    SELECT * FROM employee WHERE title = "Sales Support Agent"
    ),
    sales_rep_sales AS(
    SELECT c.support_rep_id AS sales_id, SUM(i.total) AS total_sales
    FROM customer AS c 
    INNER JOIN invoice AS i ON i.customer_id = c.customer_id
    GROUP BY 1
    )

SELECT sel.first_name || " " || sel.last_name AS sales_employee,
       sel.hire_date,
       srs.total_sales
FROM sales_employee_list AS sel 
INNER JOIN sales_rep_sales AS srs ON srs.sales_id = sel.employee_id
ORDER BY 3 DESC
'''
run_query(sales_agent)

Unnamed: 0,sales_employee,hire_date,total_sales
0,Jane Peacock,2017-04-01 00:00:00,1731.51
1,Margaret Park,2017-05-03 00:00:00,1584.0
2,Steve Johnson,2017-10-17 00:00:00,1393.92


the total sales number can be understood by the data of hire date of the employee. The oldest hire has the most number of sales.

## Analyzing Sales by Country

analyzing the sales data for cutomers from each different country. These include the total number of customers, total value of sales, average value of sales per customer, and the average order value

In [22]:
sales_by_country = '''
WITH country_data AS(
        SELECT c.country AS country, 
               SUM(i.total) AS total_value, 
               COUNT(DISTINCT c.customer_id) AS total_customer
        FROM customer AS c 
        INNER JOIN invoice AS i ON i.customer_id = c.customer_id
        GROUP BY 1),

      billing_countries AS(
          SELECT billing_country, COUNT(invoice_id) AS total_invoice
          FROM invoice
          GROUP BY billing_country
      ),
      raw_data AS(
        SELECT 
            cd.country AS country,
            cd.total_customer AS customer,
            cd.total_value AS total,
            cd.total_value / cd.total_customer AS avg_cus,
            cd.total_value / bc.total_invoice AS avg_sales
        FROM country_data as cd 
        INNER JOIN billing_countries AS bc ON cd.country = bc.billing_country
        ORDER BY 3 DESC
      )

SELECT CASE
       WHEN rd.customer = 1 THEN "Other"
       ELSE rd.country
       END AS country,
       SUM(rd.customer),
       SUM(rd.total),
       SUM(rd.avg_cus),
       SUM(rd.avg_sales)
FROM raw_data as rd
GROUP BY 1
'''

run_query(sales_by_country)

Unnamed: 0,country,SUM(rd.customer),SUM(rd.total),SUM(rd.avg_cus),SUM(rd.avg_sales)
0,Brazil,5,427.68,85.536,7.011148
1,Canada,8,535.59,66.94875,7.047237
2,Czech Republic,2,273.24,136.62,9.108
3,France,5,389.07,77.814,7.7814
4,Germany,4,334.62,83.655,8.161463
5,India,2,183.15,91.575,8.721429
6,Other,15,1094.94,1094.94,111.676066
7,Portugal,2,185.13,92.565,6.383793
8,USA,13,1040.49,80.037692,7.942672
9,United Kingdom,3,245.52,81.84,8.768571
