# Introduction

This notebook contains my test answers to SQL problems from TestDome.

In [15]:
import sqlite3
import pandas as pd

## Regional Sales Comparison

An insurance company maintains records of sales made by its employees. Each employee is assigned to a state. States are grouped under regions.

Write a query that returns:

1. The region name.
2. Average sales per employee for the region (Average sales = Total sales made for the region / Number of employees in the region).
3. The difference between the average sales of the region with the highest average sales, and the average sales per employee for the region (average sales to be calculated as explained above).

Employees can have multiple sales. A region with no sales should be also returned. Use 0 for average sales per employee for such a region when calculating the 2nd and the 3rd column.

In [3]:
conn = sqlite3.connect('dbs/regional_sales.db')
cur = conn.cursor()

In [11]:
# example data

setup_qs = ["""
CREATE TABLE IF NOT EXISTS regions( id INTEGER PRIMARY KEY, name VARCHAR(50) NOT NULL );
""","""
CREATE TABLE IF NOT EXISTS states( id INTEGER PRIMARY KEY, name VARCHAR(50) NOT NULL, regionId INTEGER NOT NULL REFERENCES regions(id) );
""","""
CREATE TABLE IF NOT EXISTS employees ( id INTEGER PRIMARY KEY, name VARCHAR(50) NOT NULL, stateId INTEGER NOT NULL REFERENCES states(id) );
""","""
CREATE TABLE IF NOT EXISTS sales ( id INTEGER PRIMARY KEY, amount INTEGER NOT NULL, employeeId INTEGER NOT NULL REFERENCES employees(id) );
""","""
INSERT INTO regions(id, name) VALUES(1, 'North'); ""","""
INSERT INTO regions(id, name) VALUES(2, 'South'); ""","""
INSERT INTO regions(id, name) VALUES(3, 'East'); ""","""
INSERT INTO regions(id, name) VALUES(4, 'West'); ""","""
INSERT INTO regions(id, name) VALUES(5, 'Midwest');
""","""
INSERT INTO states(id, name, regionId) VALUES(1, 'Minnesota', 1); ""","""
INSERT INTO states(id, name, regionId) VALUES(2, 'Texas', 2); ""","""
INSERT INTO states(id, name, regionId) VALUES(3, 'California', 3); ""","""
INSERT INTO states(id, name, regionId) VALUES(4, 'Columbia', 4); ""","""
INSERT INTO states(id, name, regionId) VALUES(5, 'Indiana', 5);
""","""
INSERT INTO employees(id, name, stateId) VALUES(1, 'Jaden', 1); ""","""
INSERT INTO employees(id, name, stateId) VALUES(2, 'Abby', 1); ""","""
INSERT INTO employees(id, name, stateId) VALUES(3, 'Amaya', 2); ""","""
INSERT INTO employees(id, name, stateId) VALUES(4, 'Robert', 3); ""","""
INSERT INTO employees(id, name, stateId) VALUES(5, 'Tom', 4); ""","""
INSERT INTO employees(id, name, stateId) VALUES(6, 'William', 5);
""","""
INSERT INTO sales(id, amount, employeeId) VALUES(1, 2000, 1); ""","""
INSERT INTO sales(id, amount, employeeId) VALUES(2, 3000, 2); ""","""
INSERT INTO sales(id, amount, employeeId) VALUES(3, 4000, 3); ""","""
INSERT INTO sales(id, amount, employeeId) VALUES(4, 1200, 4); ""","""
INSERT INTO sales(id, amount, employeeId) VALUES(5, 2400, 5);
"""]

[cur.execute(q) for q in setup_qs]

[<sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>,
 <sqlite3.Cursor at 0x7fb106c90810>]

In [22]:
# my answer

cur.execute("""
WITH 
    region_sales (region, avg_sales_per_employee)
    AS
    (
        SELECT r.name
            , COALESCE(SUM(sa.amount) / COUNT(DISTINCT e.id), 0)
        FROM regions r
        LEFT JOIN states st ON st.regionId = r.id
        LEFT JOIN employees e ON e.stateId = st.id
        LEFT JOIN sales sa ON sa.employeeId = e.id
        GROUP BY r.id
    )
SELECT region
    , COALESCE(avg_sales_per_employee, 0) AS regional_avg
    , COALESCE((SELECT MAX(avg_sales_per_employee) FROM region_sales) - avg_sales_per_employee, 0) as diff_from_max
FROM region_sales
""")

df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df

Unnamed: 0,region,regional_avg,diff_from_max
0,North,2500,1500
1,South,4000,0
2,East,1200,2800
3,West,2400,1600
4,Midwest,0,4000
