# Summarizing Data with SQL

## Summary Statistics
32) How many rows are in the `pets` table?
33) How many female pets are in the `pets` table?
34) How many female cats are in the `pets` table?
35) What's the mean age of pets in the `pets` table?
36) What's the mean age of dogs in the `pets` table?
37) What's the mean age of male dogs in the `pets` table?
38) What's the count, mean, minimum, and maximum of pet ages in the `pets` table?
    * _NOTE:_ SQLite doesn't have built-in formulas for standard deviation or median!
39) Repeat the previous problem with the following stipulations:
    * Round the average to one decimal place.
    * Give each column a human-readable column name (for example, "Average Age")
40) How many rows in `employees_null` have missing salaries?
41) How many salespeople in `employees_null` having _nonmissing_ salaries?
42) What's the mean salary of employees who joined the company after 2010? Go back to the usual `employees` table for this one.
    * _Hint:_ You may need to use the `CAST()` function for this. To cast a string as a float, you can do `CAST(x AS REAL)`
43) What's the mean salary of employees in Swiss Francs?
    * _Hint:_ Swiss Francs are abbreviated "CHF" and 1 USD = 0.97 CHF.
44) Create a query that computes the mean salary in USD as well as CHF. Give the columns human-readable names (for example "Mean Salary in USD"). Also, format them with comma delimiters and currency symbols.
    * _NOTE:_ Comma-delimiting numbers is only available for integers in SQLite, so rounding (down) to the nearest dollar or franc will be done for us.
    * _NOTE2:_ The symbols for francs is simply `Fr.` or `fr.`. So an example output will look like `100,000 Fr.`.

## Aggregating Statistics with GROUP BY
45) What is the average age of `pets` by species?
46) Repeat the previous problem but make sure the species label is also displayed! Assume this behavior is always being asked of you any time you use `GROUP BY`.
47) What is the count, mean, minimum, and maximum age by species in `pets`?
48) Show the mean salaries of each job title in `employees`.
49) Show the mean salaries in New Zealand dollars of each job title in `employees`.
    * _NOTE:_ 1 USD = 1.65 NZD.
50) Show the mean, min, and max salaries of each job title in `employees`, as well as the numbers of employees in each category.
51) Show the mean salaries of each job title in `employees` sorted descending by salary.
52) What are the top 5 most common first names among `employees`?
53) Show all first names which have exactly 2 occurrences in `employees`.
54) Take a look at the `transactions` table to get a idea of what it contains. Note that a transaction may span multiple rows if different items are purchased as part of the same order. The employee who made the order is also given by their ID.
55) Show the top 5 largest orders (and their respective customer) in terms of the numbers of items purchased in that order.
56) Show the total cost of each transaction.
    * _Hint:_ The `unit_price` column is the price of _one_ item. The customer may have purchased multiple.
    * _Hint2:_ Note that transactions here span multiple rows if different items are purchased.
57) Show the top 5 transactions in terms of total cost.
58) Show the top 5 customers in terms of total revenue (ie, which customers have we done the most business with in terms of money?)
59) Show the top 5 employees in terms of revenue generated (ie, which employees made the most in sales?)
60) Which customer worked with the largest number of employees?
    * _Hint:_ This is a tough one! Check out the `DISTINCT` keyword.
61) Show all customers who've done more than $80,000 worth of business with us.


In [1]:
# Import Pandas and Create_Engine

from sqlalchemy import create_engine
import pandas as pd
import sqlite3

con = sqlite3.connect("ladder.db")

32) How many rows are in the pets

In [2]:
pd.read_sql("SELECT COUNT(*) AS total_rows FROM pets;", con)

Unnamed: 0,total_rows
0,13


33) How many female pets are in the pets table?

In [3]:
sql = '''
SELECT COUNT(*) AS "Female_Pets"
FROM pets
WHERE sex = 'F';
'''
pd.read_sql_query(sql, con)

Unnamed: 0,Female_Pets
0,7


34) How many female cats are in the pets table? 

In [4]:
sql = '''
SELECT COUNT(*) AS "Female_Cats"
FROM pets
WHERE sex = 'F' AND LOWER(species) = 'cat';
'''
pd.read_sql_query(sql, con)

Unnamed: 0,Female_Cats
0,4


35) What's the mean age of pets in the pets table?

In [5]:
sql = '''
SELECT ROUND(AVG(age),2) AS "avg_age"
FROM pets;
'''
pd.read_sql_query(sql, con)

Unnamed: 0,avg_age
0,5.23


36) What's the mean age of dogs in the pets table?

In [6]:
sql = '''
SELECT ROUND(AVG(age),2) AS "avg_dog_age"
FROM pets
WHERE species = 'dog';
'''
pd.read_sql_query(sql, con)

Unnamed: 0,avg_dog_age
0,6.5


37) What's the mean age of male dogs in the pets table?

In [7]:
sql = '''
SELECT ROUND(AVG(age),2) AS "avg_maledog_age"
FROM pets
WHERE species = 'dog' AND sex = 'M';
'''
pd.read_sql_query(sql, con)

Unnamed: 0,avg_maledog_age
0,8.33


38) What's the count, mean, minimum, and maximum of pet ages in the pets table? * NOTE: SQLite doesn't have built-in formulas for standard deviation or median!

In [8]:
sql = '''
SELECT COUNT(age) as "age_count", ROUND(AVG(age),2) as "avg_age",\
MIN(age) AS "min_age", MAX(age) AS max_age
FROM pets;
'''
pd.read_sql_query(sql, con)

Unnamed: 0,age_count,avg_age,min_age,max_age
0,13,5.23,1,10


39) Repeat the previous problem with the following stipulations: * Round the average to one decimal place. * Give each column a human-readable column name (for example, "Average Age")

In [9]:
sql = '''
SELECT COUNT(age) as "age_count", ROUND(AVG(age),1) as "avg_age",\
MIN(age) AS "min_age", MAX(age) AS max_age
FROM pets;
'''
pd.read_sql_query(sql, con)

Unnamed: 0,age_count,avg_age,min_age,max_age
0,13,5.2,1,10


40) How many rows in employees_null have missing salaries?

In [10]:
sql = '''

SELECT COUNT(*) AS missing_salaries
FROM employees_null
WHERE salary IS NULL;

'''
pd.read_sql_query(sql, con)

Unnamed: 0,missing_salaries
0,10


41) How many salespeople in employees_null having nonmissing salaries? 

In [11]:
sql = '''

SELECT COUNT(*) AS non_missing_salaries
FROM employees_null
WHERE salary IS NOT NULL AND job = 'Sales';

'''
pd.read_sql_query(sql, con)

Unnamed: 0,non_missing_salaries
0,60


42) What's the mean salary of employees who joined the company after 2010? Go back to the usual employees table for this one. * Hint: You may need to use the CAST() function for this. To cast a string as a float, you can do CAST(x AS REAL)

In [12]:
sql = """

SELECT AVG(salary) as "avg_salary"
FROM employees
WHERE CAST(startdate AS REAL) = "2010";

"""

pd.read_sql_query(sql, con)

Unnamed: 0,avg_salary
0,86957.5


43) What's the mean salary of employees in Swiss Francs? * Hint: Swiss Francs are abbreviated "CHF" and 1 USD = 0.97 CHF.

In [13]:
sql = """

SELECT ROUND(AVG(salary/0.97),2) AS "mean_salary_chf"
FROM employees;

"""

pd.read_sql_query(sql, con)

Unnamed: 0,mean_salary_chf
0,80484.18


44) Create a query that computes the mean salary in USD as well as CHF. Give the columns human-readable names (for example "Mean Salary in USD"). Also, format them with comma delimiters and currency symbols. * NOTE: Comma-delimiting numbers is only available for integers in SQLite, so rounding (down) to the nearest dollar or franc will be done for us. * NOTE2: The symbols for francs is simply Fr. or fr.. So an example output will look like 100,000 Fr..

In [14]:
sql = """

SELECT FORMAT("%,d" ,ROUND(AVG(salary/0.97))) || " Fr" AS "mean_salary_chf",
FORMAT("$%,d", ROUND(AVG(salary))) AS "mean_salary_usd"
FROM employees;

"""

pd.read_sql_query(sql, con)

Unnamed: 0,mean_salary_chf,mean_salary_usd
0,"80,484 Fr","$78,070"


45) What is the average age of pets by species? 

In [15]:
sql = '''
SELECT ROUND(AVG(age),2) as "avg_age"
FROM pets
GROUP BY species;

'''
pd.read_sql_query(sql, con)

Unnamed: 0,avg_age
0,4.33
1,6.5
2,3.0


46) Repeat the previous problem but make sure the species label is also displayed! Assume this behavior is always being asked of you any time you use GROUP BY.

In [16]:
sql = '''
SELECT species, ROUND(AVG(age),2) as "avg_age"
FROM pets
GROUP BY species;

'''
pd.read_sql_query(sql, con)

Unnamed: 0,species,avg_age
0,cat,4.33
1,dog,6.5
2,lobster,3.0


47) What is the count, mean, minimum, and maximum age by species in pets

In [17]:
sql = '''
SELECT species, ROUND(AVG(age),2) as "avg_age", 
COUNT(age) AS "age_count",
MIN(age) AS "min_age",
MAX(age) AS "max_age"
FROM pets
GROUP BY species;

'''
pd.read_sql_query(sql, con)

Unnamed: 0,species,avg_age,age_count,min_age,max_age
0,cat,4.33,6,2,7
1,dog,6.5,6,1,10
2,lobster,3.0,1,3,3


48) Show the mean salaries of each job title in employees

In [18]:
sql = '''
SELECT job, ROUND(AVG(salary),2) as "avg_salary"
FROM employees
GROUP BY job;

'''
pd.read_sql_query(sql, con)

Unnamed: 0,job,avg_salary
0,Administrator,71986.14
1,IT,71381.0
2,Operations,74055.25
3,Sales,80778.04


49) Show the mean salaries in New Zealand dollars of each job title in employees. * NOTE: 1 USD = 1.65 NZD

In [19]:
sql = '''
SELECT job, ROUND(AVG(salary *1.65),2) as "avg_salary_NZD"
FROM employees
GROUP BY job;

'''
pd.read_sql_query(sql, con)

Unnamed: 0,job,avg_salary_NZD
0,Administrator,118777.14
1,IT,117778.65
2,Operations,122191.16
3,Sales,133283.77


50) Show the mean, min, and max salaries of each job title in employees, as well as the numbers of employees in each category. 

In [20]:
sql = '''
SELECT job, COUNT(*) AS "Num_of_Emps",
ROUND(AVG(salary),2) AS "Avg_Salary",
MIN(salary) AS "Min_Salary",
MAX(salary) AS "Max_Salary"
FROM employees
GROUP BY job;

'''
pd.read_sql_query(sql, con)

Unnamed: 0,job,Num_of_Emps,Avg_Salary,Min_Salary,Max_Salary
0,Administrator,14,71986.14,41151,120492
1,IT,10,71381.0,37397,115729
2,Operations,8,74055.25,41797,108989
3,Sales,68,80778.04,31333,124474


51) Show the mean salaries of each job title in employees sorted descending by salary.

In [21]:
sql = '''
SELECT job, ROUND(AVG(salary),2) as "avg_salary"
FROM employees
GROUP BY job
ORDER BY salary DESC;

'''
pd.read_sql_query(sql, con)

Unnamed: 0,job,avg_salary
0,Sales,80778.04
1,IT,71381.0
2,Operations,74055.25
3,Administrator,71986.14


52) What are the top 5 most common first names among employees

In [22]:
sql = '''
SELECT COUNT(LOWER(firstname)) AS "name_count", firstname
FROM employees
GROUP BY firstname
ORDER BY COUNT(firstname) DESC
LIMIT 5;

'''
pd.read_sql_query(sql, con)

Unnamed: 0,name_count,firstname
0,3,Thomas
1,3,Robert
2,3,Michael
3,3,Lisa
4,2,William


 53) Show all first names which have exactly 2 occurrences in employees.

In [23]:
sql = '''
SELECT COUNT(LOWER(firstname)) AS "name_count", firstname
FROM employees
GROUP BY firstname
HAVING name_count = 2;

'''
pd.read_sql_query(sql, con)

Unnamed: 0,name_count,firstname
0,2,Christopher
1,2,Donald
2,2,Elizabeth
3,2,Jacob
4,2,Joseph
5,2,Leslie
6,2,Mark
7,2,Shannon
8,2,William


54) Take a look at the transactions table to get a idea of what it contains. Note that a transaction may span multiple rows if different items are purchased as part of the same order. The employee who made the order is also given by their ID.

In [24]:
sql = '''
SELECT *
FROM transactions
LIMIT 5;
'''
pd.read_sql_query(sql, con)

Unnamed: 0,order_id,customer,unit_price,quantity,orderdate,employee_id
0,0,Bautista Group,20.5,12,2018-10-27,81
1,0,Bautista Group,24.0,11,2018-10-27,81
2,0,Bautista Group,22.25,14,2018-10-27,81
3,0,Bautista Group,10.5,11,2018-10-27,81
4,0,Bautista Group,13.75,9,2018-10-27,81


55) Show the top 5 largest orders (and their respective customer) in terms of the numbers of items purchased in that order.

In [25]:
sql = '''
SELECT order_id, customer, quantity
FROM transactions
GROUP BY order_id
ORDER BY quantity DESC
LIMIT 5;
'''
pd.read_sql_query(sql, con)

Unnamed: 0,order_id,customer,quantity
0,1007,Robertson-Jones,24
1,281,"Robertson, Park and Thompson",24
2,7908,"Swanson, Guerrero and Garcia",23
3,4493,Sanders PLC,23
4,3142,Miller-Carter,23


56) Show the total cost of each transaction. * Hint: The unit_price column is the price of one item. The customer may have purchased multiple. * Hint2: Note that transactions here span multiple rows if different items are purchased.

In [26]:
sql = '''
SELECT order_id, customer, SUM(unit_price) AS "total_cost"
FROM transactions
GROUP BY order_id, customer
LIMIT 5;
'''
pd.read_sql_query(sql, con)

Unnamed: 0,order_id,customer,total_cost
0,0,Bautista Group,109.0
1,1,Pineda PLC,102.75
2,2,Smith-Thomas,42.5
3,3,"Norton, Lin and Kelly",9.25
4,4,Bennett-Brown,120.0


57) Show the top 5 transactions in terms of total cost.

In [27]:
sql = '''
SELECT order_id, customer, SUM(unit_price) AS "total_cost"
FROM transactions
GROUP BY order_id, customer
ORDER BY total_cost DESC
LIMIT 5;
'''
pd.read_sql_query(sql, con)

Unnamed: 0,order_id,customer,total_cost
0,6324,"Taylor, Patel and Harvey",234.5
1,2435,Hill Ltd,221.5
2,6338,"Castaneda, Nguyen and Erickson",214.75
3,612,"Little, Chang and Hernandez",213.75
4,2190,"Griffin, Jones and Foster",213.75


58) Show the top 5 customers in terms of total revenue (ie, which customers have we done the most business with in terms of money?)

In [28]:
sql = '''
SELECT customer, SUM(quantity * unit_price) AS "total_spent"
FROM transactions 
GROUP BY customer
ORDER BY total_spent DESC
LIMIT 5;
'''
pd.read_sql_query(sql, con)

Unnamed: 0,customer,total_spent
0,Kelly-Wright,89645.25
1,Tucker Ltd,85485.0
2,Sanders PLC,84383.0
3,Ewing-Black,83294.25
4,"Taylor, Patel and Harvey",81818.25


59) Show the top 5 employees in terms of revenue generated (ie, which employees made the most in sales?)

In [29]:
sql = '''
SELECT *
FROM employees
WHERE job = 'Sales'
ORDER BY salary DESC
LIMIT 5;
'''
pd.read_sql_query(sql, con)

Unnamed: 0,ID,firstname,lastname,job,salary,startdate
0,82,Ryan,Ellis,Sales,124474,2014-04-17
1,0,Christine,Thompson,Sales,123696,2005-01-20
2,90,Julie,Bennett,Sales,123003,2009-07-24
3,23,Marissa,Schmidt,Sales,122688,2003-05-07
4,20,Lisa,Morgan,Sales,120555,1996-11-19


60) Which customer worked with the largest number of employees? * Hint: This is a tough one! Check out the DISTINCT keyword. 

In [30]:
sql = '''
SELECT DISTINCT count(*) AS "num_worked_with", customer
FROM transactions 
GROUP BY customer
ORDER BY num_worked_with DESC
LIMIT 1;
'''
pd.read_sql_query(sql, con)

Unnamed: 0,num_worked_with,customer
0,627,Kelly-Wright


61) Show all customers who've done more than $80,000 worth of business with us.

In [31]:
sql = '''
SELECT customer, SUM(quantity * unit_price) AS "total_spent"
FROM transactions 
GROUP BY customer
ORDER BY total_spent DESC
LIMIT 8;
'''
pd.read_sql_query(sql, con)

Unnamed: 0,customer,total_spent
0,Kelly-Wright,89645.25
1,Tucker Ltd,85485.0
2,Sanders PLC,84383.0
3,Ewing-Black,83294.25
4,"Taylor, Patel and Harvey",81818.25
5,"Vega, Rivera and Elliott",81595.0
6,Norman-Briggs,80331.5
7,Thompson-Fowler,80152.25
