<a href="https://colab.research.google.com/github/ranjukhanal11/ranjukhanal11/blob/main/07_SQL_Wrap.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SQL Wrap Up

To be honest, I am not sure what else to show you.  Not that there isn't more to cover but that I have exhausted by knowledge.

What I think we should do today is to challenge one another.  We are familiar with some of the datasets, can we challenge one another with questions about the data.



In [1]:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated


I'll start with an example.

How many bottles in each category cost more than two standard deviations above the mean?

First I'll gather the stats of mean and standard deviation.

In [None]:
%%bigquery --project pic-math

SELECT category_name, AVG(state_bottle_retail) as average, STDDEV(state_bottle_retail) as standarddeviation 
FROM `bigquery-public-data.iowa_liquor_sales.sales`
GROUP BY category_name

Unnamed: 0,category_name,average,standarddeviation
0,DECANTERS & SPECIALTY PACKAGES,22.710670,40.113689
1,APPLE SCHNAPPS,10.087106,2.645043
2,SINGLE BARREL BOURBON WHISKIES,34.218243,238.381471
3,Coffee Liqueurs,16.515128,7.339069
4,Triple Sec,4.073485,0.836835
...,...,...,...
131,IMPORTED VODKA - CHERRY,10.580000,0.000000
132,HIGH PROOF BEER - AMERICAN,122.138125,10.833780
133,Imported Whiskies,91.980000,22.650188
134,Delisted / Special Order Items,27.750000,


I am going to use the above table to join with the full table and make the comparison.

In [None]:
%%bigquery --project pic-math

WITH statsTable as(
SELECT category_name, AVG(state_bottle_retail) as average, STDDEV(state_bottle_retail) as standarddeviation 
FROM `bigquery-public-data.iowa_liquor_sales.sales`
GROUP BY category_name
)

SELECT t.category_name, COUNT(*) as bottles_over_two_sd
FROM `bigquery-public-data.iowa_liquor_sales.sales` t JOIN statsTable 
      ON t.category_name = statsTable.category_name
WHERE t.state_bottle_retail > statsTable.average + 2*statsTable.standarddeviation
GROUP BY category_name

Unnamed: 0,category_name,bottles_over_two_sd
0,COFFEE LIQUEURS,2771
1,IMPORTED DRY GINS,10258
2,Triple Sec,5063
3,BLACKBERRY BRANDIES,6615
4,DISTILLED SPIRITS SPECIALTY,2073
...,...,...
109,WHITE CREME DE CACAO,1
110,WHITE CREME DE MENTHE,2
111,GREEN CREME DE MENTHE,1
112,CHERRY BRANDIES,1


I have a result but I see several issues here.  One I am counting the same bottles over and over again.  I think I need to group by the *item_description* before I take an average over the category.

In [None]:
%%bigquery --project pic-math

SELECT category_name, item_description, AVG(state_bottle_retail) as average, STDDEV(state_bottle_retail) as standarddeviation 
FROM `bigquery-public-data.iowa_liquor_sales.sales`
GROUP BY category_name, item_description

Unnamed: 0,category_name,item_description,average,standarddeviation
0,Triple Sec,Montezuma Triple Sec,3.200000,0.000000
1,IMPORTED DRY GINS,Tanqueray Gin,21.887818,7.089758
2,IMPORTED DRY GINS,Bombay Sapphire Gin,24.606813,6.547968
3,STRAIGHT RYE WHISKIES,Templeton Rye,26.584527,2.441966
4,TROPICAL FRUIT SCHNAPPS,Maui Blue Hawaiian Schnapps,6.214909,0.650606
...,...,...,...,...
13825,Imported Distilled Spirit Specialty,Stoli Crushed Pineapple,13.490000,0.000000
13826,Imported Distilled Spirit Specialty,Tequila Anejo 1 liter,85.230000,0.000000
13827,Imported Distilled Spirit Specialty,The London No. 1 Gin,25.500000,
13828,Imported Distilled Spirit Specialty,Kujira 20 Year Single Grain Ryukyu Whisky,342.990000,


That did not work as I'd have hoped (and took a long time to run!)  Instead I am going to gather the `MAX` price of each bottle by *item_description* and use that.  It won't give perfect statistics but it will be better than what I have done.

In [None]:
%%bigquery --project pic-math

SELECT category_name, item_description, MAX(state_bottle_retail) as retail_max
FROM `bigquery-public-data.iowa_liquor_sales.sales`
GROUP BY category_name, item_description

Unnamed: 0,category_name,item_description,retail_max
0,MISCELLANEOUS SCHNAPPS,99 Cinnamon Mini,8.91
1,COFFEE LIQUEURS,Kahlua Coffee Liqueur,37.49
2,Triple Sec,Juarez Triple Sec,4.01
3,CINNAMON SCHNAPPS,Dekuyper Hot Damn! Pet,17.31
4,Coffee Liqueurs,Kahlua Coffee,39.72
...,...,...,...
13825,Imported Distilled Spirit Specialty,Tequila Anejo 375ml pilar,36.77
13826,Imported Distilled Spirit Specialty,Bobbys Schiedam Jenever,217.50
13827,Imported Distilled Spirit Specialty,Amor Mio Blanco,45.00
13828,Imported Distilled Spirit Specialty,Amrita Indian Whiskey 750ml Fenice,108.60


In [None]:
%%bigquery --project pic-math

WITH statsTable as(
SELECT category_name, AVG(retail_max) as average, STDDEV(retail_max) as standarddeviation 
FROM (SELECT category_name, item_description, MAX(state_bottle_retail) as retail_max
      FROM `bigquery-public-data.iowa_liquor_sales.sales`
      GROUP BY category_name, item_description) 
GROUP BY category_name
)

SELECT t.category_name, COUNT(*) as bottles_over_two_sd
FROM `bigquery-public-data.iowa_liquor_sales.sales` t JOIN statsTable 
      ON t.category_name = statsTable.category_name
WHERE t.state_bottle_retail > statsTable.average + 2*statsTable.standarddeviation
GROUP BY category_name
ORDER BY bottles_over_two_sd

Unnamed: 0,category_name,bottles_over_two_sd
0,BARBADOS RUM,1
1,Mezcal,1
2,JAPANESE WHISKY,1
3,CREME DE ALMOND,1
4,FLAVORED RUM,1
...,...,...
94,PEACH SCHNAPPS,4659
95,Flavored Rum,4939
96,American Flavored Vodka,5746
97,Whiskey Liqueur,7330


Can you improve the way I have done this?  Does it make sense that there are that many bottles of Whiskey?

## Your Turn

Challenge your mates with a difficult question.  Try to stump them and don't forget to try it yourself!

# What zip code consumes the most liquor per capita in Iowa?


In [3]:
%%bigquery --project white-device-278509

SELECT 
  zip_code
  ,ROUND(SUM(volume_sold_gallons),2) AS liquor_consumed_gallons
FROM `bigquery-public-data.iowa_liquor_sales.sales` 
GROUP BY 1
ORDER BY 2 DESC limit 1

Unnamed: 0,zip_code,liquor_consumed_gallons
0,50320,1124398.88
