NOTE:
-----

Please run the below cells first before proceeding- *you'll* **need** them soon!

In [1]:
%load_ext sql
%sql sqlite:///dataset_1.db

u'Connected: None@dataset_1.db'

Activity 2-6
------------
Quantifiers, NULLs, and Outer Joins

Exercise #1
-----------

Recall that the tables we just looked at:

`bagel`, which describes types of bagels made by the different bagel companies:
> * name STRING
> * price FLOAT
> * made_by STRING

And `purchase`:
> * bagel_name STRING
> * franchise STRING
> * date INT
> * quantity INT
> * purchaser_age INT

Where `purchase.bagel_name` references `bagel.name` and `purchase.franchise` references `bagel.made_by`.

**Can you find out if there were any purchases of products not on one of the company's official lists (i.e. the `bagel` table), using a single SQL query?**

Write your query here:

In [None]:
%%sql
SELECT b.name AS "Official List Name", p.bagel_name AS "Purchase Item", franchise AS "Bought From", p.date, p.quantity
FROM purchase p
LEFT OUTER JOIN bagel b
     ON b.name = p.bagel_name AND b.made_by = p.franchise
WHERE b.name IS NULL;

**^Oh my!^**

Practice Review
------------
Putting it all together

In addition to the `purchase` and `bagel` tables defined above, we will also make use of the `store` and `franchise` tables. Instead of me giving you the schemas for those tables, why don't you look them up yourself? Remember that the `sqlite_master` table has the following schema:

`sqlite_master`:
> * type STRING
> * name STRING
> * tbl_name STRING
> * rootpage INT
> * sql STRING

Keep in mind that `purchase.franchise` and `store.franchise` both reference `franchise.name`.

**Write a query to find out the sql schemas of the `store` and `franchise` tables below. Keep in mind that other database objects (e.g., a view) may have the same name as a table. You may want to also display the table name, to easily differentiate the sql statements.**

In [2]:
%sql SELECT name,sql FROM sqlite_master WHERE type='table' AND name IN ('store', 'franchise', 'states');

Done.


name,sql
states,"CREATE TABLE states(code INT, name VARCHAR(30), abbrev VARCHAR(2))"
franchise,"CREATE TABLE franchise (name TEXT, db_type TEXT)"
store,"CREATE TABLE store (franchise TEXT, location TEXT)"


Did you use `OR` in the previous query? Try it again with `IN`. If you used `IN`, try it again with `OR`.

In [3]:
%sql SELECT name,sql FROM sqlite_master WHERE type='table' AND (name = 'store' OR name = 'franchise' OR name = 'states');

Done.


name,sql
states,"CREATE TABLE states(code INT, name VARCHAR(30), abbrev VARCHAR(2))"
franchise,"CREATE TABLE franchise (name TEXT, db_type TEXT)"
store,"CREATE TABLE store (franchise TEXT, location TEXT)"


Let's now create another table that will map franchise locations to states via their 2-letter abbreviations (e.g., CA for California). The schema should be:

`location`:
> * name STRING
> * state STRING

Limit `state` to two characters and set `name` as key. After you create the table, insert the following tuples:

> * (NYC, NY)
> * (PA, PA)
> * (Chicago, IL)

**Write your queries below. Then, query all rows from the `location` table.**

In [4]:
%sql DROP table IF EXISTS `location`;


Done.


[]

In [5]:
"""
Expected output below

Don't re-execute this cell!
"""
%sql CREATE table `location` (`name` VARCHAR(100) PRIMARY KEY, `state` VARCHAR(2));
%sql INSERT INTO `location` VALUES ('NYC', 'NY');
%sql INSERT INTO `location` VALUES ('PA', 'PA');
%sql INSERT INTO `location` VALUES ('Chicago', 'IL');
%sql SELECT * FROM `location`;

Done.
1 rows affected.
1 rows affected.
1 rows affected.
Done.


name,state
NYC,NY
PA,PA
Chicago,IL


Find all the states that have `a` as the second letter in their names. Print out the name and abbreviation for those states.

In [6]:
"""
Expected output below

Don't re-execute this cell!
"""
%sql select name,abbrev from states where name LIKE '_a%'

Done.


name,abbrev
California,CA
Kansas,KS
Maine,ME
Maryland,MD
Massachusetts,MA
Washington,WA
Hawaii,HI
Pacific Islands,PI


Count the number of states and teritories in the United States (not including the capital). *Hint: you may want to check the quality of the data.*

Adjust your query to get the right number of states+teritories.

In [7]:
"""
Expected output below

Don't re-execute this cell!
"""
%sql select COUNT(*) as 'count' from states where length(abbrev) = 2 AND abbrev <> 'DC';

Done.


count
53


Write an `INTERSECT` query to find the set of states that have stores in them. Output the full state names.

In [8]:
"""
Expected output below

Don't re-execute this cell!
"""
%sql select distinct states.name from states,location,store WHERE location.state = states.abbrev AND store.location = location.name;

Done.


name
New York
Pennsylvania
Illinois


Write a query that will output the `name` of the state and the number of stores in that state (`num_stores`) for all states that have at least two stores.

In [9]:
"""
Expected output below

Don't re-execute this cell!
"""
%sql select name, count(name) as `num_stores` from ( select states.name from states,location,store WHERE location.state = states.abbrev AND store.location = location.name ) group by name having count(name) >= 2;

Done.


name,num_stores
New York,2
Pennsylvania,2


What are the names of the franchises that have  less than 4 stores? Use JOIN...ON in the query.

In [10]:
"""
Expected output below

Don't re-execute this cell!
"""
%sql select f.name from franchise f join store s on f.name = s.franchise group by f.name having count(s.location) < 4;

Done.


name
BAGEL CORP
Bobs Bagels
eBagel


What is the total revenue (`quantity` * `price`) derived from selling bagels to young adults (ages 18-35 years) in the states of `New York` or `Pennsylvania`?

In [11]:
"""
Expected output below

Don't re-execute this cell!
"""
%sql select SUM(revenue) as revenue from (select p.quantity * b.price as revenue from purchase p, bagel b, store s, location l, states t where p.franchise = s.franchise and p.bagel_name = b.name and p.franchise = b.made_by and s.location = l.name and l.state = t.abbrev and p.purchaser_age >= 18 and p.purchaser_age <= 35 and t.name IN('New York', 'Pennsylvania') ) ;

Done.


revenue
47.64
