# DS-SF-36 | 04 | Databases and Scrapping | Assignment | Starter Code

## `SQLite` and Bistro

In this assignment, we will be exploring the `bistro` dataset.  The previous assignment used `pandas`.  Today, we'll answer the same questions but using `SQLite`.  In some situations, `pandas` will be a better solution.  In others, doing it using `SQL` will make more sense.  As you gain more experience, you'll know which one to use.

> ### Question 1.  Import the `sqlite3` package.

In [1]:
import os

import pandas as pd
pd.set_option('display.max_rows', 10)
pd.set_option('display.notebook_repr_html', True)
pd.set_option('display.max_columns', 10)

import sqlite3

> ### Question 2.  Connect to the `dataset-04-bistro.db` database.  The rest of this assignment focus on the `bistro` table.

In [2]:
db = sqlite3.connect(os.path.join('..', 'datasets', 'dataset-04-bistro.db'))

> ### Question 3.  How many samples (i.e., rows) are in this dataset?

In [15]:
pd.io.sql.read_sql(
'''
SELECT COUNT()
    FROM bistro
    LIMIT 10
;
''', con = db)

Unnamed: 0,COUNT()
0,244


In [14]:
#Answer: 244 rows

> ### Question 4.  Print the first two rows of the table to the console.

In [203]:
pd.io.sql.read_sql(
'''
SELECT *
    FROM bistro
    Where name = "Daniel"
    LIMIT 2
;
''', con = db)

Unnamed: 0,index,day,time,name,gender,is_smoker,party,check,tip
0,64,Saturday,Dinner,Daniel,Male,0,3,17.59,2.64
1,172,Sunday,Dinner,Daniel,Male,0,2,7.25,5.15


> ### Question 5.  For which week days does the dataset have data for?

In [18]:
pd.io.sql.read_sql(
'''
SELECT DISTINCT(day)
    FROM bistro
;
''', con = db)

Unnamed: 0,day
0,Sunday
1,Saturday
2,Thursday
3,Friday


In [19]:
# Sunday, Saturday, Thursday, Friday

> ### Question 6.  How often was the bistro patronized for each week day?

In [31]:
pd.io.sql.read_sql(
'''
SELECT COUNT(time)
    FROM bistro
    WHERE day = "Sunday"
UNION ALL
SELECT COUNT(time)
    FROM bistro
    WHERE day = "Saturday"
UNION ALL
SELECT COUNT(time)
    FROM bistro
    WHERE day = "Thursday"
    UNION ALL
SELECT COUNT(time)
    FROM bistro
    WHERE day = "Friday"
;
''', con = db)

Unnamed: 0,COUNT(time)
0,76
1,87
2,62
3,19


Answer: Sunday = 76; Saturday = 87; Thursday = 62; Friday = 19

> ### Question 7.  How much tip did waiters collect for each week day?

In [33]:
pd.io.sql.read_sql(
'''
SELECT SUM(tip)
    FROM bistro
    WHERE day = "Sunday"
UNION ALL
SELECT SUM(tip)
    FROM bistro
    WHERE day = "Saturday"
UNION ALL
SELECT SUM(tip)
    FROM bistro
    WHERE day = "Thursday"
    UNION ALL
SELECT SUM(tip)
    FROM bistro
    WHERE day = "Friday"
;
''', con = db)

Unnamed: 0,SUM(tip)
0,247.39
1,260.4
2,171.83
3,51.96


Answer: Sunday: 247.39; Saturday: 260.40; Thursday: 171.83; Friday: 51.96

> ### Question 8.  What is the average tip per check (in absolute \$) for each week day?

In [47]:
pd.io.sql.read_sql(
'''
SELECT AVG(tip)
    FROM bistro
    WHERE day = "Sunday"
UNION ALL
SELECT AVG(tip)
    FROM bistro
    WHERE day = "Saturday"
UNION ALL
SELECT AVG(tip)
    FROM bistro
    WHERE day = "Thursday"
    UNION ALL
SELECT AVG(tip)
    FROM bistro
    WHERE day = "Friday"
;
''', con = db)

Unnamed: 0,AVG(tip)
0,3.255132
1,2.993103
2,2.771452
3,2.734737


Answer: Sunday: 3.26; Satuday: 2.99; Thursday: 2.77; Friday: 2.73

> ### Question 9.  What is the average tip per check (as a percentage of the check) for each week day?

(`CHECK` is a reserved keywork; use `` `check` `` (put the name between backticks) to reference the `check` column)

In [74]:
pd.io.sql.read_sql(
'''
SELECT AVG(tip/`check`*100) AS avg_tip
            FROM bistro
            WHERE day = "Sunday"
            UNION ALL
SELECT AVG(tip/`check`*100)
            FROM bistro
            WHERE day = "Saturday"
            UNION ALL
SELECT AVG(tip/`check`*100)
            FROM bistro
            WHERE day = "Thursday"
            UNION ALL
SELECT AVG(tip/`check`*100)
            FROM bistro
            WHERE day = "Friday"            
;
''', con = db)

Unnamed: 0,avg_tip
0,16.689729
1,15.315172
2,16.127563
3,16.991303


Answer: Sunday: 16.68; Saturday: 15.31; Thursday: 16.13; Friday: 16.99

> ### Question 10.  Are there any name in common between male and female patrons?  (E.g., `Chris` can refer to either a man or a woman)

In [129]:
pd.io.sql.read_sql(
'''
With Names AS 
    (SELECT x.name AS name_x, x.gender AS gender_x, y.name AS name_y, y.gender AS gender_y
        FROM bistro AS x
        JOIN bistro AS y ON name_x = name_Y
        )
    
SELECT DISTINCT(name_x) FROM Names
    WHERE gender_x != gender_y
;
''', con = db)

Unnamed: 0,(name_x)
0,Casey


Answer: Casey

> ### Question 11.  If no patrons share the same name, how many unique patrons are in the dataset?

In [136]:
pd.io.sql.read_sql(
'''
With Names AS 
    (SELECT x.name AS name_x, x.gender AS gender_x, y.name AS name_y, y.gender AS gender_y
        FROM bistro AS x
        JOIN bistro AS y ON name_x = name_Y
        )
    
SELECT DISTINCT(name_x) FROM Names
;
''', con = db)

Unnamed: 0,(name_x)
0,Kimberly
1,Nicholas
2,Larry
3,Joseph
4,Janice
...,...
176,Darwin
177,Henry
178,Jeremy
179,Dorothy


Answer: 181

> ### Question 12.  How many times did `Kevin` patronized the bistro?  How about `Alice`?

In [138]:
pd.io.sql.read_sql(
'''
SELECT * 
    FROM bistro
    WHERE name ="Kevin"
    UNION ALL
SELECT * 
    FROM bistro
    WHERE name ="Alice"
;
''', con = db)

Unnamed: 0,index,day,time,name,gender,is_smoker,party,check,tip
0,96,Friday,Dinner,Kevin,Male,0,2,27.28,4.0
1,187,Sunday,Dinner,Kevin,Male,0,5,30.46,2.0
2,218,Saturday,Dinner,Kevin,Male,0,2,7.74,1.44
3,239,Saturday,Dinner,Kevin,Male,0,3,29.03,5.92
4,16,Sunday,Dinner,Alice,Female,0,3,10.33,1.67
5,214,Saturday,Dinner,Alice,Female,0,3,28.17,6.5


Answer: 4 & 2

> ### Question 13.  Who are the top 3 female and male patrons?

In [179]:
pd.io.sql.read_sql(
'''
WITH Males AS
    (SELECT * 
        FROM bistro
        WHERE gender = "Male"),

Females AS 
    (SELECT * 
        FROM bistro
        WHERE gender = "Female")

SELECT COUNT() as count, name
    FROM Females
    GROUP BY name
    ORDER BY count DESC
    LIMIT 3
;
''', con = db)

Unnamed: 0,count,name
0,4,Mary
1,3,Casey
2,3,Laura


Answer: 
4	Mary
3	Casey
3	Laura

8	David
5	Casey
5	James

> ### Question 14.  Who's the best tipper (as a fraction of all tips over all check totals)?  Who's the worst?  How many times did they patronize the bistro?

In [202]:
pd.io.sql.read_sql(
'''
SELECT COUNT() as count, name, AVG(tip/`check`*100) AS avg_tip
    FROM bistro
    GROUP BY name
    ORDER BY avg_tip DESC
;
''', con = db)

Unnamed: 0,count,name,avg_tip
0,2,Daniel,43.021505
1,1,Maryann,41.666667
2,1,Bailey,32.573290
3,1,Dennis,28.053517
4,1,Zackary,26.631158
...,...,...,...
176,1,Willie,6.653360
177,1,Kimberly,5.944673
178,1,Destiny,5.679667
179,1,Mildred,5.643341


Answer: TODO