# Zach Gulde's sqlite3/pandas Exercises

# Purpose

Reviewing tutorial prepared by Zach Gulde.

## Environment Setup

In [1]:
import pandas as pd
import sqlite3


In [2]:
# show_df command displays info and head functions
def show_df(df, heads=5):
    if heads > len(df):
        heads = len(df)
    print(df.info())
    display(df.head(heads))

## Getting the Data Into Pandas

The `sqlite3` module is part of the python standard library, so you shouldn't
need to install anything.

```python
import pandas as pd
import sqlite3

connection = sqlite3.connect('pizza.sqlite') # or specify the path to the db file

pizzas = pd.read_sql('SELECT * FROM pizzas', connection)
```

In [3]:
connection = sqlite3.connect('pizza.sqlite') # or specify the path to the db file

## Exploration

Get structure of database

In [4]:
pizza_master = pd.read_sql('SELECT * FROM sqlite_master', connection)
display(pizza_master)

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,crust_types,crust_types,2,CREATE TABLE crust_types(\n crust_type_id I...
1,table,sizes,sizes,3,CREATE TABLE sizes(\n size_id INTEGER PRIMA...
2,table,toppings,toppings,4,CREATE TABLE toppings(\n topping_id INTEGER...
3,table,modifiers,modifiers,5,CREATE TABLE modifiers(\n modifier_id INTEG...
4,table,pizzas,pizzas,6,CREATE TABLE pizzas(\n pizza_id INTEGER PRI...
5,table,pizza_topppings,pizza_topppings,7,CREATE TABLE pizza_topppings(\n pizza_id IN...
6,index,sqlite_autoindex_pizza_topppings_1,pizza_topppings,8,
7,table,pizza_modifiers,pizza_modifiers,9,CREATE TABLE pizza_modifiers(\n pizza_id IN...
8,index,sqlite_autoindex_pizza_modifiers_1,pizza_modifiers,10,
9,table,pizza_toppings,pizza_toppings,77,"CREATE TABLE ""pizza_toppings"" (\n""pizza_id"" IN..."


get list of tables, use loops to spit out code for creating and displaying DataFrames from sqlite3

In [5]:
pizza_tables = pizza_master[pizza_master.type == 'table'].name.to_list()

In [6]:
# loop through to get create commands
for t in pizza_tables:
    print(t, '= pd.read_sql(\'SELECT * FROM', t, '\', connection)')

crust_types = pd.read_sql('SELECT * FROM crust_types ', connection)
sizes = pd.read_sql('SELECT * FROM sizes ', connection)
toppings = pd.read_sql('SELECT * FROM toppings ', connection)
modifiers = pd.read_sql('SELECT * FROM modifiers ', connection)
pizzas = pd.read_sql('SELECT * FROM pizzas ', connection)
pizza_topppings = pd.read_sql('SELECT * FROM pizza_topppings ', connection)
pizza_modifiers = pd.read_sql('SELECT * FROM pizza_modifiers ', connection)
pizza_toppings = pd.read_sql('SELECT * FROM pizza_toppings ', connection)


In [7]:
crust_types = pd.read_sql('SELECT * FROM crust_types ', connection)
sizes = pd.read_sql('SELECT * FROM sizes ', connection)
toppings = pd.read_sql('SELECT * FROM toppings ', connection)
modifiers = pd.read_sql('SELECT * FROM modifiers ', connection)
pizzas = pd.read_sql('SELECT * FROM pizzas ', connection)
pizza_toppings = pd.read_sql('SELECT * FROM pizza_toppings ', connection)
pizza_modifiers = pd.read_sql('SELECT * FROM pizza_modifiers ', connection)

In [8]:
# loop throught to get show_df commands
for t in pizza_tables:
    print('show_df('+t+')')

show_df(crust_types)
show_df(sizes)
show_df(toppings)
show_df(modifiers)
show_df(pizzas)
show_df(pizza_topppings)
show_df(pizza_modifiers)
show_df(pizza_toppings)


### crust_types

In [9]:
show_df(crust_types)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   crust_type_id    2 non-null      int64 
 1   crust_type_name  2 non-null      object
dtypes: int64(1), object(1)
memory usage: 160.0+ bytes
None


Unnamed: 0,crust_type_id,crust_type_name
0,1,hand-tossed
1,2,thin and crispy


***strictly a lookup table***

### sizes

In [10]:
show_df(sizes)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   size_id     4 non-null      int64  
 1   size_name   4 non-null      object 
 2   size_price  4 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 224.0+ bytes
None


Unnamed: 0,size_id,size_name,size_price
0,1,small,8.99
1,2,medium,10.99
2,3,large,12.99
3,4,x-large,14.99


***strictly a lookup table***

### toppings

In [11]:
show_df(toppings)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   topping_id     9 non-null      int64  
 1   topping_name   9 non-null      object 
 2   topping_price  9 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 344.0+ bytes
None


Unnamed: 0,topping_id,topping_name,topping_price
0,1,pepperoni,0.99
1,2,sausage,0.99
2,3,bacon,0.99
3,4,canadian bacon,0.99
4,5,onion,0.49


In [12]:
## strictly a lookup table

### modifiers

In [13]:
show_df(modifiers)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   modifier_id     3 non-null      int64  
 1   modifier_name   3 non-null      object 
 2   modifier_price  3 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes
None


Unnamed: 0,modifier_id,modifier_name,modifier_price
0,1,extra cheese,1.99
1,2,well done,0.0
2,3,no cheese,0.0


In [14]:
## strictly a lookup table

### pizzas

In [15]:
show_df(pizzas)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19934 entries, 0 to 19933
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype
---  ------         --------------  -----
 0   pizza_id       19934 non-null  int64
 1   order_id       19934 non-null  int64
 2   crust_type_id  19934 non-null  int64
 3   size_id        19934 non-null  int64
dtypes: int64(4)
memory usage: 623.1 KB
None


Unnamed: 0,pizza_id,order_id,crust_type_id,size_id
0,1001,1001,2,1
1,1002,1001,2,2
2,1003,1002,1,4
3,1004,1002,1,2
4,1005,1002,2,4


In [16]:
# Key Statistics

# count of unique orders
orders_count = pizzas.order_id.nunique()
order_min = pizzas.order_id.min()
order_max = pizzas.order_id.max()
print(f'{orders_count} orders in the range {order_min} - {order_max}')

# count of unique pizzas
pizzas_count = pizzas.pizza_id.nunique()
pizza_min = pizzas.pizza_id.min()
pizza_max = pizzas.pizza_id.max()
print(f'{pizzas_count} pizzas in the range {pizza_min} - {pizza_max}')

10000 orders in the range 1001 - 11000
19934 pizzas in the range 1001 - 20934


In [17]:
# most pizzas on one order
order_pizzas = pizzas.order_id.value_counts().max()

In [18]:
# most orders on one pizza
pizzas.pizza_id.value_counts().head()

2047     1
4791     1
17069    1
10928    1
8881     1
Name: pizza_id, dtype: int64

### pizza_toppings

In [19]:
show_df(pizza_toppings)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 47062 entries, 0 to 47061
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   pizza_id        47062 non-null  int64 
 1   topping_id      47062 non-null  int64 
 2   topping_amount  47062 non-null  object
dtypes: int64(2), object(1)
memory usage: 1.1+ MB
None


Unnamed: 0,pizza_id,topping_id,topping_amount
0,1001,5,extra
1,1001,3,regular
2,1002,8,regular
3,1002,9,extra
4,1002,8,regular


In [20]:
# Key Statistics
pizzas_topped_count = pizza_toppings.pizza_id.nunique()
pizza_topped_min = pizza_toppings.pizza_id.min()
pizza_topped_max = pizza_toppings.pizza_id.max()
pizza_toppings_count = pizza_toppings.pizza_id.value_counts().value_counts()
pizza_toppings_max = pizza_toppings.pizza_id.value_counts().max()
pizza_topping_freq = pizza_toppings.topping_id.value_counts()
pizza_topping_amount_freq = pizza_toppings.topping_amount.value_counts()

print(f'{pizzas_topped_count} pizzas topped')
print(f'Is pizza_id unique? {pizzas_topped_count == len(pizza_toppings)}')
print(f'Toppings available: {len(pizza_topping_freq)}, Max toppings on one pizza: {pizza_toppings_max}')
print(f'Toppings unused: {len(pizza_topping_freq) - pizza_toppings_max}')


17225 pizzas topped
Is pizza_id unique? False
Toppings available: 9, Max toppings on one pizza: 12
Toppings unused: -3


Next get an idea of scope ... value_count should be 1 on all of these.

In [23]:
pizza_toppings[pizza_toppings.pizza_id == 20929]

Unnamed: 0,pizza_id,topping_id,topping_amount
47045,20929,8,regular
47046,20929,9,extra
47047,20929,9,regular
47048,20929,7,regular
47049,20929,7,regular
47050,20929,9,extra
47051,20929,2,extra
47052,20929,7,regular
47053,20929,7,regular
47054,20929,7,regular


20929 has 5 of regular topping 7 and 3 of topping 9 - 1 regular and 2 extras.

In [24]:
# topping count frequency
pizza_toppings_count

1     4988
2     4143
3     3320
4     2154
5     1338
6      734
7      351
8      137
9       41
10      15
12       2
11       2
Name: pizza_id, dtype: int64

In [25]:
# identify topping amounts
pizza_toppings.topping_amount.value_counts().head()

regular    23300
extra      14257
double      7124
light       2381
Name: topping_amount, dtype: int64

In [26]:
# frequency by topping
pizza_toppings.topping_id.value_counts().sort_index()

1    5335
2    5186
3    5164
4    5284
5    5105
6    5279
7    5316
8    5244
9    5149
Name: topping_id, dtype: int64

### pizza_modifiers

In [27]:
show_df(pizza_modifiers)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6645 entries, 0 to 6644
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   pizza_id     6645 non-null   int64
 1   modifier_id  6645 non-null   int64
dtypes: int64(2)
memory usage: 104.0 KB
None


Unnamed: 0,pizza_id,modifier_id
0,1002,3
1,1008,2
2,1011,1
3,1013,1
4,1020,1


In [28]:
# Key Statistics
pizzas_modified_count = pizza_modifiers.pizza_id.nunique()
pizza_modified_min = pizza_modifiers.pizza_id.min()
pizza_modified_max = pizza_modifiers.pizza_id.max()

print(f'{pizzas_modified_count} pizzas modified')
print(f'Is pizza_id unique? {pizzas_modified_count == len(pizza_modifiers)}')

6645 pizzas modified
Is pizza_id unique? True


In [29]:
# modifier frequency
print(pizza_modifiers.modifier_id.value_counts().head())

1    2251
3    2210
2    2184
Name: modifier_id, dtype: int64


In [30]:
pizza_modifiers.modifier_id.nunique()

3

## Questions

### Q.1
**What information is stored in the `toppings` table? How does this table relate to the `pizzas` table?**

In [31]:
print(f'{pizzas_count} pizzas numbered {pizza_min} - {pizza_max}')

19934 pizzas numbered 1001 - 20934


In [32]:
print(len(pizza_toppings.pizza_id.value_counts()))
# print(f'{pizzas_modified_count} pizzas modified numbered {pizzas_topped} - {pizza_toppings.pizza_id.max()}')

17225


Topping id, name, and price. 

`pizzas.pizza_id` **1:0-M** `pizza_toppings.pizza_id`

`pizza_toppings.topping_id` **M:1** `toppings.topping_id`

### Q.2
**What information is stored in the `modifiers` table? How does this table relate to the `pizzas` table?**

Modifier id, name, and price. 

`pizzas.pizza_id` **1:0-1** `pizza_modifiers.pizza_id`

`pizza_modifiers.modifier_id` **M:1** `modifiers.modifier_id`

### Q.3
**How are the `pizzas` and `sizes` tables related?**

`pizzas.size_id` **M:1** `sizes.size_id`

### Q.4
**What other tables are in the database?**

`pizzas.size_id` **M:1** `crusts.crust_id`

`pizza_topppings` exists, but not used

### Q.5
**How many unique toppings are there?**

In [33]:
print(toppings.topping_id.count())

9


### Q.6
**How many unique orders are in this dataset?**

In [34]:
print(len(pizzas.order_id.value_counts()))

10000


### Q.7
**Which size of pizza is sold the most?**

### Q.8
**How many pizzas have been sold in total?**

### Q.9
**What is the most common size of pizza ordered?**

### Q.10
**What is the average number of pizzas per order?**

### Q.11
**Find the total price for each order. The total price is the sum of:**

- **The price based on pizza size**
- **Any modifiers that need to be charged for**
- **The sum of the topping prices**

**Topping price is affected by the amount of the topping specified. A light amount is half of the regular price. An extra amount is 1.5 times the regular price, and double of the topping is double the price.**

### Q.12
**What is the average price of pizzas that have no cheese?**

### Q.13
**What is the most common size for pizzas that have extra cheese?**

### Q.14
**What is the most common topping for pizzas that are well done?**

### Q.15
**How many pizzas are only cheese (i.e. have no toppings)?**

### Q.16
**How many orders consist of pizza(s) that are only cheese? What is the average price of these orders? The most common pizza size?**

### Q.17
**How may large pizzas have olives on them?**

### Q.18
**What is the average number of toppings per pizza?**

### Q.19
**What is the average number of pizzas per order?**

### Q.20
**What is the average pizza price?**

### Q.21
**What is the average order total?**

### Q.22
**What is the average number of items per order?**

### Q.23
**What is the average number of toppings per pizza for each size of pizza?**

### Q.24
**What is the average order total for orders that contain more than 1 pizza?**

### Q.25
**What is the most common pizza size for orders that contain only a single pizza?**

### Q.26
**How many orders consist of 3+ pizzas? What is the average number of toppings for these orders?**

### Q.27
**What is the most common topping on large and extra large pizzas?**

### Q.28
**What is the most common topping for orders that consist of 2 pizzas?**

### Q.29
**Which size of pizza most frequently has modifiers?**

### Q.30
**What percentage of pizzas with hot sauce have extra cheese?**

### Q.31
**What is the average order price for orders that have at least 1 pizza with pineapple?**