# Zach Gulde's sqlite3/pandas Exercises

# Purpose

Reviewing tutorial prepared by Zach Gulde.

## Getting the Data Into Pandas

The `sqlite3` module is part of the python standard library, so you shouldn't
need to install anything.

```python
import pandas as pd
import sqlite3

connection = sqlite3.connect('pizza.sqlite') # or specify the path to the db file

pizzas = pd.read_sql('SELECT * FROM pizzas', connection)
```

In [1]:
import pandas as pd
import sqlite3

connection = sqlite3.connect('pizza.sqlite') # or specify the path to the db file

## Exploration

In [2]:
pizza_tables = [
    'crust_types', 
    'modifiers', 
    'pizza_modifiers', 
    'pizza_topppings', 
    'pizza_toppings', 
    'pizzas', 
    'sizes', 
    'toppings'
]

In [3]:
# show_df command displays info and head functions
def show_df(df, heads=5):
    if heads > len(df):
        heads = len(df)
    print(df.info())
    display(df.head(heads))

In [4]:
# loop through to get create commands
for t in pizza_tables:
    print(t, '= pd.read_sql(\'SELECT * FROM', t, '\', connection)')

crust_types = pd.read_sql('SELECT * FROM crust_types ', connection)
modifiers = pd.read_sql('SELECT * FROM modifiers ', connection)
pizza_modifiers = pd.read_sql('SELECT * FROM pizza_modifiers ', connection)
pizza_topppings = pd.read_sql('SELECT * FROM pizza_topppings ', connection)
pizza_toppings = pd.read_sql('SELECT * FROM pizza_toppings ', connection)
pizzas = pd.read_sql('SELECT * FROM pizzas ', connection)
sizes = pd.read_sql('SELECT * FROM sizes ', connection)
toppings = pd.read_sql('SELECT * FROM toppings ', connection)


In [5]:
crust_types = pd.read_sql('SELECT * FROM crust_types ', connection)
modifiers = pd.read_sql('SELECT * FROM modifiers ', connection)
pizza_modifiers = pd.read_sql('SELECT * FROM pizza_modifiers ', connection)
pizza_topppings = pd.read_sql('SELECT * FROM pizza_topppings ', connection)
pizza_toppings = pd.read_sql('SELECT * FROM pizza_toppings ', connection)
pizzas = pd.read_sql('SELECT * FROM pizzas ', connection)
sizes = pd.read_sql('SELECT * FROM sizes ', connection)
toppings = pd.read_sql('SELECT * FROM toppings ', connection)

In [6]:
# loop throught to get show_df commands
for t in pizza_tables:
    print('show_df('+t+')')

show_df(crust_types)
show_df(modifiers)
show_df(pizza_modifiers)
show_df(pizza_topppings)
show_df(pizza_toppings)
show_df(pizzas)
show_df(sizes)
show_df(toppings)


### crust_types

In [7]:
show_df(crust_types)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   crust_type_id    2 non-null      int64 
 1   crust_type_name  2 non-null      object
dtypes: int64(1), object(1)
memory usage: 160.0+ bytes
None


Unnamed: 0,crust_type_id,crust_type_name
0,1,hand-tossed
1,2,thin and crispy


In [8]:
## strictly a lookup table

### modifiers

In [9]:
show_df(modifiers)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   modifier_id     3 non-null      int64  
 1   modifier_name   3 non-null      object 
 2   modifier_price  3 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes
None


Unnamed: 0,modifier_id,modifier_name,modifier_price
0,1,extra cheese,1.99
1,2,well done,0.0
2,3,no cheese,0.0


In [10]:
## strictly a lookup table

### pizza_modifiers

In [11]:
show_df(pizza_modifiers)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6645 entries, 0 to 6644
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   pizza_id     6645 non-null   int64
 1   modifier_id  6645 non-null   int64
dtypes: int64(2)
memory usage: 104.0 KB
None


Unnamed: 0,pizza_id,modifier_id
0,1002,3
1,1008,2
2,1011,1
3,1013,1
4,1020,1


In [12]:
# Key Statistics
num_pizzas_modified = len(pizza_modifiers.pizza_id.value_counts())
pizza_modified_min = pizza_modifiers.pizza_id.min()
pizza_modified_max = pizza_modifiers.pizza_id.max()



In [13]:
# check if modifier is 1:M with pizzas
print(pizza_modifiers.modifier_id.value_counts().head())

1    2251
3    2210
2    2184
Name: modifier_id, dtype: int64


In [14]:
# count of unique pizzas
print(num_pizzas_modified)

6645


### pizza_toppings

In [15]:
show_df(pizza_toppings)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 47062 entries, 0 to 47061
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   pizza_id        47062 non-null  int64 
 1   topping_id      47062 non-null  int64 
 2   topping_amount  47062 non-null  object
dtypes: int64(2), object(1)
memory usage: 1.1+ MB
None


Unnamed: 0,pizza_id,topping_id,topping_amount
0,1001,5,extra
1,1001,3,regular
2,1002,8,regular
3,1002,9,extra
4,1002,8,regular


In [16]:
# Key Statistics
num_pizzas_topped = len(pizza_toppings.pizza_id.value_counts())
pizza_topped_min = pizza_toppings.pizza_id.min()
pizza_topped_max = pizza_toppings.pizza_id.max()

In [17]:
# count of unique pizzas
print(num_pizzas_topped)

17225


In [18]:
# pizzas with most toppings
pizza_toppings.pizza_id.value_counts().head()

17076    12
12052    12
6427     11
17744    11
20317    10
Name: pizza_id, dtype: int64

In [19]:
# identify topping amounts
pizza_toppings.topping_amount.value_counts().head()

regular    23300
extra      14257
double      7124
light       2381
Name: topping_amount, dtype: int64

In [20]:
# frequency by topping
pizza_toppings.topping_id.value_counts()

1    5335
7    5316
4    5284
6    5279
8    5244
2    5186
3    5164
9    5149
5    5105
Name: topping_id, dtype: int64

### pizza_topppings

In [21]:
show_df(pizza_topppings)

<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   pizza_id    0 non-null      object
 1   topping_id  0 non-null      object
 2   amount      0 non-null      object
dtypes: object(3)
memory usage: 0.0+ bytes
None


Unnamed: 0,pizza_id,topping_id,amount


In [22]:
## this table appears to be in by mistake**

### pizzas

In [23]:
show_df(pizzas)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19934 entries, 0 to 19933
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype
---  ------         --------------  -----
 0   pizza_id       19934 non-null  int64
 1   order_id       19934 non-null  int64
 2   crust_type_id  19934 non-null  int64
 3   size_id        19934 non-null  int64
dtypes: int64(4)
memory usage: 623.1 KB
None


Unnamed: 0,pizza_id,order_id,crust_type_id,size_id
0,1001,1001,2,1
1,1002,1001,2,2
2,1003,1002,1,4
3,1004,1002,1,2
4,1005,1002,2,4


In [24]:
# Key Statistics
num_orders = len(pizzas.order_id.value_counts())
order_min = pizzas.order_id.min()
order_max = pizzas.order_id.max()
num_pizzas = len(pizzas.pizza_id.value_counts())
pizza_min = pizzas.pizza_id.min()
pizza_max = pizzas.pizza_id.max()

In [25]:
# count of unique orders
print(num_orders)
print(f'{order_min} - {order_max}')

10000
1001 - 11000


In [26]:
# count of unique pizzas
print(num_pizzas)
print(f'{pizza_min} - {pizza_max}')

19934
1001 - 20934


In [27]:
# most pizzas on one order
pizzas.order_id.value_counts().head()

6427    8
6149    7
1781    7
6788    7
2106    7
Name: order_id, dtype: int64

In [28]:
# most orders on one pizza
pizzas.pizza_id.value_counts().head()

2047     1
4791     1
17069    1
10928    1
8881     1
Name: pizza_id, dtype: int64

### sizes

In [29]:
show_df(sizes)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   size_id     4 non-null      int64  
 1   size_name   4 non-null      object 
 2   size_price  4 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 224.0+ bytes
None


Unnamed: 0,size_id,size_name,size_price
0,1,small,8.99
1,2,medium,10.99
2,3,large,12.99
3,4,x-large,14.99


In [30]:
## strictly a lookup table

### toppings

In [31]:
show_df(toppings, 10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   topping_id     9 non-null      int64  
 1   topping_name   9 non-null      object 
 2   topping_price  9 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 344.0+ bytes
None


Unnamed: 0,topping_id,topping_name,topping_price
0,1,pepperoni,0.99
1,2,sausage,0.99
2,3,bacon,0.99
3,4,canadian bacon,0.99
4,5,onion,0.49
5,6,peppers,0.49
6,7,olives,0.49
7,8,pineapple,0.79
8,9,hot sauce,0.19


In [32]:
## strictly a lookup table

## Questions

### Q.1
**What information is stored in the `toppings` table? How does this table relate to the `pizzas` table?**

In [33]:
print(f'{pizzas.pizza_id.min()} - {pizzas.pizza_id.max()}')

1001 - 20934


In [34]:
print(len(pizza_toppings.pizza_id.value_counts()))
print(f'{pizza_toppings.pizza_id.min()} - {pizza_toppings.pizza_id.max()}')

17225
1001 - 20934


Topping id, name, and price. 

`pizzas.pizza_id` **1:0-M** `pizza_toppings.pizza_id`

`pizza_toppings.topping_id` **M:1** `toppings.topping_id`

### Q.2
**What information is stored in the `modifiers` table? How does this table relate to the `pizzas` table?**

In [35]:
print(f'{pizzas.pizza_id.min()} - {pizzas.pizza_id.max()}')

1001 - 20934


In [36]:
print(len(pizza_modifiers.pizza_id.value_counts()))
print(f'{pizza_modifiers.pizza_id.min()} - {pizza_modifiers.pizza_id.max()}')

6645
1002 - 20932


Modifier id, name, and price. 

`pizzas.pizza_id` **1:0-1** `pizza_modifiers.pizza_id`

`pizza_modifiers.modifier_id` **M:1** `modifiers.modifier_id`

### Q.3
**How are the `pizzas` and `sizes` tables related?**

`pizzas.size_id` **M:1** `sizes.size_id`

### Q.4
**What other tables are in the database?**

`pizzas.size_id` **M:1** `crusts.crust_id`

`pizza_topppings` exists, but not used

### Q.5
**How many unique toppings are there?**

In [37]:
print(toppings.topping_id.count())

9


### Q.6
**How many unique orders are in this dataset?**

In [38]:
print(len(pizzas.order_id.value_counts()))

10000


### Q.7
**Which size of pizza is sold the most?**

### Q.8
**How many pizzas have been sold in total?**

### Q.9
**What is the most common size of pizza ordered?**

### Q.10
**What is the average number of pizzas per order?**

### Q.11
**Find the total price for each order. The total price is the sum of:**

- **The price based on pizza size**
- **Any modifiers that need to be charged for**
- **The sum of the topping prices**

**Topping price is affected by the amount of the topping specified. A light amount is half of the regular price. An extra amount is 1.5 times the regular price, and double of the topping is double the price.**

### Q.12
**What is the average price of pizzas that have no cheese?**

### Q.13
**What is the most common size for pizzas that have extra cheese?**

### Q.14
**What is the most common topping for pizzas that are well done?**

### Q.15
**How many pizzas are only cheese (i.e. have no toppings)?**

### Q.16
**How many orders consist of pizza(s) that are only cheese? What is the average price of these orders? The most common pizza size?**

### Q.17
**How may large pizzas have olives on them?**

### Q.18
**What is the average number of toppings per pizza?**

### Q.19
**What is the average number of pizzas per order?**

### Q.20
**What is the average pizza price?**

### Q.21
**What is the average order total?**

### Q.22
**What is the average number of items per order?**

### Q.23
**What is the average number of toppings per pizza for each size of pizza?**

### Q.24
**What is the average order total for orders that contain more than 1 pizza?**

### Q.25
**What is the most common pizza size for orders that contain only a single pizza?**

### Q.26
**How many orders consist of 3+ pizzas? What is the average number of toppings for these orders?**

### Q.27
**What is the most common topping on large and extra large pizzas?**

### Q.28
**What is the most common topping for orders that consist of 2 pizzas?**

### Q.29
**Which size of pizza most frequently has modifiers?**

### Q.30
**What percentage of pizzas with hot sauce have extra cheese?**

### Q.31
**What is the average order price for orders that have at least 1 pizza with pineapple?**