# Project 1, Part 3, Meal Related Queries



# Included Modules and Packages

Code cell containing your includes for modules and packages

In [1]:
import math
import numpy as np
import pandas as pd

import psycopg2

# Supporting code

Code cells containing any supporting code, such as connecting to the database, any functions, etc.  Remember you can use any code from the labs.

In [2]:
#
# function to run a select query and return rows in a pandas dataframe
# pandas puts all numeric values from postgres to float
# if it will fit in an integer, change it to integer
#

def my_select_query_pandas(query, rollback_before_flag, rollback_after_flag):
    "function to run a select query and return rows in a pandas dataframe"
    
    if rollback_before_flag:
        connection.rollback()
    
    df = pd.read_sql_query(query, connection)
    
    if rollback_after_flag:
        connection.rollback()
    
    # fix the float columns that really should be integers
    
    for column in df:
    
        if df[column].dtype == "float64":

            fraction_flag = False

            for value in df[column].values:
                
                if not np.isnan(value):
                    if value - math.floor(value) != 0:
                        fraction_flag = True

            if not fraction_flag:
                df[column] = df[column].astype('Int64')
    
    return(df)
    

In [3]:
connection = psycopg2.connect(
    user = "postgres",
    password = "ucb",
    host = "postgres",
    port = "5432",
    database = "postgres"
)

# 1.3.1 How many meals were purchased for all of AGM?

Each record in the line_items table has a quantity which is the number of meals purchased for that line item. 

The sum of quantity in the line_items table will tell you the total meals purchased.

Write 1 and only 1 query.  Note that the query may have as many subqueries, including "with" clauses, as you wish.  

Name column headers exactly as shown in the example below. 

Format data exactly as shown in the example below.

Ensure that when you check this Juptyer Notebook into GitHub that the query results in the Pandas dataframe are clearly visible in GitHub.


The query should return only 1 row into a Pandas dataframe and should look similar to this: 

||total_meals_purchased|
|---|---|
|0|8228284|

In [4]:
rollback_before_flag = True
rollback_after_flag = True

query = """

select 

    /*
    We are calculating the sum of quantity for all line items.
    */
    sum(quantity) as total_meals_purchased

from 
    line_items;

"""

df = my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)
df

Unnamed: 0,total_meals_purchased
0,8228284


# 1.3.2 How many meals were purchased for all of AGM by meal?

Each record in the line_items table has a quantity which is the number of meals purchased for that line item. 

For meal_name, use the description column in the products table.

Sort by meal_name in alphabetical order.

Write 1 and only 1 query.  Note that the query may have as many subqueries, including "with" clauses, as you wish.  

Name column headers exactly as shown in the example below. 

Format data exactly as shown in the example below.

Ensure that when you check this Juptyer Notebook into GitHub that the query results in the Pandas dataframe are clearly visible in GitHub.


The query should return 8 rows into a Pandas dataframe. The first and last rows should look similar to this: 

||meal_name|total_meals_purchased|
|---|---|---|
|0|Brocolli Stir Fry|913984|
|...|...|...|
|7|Tilapia Piccata|687237|

In [5]:
rollback_before_flag = True
rollback_after_flag = True

query = """

SELECT 
    /*
    The product description serves as the meal name.
    */
    products.description AS meal_name,
    
    /*
    Calculate the total number of meals purchased for each meal.
    */
    SUM(line_items.quantity) AS total_meals_purchased

FROM 
    /*
    Start from the products table.
    */
    products

/*
Join the products table with the line_items table based on the 
product_id, which is common in both tables.
*/
JOIN 
    line_items ON products.product_id = line_items.product_id

/*
Group the results by the product description.
*/
GROUP BY 
    products.description

/*
Sort the results by the meal name in alphabetical order.
*/
ORDER BY 
    meal_name;
"""

df = my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)
df

Unnamed: 0,meal_name,total_meals_purchased
0,Brocolli Stir Fry,913984
1,Chicken Salad,228561
2,Curry Chicken,1368884
3,Eggplant Lasagna,1599058
4,Pistachio Salmon,1828778
5,Spinach Orzo,456769
6,Teriyaki Chicken,1145013
7,Tilapia Piccata,687237


# 1.3.3 How many meals were purchased by store and by meal?

For store_name use the store's city.

Each record in the line_items table has a quantity which is the number of meals purchased for that line item.

For meal_name, use the description column in the products table.

Sort by store_name in alphabetical order, then by meal_name in alphabetical order.

Write 1 and only 1 query.  Note that the query may have as many subqueries, including "with" clauses, as you wish.  

Name column headers exactly as shown in the example below. 

Format data exactly as shown in the example below.

Ensure that when you check this Juptyer Notebook into GitHub that the query results in the Pandas dataframe are clearly visible in GitHub.



The query should return 40 rows into a Pandas dataframe. The first and last rows should look similar to this: 

||store_name|meal_name|total_meals_purchased|
|---|---|---|---|
|0|Berkeley|Brocolli Stir Fry|232038|
|...|...|...|...|
|39|Seattle|Tilapia Piccata|153448|

In [6]:
rollback_before_flag = True
rollback_after_flag = True

query = """

/*
start from the stores table and join it with the sales table based on the store_id.
then join with the line_items table based on store_id and sale_id.
finally join with the products table based on product_id.
*/

select 
    /*
    Get the city of the store as the store_name.
    */
    stores.city as store_name,
    
    /*
    Get the description from the products table as the meal_name.
    */
    products.description as meal_name,
    
    /*
    Calculate the total number of meals purchased for each store and meal combination.
    */
    sum(line_items.quantity) as total_meals_purchased

from 
    /*
    Start from the sales table.
    */
    stores

/*
Join the sales table with the stores table based on the store_id, which is common in both tables.
*/
join 
	sales on
	stores.store_id = sales.store_id

/*
Join the line_items table with the sales table based on the store_id and sale_id which is 
the composite key
*/
join 
	line_items on
    sales.store_id = line_items.store_id and 
    sales.sale_id = line_items.sale_id

/*
Join the line_items table with the products table based on the product_id, which is common in both tables.
*/
join 
	products on 
    line_items.product_id = products.product_id

/*
Group the results by the store_name and the meal_name.
*/
group by 
    stores.city, products.description

/*
Sort the results by the store_name in alphabetical order, then by meal_name in alphabetical order.
*/
order by 
    store_name, meal_name;
"""

df = my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)
df

Unnamed: 0,store_name,meal_name,total_meals_purchased
0,Berkeley,Brocolli Stir Fry,232038
1,Berkeley,Chicken Salad,57719
2,Berkeley,Curry Chicken,346508
3,Berkeley,Eggplant Lasagna,405637
4,Berkeley,Pistachio Salmon,464274
5,Berkeley,Spinach Orzo,115469
6,Berkeley,Teriyaki Chicken,290858
7,Berkeley,Tilapia Piccata,174252
8,Dallas,Brocolli Stir Fry,179885
9,Dallas,Chicken Salad,44756


# 1.3.4 How many meals were purchased by month?

Each record in the line_items table has a quantity which is the number of meals purchased for that line item.

Derive the month_number (1 = January) and the month from the sale_date.

Sort by month_number.


Write 1 and only 1 query.  Note that the query may have as many subqueries, including "with" clauses, as you wish.  

Name column headers exactly as shown in the example below. 

Format data exactly as shown in the example below.

Ensure that when you check this Juptyer Notebook into GitHub that the query results in the Pandas dataframe are clearly visible in GitHub.



The query should return 12 rows into a Pandas dataframe. The first and last rows should look similar to this: 

||month_number|month|total_meals_purchased|
|---|---|---|---|
|0|1|January  |650319|
|...|...|...|...|
|11|12|December |695035|

In [None]:
rollback_before_flag = True
rollback_after_flag = True

query = """

Replace with your SQL query

"""

df = my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)
df

# 1.3.5 How many meals were purchased by month and meal?

Each record in the line_items table has a quantity which is the number of meals purchased for that line item.

Derive the month_number (1 = January) and the month from the sale_date.

For meal_name, use the description column in the products table.

Sort by month_number, then by meal_name in alphabetical order.


Write 1 and only 1 query.  Note that the query may have as many subqueries, including "with" clauses, as you wish.  

Name column headers exactly as shown in the example below. 

Format data exactly as shown in the example below.

Ensure that when you check this Juptyer Notebook into GitHub that the query results in the Pandas dataframe are clearly visible in GitHub.


**Note: When a query result has a large number of rows, Pandas will only display the first 5 rows, a row with ellipses, and the last 5 rows. This is ok.**


The query should return 96 rows into a Pandas dataframe. The first and last rows should look similar to this: 

||month_number|month|meal_name|total_meals_purchased|
|---|---|---|---|---|
|0|1|January  |Brocolli Stir Fry|72161|
|...|...|...|...|...|
|95|12|December |Tilapia Piccata|58260|

In [None]:
rollback_before_flag = True
rollback_after_flag = True

query = """

Replace with your SQL query

"""

df = my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)
df

# 1.3.6 How many meals were purchased by day of week and meal?

Each record in the line_items table has a quantity which is the number of meals purchased for that line item.

Derive the dow (0 = Sunday) and the day_of_week from the sale_date.

For meal_name, use the description column in the products table.

Sort by dow, then by meal_name in alphabetical order

Write 1 and only 1 query.  Note that the query may have as many subqueries, including "with" clauses, as you wish.  

Name column headers exactly as shown in the example below. 

Format data exactly as shown in the example below.

Ensure that when you check this Juptyer Notebook into GitHub that the query results in the Pandas dataframe are clearly visible in GitHub.


The query should return 56 rows into a Pandas dataframe. The first and last rows should look similar to this: 

||dow|day_of_week|meal_name|total_meals_purchased|
|---|---|---|---|---|
|0|0|Sunday   |Brocolli Stir Fry|172250|
|...|...|...|...|...|
|55|6|Saturday |Tilapia Piccata|135327|

In [None]:
rollback_before_flag = True
rollback_after_flag = True

query = """

Replace with your SQL query

"""

df = my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)
df