# E - extract

The first step in the extract phase of this ETL is to import the needed frameworks to run the script in this jupyter notebook. Here we import pandas, sqlalchemy, numpy, and a config file. The below cell contains all of these imports.

In [3]:
import pandas as pd
from sqlalchemy import create_engine
import numpy as np
#from config import username, password

### Extract CSV Files

Here we will load the 5 csv files that are located in the "Resources" folder of this repo. To do this step we will set each files pathway to its' own variable.

In [4]:
mcd_file = "Resources/mcd_menu.csv"
bk_mcd_file = "Resources/bk_mcd_menu.csv"
starbucks_food_file = "Resources/starbucks_food.csv"
starbucks_drink_file = "Resources/starbucks_drink_menu.csv"
subway_file = "Resources/subway_menu.csv"

Next we will use the pandas ".read_csv" functionality to read each of our csv's into a dataframe. This allows us to prepare for the transform step as we can now see the data of each csv cleanly presented through the power of the jupyter notebook.

In [5]:
mcd_df = pd.read_csv(mcd_file)
bk_mcd_df = pd.read_csv(bk_mcd_file, delimiter=';')
starbucks_food_df = pd.read_csv(starbucks_food_file)
starbucks_drink_df = pd.read_csv(starbucks_drink_file)
subway_df = pd.read_csv(subway_file)

# T - transform

For each of the 5 data frames created we will first take a quick look using the ".head()" functionality. Now we can assess the current state of the dataframe and see what information needs to be transformed in order to get the data frames to all be congruent to the ERD diagram versions we had envisioned.

This most often included: dropping uneeded rows, renaming rows to be lowercase or fit our naming conventions, adding our "food_class" column and assigning the correct number designator for dessert (1) drink (2) or food (3).

Finally we would display our fully transformed data frame and move onto the next data frame to repeat the process as needed.

### First Data Frame Transformation: Subway_df

In [6]:
subway_df.head()

Unnamed: 0.1,Unnamed: 0,Category,Serving Size (g),Calories,Total Fat (g),Saturated Fat (g),Trans Fat (g),Cholesterol (mg),Sodium (mg),Carbohydrates (g),Dietary Fiber (g),Sugars (g),Protein (g),Vitamin A % DV,Vitamin C % DV,Calcium % DV,Iron % DV
0,BBQ Rib,Sandwich,208,580,31.0,10.0,0.0,60,1260,54,3,18,21,8,4,4,20
1,Black Forest Ham,Sandwich,219,260,4.0,1.5,0.0,30,720,42,5,8,18,30,15,4,15
2,Chicken & Bacon Ranch Melt,Sandwich,284,530,26.0,10.0,0.5,100,1100,41,3,6,36,40,25,20,20
3,Chicken Mango Curry,Sandwich,234,330,7.0,1.5,0.0,50,840,43,3,9,24,15,20,2,25
4,Chicken Tikka,Sandwich,205,290,5.0,1.0,0.0,50,720,39,2,6,23,10,10,0,25


In [8]:
subway_df["Category"].unique()

array(['Sandwich', 'Salad', 'Breakfast', 'Extra', 'Wrap', 'Bread',
       'Cheese', 'Extras', 'Sauces', 'Veggies', 'Protein', 'Seasonings'],
      dtype=object)

As we can see with the ".head()" there are a number of columns in this data frame that we do not need. There are also columns that we do need but that do not fit our naming conventions.

The ".unique()" on the column "Category" also reveals a category in subways menu that is called "Extra" by viewing the whole database we were able to determine that everything which was classified under the "Extra" fell most in-line with a dessert classification.

We will grab the columns that we want and copy them to a "transformed" data frame. Then we will rename the columns of our new data frame so that they are aligned with our desired naming conventions.

Next we created a condition in which the category "Extra" would recieve the value of 1 signifying it as a dessert. This was accomplished by setting the variables accordingly and using the numpy functionality "select"

All other rows in this new column "food_class" were given the designation of 3 for a food item.

The index was set to "id" and the transformed data frame can now be displayed in its full glory with one more ".head()"

In [9]:
# extract columns desired for database
subway_transformed = subway_df[["Category", "Unnamed: 0", "Saturated Fat (g)", "Calories"]].copy()

# rename columns
subway_transformed.rename(columns={"Category": "category", 
                                   "Unnamed: 0": "item",
                                   "Saturated Fat (g)": "saturated_fat",
                                   "Calories": "calories"}, inplace=True)

# add "food_class" column
# recognizing that category == Extra are desserts in the dataset
conditions = [(subway_transformed["category"] == "Extra")]

values = [1]

subway_transformed["food_class"] = np.select(conditions, values)

subway_transformed["food_class"].replace(0,3, inplace=True)

# create "id" column
subway_transformed["id"] = subway_transformed.index

subway_transformed.set_index("id", inplace=True)

# display dataframe
subway_transformed.head()

Unnamed: 0_level_0,category,item,saturated_fat,calories,food_class
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,Sandwich,BBQ Rib,10.0,580,3
1,Sandwich,Black Forest Ham,1.5,260,3
2,Sandwich,Chicken & Bacon Ranch Melt,10.0,530,3
3,Sandwich,Chicken Mango Curry,1.5,330,3
4,Sandwich,Chicken Tikka,1.0,290,3


### Second Data Frame Transformation: mcd_df

This second data frame transformation went extremely similar to the first. The df was viewed using a ".head()" so that we could see which columns and other data would need to be transformed.

In [10]:
mcd_df.head()

Unnamed: 0,Category,Item,Serving Size,Calories,Calories from Fat,Total Fat,Total Fat (% Daily Value),Saturated Fat,Saturated Fat (% Daily Value),Trans Fat,...,Carbohydrates,Carbohydrates (% Daily Value),Dietary Fiber,Dietary Fiber (% Daily Value),Sugars,Protein,Vitamin A (% Daily Value),Vitamin C (% Daily Value),Calcium (% Daily Value),Iron (% Daily Value)
0,Breakfast,Egg McMuffin,4.8 oz (136 g),300,120,13.0,20,5.0,25,0.0,...,31,10,4,17,3,17,10,0,25,15
1,Breakfast,Egg White Delight,4.8 oz (135 g),250,70,8.0,12,3.0,15,0.0,...,30,10,4,17,3,18,6,0,25,8
2,Breakfast,Sausage McMuffin,3.9 oz (111 g),370,200,23.0,35,8.0,42,0.0,...,29,10,4,17,2,14,8,0,25,10
3,Breakfast,Sausage McMuffin with Egg,5.7 oz (161 g),450,250,28.0,43,10.0,52,0.0,...,30,10,4,17,2,21,15,0,30,15
4,Breakfast,Sausage McMuffin with Egg Whites,5.7 oz (161 g),400,210,23.0,35,8.0,42,0.0,...,30,10,4,17,2,21,6,0,25,10


We again grab the name of the columns that fit our needs and save them as a list to a variable called "mcd_cols"

This is copied into a new data frame and the column names are updated.

Things are looking good but we are not there just yet...

In [11]:
# Create a filtered dataframe from specific columns
mcd_cols = ["Category", "Item", "Saturated Fat", "Calories"]
mcd_transformed= mcd_df[mcd_cols].copy()

# Rename the column headers for consistency
mcd_transformed = mcd_transformed.rename(columns={"Category": "category",
                                                    "Item": "item",
                                                    "Saturated Fat": "saturated_fat",
                                                    "Calories": "calories"})

mcd_transformed.head()

Unnamed: 0,category,item,saturated_fat,calories
0,Breakfast,Egg McMuffin,5.0,300
1,Breakfast,Egg White Delight,3.0,250
2,Breakfast,Sausage McMuffin,8.0,370
3,Breakfast,Sausage McMuffin with Egg,10.0,450
4,Breakfast,Sausage McMuffin with Egg Whites,8.0,400


Taking another look at the "category" column we can see all of the different classes and make decisions on which of our three "food_class" numbers should go to each.

In [12]:
# Find full list of categories
mcd_transformed['category'].unique()

array(['Breakfast', 'Beef & Pork', 'Chicken & Fish', 'Salads',
       'Snacks & Sides', 'Desserts', 'Beverages', 'Coffee & Tea',
       'Smoothies & Shakes'], dtype=object)

Just like we did for the subway example, we assign each category to a conditions variable, which is a list. Then using numpy we can assign our values for the "food_class" column with the "select" functionality.

Now the McDonalds df is looking pretty good!!

In [13]:
# manually assign each category to a food_class 
conditions = [(mcd_transformed['category'] == 'Breakfast') | (mcd_transformed['category'] == 'Beef & Pork') \
                  | (mcd_transformed['category'] == 'Chicken & Fish') | (mcd_transformed['category'] == 'Salads') \
                  | (mcd_transformed['category'] == 'Snacks & Sides'),
              (mcd_transformed['category'] == 'Beverages') | (mcd_transformed['category'] == 'Smoothies & Shakes') \
                  | (mcd_transformed['category'] == 'Coffee & Tea'),
              (mcd_transformed['category'] == 'Desserts'), 
             ]

values = [3, 2, 1]

mcd_transformed['food_class'] = np.select(conditions, values)

mcd_transformed.head()

Unnamed: 0,category,item,saturated_fat,calories,food_class
0,Breakfast,Egg McMuffin,5.0,300,3
1,Breakfast,Egg White Delight,3.0,250,3
2,Breakfast,Sausage McMuffin,8.0,370,3
3,Breakfast,Sausage McMuffin with Egg,10.0,450,3
4,Breakfast,Sausage McMuffin with Egg Whites,8.0,400,3


### Third Data Frame Transformation: bk_mcd_df

In [14]:
bk_mcd_df.head()

Unnamed: 0,Chain,Item,Type,Serving Size (g),Calories,Calories from fat,Total Fat (g),Saturated Fat (g),Trans Fat (g),Chol (mg),Sodium (mg),Total Carb (g),Dietary Fiber (g),Total Sugar (g),Protein (g)
0,Burger King,Whopper Sandwich,Whopper Sandwiches,270,660,360,40,12,15,90,980,49,2,11,28
1,Burger King,Whopper Sandwich with Cheese,Whopper Sandwiches,292,740,420,46,16,2,115,1340,50,2,11,32
2,Burger King,Bacon & Cheese Whopper Sandwich,Whopper Sandwiches,303,790,460,51,17,2,125,1560,50,2,11,35
3,Burger King,Double Whopper Sandwich,Whopper Sandwiches,354,900,520,58,20,3,175,1050,49,2,11,48
4,Burger King,Double Whopper Sandwich with Cheese,Whopper Sandwiches,377,980,580,64,24,3,195,1410,50,2,11,52


When we look at this data frame with the jupyter notebook, we see something interesting compared to the first two examples. This data frame is not just for a single resturant, but for both McDonald's and Burger King combined!

We won't deal with that yet but it will force a new bit of code soon.

For now we grab our columns and rename them just as we had done in the previous two examples.

In [15]:
# Create a filtered dataframe from specific columns
bk_mcd_cols = ["Chain", "Type", "Item", "Saturated Fat (g)", "Calories"]
bk_mcd_transformed= bk_mcd_df[bk_mcd_cols].copy()

# Rename the column headers
bk_mcd_transformed = bk_mcd_transformed.rename(columns={"Type": "category",
                                                    "Item": "item",
                                                    "Saturated Fat (g)": "saturated_fat",
                                                    "Calories": "calories"})

bk_mcd_transformed.head()

Unnamed: 0,Chain,category,item,saturated_fat,calories
0,Burger King,Whopper Sandwiches,Whopper Sandwich,12,660
1,Burger King,Whopper Sandwiches,Whopper Sandwich with Cheese,16,740
2,Burger King,Whopper Sandwiches,Bacon & Cheese Whopper Sandwich,17,790
3,Burger King,Whopper Sandwiches,Double Whopper Sandwich,20,900
4,Burger King,Whopper Sandwiches,Double Whopper Sandwich with Cheese,24,980


Again, we take a look at the "category" column so that we can properly classify the data into one of our three "food_class" numbers.

In [16]:
# Find full list of categories
bk_mcd_transformed['category'].unique()

array(['Whopper Sandwiches', 'Flame Broiled Burgers', 'Chicken & More',
       'Salads & Sides', 'King Jr Meals - Entrees',
       'King Jr Meals - Sides', 'King Jr Meals - Beverages',
       'King Jr Meals - Desserts', 'Desserts', 'Breakfast',
       'Additional Options', 'Shakes/Smoothies', 'Soft Drinks',
       'Hot Coffees', 'Iced Coffees', 'Frappes', 'Sandwiches',
       'French Fries', 'Chicken & Sauce', 'Salads', 'Salad Dressings',
       'Desserts/Shakes', 'Beverages', 'McCafe Coffees - Nonfat Milk',
       'McCafe Coffees - Whole Milk', 'McCafe Frappes',
       'McCafe Smoothies'], dtype=object)

a conditions variable is once again created. Values are again assigned using numpy "select" functionality, and our combo-resturant data frame is looking just like the other two before it...

but that is not what we want...

In [17]:
# manually assign each category to a food_class 
conditions = [(bk_mcd_transformed['category'] == 'Whopper Sandwiches') | (bk_mcd_transformed['category'] == 'Flame Broiled Burgers') \
                  | (bk_mcd_transformed['category'] == 'Chicken & More') | (bk_mcd_transformed['category'] == 'Salads & Sides') \
                  | (bk_mcd_transformed['category'] == 'King Jr Meals - Entrees') | (bk_mcd_transformed['category'] == 'King Jr Meals - Sides') \
                  | (bk_mcd_transformed['category'] == 'Breakfast') | (bk_mcd_transformed['category'] == 'Additional Options') \
                  | (bk_mcd_transformed['category'] == 'Sandwiches') | (bk_mcd_transformed['category'] == 'French Fries') \
                  | (bk_mcd_transformed['category'] == 'Chicken & Sauce') | (bk_mcd_transformed['category'] == 'Salads') \
                  | (bk_mcd_transformed['category'] == 'Salad Dressings'),
              (bk_mcd_transformed['category'] == 'Beverages') | (bk_mcd_transformed['category'] == 'McCafe Coffees') \
                  | (bk_mcd_transformed['category'] == 'King Jr Meals - Beverages') | (bk_mcd_transformed['category'] == 'Shakes/Smoothies') \
                  | (bk_mcd_transformed['category'] == 'Soft Drinks') | (bk_mcd_transformed['category'] == 'Hot Coffees') \
                  | (bk_mcd_transformed['category'] == 'Iced Coffees') | (bk_mcd_transformed['category'] == 'Frappes') \
                  | (bk_mcd_transformed['category'] == 'McCafe Coffees - Nonfat Milk') | (bk_mcd_transformed['category'] == 'McCafe Coffees - Whole Milk') \
                  | (bk_mcd_transformed['category'] == 'McCafe Frappes') | (bk_mcd_transformed['category'] == 'McCafe Smoothies'),
              (bk_mcd_transformed['category'] == 'Desserts') | (bk_mcd_transformed['category'] =='King Jr Meals - Desserts') \
                  | (bk_mcd_transformed['category'] == 'Desserts/Shakes') | (bk_mcd_transformed['category'] =='King Jr Meals - Desserts'), 
             ]

values = [3, 2, 1]

bk_mcd_transformed['food_class'] = np.select(conditions, values)

bk_mcd_transformed.head()

Unnamed: 0,Chain,category,item,saturated_fat,calories,food_class
0,Burger King,Whopper Sandwiches,Whopper Sandwich,12,660,3
1,Burger King,Whopper Sandwiches,Whopper Sandwich with Cheese,16,740,3
2,Burger King,Whopper Sandwiches,Bacon & Cheese Whopper Sandwich,17,790,3
3,Burger King,Whopper Sandwiches,Double Whopper Sandwich,20,900,3
4,Burger King,Whopper Sandwiches,Double Whopper Sandwich with Cheese,24,980,3


Since we want each resturant to eventually be loaded into its' own table in pgAdmin, we need to seperate the Burger King and McDonald data into to seperate data frames here.

First we need to remove the "space" from the Burger King entry, and adjust the datatypes

In [18]:
# remove bad data (namely the  ' -   ' values found in the original csv)
bk_mcd_transformed = bk_mcd_transformed[bk_mcd_transformed['saturated_fat'] != ' -   ']

In [19]:
# convert , decimal place to . and set to float64 datatype
bk_mcd_transformed['saturated_fat'] = bk_mcd_transformed['saturated_fat'].str.replace(',', '.')
bk_mcd_transformed['saturated_fat'] = bk_mcd_transformed['saturated_fat'].astype('float64')

Now the data frame is ready to be split based on what information is present in the "Chain" column.

In [20]:
# Split bk and mcd into seperate dataframes
bk_transformed =  bk_mcd_transformed.loc[bk_mcd_transformed['Chain'] == 'Burger King']
mcd_2_join_transformed =  bk_mcd_transformed.loc[bk_mcd_transformed['Chain'] == 'Mc Donalds']
mcd_2_join_transformed.head()

Unnamed: 0,Chain,category,item,saturated_fat,calories,food_class
174,Mc Donalds,Sandwiches,Hamburger,3.5,250,3
175,Mc Donalds,Sandwiches,Cheeseburger,6.0,300,3
176,Mc Donalds,Sandwiches,Double Cheeseburger,11.0,440,3
177,Mc Donalds,Sandwiches,McDouble,8.0,390,3
178,Mc Donalds,Sandwiches,Quarter Pounder with Cheese,12.0,510,3


Unfortunately this leaves us now with two seperate McDonald's data frames, one from each csv. We can rectify this by doing a right join of each of these df's so that any new information from our newly created McDonald's df is added and any repeat information is skipped.

This will eventually leave us with just one McDonald's df and one Burger King df.

In [21]:
mcd_transformed_combined = mcd_transformed.merge(mcd_2_join_transformed, how = "right")
mcd_transformed_combined

Unnamed: 0,category,item,saturated_fat,calories,food_class,Chain
0,Breakfast,Egg McMuffin,5.0,300,3,Mc Donalds
1,Breakfast,Sausage McMuffin,8.0,370,3,Mc Donalds
2,Breakfast,Sausage McMuffin with Egg,10.0,450,3,Mc Donalds
3,Breakfast,Sausage McGriddles,8.0,420,3,Mc Donalds
4,Breakfast,Hotcakes,2.0,350,3,Mc Donalds
...,...,...,...,...,...,...
292,McCafe Smoothies,Strawberry Banana Smoothie (Medium),0.0,260,2,Mc Donalds
293,McCafe Smoothies,Strawberry Banana Smoothie (Small),0.0,210,2,Mc Donalds
294,McCafe Smoothies,Wild Berry Smoothie (Large),0.5,320,2,Mc Donalds
295,McCafe Smoothies,Wild Berry Smoothie (Medium),0.0,260,2,Mc Donalds


All that is left to do is to remove the now uneeded "Chain" column in each of these data frames and set their index to "id"

In [22]:
# Remove the 'Chain' column from the dataframes
mcd_transformed_combined = mcd_transformed_combined.drop(columns=['Chain'])
bk_transformed = bk_transformed.drop(columns=['Chain'])

In [23]:
mcd_transformed_combined["id"] = mcd_transformed_combined.index
mcd_transformed_combined.set_index("id", inplace=True)
mcd_transformed_combined.head()

Unnamed: 0_level_0,category,item,saturated_fat,calories,food_class
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,Breakfast,Egg McMuffin,5.0,300,3
1,Breakfast,Sausage McMuffin,8.0,370,3
2,Breakfast,Sausage McMuffin with Egg,10.0,450,3
3,Breakfast,Sausage McGriddles,8.0,420,3
4,Breakfast,Hotcakes,2.0,350,3


In [24]:
bk_transformed["id"] = bk_transformed.index
bk_transformed.set_index("id", inplace=True)
bk_transformed.head()

Unnamed: 0_level_0,category,item,saturated_fat,calories,food_class
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,Whopper Sandwiches,Whopper Sandwich,12.0,660,3
1,Whopper Sandwiches,Whopper Sandwich with Cheese,16.0,740,3
2,Whopper Sandwiches,Bacon & Cheese Whopper Sandwich,17.0,790,3
3,Whopper Sandwiches,Double Whopper Sandwich,20.0,900,3
4,Whopper Sandwiches,Double Whopper Sandwich with Cheese,24.0,980,3


### Fourth Transform: starbucks_food_df and starbucks_drink_df

Nothing that we haven't already seen before is done in either of the starbucks df's.

The food based data frame is loaded first. After viewing it's contents, the columns are selected, copied and renamed accordingly using all of the same method's as before.

In [25]:
starbucks_food_df.head()

Unnamed: 0,Category,Name,Calories,Calories from Fat,Total Fat(g),Saturated Fat(g),Trans Fat(g),Cholesterol(mg),Sodium(mg),Total Carbohydrate(g),Dietary Fiber(g),Sugars(g),Protein(g),Portion
0,Bakery,Chonga Bagel,300.0,45.0,5.0,2.0,0.0,10.0,530.0,50.0,3.0,5.0,12.0,113 g
1,Bakery,8-Grain Roll,340.0,40.0,5.0,0.5,0.0,0.0,430.0,68.0,4.0,15.0,9.0,127 g
2,Bakery,Almond Croissant,420.0,190.0,22.0,9.0,0.5,75.0,390.0,45.0,3.0,13.0,10.0,99 g
3,Bakery,Banana Nut Bread,420.0,190.0,22.0,3.0,0.0,65.0,320.0,52.0,2.0,30.0,6.0,125 g
4,Bakery,Birthday Cake Pop,170.0,80.0,9.0,5.0,0.0,10.0,110.0,23.0,0.0,18.0,1.0,43 g


In [26]:
#starbucks_food_df.dtypes

In [27]:
# Create a filtered dataframe from specific columns
starbs_food_cols = ["Category", "Name", "Calories", "Saturated Fat(g)"]
starbs_food_transformed= starbucks_food_df[starbs_food_cols].copy()

# Rename the column headers
starbs_food_transformed = starbs_food_transformed.rename(columns={"Category": "category",
                                                                "Name": "item",
                                                                "Saturated Fat(g)": "saturated_fat",
                                                                "Calories": "calories"
                                                                })

# Show transformed db
starbs_food_transformed.head()

Unnamed: 0,category,item,calories,saturated_fat
0,Bakery,Chonga Bagel,300.0,2.0
1,Bakery,8-Grain Roll,340.0,0.5
2,Bakery,Almond Croissant,420.0,9.0
3,Bakery,Banana Nut Bread,420.0,3.0
4,Bakery,Birthday Cake Pop,170.0,5.0


This was our favorite data frame because all of the items were in the 3 (food) base "food_class" so all that had to be done was to create that new column and set it equal to 3

In [28]:
# Add new column for class designator
# Set column value equal to "food_class number 3" designating food for all
starbs_food_transformed["food_class"] = 3

starbs_food_transformed.head()

Unnamed: 0,category,item,calories,saturated_fat,food_class
0,Bakery,Chonga Bagel,300.0,2.0,3
1,Bakery,8-Grain Roll,340.0,0.5,3
2,Bakery,Almond Croissant,420.0,9.0,3
3,Bakery,Banana Nut Bread,420.0,3.0,3
4,Bakery,Birthday Cake Pop,170.0,5.0,3


We were wrong...

The "category" column here wasn't as revealing as our previous data frames, so instead we looked at the item column and all of the unique items available. This revealed which of the items should acutally be updated to be in the 1 (dessert) category.

The update was done using the conditions list and numpy functionality again.

In [30]:
#starbs_food_transformed["item"].unique()

In [31]:
# Update individual item to dessert class if needed
# Assign each category to a food_class manually
conditions = [(starbs_food_transformed["item"] == "Birthday Cake Pop") |
              (starbs_food_transformed["item"] == "Blueberry Oat Cake") |
              (starbs_food_transformed["item"] == "Chocolate Cake Pop") |
              (starbs_food_transformed["item"] == "Chocolate Chip Cookie") |
              (starbs_food_transformed["item"] == "Chocolate Chip Cookie Dough Cake Pop") |
              (starbs_food_transformed["item"] == "Classic Coffee Cake") |
              (starbs_food_transformed["item"] == "Confetti Sugar Cookie") |
              (starbs_food_transformed["item"] == "Double Chocolate Chunk Brownie") |
              (starbs_food_transformed["item"] == "Frosted Doughnut Cake Pop")|
              (starbs_food_transformed["item"] == "Gluten-Free Marshmallow Dream Bar") |
              (starbs_food_transformed["item"] == "Iced Lemon Loaf Cake") |
              (starbs_food_transformed["item"] == "Old-Fashioned Glazed Doughnut") |
              (starbs_food_transformed["item"] == "Strawberry Cake Pop")
             ]

# This is the value for a dessert
values = [1]

starbs_food_transformed['food_class'] = np.select(conditions, values)
        
# Show transformed db    
starbs_food_transformed.head()

Unnamed: 0,category,item,calories,saturated_fat,food_class
0,Bakery,Chonga Bagel,300.0,2.0,0
1,Bakery,8-Grain Roll,340.0,0.5,0
2,Bakery,Almond Croissant,420.0,9.0,0
3,Bakery,Banana Nut Bread,420.0,3.0,0
4,Bakery,Birthday Cake Pop,170.0,5.0,1


Here we did replaced anything that was changed to a zero back to a 3

In [32]:
# Change all 0 values back to 3 for food_class
starbs_food_transformed["food_class"].replace(0, 3, inplace = True)
starbs_food_transformed.head()

Unnamed: 0,category,item,calories,saturated_fat,food_class
0,Bakery,Chonga Bagel,300.0,2.0,3
1,Bakery,8-Grain Roll,340.0,0.5,3
2,Bakery,Almond Croissant,420.0,9.0,3
3,Bakery,Banana Nut Bread,420.0,3.0,3
4,Bakery,Birthday Cake Pop,170.0,5.0,1


The same steps are followed again for the Starbucks "drinks" data frame

In [33]:
starbucks_drink_df.head()

Unnamed: 0,Category,Name,Portion(fl oz),Calories,Calories from fat,Total Fat(g),Saturated fat(g),Trans fat(g),Cholesterol(mg),Sodium(mg),Total Carbohydrate(g),Dietary Fiber(g),Sugars(g),Protein(g),Caffeine(mg),Size,Milk,Whipped Cream
0,iced-coffee,Cold Brew with Cascara Cold Foam,12.0,50,0,0.0,0.0,0.0,0,25,11,0,11,1,145,Tall,,
1,iced-coffee,Cold Brew with Cascara Cold Foam,16.0,80,0,0.0,0.0,0.0,0,30,17,0,17,2,190,Grande,,
2,iced-coffee,Cold Brew with Cascara Cold Foam,24.0,100,0,0.0,0.0,0.0,0,40,22,0,22,2,280,Venti Iced,,
3,iced-coffee,Cold Brew with Cascara Cold Foam,30.0,130,0,0.0,0.0,0.0,0,45,28,0,28,2,320,Trenta Iced,,
4,iced-coffee,Iced Coffee,30.0,160,0,0.0,0.0,0.0,0,15,40,0,39,1,280,Trenta Iced,,Sweetened


In [34]:
# Create a filtered dataframe from specific columns
starbs_drink_cols = ["Category", "Name", "Calories", "Saturated fat(g)"]
starbs_drink_transformed= starbucks_drink_df[starbs_drink_cols].copy()

# Rename the column headers
starbs_drink_transformed = starbs_drink_transformed.rename(columns={"Category": "category",
                                                                    "Name": "item",
                                                                    "Saturated fat(g)": "saturated_fat",
                                                                    "Calories": "calories"
                                                                    })

# Show transformed db
starbs_drink_transformed.head()

Unnamed: 0,category,item,calories,saturated_fat
0,iced-coffee,Cold Brew with Cascara Cold Foam,50,0.0
1,iced-coffee,Cold Brew with Cascara Cold Foam,80,0.0
2,iced-coffee,Cold Brew with Cascara Cold Foam,100,0.0
3,iced-coffee,Cold Brew with Cascara Cold Foam,130,0.0
4,iced-coffee,Iced Coffee,160,0.0


This data frame really really was our favorite because this time all of the "food_class" was actually drink and could just recieve a blanked "2" for the column

In [36]:
# Add new column for class designator
# Set column value equal to "food_class number 2" designating drink for all
starbs_drink_transformed["food_class"] = 2

starbs_drink_transformed.head()

Unnamed: 0,category,item,calories,saturated_fat,food_class
0,iced-coffee,Cold Brew with Cascara Cold Foam,50,0.0,2
1,iced-coffee,Cold Brew with Cascara Cold Foam,80,0.0,2
2,iced-coffee,Cold Brew with Cascara Cold Foam,100,0.0,2
3,iced-coffee,Cold Brew with Cascara Cold Foam,130,0.0,2
4,iced-coffee,Iced Coffee,160,0.0,2


In [37]:
# See all of the different category listings
#starbs_drink_transformed["category"].unique()

So that we could have one table for Starbucks, we combined these two data frames using an outer join. Because the columns matched perfectly, the outer join worked perfectly and we now had our Starbucks menu data frame

In [38]:
# Join the dataframes to get a starbucks food and drink df
starbs_menu_df = starbs_food_transformed.merge(starbs_drink_transformed, how = "outer")
starbs_menu_df

Unnamed: 0,category,item,calories,saturated_fat,food_class
0,Bakery,Chonga Bagel,300.0,2.0,3
1,Bakery,8-Grain Roll,340.0,0.5,3
2,Bakery,Almond Croissant,420.0,9.0,3
3,Bakery,Banana Nut Bread,420.0,3.0,3
4,Bakery,Birthday Cake Pop,170.0,5.0,1
...,...,...,...,...,...
2149,tea,Iced Teavana® London Fog Tea Latte,180.0,3.5,2
2150,tea,Iced Teavana® London Fog Tea Latte,180.0,0.0,2
2151,tea,Iced Teavana® London Fog Tea Latte,230.0,3.5,2
2152,tea,Iced Teavana® London Fog Tea Latte,210.0,2.5,2


The column "saturated_fat" was moved infront of the column "calories" here so that it would be congruent with our other data frames

In [39]:
# Adjust so saturated_fat column is before calories
mid = starbs_menu_df["saturated_fat"]
starbs_menu_df.drop(labels=["saturated_fat"], axis = 1, inplace = True)
starbs_menu_df.insert(2, "saturated_fat", mid)
starbs_menu_df["id"] = starbs_menu_df.index
starbs_menu_df.set_index("id", inplace=True)
starbs_menu_df

Unnamed: 0_level_0,category,item,saturated_fat,calories,food_class
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,Bakery,Chonga Bagel,2.0,300.0,3
1,Bakery,8-Grain Roll,0.5,340.0,3
2,Bakery,Almond Croissant,9.0,420.0,3
3,Bakery,Banana Nut Bread,3.0,420.0,3
4,Bakery,Birthday Cake Pop,5.0,170.0,1
...,...,...,...,...,...
2149,tea,Iced Teavana® London Fog Tea Latte,3.5,180.0,2
2150,tea,Iced Teavana® London Fog Tea Latte,0.0,180.0,2
2151,tea,Iced Teavana® London Fog Tea Latte,3.5,230.0,2
2152,tea,Iced Teavana® London Fog Tea Latte,2.5,210.0,2


# L - Load

## Create database connection

In this step we will be connecting to the database that was made in pgAdmin before this jupyter notebook was run.

Remeber: before running the following cells, you will have needed to run the "ERD.sql" file in pgAdmin and created a "config.py" file with your username and password. Please be sure to follower our provided readme closely before proceeding.

In [None]:
# Update Username and Password for pgAdmin
# Also update Database Name to match what you created at the start
connection_string = f"{username}:{password}@localhost:5432/FastFood_db"

# Create the engine
engine = create_engine(f'postgresql://{connection_string}')

To ensure everything was set up and the config.py file is running correctly, see if you get the correct names to return back from the engine that is connected to your pgAdmin database!

In [None]:
# Confirm tables
# You should see ["McDonalds", "Burger_King", "Starbucks", "Subway", "Food_Classes"]
engine.table_names()

## Load DataFrames into database

Using the pandas function ".to_sql" we can load the data frames we transformed in this jupyter notebook to our connected engine. If all of the steps have been followed up to this point, after running the next four cells, you can switch over to pgAdmin to query your new fully populated tables!!

In [None]:
# Use "to_sql" function to load all transformed dfs' data into postgres

# Starbucks
starbs_menu_df.to_sql(name='Starbucks', con=engine, if_exists='append', index=True)

In [None]:
# Subway
subway_transformed.to_sql(name='Subway', con=engine, if_exists='append', index=True)

In [None]:
# McDonalds
mcd_transformed_combined.to_sql(name='McDonalds', con=engine, if_exists='append', index=True)

In [None]:
# Burger King
bk_transformed.to_sql(name='Burger_King', con=engine, if_exists='append', index=True)