# Entity Decomposition

The `recipes_at` entity in the raw dataset contains two types of properties: recipes and ingredients.

Create a staging table 'ingredients' so that it has all the occurences of an ingredient in a recipe - we'll use it to create the quantity table

 'Ingredients' is the table of combinations of recipe and ingredient. It's the intermediary table to quantity_at

# creating ingredient_at and quantity_at
We are creating an intermediary table that includes a row for every ingredient in one recipe (with the recipe id and the ingredient names as the only fields)

In [None]:
%%bigquery
CREATE TABLE shidcs329e.magazine_recipes_stg.ingredients AS
SELECT
  recipe_id,
  ingredient_1 AS ingredient_name
FROM
  shidcs329e.magazine_recipes_stg.recipe_ingredient_at
WHERE
  ingredient_1 IS NOT NULL

UNION ALL

SELECT
  recipe_id,
  ingredient_2 AS ingredient_name
FROM
  shidcs329e.magazine_recipes_stg.recipe_ingredient_at
WHERE
  ingredient_2 IS NOT NULL

UNION ALL

SELECT
  recipe_id,
  ingredient_3 AS ingredient_name
FROM
  shidcs329e.magazine_recipes_stg.recipe_ingredient_at
WHERE
  ingredient_3 IS NOT NULL

UNION ALL

SELECT
  recipe_id,
  ingredient_4 AS ingredient_name
FROM
  shidcs329e.magazine_recipes_stg.recipe_ingredient_at
WHERE
  ingredient_4 IS NOT NULL

UNION ALL

SELECT
  recipe_id,
  ingredient_5 AS ingredient_name
FROM
  shidcs329e.magazine_recipes_stg.recipe_ingredient_at
WHERE
  ingredient_5 IS NOT NULL

UNION ALL

SELECT
  recipe_id,
  ingredient_6 AS ingredient_name
FROM
  shidcs329e.magazine_recipes_stg.recipe_ingredient_at
WHERE
  ingredient_6 IS NOT NULL;


Query is running:   0%|          |

# Primary Keys

Let's see what the ingredient_ids are like in the bird ingredients


In [None]:
%%bigquery
SELECT MAX(ingredient_id) FROM magazine_recipes_raw.ingredients


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,f0_
0,4644


Anything above 4644 can be used as an index for our new ingredients. Unique_incredients_with_id is all the unique ingredients in recipe_at joined with the existing ids in bird recipes for ingredients that appear in the ingredients table from bird (so there are no instances of the same ingredient having two different ids.)

In [None]:
%%bigquery
DROP TABLE IF EXISTS shidcs329e.magazine_recipes_stg.unique_ingredient_with_id;
CREATE TABLE shidcs329e.magazine_recipes_stg.unique_ingredient_with_id AS
select distinct LOWER(si.ingredient_name) as ingredient_name, ri.ingredient_id from magazine_recipes_stg.ingredients as si
left join magazine_recipes_raw.ingredients as ri
on LOWER(si.ingredient_name) = ri.name

Query is running:   0%|          |

Using COALESCE() to replace the null recipe_ids with our unique indexing starting from 4644 so that there's no overlap in ids. ingredient_id_no_nulls is our complete recipe_at ingredient table

In [None]:
%%bigquery
CREATE TABLE shidcs329e.magazine_recipes_stg.ingredient_id_no_nulls AS
SELECT ingredient_name, COALESCE(ingredient_id, 4645 + ROW_NUMBER() OVER()) as ingredient_id FROM magazine_recipes_stg.unique_ingredient_with_id


Query is running:   0%|          |

Join 'ingredient_id_no_nulls' with 'ingredients' to complete the quantity table for the junction table from airtables recipes and ingredients, this table will have the recipe id and the ingredient id and data source and load time.

In [10]:
%%bigquery
DROP TABLE IF EXISTS shidcs329e.magazine_recipes_stg.quantity_at;
CREATE TABLE shidcs329e.magazine_recipes_stg.quantity_at AS
select (6357 + ROW_NUMBER() OVER()) as quantity_id, si.recipe_id, iinn.ingredient_id, 'Airtable' as data_source, CURRENT_DATETIME() as load_time from magazine_recipes_stg.ingredient_id_no_nulls as iinn
RIGHT JOIN magazine_recipes_stg.ingredients as si
ON LOWER(iinn.ingredient_name) = LOWER(si.ingredient_name)


Query is running:   0%|          |

Create a final intermediary table for ingredients from airtable that includes the unique ingredient id, the name of the ingredient, the source and the load time

In [12]:
%%bigquery
CREATE TABLE magazine_recipes_stg.ingredients_at AS
select ingredient_id, ingredient_name, "Airtable" as data_source, CURRENT_DATETIME() as load_time FROM magazine_recipes_stg.ingredient_id_no_nulls


Query is running:   0%|          |

# creating recipe_at
Finally, create a final recipes_at intermediary table that does not include any information about the ingredients in the recipe and also includes a datasource and current time of loading.

In [14]:
%%bigquery
CREATE TABLE magazine_recipes_stg.recipes_at AS
SELECT recipe_id, name, rating, ease_of_prep, note, type, prep_time, cookbook, page, slowcooker, link, last_made, "Airtable" as data_source, CURRENT_DATETIME() as load_time FROM magazine_recipes_stg.recipe_ingredient_at

Query is running:   0%|          |

# Cleanup
Delete all of the intermediary tables that we will not be using in future projects!

In [20]:
%%bigquery
drop table magazine_recipes_stg.recipe_ingredient_at;
drop table magazine_recipes_stg.ingredients;
drop table magazine_recipes_stg.unique_ingredient_with_id;
drop table magazine_recipes_stg.ingredient_id_no_nulls

Query is running:   0%|          |