# Final Project
Scenarios Implemented:
1. Have AI write directions for any recipes that do not already have directions written
2. Have AI determine what ingredients are mentioned in those directions and have AI compare the mentioned ingredients with those already stored in the database to determine what information needs to be added to the quantities table
3. For any ingredient not already found in our database, have AI assign it a category before we add it to the ingredients table
4. Have AI determine the ease of prep of a recipe based on that recipes directions
5. Create AI generated comments for the recipes in our database (this logic is implemented in a seperate notebook).


## Enrich missing fields from Recipes
Multiple fields from the Recipes table are missing because we had to create nulls when merging our two Recipe sources.

## Directions

In [None]:
# some recipes from bird have different formatting, explore why that is?
%%bigquery
select r.recipe_id, r.title, i.name, n.*, q.quantity_id
  from magazine_recipes_stg.Recipes as r
  FULL JOIN magazine_recipes_stg.Quantity as q
    ON q.recipe_id = r.recipe_id
  FULL JOIN magazine_recipes_stg.Ingredients as i
    ON i.ingredient_id = q.ingredient_id
  FULL JOIN magazine_recipes_stg.Nutrition as n
    ON n.recipe_id = r.recipe_id
  WHERE r.title LIKE '-%-'

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,recipe_id,title,name,recipe_id_1,protien,carbo,alcohol,total_fat,sat_fat,cholestrl,...,vitamin_a,fiber,pcnt_cal_carb,pcnt_cal_fat,pcnt_cal_prot,calories,health_rating,data_source,load_time,quantity_id
0,901,-Maple Flavored Syrup-,maple flavored syrup,901,0.00,26.00,0.0,0.00,0.00,0.00,...,0.00,0.00,100.00,0.00,0.00,104.00,3,bird-ai,2024-01-27 00:11:11.060078+00:00,6325
1,714,-Sauteed Mushrooms-,lowfat margarine,714,2.41,5.31,0.0,3.29,0.62,0.00,...,239.76,0.85,35.12,48.95,15.94,60.50,3,bird-ai,2024-01-27 00:11:11.060078+00:00,4649
2,714,-Sauteed Mushrooms-,mushroom,714,2.41,5.31,0.0,3.29,0.62,0.00,...,239.76,0.85,35.12,48.95,15.94,60.50,3,bird-ai,2024-01-27 00:11:11.060078+00:00,4648
3,895,-Soy Sauce-,soy sauce,895,0.93,1.53,0.0,0.01,0.00,0.00,...,0.00,0.00,61.40,1.30,37.30,9.98,1,bird-ai,2024-01-27 00:11:11.060078+00:00,4858
4,879,-Chocolate Milk Shake-,chocolate ice cream,879,9.88,33.04,0.0,12.14,7.37,34.92,...,729.07,0.05,47.04,38.89,14.07,280.92,3,bird-ai,2024-01-27 00:11:11.060078+00:00,4917
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
288,918,-Dry Curd Cottage Cheese-,dry curd cottage cheese,918,25.04,2.68,0.0,0.61,0.40,9.72,...,43.50,0.00,9.22,4.71,86.07,116.38,3,bird-ai,2024-01-27 00:11:11.060078+00:00,4880
289,916,-Steak Sauce-,steak sauce,916,0.00,2.07,0.0,0.00,0.00,0.00,...,0.00,0.00,100.00,0.00,0.00,8.28,1,bird-ai,2024-01-27 00:11:11.060078+00:00,4878
290,860,-Herb Tea-,herb tea,860,0.00,0.45,0.0,0.05,0.00,0.00,...,0.00,0.00,81.63,18.37,0.00,2.23,3,bird-ai,2024-01-27 00:11:11.060078+00:00,4824
291,791,-Unsweetened Canned Pineapple-,canned pineapple chunks in juice,791,0.60,22.29,0.0,0.11,0.01,0.00,...,53.96,0.50,96.32,1.10,2.58,92.58,3,bird-ai,2024-01-27 00:11:11.060078+00:00,4730


It appears that there are recipes from bird that are stored as both recipes and ingredients (with a one to one overlap), this is likely because they have nutritional information (which is only a quality of recipes), so we must keep them in this form.

In [None]:
# it appears that some of these recipes have the same ingredient in the table and others do not
%%bigquery
select r.recipe_id, r.title, count(i.name)
  from magazine_recipes_stg.Recipes as r
  FULL JOIN magazine_recipes_stg.Quantity as q
    ON q.recipe_id = r.recipe_id
  FULL JOIN magazine_recipes_stg.Ingredients as i
    ON i.ingredient_id = q.ingredient_id
  WHERE r.title LIKE '-%-'
  GROUP BY r.recipe_id, r.title
  HAVING count(i.name) < 2

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,recipe_id,title,f0_
0,766,-Kumquats-,0
1,727,-Beets-,1
2,797,-Canned Grapefruit Slices-,1
3,883,-Cola Drink-,1
4,920,-Whole Unsalted Cashew Nuts-,1
...,...,...,...
190,767,-Cantaloupe-,1
191,898,-Jam-,1
192,932,-Cheerios Cereal-,1
193,840,-Bagels-,0


In [None]:
# this query allows us to get a list of all the ingredients that the database has for a given recipe, we will use this to train the AI
%%bigquery
SELECT
  r.recipe_id,
  r.title,
  STRING_AGG(i.name, ', ') AS all_ingredients
FROM
  magazine_recipes_stg.Recipes AS r
LEFT JOIN magazine_recipes_stg.Quantity AS q
ON q.recipe_id = r.recipe_id
LEFT JOIN magazine_recipes_stg.Ingredients AS i
ON i.ingredient_id = q.ingredient_id
GROUP BY
  r.recipe_id, r.title


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,recipe_id,title,all_ingredients
0,39,Lentil Soup,"lentils, vegetarian, onion"
1,35,Balsamic Potatoes and Asparagus,asparagus
2,28,Kung Pao Chicken,"water chestnuts, boneless chicken, peanuts"
3,63,Roasted Sweet Potato Lentil Salad,"lentils, spinach, sweet potato, celery"
4,70,Lentil Curry,"lentils, onion"
...,...,...,...
964,59,Balsamic Dijon Root Vegetables,onion
965,739,-Boiled Onions-,onion
966,55,Sugar Cookies,flour
967,44,Chocolate Chip Irish Soda Bread,flour


In [5]:
# Ask AI to write directions for recipes that do not already have directions
%%bigquery
declare prompt_query STRING default " Create directions for the recipe based on its title and the ingredients in it. If there are are ingredients mentioned in the directions that were not given to you, indicate what they are. Do not mention specific quantities of ingredients in the directions. Return output with recipe_id, title, directions, ingredients_list (of ingredients in directions that you were not given to you)";
select *
from ML.generate_text(
  model remote_models.gemini_pro,
  (
    select concat(prompt_query, to_json_string(json_object("recipe_id", r.recipe_id, "title", title))) as prompt
    from magazine_recipes_stg.Recipes as r
    INNER JOIN magazine_recipes_stg.Quantity as q
      ON q.recipe_id = r.recipe_id
    INNER JOIN magazine_recipes_stg.Ingredients as i
      ON i.ingredient_id = q.ingredient_id
    where directions is null
    order by r.recipe_id
    limit 5
  ),
  struct(TRUE as flatten_json_output)
);

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,ml_generate_text_llm_result,ml_generate_text_rai_result,ml_generate_text_status,prompt
0,,"[{""category"":1,""probability"":1,""probability_sc...",,Create directions for the recipe based on its...
1,,"[{""category"":1,""probability"":1,""probability_sc...",,Create directions for the recipe based on its...
2,## Dutch Oven Bread\n\n**Ingredients:**\n\n* 3...,"[{""category"":1,""probability"":1,""probability_sc...",,Create directions for the recipe based on its...
3,"```json\n{""recipe_id"":71,""title"":""Dutch Oven B...","[{""category"":1,""probability"":1,""probability_sc...",,Create directions for the recipe based on its...
4,## Rye Bread\n\n**Ingredients:**\n\n* Rye flou...,"[{""category"":1,""probability"":1,""probability_sc...",,Create directions for the recipe based on its...


In [None]:
# look at examples of directions already in the database for comparison with future AI output
%%bigquery
SELECT title, directions FROM magazine_recipes_stg.Recipes WHERE directions IS NOT NULL LIMIT 10

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,title,directions
0,Orange Date Shake,Combine dates and orange juice in blender and ...
1,After School Fruit Cup,"Combine grapes, apple, orange, cantaloupe and ..."
2,Hazelnut Bleu Cheese Dressing,"Combine buttermilk, eggs, vinegar, garlic, sal..."
3,B.L.T. Salad,Cut 18 center slices from tomatoes. Lightly sa...
4,Sweet and Sour Cherry Ham Salad,Fill salad bowl half full with torn salad gree...
5,Applesauce,"In medium saucepan, combine all ingredients ex..."
6,Kiwifruit Popsicles,Peel kiwifruit. Process in blender or food pro...
7,Quick Kiwifruit Refresher,Combine all ingredients in a blender and blend...
8,T.B.P.B. Smoothie,Blend in blender until smooth.
9,Cherry Ambrosia Salad,"Layer fruit in salad bowl, sprinkle each layer..."


In [None]:
# Begin prompt engineering- specify the format to match that of the given directions
%%bigquery
declare prompt_query STRING default " Create directions based on the ingredients and title of the recipe. All in one line, no numbering of steps. Return output with recipe_id, title, ingredients, and directions as json. Give best attempt at creating directions, no null directions.";
create or replace table magazine_recipes_stg_ai.recipes_directions as
select *
from ML.generate_text(
  model remote_models.gemini_pro,
  (
    SELECT concat(prompt_query, to_json_string(json_object("recipe_id", recipe_id, "title", title, "ingredients", all_ingredients))) as prompt
     FROM (
     SELECT
      r.recipe_id,
      r.title,
      STRING_AGG(i.name, ', ') AS all_ingredients
    FROM
      magazine_recipes_stg.Recipes AS r
    LEFT JOIN magazine_recipes_stg.Quantity AS q
    ON q.recipe_id = r.recipe_id
    LEFT JOIN magazine_recipes_stg.Ingredients AS i
    ON i.ingredient_id = q.ingredient_id
    WHERE r.directions IS NULL
    AND r.title NOT LIKE '-%-'
    GROUP BY
      r.recipe_id, r.title)
  ),
  struct(TRUE as flatten_json_output)
);

Query is running:   0%|          |

In [7]:
%%bigquery
SELECT * FROM magazine_recipes_stg_ai.recipes_directions
limit 5

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,ml_generate_text_llm_result,ml_generate_text_rai_result,ml_generate_text_status,prompt
0,"{""ingredients"":""milk"",""recipe_id"":20,""title"":""...",,,Create directions based on the ingredients an...
1,"{""ingredients"":""noodles, sausage"",""recipe_id"":...",,,Create directions based on the ingredients an...
2,"{""ingredients"":""bacon"",""recipe_id"":127,""title""...",,,Create directions based on the ingredients an...
3,"{""ingredients"":""spinach, lentils, sweet potato...",,,Create directions based on the ingredients an...
4,"{""ingredients"":""asparagus"",""recipe_id"":35,""tit...",,,Create directions based on the ingredients an...


In [None]:
# We will now store the AI output in a table with directions
%%bigquery
CREATE OR REPLACE TABLE magazine_recipes_stg_ai.directions AS
select json_value(ml_generate_text_llm_result, '$.recipe_id') as recipe_id,
  json_value(ml_generate_text_llm_result, '$.title') as title,
  json_value(ml_generate_text_llm_result, '$.directions') as recipe_directions,
  json_value(ml_generate_text_llm_result, '$.ingredients') as database_ingredients
from magazine_recipes_stg_ai.recipes_directions

Query is running:   0%|          |

In [None]:
# The AI failed to write directions for some of the recipes, we will now ask it again to only look at those and give clear instructions for how to do it
%%bigquery
declare prompt_query STRING default " You must write directions for the following recipe WITH LESS THAN 30 WORDS. All in one line, no numbering of steps. Return output with recipe_id, title, and directions as a VALID CLOSED JSON. ";
create or replace table magazine_recipes_stg_ai.recipes_directions_for_nulls as
select *
from ML.generate_text(
  model remote_models.gemini_pro,
  (
    SELECT concat(prompt_query, to_json_string(json_object("recipe_id", recipe_id, "title", title))) as prompt
     FROM (
     SELECT
      recipe_id,
      title,
    FROM
    magazine_recipes_stg_ai.directions
    WHERE recipe_directions IS NULL)
  ),
  struct(TRUE as flatten_json_output)
);

Query is running:   0%|          |

In [8]:
%%bigquery
SELECT * FROM magazine_recipes_stg_ai.recipes_directions_for_nulls

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,ml_generate_text_llm_result,ml_generate_text_rai_result,ml_generate_text_status,prompt
0,,,,You must write directions for the following r...
1,"{""recipe_id"":""105"",""title"":""Shepherd's Pie"",""d...",,,You must write directions for the following r...
2,"{""recipe_id"":""58"",""title"":""Naan"",""directions"":...",,,You must write directions for the following r...
3,"{""recipe_id"":""55"",""title"":""Sugar Cookies"",""dir...",,,You must write directions for the following r...
4,"{""recipe_id"":null,""title"":null}",,,You must write directions for the following r...
5,"{""recipe_id"":""56"",""title"":""Potato Curry"",""dire...",,,You must write directions for the following r...
6,"{""recipe_id"":""117"",""title"":""Cloverleaf Rolls"",...",,,You must write directions for the following r...
7,"{""recipe_id"":""45"",""title"":""Malteese Gilatti"",""...",,,You must write directions for the following r...


In [None]:
# Store the output in a table for just the ones that were originally null
%%bigquery
CREATE OR REPLACE TABLE magazine_recipes_stg_ai.directions_for_nulls AS
select json_value(ml_generate_text_llm_result, '$.recipe_id') as recipe_id,
  json_value(ml_generate_text_llm_result, '$.title') as title,
  json_value(ml_generate_text_llm_result, '$.directions') as recipe_directions
from magazine_recipes_stg_ai.recipes_directions_for_nulls


Query is running:   0%|          |

In [None]:
# Examine the table
%%bigquery
SELECT * FROM magazine_recipes_stg_ai.directions_for_nulls

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,recipe_id,title,recipe_directions
0,,,
1,,,
2,45.0,Malteese Gilatti,Mix all ingredients and bake at 350 degrees fo...
3,55.0,Sugar Cookies,Preheat oven to 375 degrees F (190 degrees C)....
4,56.0,Potato Curry,"Sauté potatoes, onions, and garlic in oil. Add..."
5,58.0,Naan,"Combine flour, salt, and yeast in a large bowl..."
6,105.0,Shepherd's Pie,"Brown the ground beef and onion. Add the peas,..."
7,117.0,Cloverleaf Rolls,"Combine warm water, sugar, and yeast in a larg..."


In [None]:
# update the directions table with these new values
%%bigquery
UPDATE magazine_recipes_stg_ai.directions d
SET d.recipe_directions = dn.recipe_directions
FROM magazine_recipes_stg_ai.directions_for_nulls AS dn
WHERE d.recipe_id = dn.recipe_id

Query is running:   0%|          |

In [None]:
#, it appears the model still had trouble with the chocolate chip cookie despite several attempts at writing prompts directed at that recipe
%%bigquery
SELECT * FROM magazine_recipes_stg_ai.directions  WHERE recipe_directions IS NULL

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,recipe_id,title,recipe_directions,database_ingredients
0,,,,
1,64.0,Chewy Chocolate Chip COokies,,"butter, flour"


In [None]:
# Use AI to determine if any additional ingredients were mentioned in the directions that are not in the database for that recipe
%%bigquery
declare prompt_query STRING default "Identify all ingredients mentioned in the directions of the recipe. Return output with recipe_id, title, directions, and mentioned_ingredients";
create or replace table magazine_recipes_stg_ai.direction_ingredients as
select *
from ML.generate_text(
  model remote_models.gemini_pro,
  (
    SELECT concat(prompt_query, to_json_string(json_object("recipe_id", recipe_id, "title", title, "recipe_directions", recipe_directions))) as prompt
     FROM magazine_recipes_stg_ai.directions
  ),
  struct(TRUE as flatten_json_output)
);

Query is running:   0%|          |

In [12]:
# explore what mentioned ingredients the AI identified
%%bigquery
SELECT * FROM magazine_recipes_stg_ai.direction_ingredients

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,ml_generate_text_llm_result,ml_generate_text_rai_result,ml_generate_text_status,prompt
0,The provided context does not include any reci...,,,Identify all ingredients mentioned in the dire...
1,The provided context does not include any reci...,,,Identify all ingredients mentioned in the dire...
2,The provided context does not include any reci...,,,Identify all ingredients mentioned in the dire...
3,"{""recipe_directions"":""Blend figs and milk unti...",,,Identify all ingredients mentioned in the dire...
4,"{""recipe_directions"":""Brown the beef and add t...",,,Identify all ingredients mentioned in the dire...
...,...,...,...,...
126,"{""recipe_id"":""93"",""title"":""Barley Beef Skillet...",,,Identify all ingredients mentioned in the dire...
127,"{""recipe_id"":""94"",""title"":""Southwest Beef & Ri...",,,Identify all ingredients mentioned in the dire...
128,"{""recipe_id"":""95"",""title"":""Fried Rice"",""direct...",,,Identify all ingredients mentioned in the dire...
129,"{""recipe_id"":""97"",""title"":""Baked Mostaccioli"",...",,,Identify all ingredients mentioned in the dire...


In [None]:
# create a table with the mentioned ingredients
%%bigquery
CREATE OR REPLACE TABLE magazine_recipes_stg_ai.direction_mentioned_ingredients AS
select json_value(ml_generate_text_llm_result, '$.recipe_id') as recipe_id,
  json_QUERY(ml_generate_text_llm_result, '$.mentioned_ingredients') as mentioned_ingredients,
from magazine_recipes_stg_ai.direction_ingredients

Query is running:   0%|          |

In [None]:
# create a table that has a field for the ingredients mentioned in the AI output and a field for the ingredients found in the database
%%bigquery
create or replace table magazine_recipes_stg_ai.compare_ingredients as
select d.recipe_id, replace(d.database_ingredients, '\'', '') as db_ingredients, REPLACE(REPLACE(REPLACE(di.mentioned_ingredients, '"', ''), ']', ''), '[', '') AS ai_ingredients
from magazine_recipes_stg_ai.directions as d
JOIN magazine_recipes_stg_ai.direction_mentioned_ingredients as di
  ON d.recipe_id = di.recipe_id

Query is running:   0%|          |

In [None]:
# examine the ingredients in this table
%%bigquery
select * from magazine_recipes_stg_ai.compare_ingredients

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,recipe_id,db_ingredients,ai_ingredients
0,93,beef,"beef,barley,water,salt,pepper"
1,105,beef,"ground beef,onion,peas,carrots,corn,tomato sou..."
2,132,milk,
3,89,milk,
4,131,milk,"milk,waffle batter"
...,...,...,...
123,142,"cucumber, fish, dill, red onion, broccoli, yogurt","cucumber,fish,dill,red onion,broccoli,yogurt"
124,16,"fresh kale, coconut milk, peanuts, cilantro, q...","quinoa,coconut milk,onion,kale,peanuts,cilantro"
125,17,"red onion, coconut milk, cilantro, ginger, len...","red onion,ginger,lentils,sweet potato,coconut ..."
126,12,"asparagus, orzo, olive oil, lemon juice, salt,...","orzo,olive oil,asparagus,garlic,lemon zest,lem..."


In [None]:
# ask AI to identify any differences in the ingredients in the two columns
%%bigquery
declare prompt_query STRING default " Identify ingredients in ai_ingredients that are not mentioned in db ingredients. Return recipe_id, db_ingredients, ai_ingredients and ingredients_difference in output";
CREATE OR REPLACE TABLE magazine_recipes_stg_ai.ingredients_difference AS
select *
from ML.generate_text(
  model remote_models.gemini_pro,
  (
    select concat(prompt_query, to_json_string(json_object("recipe_id",recipe_id, "db_ingredients", db_ingredients, "ai_ingredients", ai_ingredients ))) as prompt
    from  magazine_recipes_stg_ai.compare_ingredients
  ),
  struct(TRUE as flatten_json_output)
);

Query is running:   0%|          |

In [17]:
# explore the AI output
%%bigquery
SELECT * FROM magazine_recipes_stg_ai.ingredients_difference
limit 5

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,ml_generate_text_llm_result,ml_generate_text_rai_result,ml_generate_text_status,prompt
0,"{""ai_ingredients"":""chicken,honey,butter,herbs,...",,,Identify ingredients in ai_ingredients that a...
1,"{""ai_ingredients"":""mixed nuts"",""db_ingredients...",,,Identify ingredients in ai_ingredients that a...
2,"{""ai_ingredients"":""noodles,marinara sauce"",""db...",,,Identify ingredients in ai_ingredients that a...
3,"{""ai_ingredients"":""quinoa,coconut milk,onion,k...",,,Identify ingredients in ai_ingredients that a...
4,"{""ai_ingredients"":""lentils,water"",""db_ingredie...",,,Identify ingredients in ai_ingredients that a...


In [None]:
# create a new table that has the ingredients identified by AI to not be in the database
%%bigquery
CREATE OR REPLACE TABLE magazine_recipes_stg_ai.missing_ingredients_list AS
select json_value(ml_generate_text_llm_result, '$.recipe_id') as recipe_id,
  SPLIT(REPLACE(json_QUERY(ml_generate_text_llm_result, '$.ingredients_difference'),"\"", ""), ',') as missing_ingredients,
from magazine_recipes_stg_ai.ingredients_difference

Query is running:   0%|          |

In [None]:
# store each combination of the recipe and missing ingredient as a row (to add to quantity table)
%%bigquery
CREATE OR REPLACE TABLE magazine_recipes_stg_ai.missing_ingredients_unnest AS
SELECT
  recipe_id,
  ingredient
FROM
magazine_recipes_stg_ai.missing_ingredients_list
CROSS JOIN
  UNNEST(missing_ingredients) AS ingredient;


Query is running:   0%|          |

In [None]:
%%bigquery
SELECT * FROM magazine_recipes_stg_ai.missing_ingredients_unnest

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,recipe_id,ingredient
0,124,dill cream
1,56,oil
2,56,turmeric
3,116,spinach
4,98,turmeric
...,...,...
218,117,sugar
219,123,broth
220,92,vegetables
221,126,vegetables


In [None]:
# join any ingredients identified that are stored under the exact same name or a plural name with their known ingredient id
%%bigquery
CREATE OR REPLACE TABLE magazine_recipes_stg_ai.missing_ingredients_with_ids AS
SELECT miu.*, i.ingredient_id FROM magazine_recipes_stg_ai.missing_ingredients_unnest AS miu
LEFT JOIN (select ingredient_id, name, CONCAT(name, plural) as pluralname from magazine_recipes_stg.Ingredients) as i
  ON miu.ingredient = i.name
  or miu.ingredient = i.pluralname
WHERE LENGTH(miu.ingredient) > 0

Query is running:   0%|          |

In [None]:
%%bigquery
select * from
magazine_recipes_stg_ai.missing_ingredients_with_ids

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,recipe_id,ingredient,ingredient_id
0,86,[],
1,109,sauce,
2,66,broth,
3,113,dressing,
4,81,cornbread dressing ingredients,
...,...,...,...
192,120,sweet potatoes,2298
193,104,bok choy,251
194,922,peanuts,2559
195,92,peanuts,2559


In [None]:
# for any ingredient that has some overlap in the name but not complete, identify what ingredient in the database already matches that ingredient
%%bigquery
create or replace table magazine_recipes_stg_ai.similar_ingreidents as
WITH row_number as (select mi.recipe_id, mi.ingredient, i.ingredient_id, i.name, ROW_NUMBER() OVER (PARTITION BY mi.ingredient ORDER BY i.name ASC) AS rn
from magazine_recipes_stg.Ingredients i
join magazine_recipes_stg_ai.missing_ingredients_with_ids mi
 on i.name like CONCAT('%', mi.ingredient ,'%')
where mi.ingredient_id is null)
SELECT * FROM row_number
WHERE rn = 1



Query is running:   0%|          |

In [18]:
%%bigquery
select * from magazine_recipes_stg_ai.similar_ingreidents
limit 5

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,recipe_id,ingredient,ingredient_id,name,rn
0,20,figs,1063,dried figs,1
1,39,oil,694,Chinese hot oil,1
2,66,water,4384,boiling salted water,1
3,66,broth,166,beef broth,1
4,69,chicken,265,boneless chicken,1


In [None]:
# update the table of missing ingredients to add the new found ingredient ids
%%bigquery
update magazine_recipes_stg_ai.missing_ingredients_with_ids miid
set ingredient_id = (select ingredient_id from magazine_recipes_stg_ai.similar_ingreidents si where si.recipe_id = miid.recipe_id and si.ingredient = miid.ingredient),
ingredient = (select name from magazine_recipes_stg_ai.similar_ingreidents si where si.recipe_id = miid.recipe_id and si.ingredient = miid.ingredient)
where ingredient_id is null and ingredient IN (SELECT ingredient FROM magazine_recipes_stg_ai.similar_ingreidents)

Query is running:   0%|          |

In [None]:
# examine the table for any missing ingredients without ids
%%bigquery
select * from magazine_recipes_stg_ai.missing_ingredients_with_ids
where ingredient_id is null AND ingredient IS NOT NULL

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,recipe_id,ingredient,ingredient_id
0,86,[],
1,130,[],
2,16,[],
3,3,nachos,
4,51,BBQ sauce,
5,31,Thai sauce,
6,124,dill cream,
7,107,jerk sauce,
8,119,orange sauce,
9,115,lasagna sauce,


In [21]:
# update this table to remove any rows with non-ingredients
%%bigquery
CREATE OR REPLACE TABLE magazine_recipes_stg_ai.missing_ingredients_with_ids as
select * from magazine_recipes_stg_ai.missing_ingredients_with_ids
where ingredient_id is null AND ingredient IS NOT NULL AND ingredient NOT IN ('no difference','[]')

Query is running:   0%|          |

In [22]:
%%bigquery
select * from magazine_recipes_stg_ai.missing_ingredients_with_ids

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,recipe_id,ingredient,ingredient_id
0,107,jerk sauce,
1,113,chicken cutlets,
2,31,Thai sauce,
3,115,lasagna sauce,
4,3,nachos,
5,81,cornbread dressing ingredients,
6,59,root vegetables,
7,51,BBQ sauce,
8,56,vegetable broth,
9,124,dill cream,


In [None]:
# have AI assign a new category to these ingredients (like we did in the former project)
%%bigquery
declare prompt_query STRING default " Identify the best category that matches the ingredient from the list. Return recipe_id, ingredient, and category in output";
CREATE OR REPLACE TABLE magazine_recipes_stg_ai.new_ingredients_with_category AS
select *
from ML.generate_text(
  model remote_models.gemini_pro,
  (
    select concat(prompt_query, to_json_string(json_object("recipe_id",recipe_id, "ingredients", ingredient, "categories", (select string_agg(distinct(category), ', ') from magazine_recipes_stg.Ingredients)))) as prompt
    from magazine_recipes_stg_ai.missing_ingredients_with_ids
    where ingredient_id is null and ingredient NOT IN ('no difference','[]')
  ),
  struct(TRUE as flatten_json_output)
)

Query is running:   0%|          |

In [None]:
%%bigquery
select * from magazine_recipes_stg_ai.new_ingredients_with_category

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,ml_generate_text_llm_result,ml_generate_text_rai_result,ml_generate_text_status,prompt
0,"{""categories"":""soups"",""ingredients"":""vegetable...",,,Identify the best category that matches the i...
1,"{""categories"":""meat/poultry"",""ingredients"":""ch...",,,Identify the best category that matches the i...
2,"{""categories"":""condiments/sauces"",""ingredients...",,,Identify the best category that matches the i...
3,"{""categories"":""sauces and gravies"",""ingredient...",,,Identify the best category that matches the i...
4,"{""categories"":""fresh vegetables"",""ingredients""...",,,Identify the best category that matches the i...
5,"{""categories"":""dairy"",""ingredients"":""dill crea...",,,Identify the best category that matches the i...
6,"{""categories"":""condiments/sauces"",""ingredients...",,,Identify the best category that matches the i...
7,"{""categories"":""breads, bread products"",""ingred...",,,Identify the best category that matches the i...
8,"{""categories"":""breads, bread products"",""ingred...",,,Identify the best category that matches the i...
9,"{""categories"":""condiments/sauces"",""ingredients...",,,Identify the best category that matches the i...


In [None]:
# create a table with all ingredients that will need to be added to the ingredients table with a new unique primary key
%%bigquery
CREATE OR REPLACE TABLE magazine_recipes_stg_ai.new_ingredients AS
select (select max(ingredient_id) from magazine_recipes_stg.Ingredients) + ROW_NUMBER() OVER() as ingredient_id,
json_value(ml_generate_text_llm_result, '$.ingredients') as ingredient,
  json_value(ml_generate_text_llm_result, '$.categories') as category
from magazine_recipes_stg_ai.new_ingredients_with_category

Query is running:   0%|          |

In [None]:
%%bigquery
SELECT * FROM magazine_recipes_stg_ai.new_ingredients

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,ingredient_id,ingredient,category
0,4673,dill cream,dairy
1,4668,vegetable broth,soups
2,4680,nachos,snacks
3,4669,chicken cutlets,meat/poultry
4,4672,root vegetables,fresh vegetables
5,4670,jerk sauce,condiments/sauces
6,4674,BBQ sauce,condiments/sauces
7,4677,Thai sauce,condiments/sauces
8,4679,pasta salad dressing,condiments/sauces
9,4671,orange sauce,sauces and gravies


In [None]:
# insert these records into the Ingredients table
%%bigquery
INSERT INTO magazine_recipes_stg.Ingredients (ingredient_id, category, name, plural, data_source, load_time)
SELECT
  ingredient_id,
  category,
  ingredient,
  NULL AS plural,
  'ai' AS data_source,
  current_timestamp() AS load_time
FROM magazine_recipes_stg_ai.new_ingredients


Query is running:   0%|          |

In [None]:
# update the quantities table to include the ingredients that were mentioned in the directions but are not currently present in the database for a given recipe
%%bigquery
INSERT INTO magazine_recipes_stg.Quantity (quantity_id, recipe_id, ingredient_id, max_qty, min_qty, unit, preparation, optional, data_source, load_time)
SELECT
  (SELECT MAX(quantity_id) FROM magazine_recipes_stg.Quantity) + ROW_NUMBER() OVER() as quantity_id,
  cast(recipe_id as int64) as recipe_id,
  ingredient_id,
  NULL AS max_qty,
  NULL AS min_qty,
  cast(NULL as string) AS unit,
  cast(NULL as string) AS preparation,
  cast(NULL as bool) AS optional,
  'ai' AS data_source,
  current_timestamp() AS load_time
FROM magazine_recipes_stg_ai.missing_ingredients_with_ids
WHERE ingredient_id IS NOT NULL



Query is running:   0%|          |

In [None]:
# Add directions from AI to Recipes table
%%bigquery
UPDATE magazine_recipes_stg.Recipes as r
SET directions = (SELECT recipe_directions FROM magazine_recipes_stg_ai.directions d WHERE CAST(d.recipe_id as INT64) = r.recipe_id)
, data_source = CONCAT(data_source,'-ai')
WHERE recipe_id IN (SELECT CAST(recipe_id AS INT64) FROM magazine_recipes_stg_ai.directions) AND directions IS NULL

Query is running:   0%|          |

## Ease of Prep
We will add information about the ease of prep of a recipe


In [None]:
# determine what the possible options for ease of prep are
%%bigquery
SELECT DISTINCT ease_of_prep from magazine_recipes_stg.Recipes

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,ease_of_prep
0,Super Simple
1,Fairly Easy
2,Average
3,
4,Hard
5,Very Difficult


In [24]:
# Ask the AI to determine an ease of prep
%%bigquery
declare prompt_query STRING default " Identify the Ease of Prep of the recipe, ranging from 'Super Simple', 'Fairly Easy', 'Average', 'Hard' and 'Very Difficult' based on the recipe directions. Return output as json with recipe_id, title, and ease_of_prep)";
select *
from ML.generate_text(
  model remote_models.gemini_pro,
  (
    select concat(prompt_query, to_json_string(json_object("recipe_id", recipe_id, "title", title, "directions", directions ))) as prompt
    from magazine_recipes_stg.Recipes
    WHERE directions is NOT NULL and ease_of_prep is NULL
      limit 10),
  struct(TRUE as flatten_json_output)
);

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,ml_generate_text_llm_result,ml_generate_text_rai_result,ml_generate_text_status,prompt
0,"```json\n{""recipe_id"":758,""title"":""-Strawberri...","[{""category"":1,""probability"":1,""probability_sc...",,"Identify the Ease of Prep of the recipe, rang..."
1,"```json\n{""recipe_id"": 785, ""title"": ""-Mangoes...","[{""category"":1,""probability"":1,""probability_sc...",,"Identify the Ease of Prep of the recipe, rang..."
2,"```json\n{""recipe_id"": 866, ""title"": ""-Cranber...","[{""category"":1,""probability"":1,""probability_sc...",,"Identify the Ease of Prep of the recipe, rang..."
3,"```json\n{""recipe_id"":794,""title"":""-Canned Apr...","[{""category"":1,""probability"":1,""probability_sc...",,"Identify the Ease of Prep of the recipe, rang..."
4,"```json\n{""recipe_id"": 865, ""title"": ""-Limeade...","[{""category"":1,""probability"":1,""probability_sc...",,"Identify the Ease of Prep of the recipe, rang..."
5,"```json\n{""recipe_id"":824,""title"":""-Wild Rice-...","[{""category"":1,""probability"":1,""probability_sc...",,"Identify the Ease of Prep of the recipe, rang..."
6,"```json\n{""recipe_id"": 715, ""title"": ""-Steamed...","[{""category"":1,""probability"":1,""probability_sc...",,"Identify the Ease of Prep of the recipe, rang..."
7,"```json\n{""recipe_id"": 892, ""title"": ""-Dill Pi...","[{""category"":1,""probability"":1,""probability_sc...",,"Identify the Ease of Prep of the recipe, rang..."
8,"```json\n{""recipe_id"": 838, ""title"": ""-Flour T...","[{""category"":1,""probability"":1,""probability_sc...",,"Identify the Ease of Prep of the recipe, rang..."
9,"```json\n{""recipe_id"":807,""title"":""-Canned Asp...","[{""category"":1,""probability"":1,""probability_sc...",,"Identify the Ease of Prep of the recipe, rang..."


In [None]:
# The AI assigned too many super simples, so we will tell it to assign the difficulty based on number of steps
%%bigquery
declare prompt_query STRING default "Assign an Ease of Prep of the recipe, ranging from 'Super Simple', 'Fairly Easy', 'Average', 'Hard' and 'Very Difficult' based on the recipe directions. Directions with one step are fairly easy and directions with more than three steps are very difficult. Return output as json with only recipe_id, title, and ease_of_prep";
create or replace table magazine_recipes_stg_ai.ease_of_prep AS
select *
from ML.generate_text(
  model remote_models.gemini_pro,
  (
    select concat(prompt_query, to_json_string(json_object("recipe_id", recipe_id, "title", title, "directions", directions ))) as prompt
    from magazine_recipes_stg.Recipes
    WHERE directions is NOT NULL
      AND ease_of_prep IS NULL
      LIMIT 200),
  struct(TRUE as flatten_json_output)
);

Query is running:   0%|          |

In [25]:
%%bigquery
select *
from magazine_recipes_stg_ai.ease_of_prep
limit 10

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,ml_generate_text_llm_result,ml_generate_text_rai_result,ml_generate_text_status,prompt
0,,,,"Assign an Ease of Prep of the recipe, ranging..."
1,,,,"Assign an Ease of Prep of the recipe, ranging..."
2,"```json\n[\n {\n ""directions"": ""Heat and s...",,,"Assign an Ease of Prep of the recipe, ranging..."
3,"```json\n[\n {\n ""directions"": ""Mix ingred...",,,"Assign an Ease of Prep of the recipe, ranging..."
4,"```json\n[\n {\n ""directions"": ""Use on sal...",,,"Assign an Ease of Prep of the recipe, ranging..."
5,"```json\n[\n {\n ""recipe_id"": 1025,\n ""...",,,"Assign an Ease of Prep of the recipe, ranging..."
6,"```json\n[\n {\n ""recipe_id"": 1068,\n ""...",,,"Assign an Ease of Prep of the recipe, ranging..."
7,"```json\n[\n {\n ""recipe_id"": 1137,\n ""...",,,"Assign an Ease of Prep of the recipe, ranging..."
8,"```json\n[\n {\n ""recipe_id"": 1153,\n ""...",,,"Assign an Ease of Prep of the recipe, ranging..."
9,"```json\n[\n {\n ""recipe_id"": 1201,\n ""...",,,"Assign an Ease of Prep of the recipe, ranging..."


In [None]:
# these are more varied, so we will store the output
%%bigquery
create or replace table magazine_recipes_stg_ai.ease_of_prep_formatted as
select
  regexp_extract(ml_generate_text_llm_result, r'{[^}]*}') AS ml_generate_text_llm_result
FROM magazine_recipes_stg_ai.ease_of_prep

Query is running:   0%|          |

In [None]:
# create table with the ease of prep field
%%bigquery
CREATE OR REPLACE TABLE magazine_recipes_stg_ai.eop AS
select
cast(json_value(ml_generate_text_llm_result, '$.recipe_id') as int64) as recipe_id,
json_value(ml_generate_text_llm_result, '$.title') as title,
json_value(ml_generate_text_llm_result, '$.ease_of_prep') as ease_of_prep
from magazine_recipes_stg_ai.ease_of_prep_formatted

Query is running:   0%|          |

In [28]:
# examine this table
%%bigquery
SELECT * FROM magazine_recipes_stg_ai.eop
order by recipe_id
limit 10

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,recipe_id,title,ease_of_prep
0,,,
1,,,
2,2.0,Sweet Potato Breakfast Burritos,Super Simple
3,3.0,Spicy Black Bean Nachos,Super Simple
4,75.0,Vegetarian Chili,Fairly Easy
5,138.0,Vegetable Couscous,Super Simple
6,214.0,Raspberry Chiffon Pie,Very Difficult
7,215.0,Apricot Yogurt Parfaits,Fairly Easy
8,216.0,Fresh Apricot Bavarian,Fairly Easy
9,217.0,Fresh Peaches,Fairly Easy


In [None]:
# explore the distribution of ease of prep by the AI- this seems appropriate
%%bigquery
SELECT ease_of_prep, count(*) from magazine_recipes_stg_ai.eop group by ease_of_prep

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,ease_of_prep,f0_
0,,2
1,Hard,146
2,Average,44
3,Fairly Easy,396
4,Super Simple,173
5,Very Difficult,62


In [None]:
# update the recipes table to reflect the new ease of prep identified by the AI
%%bigquery
UPDATE magazine_recipes_stg.Recipes as r
SET ease_of_prep = (SELECT ease_of_prep FROM magazine_recipes_stg_ai.eop e WHERE e.recipe_id = r.recipe_id)
, data_source = CONCAT(data_source,'-ai')
WHERE recipe_id IN (SELECT recipe_id FROM magazine_recipes_stg_ai.eop)

Query is running:   0%|          |