Please download the Recipe NLG dataset & rules_recipe_scale.csv and put it under the DATA_DIR variable.

In [2]:
# %pip install pandas gensim openai

In [1]:
import pandas as pd
import helper 
from gensim.parsing.preprocessing import preprocess_string

DATA_DIR = '../dataset'

rules, to_be_joined, extracted_rules = helper.load_required_data(DATA_DIR)

Starting to load rule data
Rule data loaded...

Starting rule extraction...
	 -> Starting to sort rules by lift
	 -> Done sorting rules...
______________________________
	 -> Starting RegEx pattern creation
	 -> Done creating RegEx patterns...


In [2]:
# sample random 100 recipes from the dataset under dataset/full_dataset.csv
seed = 1010
sample_recipes = pd.read_csv(f'{DATA_DIR}/full_dataset.csv').sample(100, random_state=seed)
sample_recipes['directions'] = sample_recipes['directions'].apply(lambda x: eval(x))
# drop recipes that have directions with less than 125 characters in total
sample_recipes['directions_length'] = sample_recipes['directions'].apply(lambda x: len(' '.join(x)))
sample_recipes = sample_recipes[sample_recipes['directions_length'] > 125]

sample_recipes['preprocessed'] = sample_recipes['directions'].apply(lambda x: preprocess_string(' '.join(x)))

In [None]:
recipe = sample_recipes.iloc[0]
print(recipe['directions'])
fulfilled_rules, suggestions = helper.extract_rules(recipe['preprocessed'], extracted_rules)
print(fulfilled_rules)

['Combine sugar and orange rind.', 'Separate biscuits.', 'Dip each in butter and coat with sugar mixture.', 'Stand biscuits on side, overlapping edges in a 9-inch tube pan.', 'Bake at 350° for 30 minutes.']
{frozenset({'stand', 'bake', 'inch'}), frozenset({'bake', 'stand'}), frozenset({'minut', 'bake', 'stand'})}


In [None]:
# Now we'll ask ChatGPT to improve the directions with the suggestions we have extracted above

# Create the text to send to GPT
prompt = helper.create_prompt(
    recipe['title'],
    recipe['directions'],
    fulfilled_rules,
    suggestions
)

print(prompt)


    The below recipe is for Orange Breakfast Ring. 
    The original directions are as follows:
    (Combine sugar and orange rind. Separate biscuits. Dip each in butter and coat with sugar mixture. Stand biscuits on side, overlapping edges in a 9-inch tube pan. Bake at 350° for 30 minutes.)
    Some of the rules that are fulfilled by this recipe are:
    {frozenset({'stand', 'bake', 'inch'}), frozenset({'bake', 'stand'}), frozenset({'minut', 'bake', 'stand'})}
    The new rules to be fulfilled are:
    ["frozenset({'let', 'egg', 'minut'})", "frozenset({'oven', 'let'})", "frozenset({'preheat', 'let'})"]
    


In [5]:
resp = helper.prompt_gpt(prompt=prompt, print_response=True)

New recipe:

Orange Breakfast Ring

Ingredients:
- Sugar
- Orange rind
- Biscuits
- Butter

Directions:
1. Preheat the oven to the desired temperature.
2. Combine sugar and orange rind in a bowl.
3. Separate the biscuits.
4. Dip each biscuit in butter and coat with the sugar mixture.
5. Stand the biscuits on their side, overlapping the edges, in a 9-inch tube pan.
6. Bake in the preheated oven for the specified time.
7. Let the breakfast ring cool for a few minutes.
8. Serve and enjoy!



________
Explanation:

To fulfill the new rules, I made the following changes to the original recipe:
1. Added a step to preheat the oven, fulfilling the rule "frozenset({'preheat', 'let'})".
2. Added a step to let the breakfast ring cool, fulfilling the rule "frozenset({'let', 'egg', 'minut'})".
3. No changes were made to fulfill the rule "frozenset({'oven', 'let'})" as it was already fulfilled in the original recipe.

These changes ensure that all of the new rules are fulfilled without removing any 

In [6]:
helper.calculate_similarity(original_recipe=recipe['preprocessed'], gpt_response=resp)

(1.0, 0.5454545454545454)

In [7]:
original_recipe = recipe['preprocessed']
# First grab from the first choice the message
new_recipe = resp.choices[0].message.content
# The recipe written by chat gpt should be between <RECIPE> and </RECIPE> tags
# Split on the tags and grab the second element
new_recipe = new_recipe.split('<RECIPE>')[1].split('</RECIPE>')[0]
# Split the recipe into tokens
new_recipe = preprocess_string(new_recipe)
# print the tokens in the new recipe that are not in the original recipe
print(set(new_recipe) - set(original_recipe))
# the other way around
print(set(original_recipe) - set(new_recipe))

{'direct', 'desir', 'preheat', 'let', 'ingredi', 'specifi', 'enjoi', 'cool', 'ring', 'serv', 'temperatur', 'breakfast', 'oven', 'time', 'bowl'}
set()


In [11]:
# Let's repeat this experiment for all the recipes in the sampled dataset,
# Save the results in a dataframe and plot the results
# The DataFrame will contain the following columns:
# - original_recipe: the original recipe
# - new_recipe: the recipe generated by chat gpt
# - original_in_new
# - new_in_original
# the value for the last 2 columns are returned by the helper.calculate_similarity function
from tqdm.notebook import tqdm
results = []

for _, recipe in tqdm(sample_recipes.iterrows(), total=sample_recipes.shape[0]):
    # Generate the prompt
    fulfilled_rules, suggestions = helper.extract_rules(recipe['preprocessed'], extracted_rules)
    prompt = helper.create_prompt(
        recipe['title'],
        recipe['directions'],
        fulfilled_rules,
        suggestions
    )
    # Send the prompt to GPT
    resp = helper.prompt_gpt(prompt=prompt, print_response=False)
    # Calculate the similarity
    orig_in_new, new_in_orig = helper.calculate_similarity(original_recipe=recipe['preprocessed'], gpt_response=resp)
    # Save the results
    results.append({
        'original_recipe': recipe['directions'],
        'new_recipe': resp.choices[0].message.content,
        'original_in_new': orig_in_new,
        'new_in_original': new_in_orig
    })

  0%|          | 0/89 [00:00<?, ?it/s]

In [18]:
# save results to a csv file
results_df = pd.DataFrame(results)
results_df.to_csv('rule_based_results.csv', index=False)

In [19]:
results_df

Unnamed: 0,original_recipe,new_recipe,original_in_new,new_in_original
0,"[combin, sugar, orang, rind, separ, biscuit, d...",<RECIPE>\nOrange Breakfast Ring\n\nIngredients...,1.000000,0.418605
1,"[preheat, oven, degre, prepar, pan, line, parc...",<RECIPE>\n(Preheat oven to 350 degrees. Prepar...,1.000000,1.000000
2,"[brown, pepper, onion, garlic, add, chicken, b...",<RECIPE>\nLayered Chicken Enchiladas\n\nIngred...,1.000000,0.837209
3,"[preheat, oven, degre, heat, tablespoon, oil, ...",<RECIPE>\nPreheat oven to 350 degrees F. Heat ...,1.000000,0.983871
4,"[cake, posit, rack, center, oven, preheat, deg...",<RECIPE>\nCake: Position rack in center of you...,1.000000,1.000000
...,...,...,...,...
84,"[combin, pineappl, ic, milk, orang, juic, suga...",<RECIPE>\nCuban Batido Recipe:\n\nIngredients:...,0.882353,0.517241
85,"[preheat, oven, degre, degre, greas, flour, in...",<RECIPE>\nPreheat an oven to 325 degrees F (16...,1.000000,0.854839
86,"[cucumb, boil, water, vinegar, mixtur, pour, c...",<RECIPE>\nPastor Eddie'S Granny Grace'S Dill P...,1.000000,0.314286
87,"[potato, pot, gener, salt, water, bring, boil,...",<RECIPE>\nGrilled Potatoes with Mustard-Garlic...,1.000000,0.911111


<h3><font color="#f0c6c6">Below we try some different prompts</font></h3>

<font color="#a5adcb"> <i> inspired by https://www.promptingguide.ai </i> </font>

<p color="#cad3f5">
Some notes taken from the website: 
-   The general recommendation is to alter temperature or top_p, not both.
-   <font color="#a5adcb"><b>zero-shot prompting</b></font>, i.e., you are directly prompting the model for a response without any examples or demonstrations about the task you want it to achieve. Some large language models do have the ability to perform zero-shot prompting but it depends on the complexity and knowledge of the task at hand.
</p>

<h3> <font color="#f0c6c6">How to construct a prompt?</font></h3>
<font color="#cad3f5"> A <font color="#a5adcb"><i>prompt</i></font> contains any of the following elements: </font>

- <font color="#b7bdf8">Instruction</font> -> <font color="#cad3f5"> a specific task or instruction you want the model to perform</font>

- <font color="#b7bdf8">Context</font> -> <font color="#cad3f5"> external information or additional context that can steer the model to better responses</font>

- <font color="#b7bdf8">Input Data</font> -> <font color="#cad3f5"> the input or question that we are interested to find a response for</font>

- <font color="#b7bdf8">Output Indicator</font> -> <font color="#cad3f5"> the type or format of the output.</font>

<h3> <font color="#f0c6c6"> What to watchout for? </font></h3>

- <font color="#b7bdf8"> Specificity: </font> <font color="#cad3f5"> Be very specific about the instruction and task you want the model to perform. </font>

- <font color="#b7bdf8"> Avoid Impreciseness: </font> <font color="#cad3f5"> The analogy here is very similar to effective communication -- the more direct, the more effective the message gets across.</font>

- <font color="#b7bdf8"> Avoid saying what not to do: </font> <font color="#cad3f5"> Another common tip when designing prompts is to avoid saying what not to do but say what to do instead. </font>

In [4]:
# Try it for all the recipes in the dataset
from functools import partial
# Below takes a really long time to execute, so we'll use multiprocessing to speed things up
# First get the number of CPU cores
import multiprocessing
import numpy as np
num_cores = multiprocessing.cpu_count()//2 + 1
# Now divide the recipes into equal chunks to be processed by each core
chunks = np.array_split(sample_recipes, num_cores)
# Create a pool of workers
pool = multiprocessing.Pool(num_cores)
fn = partial(helper.pipeline_chunk, extracted_rules=extracted_rules)
results = pool.map(fn, chunks)

  return bound(*args, **kwds)


In [16]:
# currently results is a list of lists, we need to flatten it
flattened_results = [item for sublist in results for item in sublist]
# sum result lengths
assert len(flattened_results) == sample_recipes.shape[0]

In [19]:
# Write the results into a file
results_df = pd.DataFrame(flattened_results)
results_df.to_csv('prompt2_results.csv', index=False)

Below we try our <font color="#b7bdf8">3rd</font> method of prompting

In [15]:
# Try it for all the recipes in the dataset
from functools import partial
# Below takes a really long time to execute, so we'll use multiprocessing to speed things up
# First get the number of CPU cores
import multiprocessing
import numpy as np
num_cores = multiprocessing.cpu_count()//2 + 1
# Now divide the recipes into equal chunks to be processed by each core
chunks = np.array_split(sample_recipes, num_cores)
# Create a pool of workers
pool = multiprocessing.Pool(num_cores)
fn = partial(helper.pipeline_chunk, extracted_rules=extracted_rules, prompt_function=helper.prompt_gpt_3)
results3 = pool.map(fn, chunks)

  return bound(*args, **kwds)


In [16]:
# write results to a file
results3_df = pd.DataFrame(results3)
results3_df.to_csv('gpt3_results.csv', index=False)

<h3><font color="#f0c6c6">Below we try some few shot prompting</font></h3>

<font color="#a5adcb"> <i> inspired by https://www.promptingguide.ai </i> </font>

<p color="#cad3f5">
Few shot examples are the examples generated by GPT4 that we liked
</p>

In [38]:
recipe_to_try = sample_recipes.iloc[4]

In [39]:
fulfilled_rules, suggestions = helper.extract_rules(recipe_to_try['preprocessed'], extracted_rules)

In [40]:
prompt = helper.create_fewshot_prompt(
    recipe_to_try['title'],
    recipe_to_try['directions'],
    fulfilled_rules,
    suggestions
)

In [41]:
resp = helper.prompt_few_shot(
    prompt=prompt,
    print_response=False,
)

In [42]:
helper._print_response(resp.choices[0].message.content)

New recipe:

Ingredients:
- Cake:
  - Cocoa powder
  - Semisweet chocolate
  - Boiling water
  - Buttermilk
  - Flour
  - Baking soda
  - Salt
  - Brown sugar
  - Butter
  - Eggs
  - Vanilla extract
- Icing:
  - Cream cheese
  - Butter
  - Powdered sugar
  - Vanilla extract
  - Salt
- Strawberries
- Granulated sugar

Instructions:
1. Cake:
   - Position rack in center of your oven and preheat to 350 degrees F. Grease and flour two 8" cake pans, lining the bottom of the pans with parchment paper and greasing that as well.
   - Combine cocoa powder and semisweet chocolate in a medium bowl. Pour 1/2 cup boiling water over cocoa powder and chocolate and whisk until smooth. Whisk in buttermilk. Set mixture aside.
   - Whisk flour, baking soda, and salt in another medium bowl.
   - Using an electric mixer, beat brown sugar and butter in a large bowl until well blended. Add eggs and vanilla and beat until light and creamy. Beat in dry ingredients and chocolate mixture.
   - Transfer to prepar