# Dietary Restriction AI

### Import Packages

In [1]:
pip install langchain_community

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install pandas

Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install pyspark

Note: you may need to restart the kernel to use updated packages.


In [4]:
pip install sentence_transformers

Note: you may need to restart the kernel to use updated packages.


In [5]:
pip install tokenizers

Note: you may need to restart the kernel to use updated packages.


In [6]:
pip install faiss-cpu

Note: you may need to restart the kernel to use updated packages.


In [7]:
from langchain_community.llms import Ollama
import os
import json
import pandas as pd 
import numpy as np
import tempfile 
from pyspark.sql import SparkSession

from pyspark.sql import functions as f
from pyspark.sql import Row
from sentence_transformers import SentenceTransformer
import faiss
import ollama

import re

### Initialize a Spark Session

In [8]:
spark = SparkSession.builder \
    .appName("Local Spark") \
    .config("spark.driver.memory", "4g") \
    .config("spark.executor.memory", "4g") \
    .getOrCreate()

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/04/07 19:31:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


### Data Ingestion
In order to be compatible with LLM that we will be creating, the data needs to be processed to be in an efficient retrieval format and stored in a searchable index. 

##### Recipe Data
Our recipe data is sourced from web-scraped data containing

In [9]:
all_files = os.listdir("./")
recipe_files = [file for file in all_files if "recipes_raw_nosource" in file]

df_list = []

for file_name in recipe_files: 
    temp_path = os.path.join(tempfile.gettempdir(), file_name)
    
    with open(file_name, "r", encoding="utf-8") as file:
        data = json.load(file)

    file_df = pd.DataFrame.from_dict(data, orient="index")
    df_list.append(file_df)  # Collect DataFrame
    
# Concatenate all dataframes
recipes_df = pd.concat(df_list)

# select only title, ingredient, instructions columns
recipes_df = recipes_df[['title', 'ingredients', 'instructions']]

# repartition the dataframe 
recipes_df = spark.createDataFrame(recipes_df)
recipes_df = recipes_df.repartition(100)

recipes_df.show()

25/04/07 19:31:34 WARN TaskSetManager: Stage 0 contains a task of very large size (16301 KiB). The maximum recommended task size is 1000 KiB.
                                                                                

+--------------------+--------------------+--------------------+
|               title|         ingredients|        instructions|
+--------------------+--------------------+--------------------+
|  Baked Greens Chips|[6 to 8 ounces he...|Watch how to make...|
|Sweet Potato-Chic...|[2 large sweet po...|To prepare the ha...|
|         Cali Burger|[1/4 cup mayonnai...|For the chipotle ...|
|Oatmeal Cream Che...|[2 sticks unsalte...|Preheat the oven ...|
|    Campari Spritzer|[1 (12-ounce) can...|Stir the orange j...|
|Seared Rack of La...|[1/2 cup pistachi...|Watch how to make...|
|         Cream Puffs|[6 tablespoons un...|Special equipment...|
|Italian Style Hot...|[Cooking oil, sui...|In a saucepan ove...|
|    Crab Cakes Salad|[2 tablespoons fi...|For the salad: Co...|
|      Hot Cross Buns|[2 ounces fresh y...|Crumble the yeast...|
|Strawberries with...|[4 pints (8 cups)...|Thirty minutes to...|
|       Curry Chicken|[2 pounds chicken...|Put the sliced ch...|
|Golden Squash Blo...|[1 

##### Cooking Literature Data

The cooking literature data was pre-processed from PDF text files into a usable format in another notebook.

### Data Chunking
##### Recipes Data
The data was chunked into recipe-level chunks, since the recipes will then be able toi be referenced individually when needed. Since this use case is about modifying recipes in their entirety, we want the model to be able to reference the recipes in their entirety during its retrieval process. 

In [10]:
recipes_df_chunk = recipes_df.withColumn("chunk_text", 
                                         f.concat_ws("\n", f.col("title"), f.col("ingredients"), f.col("instructions")))
recipes_df_chunk.show()

25/04/07 19:31:36 WARN TaskSetManager: Stage 3 contains a task of very large size (16301 KiB). The maximum recommended task size is 1000 KiB.


+--------------------+--------------------+--------------------+--------------------+
|               title|         ingredients|        instructions|          chunk_text|
+--------------------+--------------------+--------------------+--------------------+
|  Baked Greens Chips|[6 to 8 ounces he...|Watch how to make...|Baked Greens Chip...|
|Sweet Potato-Chic...|[2 large sweet po...|To prepare the ha...|Sweet Potato-Chic...|
|         Cali Burger|[1/4 cup mayonnai...|For the chipotle ...|Cali Burger\n1/4 ...|
|Oatmeal Cream Che...|[2 sticks unsalte...|Preheat the oven ...|Oatmeal Cream Che...|
|    Campari Spritzer|[1 (12-ounce) can...|Stir the orange j...|Campari Spritzer\...|
|Seared Rack of La...|[1/2 cup pistachi...|Watch how to make...|Seared Rack of La...|
|         Cream Puffs|[6 tablespoons un...|Special equipment...|Cream Puffs\n6 ta...|
|Italian Style Hot...|[Cooking oil, sui...|In a saucepan ove...|Italian Style Hot...|
|    Crab Cakes Salad|[2 tablespoons fi...|For the sal

                                                                                

In [11]:
recipes_df = recipes_df_chunk.select("chunk_text").withColumnRenamed("chunk_text", "recipe_text")
recipes_df.show()

25/04/07 19:31:36 WARN TaskSetManager: Stage 6 contains a task of very large size (16301 KiB). The maximum recommended task size is 1000 KiB.


+--------------------+
|         recipe_text|
+--------------------+
|Baked Greens Chip...|
|Sweet Potato-Chic...|
|Cali Burger\n1/4 ...|
|Oatmeal Cream Che...|
|Campari Spritzer\...|
|Seared Rack of La...|
|Cream Puffs\n6 ta...|
|Italian Style Hot...|
|Crab Cakes Salad\...|
|Hot Cross Buns\n2...|
|Strawberries with...|
|Curry Chicken\n2 ...|
|Golden Squash Blo...|
|Mexican Hot Choco...|
|Georgian Short Ri...|
|Chicken Milanese\...|
|Citrus Marinated ...|
|Beet-and-Potato L...|
|Green Tea and Gin...|
|Jalapeno Cheese B...|
+--------------------+
only showing top 20 rows



### Generate Embeddings

In [12]:
# Load chunked JSON data
with open("chunked_data.json", "r", encoding="utf-8") as f:
    chunked_data = json.load(f)

# load model and generate embeddings
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = np.array([embedding_model.encode(chunk["body"]) for chunk in chunked_data], dtype=np.float32)

# Store embeddings in chunked JSON
for i, chunk in enumerate(chunked_data):
    chunk["embedding"] = embeddings[i].tolist()
    
# Create and save FAISS index
embedding_dim = embeddings.shape[1]
index = faiss.IndexFlatL2(embedding_dim)
index.add(embeddings)

### Model Ingestion

##### Prompt Engineering
The prompt inputted by the user should only need to contain the necessary recipe that the user wants to modify. The following prompt engineering code adds additional, consistent language that does the following: 
- Specifies that the user wants to modify the recipe, retaining the original intention
- Provides the dietary framework to stick to, in this case the high-protein low-carb diet. In another phase of development, this could be changed to xspecify a diet of choice
- Requests a list of macronutrients based on the data

##### RAG Component

In [13]:
sampled_recipes = (
    recipes_df.sample(withReplacement=False, fraction=0.1, seed=42)
    .limit(100)  # ensure exactly 100
    .select("recipe_text")
    .toLocalIterator()
)

25/04/07 19:34:36 WARN TaskSetManager: Stage 9 contains a task of very large size (16301 KiB). The maximum recommended task size is 1000 KiB.
[Stage 9:>                                                        (0 + 10) / 10]

In [14]:
def retrieve_relevant_chunks(query, k=5):
    """Retrieve top-k most relevant chunks using FAISS."""
    query_embedding = embedding_model.encode(query).reshape(1, -1)  # Convert query to embedding
    distances, indices = index.search(query_embedding, k)  # Retrieve top-k chunks

    return [chunked_data[i] for i in indices[0]]  # Get original text chunks

def query_ollama_with_context(query):
    """Retrieve relevant context and query Ollama 3.2."""
    retrieved_chunks = retrieve_relevant_chunks(query)
    context = "\n".join([chunk["body"] for chunk in retrieved_chunks])  # Combine relevant chunks

    # Formulate prompt for LLaMA
    prompt = f"Context:\n{context}\n\nQuery: {query}\nAnswer:"

    # Query Ollama
    response = ollama.chat(model="llama3.2", messages=[{"role": "user", "content": prompt}])
    return response["message"]["content"]

In [15]:
# if __name__ == "__main__":
#     query = input("Enter your recipe: ")
#     query += " Modify this recipe so that it is more suited for a high-protein, low carb diet. Provide a list of macronutrients as a part of the analysis."
#     answer = query_ollama_with_context(query)
#     print("\nOllama's Answer:", answer)

In [16]:
results = []

for line_num, row in enumerate(sampled_recipes):
    raw_query = row.recipe_text
    query = f""" {raw_query}
     Modify this recipe so that it is more suited for a high-protein, low carb diet.
     Provide a list of macronutrients as a part of the analysis.
     Make sure the macronutrients are listed as single numbers, not a range or individual value.
     They should be able to be extracted using these regex patterns:
    
        protein_pattern = r"Protein:\\s*(\\d+\\.?\\d*)g"
        fat_pattern = r"Fat:\\s*(\\d+\\.?\\d*)g"
        carbs_pattern = r"Carbohydrates:\\s*(\\d+\\.?\\d*)g"
    """
    
    try:
        answer = query_ollama_with_context(query)
        results.append({
            "original_recipe": raw_query,
            "modified_response": answer
        })
        print(f"Completed line {line_num}")
    except Exception as e:
        results.append({
            "original_recipe": raw_query,
            "modified_response": f"Error: {e}"
        })

results_df = spark.createDataFrame([Row(**r) for r in results])
results_df.show(truncate=False)

Completed line 0
Completed line 1
Completed line 2
Completed line 3
Completed line 4
Completed line 5
Completed line 6
Completed line 7
Completed line 8
Completed line 9
Completed line 10
Completed line 11
Completed line 12
Completed line 13
Completed line 14
Completed line 15
Completed line 16
Completed line 17
Completed line 18
Completed line 19
Completed line 20
Completed line 21
Completed line 22
Completed line 23
Completed line 24
Completed line 25
Completed line 26
Completed line 27
Completed line 28
Completed line 29
Completed line 30
Completed line 31
Completed line 32
Completed line 33
Completed line 34
Completed line 35
Completed line 36
Completed line 37
Completed line 38
Completed line 39
Completed line 40
Completed line 41
Completed line 42
Completed line 43
Completed line 44
Completed line 45
Completed line 46
Completed line 47
Completed line 48
Completed line 49
Completed line 50
Completed line 51
Completed line 52
Completed line 53
Completed line 54
Completed line 55
Co

                                                                                

In [17]:
all_results = results_df

In [None]:
all_results = all_results.union(results_df)

In [22]:
results_df.show()

+--------------------+--------------------+
|     original_recipe|   modified_response|
+--------------------+--------------------+
|Italian Style Hot...|To modify the rec...|
|Citrus Marinated ...|To modify the Cit...|
|Green Tea and Gin...|To modify the Gre...|
|Spicy Fried Oyste...|To modify the Spi...|
|Tres Leches\n2 cu...|To modify the Tre...|
|Table of Polenta\...|To modify the rec...|
|Hot Fudge Sauce\n...|To modify the Hot...|
|Tomato Salsa\n2 r...|To modify the Tom...|
|Raspberry Lemonad...|To modify the Ras...|
|Peppermint Red Ve...|To modify the Pep...|
|Chocolate Creme F...|To modify the rec...|
|Apricot Clafouti\...|To modify the Apr...|
|Crispy Szechuan-S...|To modify the Cri...|
|Dewey's Albuquerq...|To modify the Dew...|
|Sweet-and-Sour Ca...|To modify the Swe...|
|Chocolate Chip Ca...|To modify the Cho...|
|Fruity Gum Flower...|To modify the rec...|
|Gingerbread House...|To modify the gin...|
|Surf and Turf Rib...|To modify this re...|
|Corn Doggies\n2 1...|To modify 

In [32]:
# 1. Save as a single Parquet file
results_df.coalesce(1).write.parquet("temp_parquet_output", mode="overwrite")

# 2. Move the part file to a final .parquet file
import os
import shutil

temp_folder = "temp_parquet_output"
final_file = "results_output.parquet"

# Find the part file and move it
for fname in os.listdir(temp_folder):
    if fname.startswith("part-") and fname.endswith(".parquet"):
        shutil.move(os.path.join(temp_folder, fname), final_file)
        break

# Optional: clean up
shutil.rmtree(temp_folder)

print(f"✅ DataFrame saved as '{final_file}' in: {os.getcwd()}")


✅ DataFrame saved as 'results_output.parquet' in: /Users/jenny/ollama-env


### Recipe Evaluator

In [24]:
def extract_macronutrients(recipe_text):
    # Define a regular expression pattern to search for protein, fat, and carbs
    protein_pattern = r"Protein:\s*(\d+\.?\d*)g"
    fat_pattern = r"Fat:\s*(\d+\.?\d*)g"
    carbs_pattern = r"Carbohydrates:\s*(\d+\.?\d*)g"
    
    # Search the text for the respective macronutrients
    protein = re.search(protein_pattern, recipe_text)
    fat = re.search(fat_pattern, recipe_text)
    carbs = re.search(carbs_pattern, recipe_text)
    
    # Extract the values if found
    protein_value = float(protein.group(1)) if protein else None
    fat_value = float(fat.group(1)) if fat else None
    carbs_value = float(carbs.group(1)) if carbs else None
    
    # Return the extracted values
    return {
        "P": protein_value,
        "F": fat_value,
        "C": carbs_value
    }

In [25]:
macros = extract_macronutrients(answer) 
print(macros)

{'P': 22.0, 'F': 24.0, 'C': 10.0}


In [26]:
def evaluate_recipe(protein_g, fat_g, carb_g):
    # Caloric values per gram
    PROTEIN_CAL = 4
    CARB_CAL = 4
    FAT_CAL = 9
    
    # Calculate total calories
    total_calories = (protein_g * PROTEIN_CAL) + (fat_g * FAT_CAL) + (carb_g * CARB_CAL)
    
    if total_calories == 0:
        return "Invalid recipe: Total calories cannot be zero."
    
    # Calculate macronutrient percentage
    protein_pct = (protein_g * PROTEIN_CAL / total_calories) * 100
    fat_pct = (fat_g * FAT_CAL / total_calories) * 100
    carb_pct = (carb_g * CARB_CAL / total_calories) * 100
    
    # Define healthy ranges
    protein_range = (10, 30)
    fat_range = (20, 35)
    carb_range = (45, 65)
    
    # Check if recipe meets healthy criteria
    if (protein_range[0] <= protein_pct <= protein_range[1] and
        fat_range[0] <= fat_pct <= fat_range[1] and
        carb_range[0] <= carb_pct <= carb_range[1]):
        return "Meets Criteria"
    else:
        return "Does Not Meet Criteria"
# Example usage
recipe_result = evaluate_recipe(protein_g=35, fat_g=20, carb_g=10)
print(recipe_result)

Does Not Meet Criteria


In [27]:
print(evaluate_recipe(macros.get("P"), macros.get("F"), macros.get("C")))

Does Not Meet Criteria


In [30]:
for recipe in results: 
    print(recipe["modified_response"])

To modify the Italian Style Hot Dogs recipe to be more suitable for a high-protein, low-carb diet, we can make the following changes:

1. Reduce the amount of bread used:
	* Instead of using a full medium round loaf of bread, use a smaller portion, such as 2-3 slices or 4-6 slices of bread.
	* Consider using low-carb alternatives like lettuce wraps, portobello mushroom caps, or low-carb buns made from almond flour or coconut flour.
2. Increase the protein content:
	* Add more hot dogs to the recipe, such as 4-5 hot dogs instead of 4.
	* Consider adding other high-protein toppings, such as grilled chicken breast, steak, or bacon bits.
3. Reduce the carb content:
	* Use less potato rounds in the recipe. Instead of using about half a potato worth of rounds, use only about 1/2 cup to 1 cup of sliced potatoes.
	* Consider adding more vegetables like onions, peppers, and mushrooms, which are lower in carbs than potatoes.

Here is the modified recipe:

Ingredients:

* 4-5 all-beef hot dogs
* 