# Notebook 06 - Ontology Alignment and Normalization

This notebook introduces an ontology-based normalization layer for ingredient and product names.

Goals:
- Map noisy product and ingredient terms to standardized food categories
- Facilitate fallback matching and substitutions by aligning concepts (e.g., “volle yoghurt” -> “yogurt”)
- Prepare enriched data for concept-aware semantic similarity, filtering, and box planning

**Input:**
- `products_semantic_ready.csv`
- `recipes_semantic_ready.csv`

**Output:**
- `products_with_ontology.csv`
- `recipes_with_ontology.csv`


In [5]:
import pandas as pd
import os

input_folder = "cleaned_data"

# Load preprocessed datasets
df_products = pd.read_csv(os.path.join(input_folder, "products_semantic_ready.csv"))
df_recipes = pd.read_csv(os.path.join(input_folder, "recipes_semantic_ready.csv"))

print("Products:", df_products.shape)
print("Recipes:", df_recipes.shape)


Products: (126919, 31)
Recipes: (6, 5)


## Ontology Mapping

We define a lightweight mapping of noisy food descriptions to canonical food categories. This helps reduce vocabulary mismatch across datasets.


In [9]:
# Improved manual ontology dictionary (expandable)
ontology = {
    # Fruit
    "aardbeien": "strawberries",
    "strawberries": "strawberries",
    "bananen": "banana",
    "banana": "banana",

    # Dairy
    "volle yoghurt": "yogurt",
    "magere yoghurt": "yogurt",
    "roeryoghurt": "yogurt",
    "kwark aardbei": "yogurt",
    "yogurt": "yogurt",

    # Sweeteners
    "bloemenhoning": "honey",
    "mosterd honing": "honey",
    "honey": "honey",

    # Vegetables
    "tomaten": "tomato",
    "tomatengroentesoep": "tomato",
    "tomato": "tomato",

    # Fish
    "tonijn": "tuna",
    "tuna": "tuna",

    # Noisy fallbacks
    "wolkentoetje banaan": "banana",
}

def map_to_ontology(text, mapping):
    if isinstance(text, str):
        return mapping.get(text.lower().strip(), None)
    return None

# Apply ontology to both Dutch and English fields
df_products["product_concept"] = df_products["product_en"].apply(lambda x: map_to_ontology(x, ontology))
df_products["product_concept"].fillna(
    df_products["product_name_clean"].apply(lambda x: map_to_ontology(x, ontology)),
    inplace=True
)

df_recipes["ingredient_concept"] = df_recipes["ingredient_en"].apply(lambda x: map_to_ontology(x, ontology))
df_recipes["ingredient_concept"].fillna(
    df_recipes["ingredient"].apply(lambda x: map_to_ontology(x, ontology)),
    inplace=True
)

# Show sample results
display(df_products[["product_en", "product_name_clean", "product_concept"]].dropna().drop_duplicates().head(10))
display(df_recipes[["ingredient", "ingredient_en", "ingredient_concept"]])


Unnamed: 0,product_en,product_name_clean,product_concept
8058,roeryoghurt,roeryoghurt,yogurt
22210,volle yoghurt,volle yoghurt,yogurt
66841,kwark aardbei,kwark aardbei,yogurt
119042,mosterd honing,mosterd honing,honey
123879,aardbeien,aardbeien,strawberries
124840,magere yoghurt,magere yoghurt,yogurt
126126,wolkentoetje banaan,wolkentoetje banaan,banana
126530,tomatengroentesoep,tomaten-groentesoep,tomato


Unnamed: 0,ingredient,ingredient_en,ingredient_concept
0,strawberries,strawberries,strawberries
1,banana,banana,banana
2,yogurt,yogurt,yogurt
3,honey,honey,honey
4,tomato,tomato,tomato
5,tuna,tuna,tuna


In [10]:
# Save enriched outputs
products_out = os.path.join(input_folder, "products_with_ontology.csv")
recipes_out = os.path.join(input_folder, "recipes_with_ontology.csv")

df_products.to_csv(products_out, index=False)
df_recipes.to_csv(recipes_out, index=False)

print("-> Saved products to:", products_out)
print("-> Saved recipes to:", recipes_out)


-> Saved products to: cleaned_data\products_with_ontology.csv
-> Saved recipes to: cleaned_data\recipes_with_ontology.csv


## Summary

This notebook added a normalized `product_concept` and `ingredient_concept` field to both datasets using a simple ontology.

This layer:
- Standardizes naming across noisy real-world product data
- Enables concept-level matching and substitutions
- Prepares the foundation for more advanced fallback logic and waste-aware matching

Next: we’ll use these concepts for filtering and prioritization of meal kits per store.
