# Notebook 09 - Prioritize Waste-Aware Matching

This notebook integrates waste and markdown data—enriched with product concepts—into the product dataset. Each product is flagged with a `waste_flag` or `markdown_flag` if it appears in recent waste or discount files, using a shared food ontology.

Input files:
- `products_semantic_ready.csv`
- `waste_with_concept.csv`
- `markdown_with_concept.csv`

Output:
- `products_with_priority.csv`


In [1]:
import pandas as pd
import os

# Paths
input_folder = "cleaned_data"
output_folder = "cleaned_data"

products_file = os.path.join(input_folder, "products_semantic_ready.csv")
waste_file = os.path.join(input_folder, "waste_with_concept.csv")
markdown_file = os.path.join(input_folder, "markdown_with_concept.csv")


In [8]:
df_products = pd.read_csv(os.path.join(input_folder, "products_with_ontology.csv"))
df_waste = pd.read_csv(os.path.join(input_folder, "waste_with_concept.csv"))
df_markdown = pd.read_csv(os.path.join(input_folder, "markdown_with_concept.csv"))

print("Products:", df_products.shape)
print("Waste:", df_waste.shape)
print("Markdown:", df_markdown.shape)


Products: (126919, 32)
Waste: (18382, 16)
Markdown: (5605, 9)


  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


In [9]:
# Extract product name from column 'Unnamed: 4'
df_markdown["product_name_clean"] = df_markdown["Unnamed: 4"].astype(str).str.strip().str.lower()


In [10]:
# Canonical mapping shared with other notebooks
ontology = {
    "aardbeien": "strawberries", "strawberries": "strawberries",
    "bananen": "banana", "banana": "banana",
    "volle yoghurt": "yogurt", "magere yoghurt": "yogurt",
    "roeryoghurt": "yogurt", "kwark aardbei": "yogurt", "yogurt": "yogurt",
    "bloemenhoning": "honey", "mosterd honing": "honey", "honey": "honey",
    "tomaten": "tomato", "tomatengroentesoep": "tomato", "tomato": "tomato",
    "tonijn": "tuna", "tuna": "tuna",
    "wolkentoetje banaan": "banana"
}

def map_to_ontology(text, mapping):
    if isinstance(text, str):
        return mapping.get(text.lower().strip(), None)
    return None


In [11]:
df_markdown["product_concept"] = df_markdown["product_name_clean"].apply(lambda x: map_to_ontology(x, ontology))


In [12]:
# Drop NAs in concept columns
waste_concepts = df_waste["product_concept"].dropna().unique()
markdown_concepts = df_markdown["product_concept"].dropna().unique()


In [13]:
df_products["waste_flag"] = df_products["product_concept"].isin(waste_concepts).astype(int)
df_products["markdown_flag"] = df_products["product_concept"].isin(markdown_concepts).astype(int)

# Preview result
display(df_products[["product_concept", "waste_flag", "markdown_flag"]].drop_duplicates().head(10))


Unnamed: 0,product_concept,waste_flag,markdown_flag
0,,0,0
8058,yogurt,1,0
119042,honey,1,0
123879,strawberries,1,0
126126,banana,1,0
126530,tomato,0,0


In [14]:
# Add priority score: sum of waste and markdown indicators
df_products["priority_score"] = df_products["waste_flag"] + df_products["markdown_flag"]

# Export enriched product file
priority_output_path = os.path.join(output_folder, "products_with_priority.csv")
df_products.to_csv(priority_output_path, index=False)

print("-> Saved priority-enriched products to:", priority_output_path)


-> Saved priority-enriched products to: cleaned_data\products_with_priority.csv


## Summary

This notebook prioritized product inventory by enriching the semantic product dataset with:

- `waste_flag`: Whether the product concept appeared in recent waste
- `markdown_flag`: Whether the product concept appeared in recent markdown discounts
- `priority_score`: Sum of both flags to rank urgency for usage

This allows downstream notebooks to:
- Favor overstocked or expiring inventory
- Build waste-aware meal boxes aligned with store conditions
- Reduce waste at both the retail and consumer level

Next: In **Notebook 10**, we will use these priority scores and product availability to generate store-specific meal kits.
