# Notebook 23 - Simulate Waste Impact with Fuzzy Matching

## Purpose  
This notebook simulates how much product waste could have been avoided if a fuzzy-only recipe-product matching strategy had been deployed in stores. It mirrors the semantic-based simulation (Notebook 21) to support a fair comparison in Notebook 24.

## Objectives
- Load fuzzy-only matches and join with ranked recipe-store deployment plans
- Align matched products with waste logs using concept-based linking
- Aggregate estimated waste reductions (items + value) per store
- Export results for benchmarking against other strategies

## Inputs
- matching_matrix_fuzzy.csv - Fuzzy concept-level matches
- recipe_store_ranked.csv - Ranked store-recipe plans
- recipes_with_variants.csv - Recipe ingredient metadata
- waste_with_concept.csv - Waste logs with product concept tags

## Output
- waste_impact_fuzzy.csv - Store-level avoided waste using fuzzy-only logic


In [26]:
import os
import pandas as pd

# Paths
matching_folder = "matching_scored"
ranking_folder = "recipe_ranking"
variant_folder = "variant_exports"
waste_folder = "cleaned_data"
output_folder = "waste_simulation"
os.makedirs(output_folder, exist_ok=True)

# Files
match_file = os.path.join(matching_folder, "matching_matrix_fuzzy.csv")
ranked_file = os.path.join(ranking_folder, "recipe_store_ranked.csv")
recipes_file = os.path.join(variant_folder, "recipes_with_variants.csv")
waste_file = os.path.join(waste_folder, "waste_with_concept.csv")

# Load data
df_matches = pd.read_csv(match_file)
df_ranked = pd.read_csv(ranked_file)
df_recipes = pd.read_csv(recipes_file)
df_waste = pd.read_csv(waste_file)

print("Loaded:")
print(f"- Fuzzy Matches: {df_matches.shape}")
print(f"- Ranked Recipes: {df_ranked.shape}")
print(f"- Recipes: {df_recipes.shape}")
print(f"- Waste: {df_waste.shape}")


Loaded:
- Fuzzy Matches: (21, 8)
- Ranked Recipes: (4, 6)
- Recipes: (6, 8)
- Waste: (18382, 16)


In [28]:
# Ensure row_id is integer-compatible in both dataframes
df_matches["row_id"] = pd.to_numeric(df_matches["row_id"], errors="coerce").astype("Int64")
df_recipes["row_id"] = pd.to_numeric(df_recipes["row_id"], errors="coerce").astype("Int64")

# Sanity check before merging
print("Sample df_matches row_ids:", df_matches["row_id"].dropna().unique()[:5])
print("Sample df_recipes row_ids:", df_recipes["row_id"].dropna().unique()[:5])


Sample df_matches row_ids: <IntegerArray>
[0, 1, 2, 3, 4]
Length: 5, dtype: Int64
Sample df_recipes row_ids: <IntegerArray>
[0, 1, 2, 3, 4]
Length: 5, dtype: Int64


In [29]:
# Merge recipe context using row_id
df_matches = df_matches.merge(
    df_recipes[["row_id", "recipe"]],
    on="row_id",
    how="left"
)

# Confirm recipe field exists
if "recipe" not in df_matches.columns:
    raise ValueError("Merge failed: 'recipe' column still missing. Check row_id types and values.")


In [30]:
# Filter to deployed recipe-store combinations
deployable_pairs = df_ranked[["store", "recipe"]].drop_duplicates()
df_deployed = df_matches.merge(deployable_pairs, on=["store", "recipe"], how="inner")

print("Deployable fuzzy matches:", df_deployed.shape)
display(df_deployed.head())


Deployable fuzzy matches: (5, 10)


Unnamed: 0,row_id,recipe_x,ingredient,store,product_article,product_name,product_concept,fuzzy_score,recipe_y,recipe
0,2,Greek Yogurt & Honey,yogurt,1024,438226,Roeryoghurt,yogurt,100.0,Greek Yogurt & Honey,Greek Yogurt & Honey
1,2,Greek Yogurt & Honey,yogurt,1090,438226,Roeryoghurt,yogurt,100.0,Greek Yogurt & Honey,Greek Yogurt & Honey
2,2,Greek Yogurt & Honey,yogurt,4255,105755,Kwark aardbei,yogurt,100.0,Greek Yogurt & Honey,Greek Yogurt & Honey
3,2,Greek Yogurt & Honey,yogurt,4255,438226,Roeryoghurt,yogurt,100.0,Greek Yogurt & Honey,Greek Yogurt & Honey
4,2,Greek Yogurt & Honey,yogurt,3340,438226,Roeryoghurt,yogurt,100.0,Greek Yogurt & Honey,Greek Yogurt & Honey


In [31]:
# Normalize concept fields to lowercase
df_deployed["product_concept"] = df_deployed["product_concept"].astype(str).str.strip().str.lower()
df_waste["product_concept"] = df_waste["product_concept"].astype(str).str.strip().str.lower()


In [32]:
# Join waste logs with deployed matches on concept + store
df_simulated = df_waste.merge(
    df_deployed,
    left_on=["Store", "product_concept"],
    right_on=["store", "product_concept"],
    how="inner"
)

print("Simulated fuzzy impact matches:", df_simulated.shape)
display(df_simulated[["Store", "product_concept", "Items wasted", "Value wasted", "recipe"]].head())


Simulated fuzzy impact matches: (7, 25)


Unnamed: 0,Store,product_concept,Items wasted,Value wasted,recipe
0,4255,yogurt,1,1.33,Greek Yogurt & Honey
1,4255,yogurt,1,1.33,Greek Yogurt & Honey
2,4255,yogurt,1,0.79,Greek Yogurt & Honey
3,4255,yogurt,1,0.79,Greek Yogurt & Honey
4,1024,yogurt,1,0.79,Greek Yogurt & Honey


In [33]:
# Fix store column if duplicates exist
if "store_x" in df_simulated.columns and "store_y" in df_simulated.columns:
    df_simulated["store"] = df_simulated["store_x"]
elif "Store" in df_simulated.columns:
    df_simulated = df_simulated.rename(columns={"Store": "store"})

# Remove duplicates and rename relevant columns
df_simulated = df_simulated.loc[:, ~df_simulated.columns.duplicated()]
df_simulated = df_simulated.rename(columns={
    "Items wasted": "items_wasted",
    "Value wasted": "value_wasted"
})


In [34]:
# Aggregate total avoided waste per store
df_impact = df_simulated.groupby("store").agg({
    "items_wasted": "sum",
    "value_wasted": "sum"
}).reset_index()

# Save results
output_file = os.path.join(output_folder, "waste_impact_fuzzy.csv")
df_impact.to_csv(output_file, index=False)

print(f"Saved fuzzy impact results to: {output_file}")
display(df_impact.head())


Saved fuzzy impact results to: waste_simulation\waste_impact_fuzzy.csv


Unnamed: 0,store,items_wasted,value_wasted
0,1024,1,0.79
1,1090,1,0.79
2,3340,1,0.79
3,4255,4,4.24
