# Notebook 26 - Simulate Waste Impact with Random Matching

This notebook evaluates the effectiveness of a random product matching strategy by simulating how much food waste could have been avoided using randomly selected matches for each store-ingredient pair.

## Objectives
- Load random matches generated in Notebook 25
- Merge recipe/store context for alignment
- Join with waste logs using concept-level linking
- Aggregate avoided waste (items + value) per store
- Save results for benchmarking in Notebook 24

## Inputs
- matching_scored/matching_matrix_random.csv
- recipe_ranking/recipe_store_ranked.csv
- variant_exports/recipes_with_variants.csv
- cleaned_data/waste_with_concept.csv

## Output
- waste_simulation/waste_impact_random.csv


In [1]:
import os
import pandas as pd

# Define folders
matching_folder = "matching_scored"
ranking_folder = "recipe_ranking"
variant_folder = "variant_exports"
waste_folder = "cleaned_data"
output_folder = "waste_simulation"
os.makedirs(output_folder, exist_ok=True)

# Define files
match_file = os.path.join(matching_folder, "matching_matrix_random.csv")
ranked_file = os.path.join(ranking_folder, "recipe_store_ranked.csv")
recipes_file = os.path.join(variant_folder, "recipes_with_variants.csv")
waste_file = os.path.join(waste_folder, "waste_with_concept.csv")

# Load datasets
df_matches = pd.read_csv(match_file)
df_ranked = pd.read_csv(ranked_file)
df_recipes = pd.read_csv(recipes_file)
df_waste = pd.read_csv(waste_file)

print("Loaded:")
print(f"- Random Matches: {df_matches.shape}")
print(f"- Ranked Recipes: {df_ranked.shape}")
print(f"- Recipes: {df_recipes.shape}")
print(f"- Waste: {df_waste.shape}")


Loaded:
- Random Matches: (4, 8)
- Ranked Recipes: (4, 6)
- Recipes: (6, 8)
- Waste: (18382, 16)


In [5]:
# Ensure row_id exists
df_matches["row_id"] = pd.to_numeric(df_matches["row_id"], errors="coerce").astype("Int64")
df_recipes["row_id"] = pd.to_numeric(df_recipes["row_id"], errors="coerce").astype("Int64")

# Normalize ingredient names
df_matches["ingredient"] = df_matches["ingredient"].astype(str).str.strip().str.lower()
df_recipes["ingredient"] = df_recipes["ingredient"].astype(str).str.strip().str.lower()

# Merge recipe names back into df_matches using row_id and ingredient
df_matches = df_matches.merge(
    df_recipes[["row_id", "ingredient", "recipe"]],
    on=["row_id", "ingredient"],
    how="left"
)

# Confirm success
assert "recipe" in df_matches.columns, "Merge failed: 'recipe' column missing"
print("Merge successful. 'recipe' column restored to df_matches.")


Merge successful. 'recipe' column restored to df_matches.


In [7]:
# Normalize concept field for join
df_matches["product_concept"] = df_matches["product_concept"].astype(str).str.strip().str.lower()
df_waste["product_concept"] = df_waste["product_concept"].astype(str).str.strip().str.lower()

# Filter to deployed recipe-store pairs only
deployable_pairs = df_ranked[["store", "recipe"]].drop_duplicates()
df_deployed = df_matches.merge(deployable_pairs, on=["store", "recipe"], how="inner")
print("Deployed random matches:", df_deployed.shape)


Deployed random matches: (4, 10)


In [24]:
# Fix any inconsistent store column in df_waste BEFORE merge
df_waste = df_waste.rename(columns={col: "Store" for col in df_waste.columns if col.lower() == "store"})


In [25]:
# Perform concept-level join to match deployed products to waste records
df_simulated = df_waste.merge(
    df_deployed,
    left_on=["Store", "product_concept"],
    right_on=["store", "product_concept"],
    how="inner"
)

print("Simulated random impact matches:", df_simulated.shape)
display(df_simulated[["Store", "product_concept", "Items wasted", "Value wasted", "recipe"]].head())


Simulated random impact matches: (610, 25)


Unnamed: 0,Store,product_concept,Items wasted,Value wasted,recipe
0,1024,,6,5.94,Greek Yogurt & Honey
1,1024,,4,3.16,Greek Yogurt & Honey
2,1024,,8,15.92,Greek Yogurt & Honey
3,1024,,2,2.3,Greek Yogurt & Honey
4,1024,,2,4.18,Greek Yogurt & Honey


In [26]:
df_waste = df_waste.rename(columns={col: "Store" for col in df_waste.columns if col.lower() == "store"})

# Find all columns named 'store' (case-insensitive)
store_col_idx = [i for i, c in enumerate(df_simulated.columns) if c.lower() == "store"]

# Check we have at least two (waste + deployed)
if len(store_col_idx) < 2:
    raise ValueError("Expected at least two 'store' columns in df_simulated.")

# Extract second one (from df_deployed) as a temp variable
store_series = df_simulated.iloc[:, store_col_idx[1]]

# Drop *by position* instead of name to avoid pandas re-aliasing
df_simulated.drop(df_simulated.columns[store_col_idx], axis=1, inplace=True)

# Assign clean version back to correct name
df_simulated["store"] = pd.to_numeric(store_series, errors="coerce").astype("Int64")
assert df_simulated["store"].ndim == 1

print("Cleaned and reassigned 'store' column safely.")


Cleaned and reassigned 'store' column safely.


In [27]:
# Rename for consistency
df_simulated = df_simulated.rename(columns={
    "Items wasted": "items_wasted",
    "Value wasted": "value_wasted"
})

# Aggregate total waste potentially avoided per store
df_impact = df_simulated.groupby("store").agg({
    "items_wasted": "sum",
    "value_wasted": "sum"
}).reset_index()

# Save results
output_file = os.path.join(output_folder, "waste_impact_random.csv")
df_impact.to_csv(output_file, index=False)

print("Saved simulated waste impact (random matching) to:", output_file)
display(df_impact)


Saved simulated waste impact (random matching) to: waste_simulation\waste_impact_random.csv


Unnamed: 0,store,items_wasted,value_wasted
0,1024,327,528.54
1,1090,166,371.52
2,3340,275,421.81
3,4255,429,677.31
