# Efficient Grocery Optimization (Offline Shopping)

## Objective
Minimize cost and maximize protein under nutritional constraints using offline prices (ALDI, Meijer).

## Dataset
The dataset includes nutritional and pricing information for various food items from offline sources (ALDI, Meijer).  
Each row represents a purchasable unit, with attributes such as:
- `item_name`
- `category`
- `price_total`, `purchase_count`, `unit_price`
- `protein_score`, `protein_per_100g`, etc.

Only relevant offline items are filtered and used for optimization.

## Optimization Strategy
We implement a dynamic programming (0/1 knapsack) algorithm to maximize total protein score within a given budget.  
Key aspects:
- Objective: Maximize sum of `protein_score`
- Constraint: Budget limit in dollars
- Input values scaled by 100 for integer DP
- Output is grouped by store and category

In [3]:
import pandas as pd
from collections import defaultdict

# Load data
df = pd.read_csv("../data/efficient_shopping_optimizer_data.csv")

# Select necessary columns and clean data
items = df[['item_name', 'price_total', 'protein_score', 'source', 'purchase_count', 'category']].copy()
items = items.dropna(subset=['item_name', 'price_total', 'protein_score', 'source', 'purchase_count', 'category'])

# Extract store name
items['store'] = items['source'].str.extract(r'_(\w+)', expand=False)
items = items.dropna(subset=['store'])  # ensure store is str

# Calculate unit price
items['unit_price'] = items['price_total'] / items['purchase_count']
items = items.reset_index(drop=True)

# Prepare data
protein_scores = items['protein_score'].tolist()
unit_prices = items['unit_price'].tolist()
item_names = items['item_name'].tolist()
stores = items['store'].tolist()
categories = items['category'].tolist()
budget = 50
n = len(protein_scores)

# DP scaling
scaled_weights = [int(round(w * 100)) for w in unit_prices]
scaled_budget = int(round(budget * 100))

dp = [[0.0] * (scaled_budget + 1) for _ in range(n + 1)]
for i in range(1, n + 1):
    for w in range(scaled_budget + 1):
        if scaled_weights[i - 1] > w:
            dp[i][w] = dp[i - 1][w]
        else:
            dp[i][w] = max(dp[i - 1][w], dp[i - 1][w - scaled_weights[i - 1]] + protein_scores[i - 1])

# Backtrack
selected_indices = []
w = scaled_budget
for i in range(n, 0, -1):
    if dp[i][w] != dp[i - 1][w]:
        selected_indices.append(i - 1)
        w -= scaled_weights[i - 1]

## Results
The final selection is printed by:
- Store → Category → Item  
Each item's unit cost and protein score are displayed, along with:
- **Total Expenditure**
- **Total Protein Score**

In [4]:
# Output
grouped_selection = defaultdict(lambda: defaultdict(list))
for i in selected_indices:
    grouped_selection[stores[i]][categories[i]].append(i)

print("Selected Items:")
total_cost = total_score = 0.0
for store in sorted(grouped_selection):
    print(f"{store}:")
    for category in sorted(grouped_selection[store]):
        print(f"  {category}:")
        for i in grouped_selection[store][category]:
            print(f"    - {item_names[i]} | ${unit_prices[i]:.2f} | Protein Score: {protein_scores[i]:.2f}")
            total_cost += unit_prices[i]
            total_score += protein_scores[i]
print(f"Total Expenditure: ${total_cost:.2f}")
print(f"Total Protein Score: {total_score:.2f}")

Selected Items:
ALDI:
  protein:
    - Plain Greek Yogurt | $3.79 | Protein Score: 22.50
    - Egg | $3.19 | Protein Score: 22.57
    - Chicken Breast | $9.99 | Protein Score: 25.88
  vegetables:
    - Potato | $4.69 | Protein Score: 20.35
    - Yellow Onion | $2.39 | Protein Score: 6.26
    - Carrot | $1.39 | Protein Score: 5.22
    - Cabbage | $2.25 | Protein Score: 14.52
Meijer:
  fruit:
    - Nectarine | $0.63 | Protein Score: 2.50
  protein:
    - Pork | $4.42 | Protein Score: 22.41
    - Plain Greek Yogurt | $0.75 | Protein Score: 5.36
    - Chicken Breast | $10.28 | Protein Score: 35.72
  vegetables:
    - Potato | $3.59 | Protein Score: 13.27
    - Carrot | $0.22 | Protein Score: 2.47
    - Cabbage | $2.37 | Protein Score: 13.78
Total Expenditure: $49.95
Total Protein Score: 212.82
