# Food Detection POC / Spike

**Goal:** Validate whether traditional ML (YOLO + CNN classifiers) can produce accurate-enough ingredient detection and weight estimation for a macro-tracking app, *before* committing to a full backend implementation.

## Pipeline
1. Load image + extract EXIF metadata (camera, focal length)
2. YOLO detection → bounding boxes + labels + confidence
3. Dish classification → infer hidden ingredients
4. Distance / dimension estimation from EXIF + bounding box geometry
5. Weight estimation per ingredient
6. Nutritional lookup (USDA FoodData Central)
7. Summary: total macros/micros per photo

## How to use
1. Drop your food photos into `./test_images/`
2. Run cells top-to-bottom
3. Review detection results visually
4. Annotate accuracy in the review section

---

## Cell 0: Setup & Installation

In [None]:
# Run once to install dependencies
# !pip install -r requirements.txt

import os
import json
from pathlib import Path
from dataclasses import dataclass, field, asdict
from typing import Optional

import cv2
import exifread
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np
import pandas as pd
import requests
from PIL import Image
from ultralytics import YOLO

%matplotlib inline
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['figure.dpi'] = 100

# Create test_images directory if it doesn't exist
Path('./test_images').mkdir(exist_ok=True)

print('All imports loaded. Drop food photos into ./test_images/ and continue.')

## Cell 1: Image Loading + EXIF Metadata Extraction

We extract camera metadata to later estimate distance-to-subject and real-world dimensions.

Key EXIF fields:
- **FocalLength**: lens focal length in mm — used for distance estimation
- **FocalLengthIn35mmFilm**: normalized focal length for cross-device comparison
- **Make / Model**: camera/phone identifier
- **ImageWidth / ImageLength**: pixel dimensions
- **ExifImageWidth / ExifImageHeight**: actual capture resolution

In [None]:
@dataclass
class ImageMetadata:
    """EXIF and derived metadata for a food photo."""
    file_path: str
    width_px: int = 0
    height_px: int = 0
    camera_make: str = 'unknown'
    camera_model: str = 'unknown'
    focal_length_mm: Optional[float] = None
    focal_length_35mm: Optional[float] = None
    # Sensor size defaults — common smartphone sensor (1/1.7")
    sensor_width_mm: float = 7.6
    sensor_height_mm: float = 5.7


# Known sensor sizes for common phones (width_mm, height_mm)
# Extend this as you test with different devices
KNOWN_SENSORS: dict[str, tuple[float, float]] = {
    'iPhone 15 Pro': (9.8, 7.3),       # 1/1.28"
    'iPhone 15 Pro Max': (9.8, 7.3),
    'iPhone 14 Pro': (9.8, 7.3),
    'iPhone 14 Pro Max': (9.8, 7.3),
    'iPhone 13 Pro': (7.6, 5.7),       # 1/1.7"
    'iPhone 13': (7.6, 5.7),
    'Pixel 8 Pro': (9.8, 7.3),         # 1/1.31"
    'Pixel 7 Pro': (9.8, 7.3),
    'Samsung SM-S928B': (9.8, 7.3),    # S24 Ultra
    'Samsung SM-S918B': (9.8, 7.3),    # S23 Ultra
}


def _parse_focal_length(tag_value) -> Optional[float]:
    """Parse EXIF focal length tag to float mm value."""
    try:
        val = tag_value.values[0]
        if hasattr(val, 'num') and hasattr(val, 'den'):
            return float(val.num) / float(val.den)
        return float(val)
    except (IndexError, TypeError, ZeroDivisionError):
        return None


def extract_metadata(image_path: str) -> ImageMetadata:
    """Extract EXIF metadata from image file."""
    meta = ImageMetadata(file_path=image_path)

    # Get pixel dimensions from PIL (reliable even without EXIF)
    with Image.open(image_path) as img:
        meta.width_px, meta.height_px = img.size

    # Extract EXIF tags
    with open(image_path, 'rb') as f:
        tags = exifread.process_file(f, details=False)

    if not tags:
        print(f'  Warning: No EXIF data found in {image_path}')
        return meta

    meta.camera_make = str(tags.get('Image Make', 'unknown')).strip()
    meta.camera_model = str(tags.get('Image Model', 'unknown')).strip()

    if 'EXIF FocalLength' in tags:
        meta.focal_length_mm = _parse_focal_length(tags['EXIF FocalLength'])

    if 'EXIF FocalLengthIn35mmFilm' in tags:
        try:
            meta.focal_length_35mm = float(tags['EXIF FocalLengthIn35mmFilm'].values[0])
        except (IndexError, TypeError):
            pass

    # Look up sensor size from known devices
    for model_name, (sw, sh) in KNOWN_SENSORS.items():
        if model_name.lower() in meta.camera_model.lower():
            meta.sensor_width_mm = sw
            meta.sensor_height_mm = sh
            break

    return meta


# --- Load all test images ---
IMAGE_DIR = Path('./test_images')
image_extensions = {'.jpg', '.jpeg', '.png', '.heic', '.webp'}
image_paths = sorted(
    p for p in IMAGE_DIR.iterdir()
    if p.suffix.lower() in image_extensions
)

if not image_paths:
    print('No images found in ./test_images/')
    print('Please add some food photos and re-run this cell.')
else:
    print(f'Found {len(image_paths)} image(s):')
    all_metadata = {}
    for p in image_paths:
        meta = extract_metadata(str(p))
        all_metadata[p.name] = meta
        print(f'  {p.name}: {meta.width_px}x{meta.height_px}, '
              f'camera={meta.camera_model}, focal={meta.focal_length_mm}mm')

## Cell 2: YOLO Food Detection

We start with the pretrained YOLOv8n (nano) model on COCO. This knows ~80 object classes including some foods (banana, apple, pizza, cake, sandwich, etc.).

**Limitation:** COCO only has ~10 food classes. For a production tracker we'd fine-tune on Food-101 or ISIA Food-500. This cell validates the *pipeline* — accuracy will improve with a food-specific model.

### What to look for during review
- Does it draw boxes around food items (vs non-food)?
- Are the bounding boxes tight or sloppy?
- What confidence scores do correct detections get?
- What foods does it miss entirely?

In [None]:
# Load YOLO model — downloads weights on first run (~6MB for nano)
yolo_model = YOLO('yolov8n.pt')

# COCO classes that are food-related (for filtering)
COCO_FOOD_CLASSES = {
    46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange',
    50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza',
    54: 'donut', 55: 'cake',
    # Also useful context:
    39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork',
    43: 'knife', 44: 'spoon', 45: 'bowl',
}


@dataclass
class Detection:
    """A single detected object in the image."""
    label: str
    confidence: float
    bbox_xyxy: list[float]   # [x1, y1, x2, y2] in pixels
    class_id: int
    is_food: bool = False

    @property
    def bbox_width_px(self) -> float:
        return self.bbox_xyxy[2] - self.bbox_xyxy[0]

    @property
    def bbox_height_px(self) -> float:
        return self.bbox_xyxy[3] - self.bbox_xyxy[1]

    @property
    def bbox_area_px(self) -> float:
        return self.bbox_width_px * self.bbox_height_px


def detect_food(image_path: str, conf_threshold: float = 0.25) -> list[Detection]:
    """Run YOLO detection and return structured results."""
    results = yolo_model.predict(source=image_path, conf=conf_threshold, verbose=False)
    detections = []

    for result in results:
        for box in result.boxes:
            class_id = int(box.cls[0])
            label = yolo_model.names[class_id]
            det = Detection(
                label=label,
                confidence=float(box.conf[0]),
                bbox_xyxy=box.xyxy[0].tolist(),
                class_id=class_id,
                is_food=class_id in COCO_FOOD_CLASSES,
            )
            detections.append(det)

    return detections


def visualise_detections(image_path: str, detections: list[Detection]):
    """Display image with bounding boxes overlaid."""
    img = Image.open(image_path)
    fig, ax = plt.subplots(1, 1)
    ax.imshow(img)

    for det in detections:
        x1, y1, x2, y2 = det.bbox_xyxy
        color = 'lime' if det.is_food else 'cyan'
        rect = patches.Rectangle(
            (x1, y1), x2 - x1, y2 - y1,
            linewidth=2, edgecolor=color, facecolor='none'
        )
        ax.add_patch(rect)
        ax.text(
            x1, y1 - 5,
            f'{det.label} ({det.confidence:.0%})',
            color='white', fontsize=9,
            bbox=dict(boxstyle='round,pad=0.2', facecolor=color, alpha=0.7)
        )

    food_count = sum(1 for d in detections if d.is_food)
    ax.set_title(f'{Path(image_path).name} — {food_count} food items, '
                 f'{len(detections)} total detections')
    ax.axis('off')
    plt.tight_layout()
    plt.show()


# --- Run detection on all test images ---
all_detections: dict[str, list[Detection]] = {}

for img_path in image_paths:
    dets = detect_food(str(img_path))
    all_detections[img_path.name] = dets
    visualise_detections(str(img_path), dets)

    food_dets = [d for d in dets if d.is_food]
    print(f'  Food items: {[f"{d.label} ({d.confidence:.0%})" for d in food_dets]}')
    print()

## Cell 3: Dish Classification (Infer Hidden Ingredients)

YOLO tells us *what objects are visible*. But a photo of "fried rice" contains soy sauce, oil, egg, etc. that aren't individually detectable.

We use a separate classifier to identify the *dish* and then look up its typical ingredient composition.

**Approach for this spike:** We use a simple lookup table mapping dish names → expected ingredients. In production this could be:
- A Food-101 fine-tuned classifier (EfficientNet or MobileViT)
- An LLM fallback for unusual dishes
- A user-editable recipe database

In [None]:
# Dish → typical ingredients mapping
# This is a stand-in for a trained classifier. In production you'd use
# a Food-101 model to classify the dish, then look up ingredients.
DISH_INGREDIENTS: dict[str, list[dict]] = {
    'pizza': [
        {'name': 'pizza dough', 'weight_pct': 0.35},
        {'name': 'mozzarella cheese', 'weight_pct': 0.25},
        {'name': 'tomato sauce', 'weight_pct': 0.15},
        {'name': 'olive oil', 'weight_pct': 0.05},
    ],
    'sandwich': [
        {'name': 'bread', 'weight_pct': 0.30},
        {'name': 'deli meat', 'weight_pct': 0.25},
        {'name': 'cheese', 'weight_pct': 0.15},
        {'name': 'lettuce', 'weight_pct': 0.10},
        {'name': 'tomato', 'weight_pct': 0.05},
        {'name': 'mayonnaise', 'weight_pct': 0.05},
    ],
    'cake': [
        {'name': 'flour', 'weight_pct': 0.25},
        {'name': 'sugar', 'weight_pct': 0.20},
        {'name': 'butter', 'weight_pct': 0.15},
        {'name': 'eggs', 'weight_pct': 0.15},
        {'name': 'frosting', 'weight_pct': 0.20},
    ],
    'hot dog': [
        {'name': 'hot dog sausage', 'weight_pct': 0.45},
        {'name': 'hot dog bun', 'weight_pct': 0.35},
        {'name': 'ketchup', 'weight_pct': 0.10},
        {'name': 'mustard', 'weight_pct': 0.05},
    ],
    # Fruits and vegetables are typically "what you see is what you get"
    'banana': [{'name': 'banana', 'weight_pct': 1.0}],
    'apple': [{'name': 'apple', 'weight_pct': 1.0}],
    'orange': [{'name': 'orange', 'weight_pct': 1.0}],
    'broccoli': [{'name': 'broccoli', 'weight_pct': 1.0}],
    'carrot': [{'name': 'carrot', 'weight_pct': 1.0}],
    'donut': [
        {'name': 'flour', 'weight_pct': 0.30},
        {'name': 'sugar', 'weight_pct': 0.20},
        {'name': 'vegetable oil', 'weight_pct': 0.20},
        {'name': 'eggs', 'weight_pct': 0.10},
        {'name': 'glaze', 'weight_pct': 0.15},
    ],
}


@dataclass
class IngredientEstimate:
    """An estimated ingredient with source attribution."""
    name: str
    source: str           # 'detected' or 'inferred_from_dish'
    weight_pct: float     # proportion of total dish weight
    weight_g: float = 0.0 # filled in by weight estimation step


def infer_ingredients(detections: list[Detection]) -> list[IngredientEstimate]:
    """Combine YOLO detections with dish-level ingredient inference."""
    ingredients = []
    seen_labels = set()

    for det in detections:
        if not det.is_food:
            continue

        label = det.label
        if label in seen_labels:
            continue
        seen_labels.add(label)

        # Look up dish ingredients
        if label in DISH_INGREDIENTS:
            for ing in DISH_INGREDIENTS[label]:
                source = 'detected' if ing['name'] == label else 'inferred_from_dish'
                ingredients.append(IngredientEstimate(
                    name=ing['name'],
                    source=source,
                    weight_pct=ing['weight_pct'],
                ))
        else:
            # Unknown dish — treat the detection label as a single ingredient
            ingredients.append(IngredientEstimate(
                name=label,
                source='detected',
                weight_pct=1.0,
            ))

    return ingredients


# --- Run ingredient inference ---
all_ingredients: dict[str, list[IngredientEstimate]] = {}

for img_name, dets in all_detections.items():
    ings = infer_ingredients(dets)
    all_ingredients[img_name] = ings

    print(f'\n{img_name}:')
    if ings:
        for ing in ings:
            print(f'  [{ing.source:>20}] {ing.name} ({ing.weight_pct:.0%})')
    else:
        print('  No food items detected — try a different photo or lower conf_threshold')

## Cell 4: Distance & Dimension Estimation

Using EXIF focal length + known sensor size + assumptions about plate/bowl diameter, we estimate how far the camera was from the food, and convert bounding box pixels into real-world centimetres.

### The geometry
```
real_width = (object_px / image_px) * (sensor_width_mm / focal_length_mm) * distance_mm
```

Since we don't know `distance_mm` directly, we use a **reference object** — typically the plate/bowl detected by YOLO. Standard dinner plate ≈ 26cm diameter.

If no reference object is found, we fall back to a "typical overhead phone photo" distance of ~30cm.

In [None]:
# Reference object sizes (real-world diameter in cm)
REFERENCE_SIZES_CM: dict[str, float] = {
    'bowl': 18.0,        # standard cereal/soup bowl
    'cup': 8.0,          # standard mug diameter
    'wine glass': 8.0,
    'fork': 19.0,        # length
    'knife': 22.0,       # length
    'spoon': 17.0,       # length
    'bottle': 7.0,       # diameter
}

DEFAULT_PLATE_DIAMETER_CM = 26.0
DEFAULT_DISTANCE_CM = 30.0  # typical overhead phone shot distance


@dataclass
class SceneGeometry:
    """Estimated real-world geometry of the scene."""
    px_per_cm: float         # pixels per centimetre
    distance_cm: float       # estimated camera-to-food distance
    reference_used: str      # what object was used for calibration
    confidence: str          # 'high', 'medium', 'low'


def estimate_scene_geometry(
    metadata: ImageMetadata,
    detections: list[Detection],
) -> SceneGeometry:
    """Estimate px/cm ratio using reference objects or fallback."""

    # Strategy 1: Use a detected reference object
    for det in detections:
        if det.label in REFERENCE_SIZES_CM:
            real_size_cm = REFERENCE_SIZES_CM[det.label]
            # Use the larger bbox dimension as the reference measurement
            ref_px = max(det.bbox_width_px, det.bbox_height_px)
            px_per_cm = ref_px / real_size_cm

            # Back-calculate distance if we have focal length
            distance_cm = DEFAULT_DISTANCE_CM
            if metadata.focal_length_mm and metadata.sensor_width_mm:
                distance_cm = (
                    real_size_cm * 10  # to mm
                    * metadata.focal_length_mm
                    / (ref_px / metadata.width_px * metadata.sensor_width_mm)
                ) / 10  # back to cm

            return SceneGeometry(
                px_per_cm=px_per_cm,
                distance_cm=distance_cm,
                reference_used=det.label,
                confidence='medium',
            )

    # Strategy 2: Assume largest food bounding box is on a standard plate
    food_dets = [d for d in detections if d.is_food]
    if food_dets:
        largest = max(food_dets, key=lambda d: d.bbox_area_px)
        # Assume the food fills ~70% of a standard plate
        estimated_plate_px = max(largest.bbox_width_px, largest.bbox_height_px) / 0.7
        px_per_cm = estimated_plate_px / DEFAULT_PLATE_DIAMETER_CM

        return SceneGeometry(
            px_per_cm=px_per_cm,
            distance_cm=DEFAULT_DISTANCE_CM,
            reference_used='assumed_plate',
            confidence='low',
        )

    # Strategy 3: Pure fallback — use image width as ~30cm field of view
    return SceneGeometry(
        px_per_cm=metadata.width_px / 30.0,
        distance_cm=DEFAULT_DISTANCE_CM,
        reference_used='fallback',
        confidence='low',
    )


# --- Estimate geometry for each image ---
all_geometry: dict[str, SceneGeometry] = {}

for img_name in all_detections:
    meta = all_metadata[img_name]
    dets = all_detections[img_name]
    geo = estimate_scene_geometry(meta, dets)
    all_geometry[img_name] = geo

    print(f'{img_name}:')
    print(f'  px/cm = {geo.px_per_cm:.1f}, distance = {geo.distance_cm:.0f}cm, '
          f'ref = {geo.reference_used}, confidence = {geo.confidence}')

## Cell 5: Weight Estimation

Now we combine:
- Bounding box size (from YOLO)
- Scene geometry (px → cm conversion)
- Food density lookup table
- Dish ingredient proportions

to estimate the weight in grams for each ingredient.

### Simplifying assumptions for this spike
- Food item depth ≈ 50% of the smaller bbox dimension (overhead view)
- Volume modelled as an ellipsoid: `V = (4/3) * π * (w/2) * (h/2) * (d/2)`
- Default density = 0.8 g/cm³ (roughly cooked food average)

In [None]:
import math

# Food density table — replicating the one from the backend config
FOOD_DENSITY: dict[str, float] = {
    # Proteins
    'chicken breast': 1.04, 'deli meat': 1.04, 'beef': 1.05,
    'salmon': 1.02, 'egg': 1.03, 'eggs': 1.03, 'tofu': 0.95,
    'hot dog sausage': 1.02,
    # Carbs
    'rice': 0.81, 'pasta': 1.00, 'bread': 0.27,
    'pizza dough': 0.70, 'hot dog bun': 0.30,
    'flour': 0.59, 'potato': 1.08, 'oats': 0.84,
    # Vegetables
    'broccoli': 0.35, 'carrot': 0.64, 'tomato': 0.62,
    'lettuce': 0.36, 'cucumber': 0.96,
    # Fruits
    'apple': 0.64, 'banana': 0.94, 'orange': 0.87,
    'avocado': 0.92, 'berries': 0.65,
    # Dairy / Fats
    'mozzarella cheese': 1.00, 'cheese': 1.00,
    'butter': 0.91, 'olive oil': 0.91, 'vegetable oil': 0.91,
    'mayonnaise': 0.91,
    # Sauces
    'tomato sauce': 1.03, 'ketchup': 1.10, 'mustard': 1.05,
    # Sweets
    'sugar': 0.85, 'frosting': 1.10, 'glaze': 1.15,
}

DEFAULT_DENSITY = 0.80  # g/cm³
DEPTH_RATIO = 0.5       # assumed depth = 50% of smaller bbox dimension


def estimate_weight(
    detection: Detection,
    ingredients: list[IngredientEstimate],
    geometry: SceneGeometry,
) -> list[IngredientEstimate]:
    """Estimate weight in grams for each ingredient of a detected food item."""
    # Convert bbox to real-world cm
    width_cm = detection.bbox_width_px / geometry.px_per_cm
    height_cm = detection.bbox_height_px / geometry.px_per_cm
    depth_cm = min(width_cm, height_cm) * DEPTH_RATIO

    # Ellipsoid volume
    volume_cm3 = (4 / 3) * math.pi * (width_cm / 2) * (height_cm / 2) * (depth_cm / 2)

    # Use the detection label's density, or default
    overall_density = FOOD_DENSITY.get(detection.label, DEFAULT_DENSITY)
    total_weight_g = volume_cm3 * overall_density

    # Distribute weight across ingredients by their proportions
    for ing in ingredients:
        ing.weight_g = total_weight_g * ing.weight_pct

    return ingredients


# --- Estimate weights for all images ---
for img_name in all_detections:
    geo = all_geometry[img_name]
    food_dets = [d for d in all_detections[img_name] if d.is_food]
    ings = all_ingredients[img_name]

    print(f'\n{img_name} (scene: {geo.px_per_cm:.1f} px/cm, ref={geo.reference_used}):')

    if not food_dets:
        print('  No food detections.')
        continue

    for det in food_dets:
        # Get ingredients that belong to this detection
        det_ings = [i for i in ings]
        estimate_weight(det, det_ings, geo)

        width_cm = det.bbox_width_px / geo.px_per_cm
        height_cm = det.bbox_height_px / geo.px_per_cm
        total_g = sum(i.weight_g for i in det_ings)

        print(f'  {det.label} ({det.confidence:.0%}): '
              f'bbox={width_cm:.1f}x{height_cm:.1f}cm, '
              f'estimated total={total_g:.0f}g')
        for ing in det_ings:
            density = FOOD_DENSITY.get(ing.name, DEFAULT_DENSITY)
            print(f'    {ing.name}: {ing.weight_g:.0f}g '
                  f'(density={density}, pct={ing.weight_pct:.0%})')

## Cell 6: Nutritional Lookup (USDA FoodData Central)

We query the USDA FoodData Central API to get per-100g nutritional data, then scale by our estimated weights.

**API:** https://api.nal.usda.gov/fdc/v1  
**Rate limit:** 1000 requests/hour with a free API key  
**Key nutrients:** calories (kcal), protein (g), total fat (g), carbohydrates (g), fiber (g)

Get a free API key at: https://fdc.nal.usda.gov/api-key-signup

In [None]:
# --- Configuration ---
# Get a free key from https://fdc.nal.usda.gov/api-key-signup
USDA_API_KEY = os.environ.get('USDA_API_KEY', 'DEMO_KEY')
USDA_BASE_URL = 'https://api.nal.usda.gov/fdc/v1'


@dataclass
class NutritionPer100g:
    """Nutritional values per 100g of a food item."""
    food_name: str
    fdc_id: int = 0
    calories_kcal: float = 0.0
    protein_g: float = 0.0
    fat_g: float = 0.0
    carbs_g: float = 0.0
    fiber_g: float = 0.0
    source: str = 'usda'


# Local cache to avoid repeat API calls within the notebook session
_nutrition_cache: dict[str, NutritionPer100g] = {}

# Nutrient ID mapping for USDA FoodData Central
NUTRIENT_IDS = {
    1008: 'calories_kcal',  # Energy
    1003: 'protein_g',      # Protein
    1004: 'fat_g',          # Total lipid (fat)
    1005: 'carbs_g',        # Carbohydrate, by difference
    1079: 'fiber_g',        # Fiber, total dietary
}


def lookup_nutrition(food_name: str) -> NutritionPer100g:
    """Search USDA FoodData Central for nutritional info."""
    if food_name in _nutrition_cache:
        return _nutrition_cache[food_name]

    result = NutritionPer100g(food_name=food_name)

    try:
        # Search for the food
        resp = requests.get(
            f'{USDA_BASE_URL}/foods/search',
            params={
                'api_key': USDA_API_KEY,
                'query': food_name,
                'pageSize': 1,
                'dataType': 'Foundation,SR Legacy',
            },
            timeout=10,
        )
        resp.raise_for_status()
        data = resp.json()

        if not data.get('foods'):
            print(f'  USDA: No results for "{food_name}"')
            _nutrition_cache[food_name] = result
            return result

        food = data['foods'][0]
        result.fdc_id = food.get('fdcId', 0)
        result.food_name = food.get('description', food_name)

        for nutrient in food.get('foodNutrients', []):
            nid = nutrient.get('nutrientId')
            if nid in NUTRIENT_IDS:
                setattr(result, NUTRIENT_IDS[nid], nutrient.get('value', 0.0))

    except requests.RequestException as e:
        print(f'  USDA API error for "{food_name}": {e}')

    _nutrition_cache[food_name] = result
    return result


# --- Look up nutrition for all detected ingredients ---
all_nutrition: dict[str, NutritionPer100g] = {}

for img_name, ings in all_ingredients.items():
    print(f'\nLooking up nutrition for {img_name}...')
    for ing in ings:
        if ing.name not in all_nutrition:
            nutr = lookup_nutrition(ing.name)
            all_nutrition[ing.name] = nutr
            print(f'  {ing.name} → {nutr.food_name} '
                  f'({nutr.calories_kcal:.0f} kcal/100g, '
                  f'P:{nutr.protein_g:.1f}g, F:{nutr.fat_g:.1f}g, '
                  f'C:{nutr.carbs_g:.1f}g)')

## Cell 7: Summary — Total Macros Per Photo

Combine estimated weights + nutritional data → final calorie/macro breakdown per image.

In [None]:
@dataclass
class MealSummary:
    """Total nutritional summary for one image."""
    image_name: str
    total_calories: float = 0.0
    total_protein_g: float = 0.0
    total_fat_g: float = 0.0
    total_carbs_g: float = 0.0
    total_fiber_g: float = 0.0
    total_weight_g: float = 0.0
    ingredient_details: list = field(default_factory=list)


def compute_meal_summary(
    img_name: str,
    ingredients: list[IngredientEstimate],
    nutrition_db: dict[str, NutritionPer100g],
) -> MealSummary:
    """Calculate total nutrition for a meal image."""
    summary = MealSummary(image_name=img_name)

    for ing in ingredients:
        nutr = nutrition_db.get(ing.name)
        if not nutr:
            continue

        # Scale from per-100g to actual estimated weight
        scale = ing.weight_g / 100.0

        cal = nutr.calories_kcal * scale
        pro = nutr.protein_g * scale
        fat = nutr.fat_g * scale
        carb = nutr.carbs_g * scale
        fib = nutr.fiber_g * scale

        summary.total_calories += cal
        summary.total_protein_g += pro
        summary.total_fat_g += fat
        summary.total_carbs_g += carb
        summary.total_fiber_g += fib
        summary.total_weight_g += ing.weight_g

        summary.ingredient_details.append({
            'ingredient': ing.name,
            'source': ing.source,
            'weight_g': round(ing.weight_g, 1),
            'calories': round(cal, 1),
            'protein_g': round(pro, 1),
            'fat_g': round(fat, 1),
            'carbs_g': round(carb, 1),
        })

    return summary


# --- Compute summaries ---
meal_summaries: dict[str, MealSummary] = {}

for img_name, ings in all_ingredients.items():
    summary = compute_meal_summary(img_name, ings, all_nutrition)
    meal_summaries[img_name] = summary

    print(f'\n{"=" * 60}')
    print(f'{img_name}')
    print(f'{"=" * 60}')

    # Ingredient breakdown table
    if summary.ingredient_details:
        df = pd.DataFrame(summary.ingredient_details)
        display(df)

    print(f'\n  TOTALS:')
    print(f'    Weight:   {summary.total_weight_g:.0f}g')
    print(f'    Calories: {summary.total_calories:.0f} kcal')
    print(f'    Protein:  {summary.total_protein_g:.1f}g')
    print(f'    Fat:      {summary.total_fat_g:.1f}g')
    print(f'    Carbs:    {summary.total_carbs_g:.1f}g')
    print(f'    Fiber:    {summary.total_fiber_g:.1f}g')

    # Geometry context
    geo = all_geometry.get(img_name)
    if geo:
        print(f'\n  Scene calibration: {geo.reference_used} '
              f'(confidence: {geo.confidence})')

## Cell 8: Human Review & Accuracy Annotation

This is the most important cell for the spike. Review the results above and annotate:
- Was each food item correctly detected?
- Were the inferred ingredients reasonable?
- How close are the weight estimates to reality? (weigh your food if possible!)
- How close are the calorie estimates?

This builds a ground truth dataset that will guide decisions on:
1. Whether YOLO alone is sufficient or we need an LLM fallback
2. Whether the volumetric weight estimation is usable
3. Which food categories need the most improvement

In [None]:
@dataclass
class ReviewEntry:
    """Human review of one image's detection results."""
    image_name: str
    # What foods were actually in the photo?
    actual_foods: list[str] = field(default_factory=list)
    # Did the model correctly identify the foods?
    detection_correct: bool = False
    # Were there foods the model missed?
    missed_foods: list[str] = field(default_factory=list)
    # Were there false positives (things detected that aren't food)?
    false_positives: list[str] = field(default_factory=list)
    # Actual weight if known (e.g. you weighed the food)
    actual_weight_g: Optional[float] = None
    # Actual calories if known
    actual_calories: Optional[float] = None
    # Free-form notes
    notes: str = ''


# --- Fill in your reviews here ---
# Uncomment and modify for each test image:

reviews: list[ReviewEntry] = [
    # ReviewEntry(
    #     image_name='lunch.jpg',
    #     actual_foods=['grilled chicken', 'white rice', 'steamed broccoli'],
    #     detection_correct=False,
    #     missed_foods=['rice', 'chicken'],
    #     false_positives=[],
    #     actual_weight_g=450,
    #     actual_calories=520,
    #     notes='YOLO only detected broccoli. Need food-specific model.'
    # ),
]


# --- Save reviews to JSON for later analysis ---
def save_reviews(reviews: list[ReviewEntry], path: str = './review_results.json'):
    data = [asdict(r) for r in reviews]
    with open(path, 'w') as f:
        json.dump(data, f, indent=2)
    print(f'Saved {len(reviews)} review(s) to {path}')


if reviews:
    save_reviews(reviews)

    # Quick accuracy summary
    correct = sum(1 for r in reviews if r.detection_correct)
    print(f'\nDetection accuracy: {correct}/{len(reviews)} '
          f'({correct/len(reviews):.0%})')

    weight_errors = [
        abs(meal_summaries[r.image_name].total_weight_g - r.actual_weight_g)
        / r.actual_weight_g
        for r in reviews
        if r.actual_weight_g and r.image_name in meal_summaries
    ]
    if weight_errors:
        print(f'Weight estimation mean error: {np.mean(weight_errors):.0%}')

    cal_errors = [
        abs(meal_summaries[r.image_name].total_calories - r.actual_calories)
        / r.actual_calories
        for r in reviews
        if r.actual_calories and r.image_name in meal_summaries
    ]
    if cal_errors:
        print(f'Calorie estimation mean error: {np.mean(cal_errors):.0%}')
else:
    print('No reviews yet. Fill in the reviews list above after running the pipeline.')
    print('Tip: Weigh your food on a kitchen scale for ground truth comparison.')

## Next Steps

Based on your review results, decide:

1. **If YOLO COCO detection is too limited** → Fine-tune YOLOv8 on Food-101 or ISIA Food-500 (see ADR-003)
2. **If dish classification is needed** → Train an EfficientNet classifier on Food-101
3. **If weight estimation is too inaccurate** → Consider requiring a reference object (coin, credit card) in photos
4. **If the pipeline works well enough** → Port inference to Go via ONNX runtime (see ADR-002)

See `docs/adr/` for architectural decision records.