# 04 - Order Selector (Dispatch Generator)

## Overview

This notebook demonstrates the dispatch candidate generation system. We solve a knapsack-style optimization problem where we:

- **Maximize** total priority of selected orders
- **Respect** truck capacity constraints (7.0-8.5 pallets preferred, 9.0 hard max)
- **Include** all mandatory orders
- **Consider** zone coherence for routing efficiency

## Selection Strategies

| Strategy | Description |
|----------|-------------|
| Greedy Efficiency | Sort by priority/pallets ratio |
| Greedy Priority | Sort by priority only |
| Greedy Zone | Single zone selection (CABA, North, South, West) |
| Zone Spillover | Start with dominant zone, controlled expansion |
| Best Fit | Optimize for 8.0 pallet utilization |
| DP Optimal | Dynamic programming exact solution |
| Mandatory First | Start with mandatory, fill same zone |
| Mandatory Nearest | Geographic clustering around mandatory |

In [1]:
# Setup and Imports
import sys
from pathlib import Path

# Add project root to path
project_root = Path.cwd().parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

# Standard imports
import json

# Data handling
import pandas as pd
from sqlalchemy import text

# Visualization
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Project modules
from src.database import get_database_manager

from src.order_selector import (
    SelectionStrategy,
    load_selector_config,
    load_pending_orders,
    get_mandatory_orders,
    calculate_mandatory_pallets,
    select_mandatory_subset,
    get_zone_breakdown,
    get_dominant_zone,
    build_dispatch_candidate,
    deduplicate_candidates,
    rank_candidates,
    get_best_single_zone_candidate,
    get_exceptional_multizone_candidates,
    get_top_n_candidates,
    export_candidates_to_json,
    get_candidates_summary_df,
    generate_candidates_by_subsets,
)

print("Imports loaded successfully!")

Imports loaded successfully!


In [2]:
# Path configuration
DATA_DIR = project_root / "data"
CONFIG_DIR = DATA_DIR / "config"
DB_PATH = DATA_DIR / "processed" / "delivery.db"
OUTPUT_DIR = project_root / "output"
DISPATCH_DIR = OUTPUT_DIR / "dispatches"

# Load configuration
CONFIG_PATH = CONFIG_DIR / "order_selector_config.json"
config = load_selector_config(CONFIG_PATH)

# Connect to database
db = get_database_manager(DB_PATH)

print(f"Database: {DB_PATH}")
print(f"Config: {CONFIG_PATH}")
print(f"\nCapacity Configuration:")
print(f"  Nominal: {config.nominal_capacity} pallets")
print(f"  Acceptable Range: {config.min_acceptable} - {config.max_acceptable} pallets")
print(f"  Hard Maximum: {config.hard_max} pallets")
print(f"  Min for Zone Candidate: {config.min_for_zone_candidate} pallets")

Database: c:\Users\Santi\Desktop\CV\portafolio\Eco-Bags-Delivery-Optimizer\data\processed\delivery.db
Config: c:\Users\Santi\Desktop\CV\portafolio\Eco-Bags-Delivery-Optimizer\data\config\order_selector_config.json

Capacity Configuration:
  Nominal: 8.0 pallets
  Acceptable Range: 7.0 - 8.5 pallets
  Hard Maximum: 9.0 pallets
  Min for Zone Candidate: 4.0 pallets


## 1. Load and Analyze Pending Orders

In [3]:
# Load pending orders
orders = load_pending_orders(db)

# Create DataFrame for analysis
orders_df = pd.DataFrame([
    {
        "order_id": o.order_id,
        "client_name": o.client_name,
        "total_pallets": o.total_pallets,
        "priority_score": o.priority_score,
        "zone_id": o.zone_id,
        "is_mandatory": o.is_mandatory,
        "efficiency": o.priority_score / max(o.total_pallets, 0.1),
    }
    for o in orders
])

print(f"Total Pending Orders: {len(orders)}")
print(f"Total Pallets Available: {orders_df['total_pallets'].sum():.2f}")
print(f"Total Priority Available: {orders_df['priority_score'].sum():.2f}")
print(f"\nPallet Statistics:")
print(f"  Min: {orders_df['total_pallets'].min():.2f}")
print(f"  Max: {orders_df['total_pallets'].max():.2f}")
print(f"  Mean: {orders_df['total_pallets'].mean():.2f}")
print(f"  Median: {orders_df['total_pallets'].median():.2f}")

Total Pending Orders: 26
Total Pallets Available: 85.51
Total Priority Available: 2001343.01

Pallet Statistics:
  Min: 0.71
  Max: 8.00
  Mean: 3.29
  Median: 2.82


In [4]:
# Display orders table
display_df = orders_df.sort_values("priority_score", ascending=False).head(15)
display_df

Unnamed: 0,order_id,client_name,total_pallets,priority_score,zone_id,is_mandatory,efficiency
18,ORD-85CA3985,Comercial San Martin,3.71,999999.0,CABA,True,269541.509434
20,ORD-19DC5AB0,Mayorista Don Juan,3.79,999999.0,NORTH_ZONE,True,263851.978892
25,ORD-C4A1485D,Mayorista El Gaucho,1.26,102.26,NORTH_ZONE,False,81.15873
13,ORD-2E43FF51,Fiambreria La Esquina,4.89,100.65,WEST_ZONE,False,20.582822
9,ORD-BAA10376,Comercial Rivadavia,1.75,99.52,SOUTH_ZONE,False,56.868571
5,ORD-1F187AFB,Autoservicio El Trebol,4.81,92.32,WEST_ZONE,False,19.193347
8,ORD-85EB94F2,Comercial El Puente,0.91,85.92,CABA,False,94.417582
0,ORD-F19ECF5B,Distribuidora Pampa,1.69,81.71,NORTH_ZONE,False,48.349112
16,ORD-ACD8A197,Distribuidora del Sur,4.44,75.45,WEST_ZONE,False,16.993243
2,ORD-C8429AF5,Supermercado Norte,0.81,67.27,WEST_ZONE,False,83.049383


## 2. Mandatory Orders Analysis

In [5]:
# Identify mandatory orders
all_mandatory = get_mandatory_orders(orders)
total_mandatory_pallets = calculate_mandatory_pallets(all_mandatory)

print(f"Total Mandatory Orders: {len(all_mandatory)}")
print(f"Total Mandatory Pallets: {total_mandatory_pallets:.2f}")
print(f"Truck Hard Max Capacity: {config.hard_max} pallets")

# Handle mandatory overflow - select subset that fits
if total_mandatory_pallets > config.hard_max:
    print(f"\n‚ö†Ô∏è MANDATORY OVERFLOW: {total_mandatory_pallets:.2f} pallets > {config.hard_max} capacity!")
    print("   Will select subset of mandatory orders that fit in one dispatch.")
    
    mandatory, deferred_mandatory = select_mandatory_subset(
        all_mandatory, config.hard_max, strategy="priority"
    )
    mandatory_pallets = calculate_mandatory_pallets(mandatory)
    
    print(f"\n   ‚úì Selected for this dispatch: {len(mandatory)} orders ({mandatory_pallets:.2f} pallets)")
    print(f"   ‚è≥ Deferred to next dispatch: {len(deferred_mandatory)} orders ({total_mandatory_pallets - mandatory_pallets:.2f} pallets)")
    
    if deferred_mandatory:
        print("\n   Deferred Mandatory Orders:")
        deferred_df = pd.DataFrame([
            {"order_id": o.order_id, "client": o.client_name[:25], "pallets": o.total_pallets, "zone": o.zone_id}
            for o in deferred_mandatory
        ])
        display(deferred_df)
else:
    mandatory = all_mandatory
    deferred_mandatory = []
    mandatory_pallets = total_mandatory_pallets
    print(f"\n‚úì All mandatory orders fit within capacity ({mandatory_pallets:.2f} / {config.hard_max} pallets)")

print(f"\nRemaining Capacity after mandatory: {config.max_acceptable - mandatory_pallets:.2f} pallets")

# Mandatory orders breakdown (selected for this dispatch)
if mandatory:
    mandatory_df = pd.DataFrame([
        {
            "order_id": o.order_id,
            "client_name": o.client_name,
            "pallets": o.total_pallets,
            "priority_score": o.priority_score,
            "zone_id": o.zone_id,
        }
        for o in mandatory
    ])
    print("\nMandatory Orders for This Dispatch:")
    display(mandatory_df)
    
    # Zone breakdown
    mandatory_zones = get_zone_breakdown(mandatory)
    print(f"\nMandatory Zone Breakdown: {mandatory_zones}")
else:
    print("\nNo mandatory orders in pending queue.")

Total Mandatory Orders: 2
Total Mandatory Pallets: 7.50
Truck Hard Max Capacity: 9.0 pallets

‚úì All mandatory orders fit within capacity (7.50 / 9.0 pallets)

Remaining Capacity after mandatory: 1.00 pallets

Mandatory Orders for This Dispatch:


Unnamed: 0,order_id,client_name,pallets,priority_score,zone_id
0,ORD-85CA3985,Comercial San Martin,3.71,999999.0,CABA
1,ORD-19DC5AB0,Mayorista Don Juan,3.79,999999.0,NORTH_ZONE



Mandatory Zone Breakdown: {'CABA': 1, 'NORTH_ZONE': 1}


## 3. Zone Distribution Overview

In [6]:
# Aggregate by zone
zone_summary = orders_df.groupby("zone_id").agg(
    order_count=("order_id", "count"),
    total_pallets=("total_pallets", "sum"),
    total_priority=("priority_score", "sum"),
    avg_efficiency=("efficiency", "mean"),
).reset_index()

zone_summary["priority_per_pallet"] = zone_summary["total_priority"] / zone_summary["total_pallets"]
zone_summary = zone_summary.round(2)

print("Zone Summary:")
display(zone_summary)

Zone Summary:


Unnamed: 0,zone_id,order_count,total_pallets,total_priority,avg_efficiency,priority_per_pallet
0,CABA,8,31.86,1000364.64,33718.12,31398.76
1,NORTH_ZONE,6,16.0,1000286.43,44002.48,62517.9
2,SOUTH_ZONE,5,10.25,245.95,25.37,24.0
3,WEST_ZONE,7,27.4,445.99,25.29,16.28


In [7]:
# Zone color mapping
zone_colors = {
    "CABA": "#FF6B6B",
    "NORTH_ZONE": "#4ECDC4",
    "SOUTH_ZONE": "#45B7D1",
    "WEST_ZONE": "#96CEB4",
}

# Create subplot with 3 bar charts
fig = make_subplots(
    rows=1, cols=3,
    subplot_titles=("Orders per Zone", "Pallets per Zone", "Priority per Zone"),
    horizontal_spacing=0.08,
)

# Orders count
fig.add_trace(
    go.Bar(
        x=zone_summary["zone_id"],
        y=zone_summary["order_count"],
        marker_color=[zone_colors.get(z, "#888") for z in zone_summary["zone_id"]],
        text=zone_summary["order_count"],
        textposition="outside",
        name="Orders",
    ),
    row=1, col=1,
)

# Pallets
fig.add_trace(
    go.Bar(
        x=zone_summary["zone_id"],
        y=zone_summary["total_pallets"],
        marker_color=[zone_colors.get(z, "#888") for z in zone_summary["zone_id"]],
        text=zone_summary["total_pallets"].round(1),
        textposition="outside",
        name="Pallets",
    ),
    row=1, col=2,
)

# Priority
fig.add_trace(
    go.Bar(
        x=zone_summary["zone_id"],
        y=zone_summary["total_priority"],
        marker_color=[zone_colors.get(z, "#888") for z in zone_summary["zone_id"]],
        text=zone_summary["total_priority"].round(0),
        textposition="outside",
        name="Priority",
    ),
    row=1, col=3,
)

fig.update_layout(
    title="Zone Distribution Analysis",
    showlegend=False,
    height=400,
)

fig.show()

In [8]:
# Identify zone with most potential
dominant_zone = get_dominant_zone(orders)
print(f"Dominant Zone (highest total priority): {dominant_zone}")

# Zone efficiency comparison
fig = px.bar(
    zone_summary,
    x="zone_id",
    y="priority_per_pallet",
    color="zone_id",
    color_discrete_map=zone_colors,
    title="Priority Efficiency by Zone (Priority per Pallet)",
    text="priority_per_pallet",
)
fig.update_traces(texttemplate="%{text:.1f}", textposition="outside")
fig.update_layout(showlegend=False, height=400)
fig.show()

Dominant Zone (highest total priority): CABA


## 4. Single Strategy Demo: Zone Spillover

Let's walk through the **Greedy Zone with Spillover** strategy step-by-step to understand how it makes decisions.

In [9]:
# Step 1: Identify dominant zone
print("=" * 60)
print("GREEDY ZONE SPILLOVER - Step by Step")
print("=" * 60)

# Calculate zone priorities
zone_priorities = {}
for order in orders:
    zone_priorities[order.zone_id] = zone_priorities.get(order.zone_id, 0) + order.priority_score

print("\nüìä Step 1: Zone Priority Analysis")
for zone, priority in sorted(zone_priorities.items(), key=lambda x: -x[1]):
    print(f"   {zone}: {priority:.2f} total priority")

dominant = max(zone_priorities, key=zone_priorities.get)
print(f"\n‚úì Dominant Zone: {dominant}")

GREEDY ZONE SPILLOVER - Step by Step

üìä Step 1: Zone Priority Analysis
   CABA: 1000364.64 total priority
   NORTH_ZONE: 1000286.43 total priority
   WEST_ZONE: 445.99 total priority
   SOUTH_ZONE: 245.95 total priority

‚úì Dominant Zone: CABA


In [10]:
# Step 2: Fill from dominant zone
print("\nüì¶ Step 2: Fill from Dominant Zone")

mandatory_ids = {o.order_id for o in mandatory}
dominant_orders = [
    o for o in orders 
    if o.zone_id == dominant and o.order_id not in mandatory_ids
]
dominant_orders.sort(key=lambda o: o.priority_score, reverse=True)

selected = list(mandatory)
current_pallets = mandatory_pallets

print(f"   Starting with {len(mandatory)} mandatory orders ({mandatory_pallets:.2f} pallets)")
print(f"   Available in {dominant}: {len(dominant_orders)} orders")

for order in dominant_orders:
    if current_pallets + order.total_pallets <= config.max_acceptable:
        selected.append(order)
        current_pallets += order.total_pallets
        print(f"   ‚úì Added: {order.order_id} ({order.total_pallets:.2f}p, {order.priority_score:.1f} pri)")
    elif current_pallets >= config.min_acceptable:
        print(f"   ‚äò Capacity reached ({current_pallets:.2f} pallets)")
        break

print(f"\n   After dominant zone: {len(selected)} orders, {current_pallets:.2f} pallets")


üì¶ Step 2: Fill from Dominant Zone
   Starting with 2 mandatory orders (7.50 pallets)
   Available in CABA: 7 orders
   ‚úì Added: ORD-85EB94F2 (0.91p, 85.9 pri)
   ‚äò Capacity reached (8.41 pallets)

   After dominant zone: 3 orders, 8.41 pallets


In [11]:
# Step 3: Check spillover conditions
print("\nüîÑ Step 3: Spillover Decision")

remaining_capacity = config.max_acceptable - current_pallets
print(f"   Remaining capacity: {remaining_capacity:.2f} pallets")
print(f"   Spillover threshold: {config.spillover_capacity_threshold} pallets")

if remaining_capacity > config.spillover_capacity_threshold:
    print(f"   ‚úì Spillover TRIGGERED (remaining > threshold)")
    
    # Get adjacent zones
    adjacent = config.zone_adjacency.get(dominant, [])
    print(f"   Adjacent zones: {adjacent}")
    
    current_priority = sum(o.priority_score for o in selected)
    selected_ids = {o.order_id for o in selected}
    
    adjacent_orders = [
        o for o in orders
        if o.zone_id in adjacent and o.order_id not in selected_ids
    ]
    adjacent_orders.sort(key=lambda o: o.priority_score, reverse=True)
    
    print(f"\n   Evaluating {len(adjacent_orders)} adjacent zone orders:")
    
    for order in adjacent_orders[:5]:  # Show first 5
        if current_pallets + order.total_pallets <= config.max_acceptable:
            marginal_increase = order.priority_score / max(current_priority, 1)
            if marginal_increase >= config.spillover_priority_threshold:
                print(f"   ‚úì {order.order_id} [{order.zone_id}]: +{marginal_increase:.1%} priority (meets {config.spillover_priority_threshold:.0%} threshold)")
                selected.append(order)
                current_pallets += order.total_pallets
                current_priority += order.priority_score
            else:
                print(f"   ‚úó {order.order_id} [{order.zone_id}]: +{marginal_increase:.1%} priority (below threshold)")
else:
    print(f"   ‚úó Spillover NOT triggered (remaining < threshold)")


üîÑ Step 3: Spillover Decision
   Remaining capacity: 0.09 pallets
   Spillover threshold: 2.0 pallets
   ‚úó Spillover NOT triggered (remaining < threshold)


In [12]:
# Build candidate from step-by-step result
demo_candidate = build_dispatch_candidate(selected, SelectionStrategy.GREEDY_ZONE_SPILLOVER, config)

print("\n" + "=" * 60)
print("FINAL RESULT")
print("=" * 60)
if demo_candidate:
    print(f"Candidate ID: {demo_candidate.candidate_id}")
    print(f"Orders: {len(demo_candidate.order_ids)}")
    print(f"Pallets: {demo_candidate.total_pallets} ({demo_candidate.utilization_pct}% utilization)")
    print(f"Priority: {demo_candidate.total_priority} (adjusted: {demo_candidate.adjusted_priority})")
    print(f"Zones: {demo_candidate.zones}")
    print(f"Zone Breakdown: {demo_candidate.zone_breakdown}")
    print(f"Single Zone: {demo_candidate.is_single_zone}")
    print(f"Zone Penalty: {demo_candidate.zone_dispersion_penalty}")


FINAL RESULT
Candidate ID: DISP-20260120-GREEDY-13E3
Orders: 3
Pallets: 8.41 (105.1% utilization)
Priority: 2000083.92 (adjusted: 1900079.72)
Zones: ['NORTH_ZONE', 'CABA']
Zone Breakdown: {'CABA': 2, 'NORTH_ZONE': 1}
Single Zone: False
Zone Penalty: 0.95


## 5. Generate All Candidates

In [13]:
# Generate candidates using all strategies on multiple subsets
# This runs all strategies on:
# 1. Full order set (is_subset=False)
# 2. Non-mandatory orders only (is_subset=True)
# 3. Random subsets from ALL orders (including mandatory)
# 4. Random subsets from ALL non-mandatory orders
# 5. Random subsets from TOP priority orders (with mandatory)
# 6. Random subsets from TOP priority non-mandatory orders
#
# Minimum subset size: 60% of total orders
# Mandatory orders are filtered to fit capacity using zone-based heuristics

all_candidates, subset_generation_info = generate_candidates_by_subsets(
    orders=orders,
    config=config,
    n_random_subsets=12,      # 3 per category (4 categories)
    top_n_for_random=14,      # Consider top 14 orders for "top priority" subsets
    random_seed=42,           # For reproducibility
)

print("=" * 60)
print("CANDIDATE GENERATION SUMMARY")
print("=" * 60)
print(f"\nTotal candidates generated: {len(all_candidates)}")
print(f"\nBreakdown by subset type:")
print(f"  ‚Ä¢ Full order set: {subset_generation_info['full_set_count']} candidates")
print(f"  ‚Ä¢ Non-mandatory only: {subset_generation_info['non_mandatory_set_count']} candidates")
print(f"  ‚Ä¢ Random from ALL orders: {subset_generation_info['random_all_orders_count']} candidates")
print(f"  ‚Ä¢ Random from non-mandatory: {subset_generation_info['random_non_mandatory_count']} candidates")
print(f"  ‚Ä¢ Random from TOP priority (with mand.): {subset_generation_info['random_top_with_mandatory_count']} candidates")
print(f"  ‚Ä¢ Random from TOP priority (no mand.): {subset_generation_info['random_top_without_mandatory_count']} candidates")

print(f"\nSubsets generated ({len(subset_generation_info['subsets_generated'])} total):")
for subset_info in subset_generation_info["subsets_generated"]:
    mand_label = "‚úì" if subset_info["includes_mandatory"] else "‚úó"
    print(f"  {mand_label} {subset_info['name']}: {subset_info['order_count']} orders ‚Üí {subset_info['candidates_generated']} candidates")

print(f"\n" + "-" * 60)
print("Sample of generated candidates:")
for c in all_candidates[:10]:
    subset_label = " (subset)" if c.is_subset else ""
    print(f"  ‚Ä¢ {c.strategy.value}{subset_label}: {len(c.order_ids)} orders, {c.total_pallets}p, {c.total_priority:.0f} priority")

CANDIDATE GENERATION SUMMARY

Total candidates generated: 81

Breakdown by subset type:
  ‚Ä¢ Full order set: 10 candidates
  ‚Ä¢ Non-mandatory only: 10 candidates
  ‚Ä¢ Random from ALL orders: 30 candidates
  ‚Ä¢ Random from non-mandatory: 31 candidates
  ‚Ä¢ Random from TOP priority (with mand.): 0 candidates
  ‚Ä¢ Random from TOP priority (no mand.): 0 candidates

Subsets generated (8 total):
  ‚úì full_set: 26 orders ‚Üí 10 candidates
  ‚úó non_mandatory_only: 24 orders ‚Üí 10 candidates
  ‚úì random_all_orders_1: 25 orders ‚Üí 10 candidates
  ‚úì random_all_orders_2: 24 orders ‚Üí 10 candidates
  ‚úì random_all_orders_3: 16 orders ‚Üí 10 candidates
  ‚úó random_non_mandatory_1: 19 orders ‚Üí 11 candidates
  ‚úó random_non_mandatory_2: 23 orders ‚Üí 10 candidates
  ‚úó random_non_mandatory_3: 23 orders ‚Üí 10 candidates

------------------------------------------------------------
Sample of generated candidates:
  ‚Ä¢ greedy_efficiency: 3 orders, 8.41p, 2000084 priority
  ‚Ä¢ greed

In [14]:
# Summary DataFrame before deduplication
raw_summary_df = get_candidates_summary_df(all_candidates)
print("\nAll Candidates (before deduplication):")
display(raw_summary_df)


All Candidates (before deduplication):


Unnamed: 0,candidate_id,strategy,total_pallets,adjusted_priority,total_priority,utilization_pct,zones,zone_count,is_single_zone,order_count,mandatory_count,is_subset
0,DISP-20260120-GREEDY-B86C,greedy_efficiency,8.41,1900079.72,2000083.92,105.1,"NORTH_ZONE, CABA",2,False,3,2,False
7,DISP-20260120-GREEDY-A488,greedy_best_fit,8.41,1900079.72,2000083.92,105.1,"NORTH_ZONE, CABA",2,False,3,2,False
8,DISP-20260120-MANDAT-103F,mandatory_first,8.41,1900079.72,2000083.92,105.1,"NORTH_ZONE, CABA",2,False,3,2,False
6,DISP-20260120-GREEDY-E2EF,greedy_zone_spillover,8.41,1900079.72,2000083.92,105.1,"NORTH_ZONE, CABA",2,False,3,2,False
30,DISP-20260120-GREEDY-SUB-47A1,greedy_efficiency (subset),8.41,1900079.72,2000083.92,105.1,"NORTH_ZONE, CABA",2,False,3,2,True
...,...,...,...,...,...,...,...,...,...,...,...,...
63,DISP-20260120-GREEDY-SUB-B65C,greedy_zone_caba (subset),8.91,149.48,149.48,111.4,CABA,1,True,2,0,True
17,DISP-20260120-GREEDY-SUB-B422,greedy_best_fit (subset),8.00,63.56,63.56,100.0,CABA,1,True,1,0,True
57,DISP-20260120-GREEDY-SUB-4017,greedy_best_fit (subset),8.00,63.56,63.56,100.0,CABA,1,True,1,0,True
68,DISP-20260120-GREEDY-SUB-B3DB,greedy_best_fit (subset),8.00,63.56,63.56,100.0,CABA,1,True,1,0,True


## 6. Deduplication Analysis

In [15]:
# Deduplicate candidates
unique_candidates = deduplicate_candidates(all_candidates)

print(f"Before deduplication: {len(all_candidates)} candidates")
print(f"After deduplication: {len(unique_candidates)} unique candidates")
print(f"Duplicates removed: {len(all_candidates) - len(unique_candidates)}")

# Count by subset status
full_set_unique = sum(1 for c in unique_candidates if not c.is_subset)
subset_unique = sum(1 for c in unique_candidates if c.is_subset)
print(f"\nUnique candidates breakdown:")
print(f"  ‚Ä¢ From full order set: {full_set_unique}")
print(f"  ‚Ä¢ From subsets: {subset_unique}")

# Identify which strategies produced duplicates
order_sets = {}
for c in all_candidates:
    key = frozenset(c.order_ids)
    if key not in order_sets:
        order_sets[key] = []
    subset_label = " (subset)" if c.is_subset else ""
    order_sets[key].append(f"{c.strategy.value}{subset_label}")

print("\nStrategies producing same order sets:")
for order_set, strategies in order_sets.items():
    if len(strategies) > 1:
        print(f"  - {strategies}")

Before deduplication: 81 candidates
After deduplication: 28 unique candidates
Duplicates removed: 53

Unique candidates breakdown:
  ‚Ä¢ From full order set: 7
  ‚Ä¢ From subsets: 21

Strategies producing same order sets:
  - ['greedy_efficiency', 'greedy_zone_spillover', 'greedy_best_fit', 'mandatory_first', 'greedy_efficiency (subset)', 'greedy_zone_spillover (subset)', 'greedy_best_fit (subset)', 'mandatory_first (subset)', 'greedy_efficiency (subset)', 'greedy_zone_spillover (subset)', 'greedy_best_fit (subset)']
  - ['greedy_priority', 'greedy_priority (subset)', 'greedy_priority (subset)', 'mandatory_first (subset)', 'greedy_priority (subset)', 'greedy_zone_spillover (subset)', 'greedy_best_fit (subset)', 'mandatory_first (subset)']
  - ['greedy_zone_caba', 'greedy_zone_caba (subset)']
  - ['greedy_zone_north', 'greedy_zone_north (subset)', 'greedy_zone_north (subset)']
  - ['greedy_zone_south', 'greedy_zone_south (subset)', 'greedy_zone_south (subset)', 'greedy_zone_south (subse

In [16]:
# Filter candidates WITHOUT mandatory orders for comparison charts
# These are already included in our subset generation (non-mandatory only + random without mandatory)
unique_non_mandatory = [c for c in unique_candidates if c.mandatory_count == 0]
non_mandatory_summary_df = get_candidates_summary_df(unique_non_mandatory, include_order_ids=True)

print(f"Non-mandatory candidates (from unique set): {len(unique_non_mandatory)}")
print(f"These candidates show real priority scores without the 999999 mandatory boost")

Non-mandatory candidates (from unique set): 19
These candidates show real priority scores without the 999999 mandatory boost


## 7. Candidates Comparison

In [17]:
# Summary DataFrame with order IDs
summary_df = get_candidates_summary_df(unique_candidates, include_order_ids=True)
print("Unique Candidates Summary:")
display(summary_df)

Unique Candidates Summary:


Unnamed: 0,candidate_id,strategy,total_pallets,adjusted_priority,total_priority,utilization_pct,zones,zone_count,is_single_zone,order_count,mandatory_count,is_subset,order_ids
0,DISP-20260120-GREEDY-B86C,greedy_efficiency,8.41,1900079.72,2000083.92,105.1,"NORTH_ZONE, CABA",2,False,3,2,False,"ORD-85CA3985, ORD-19DC5AB0, ORD-85EB94F2"
6,DISP-20260120-GREEDY-FD99,greedy_mandatory_nearest,8.21,1900037.08,2000039.03,102.6,"NORTH_ZONE, CABA",2,False,3,2,False,"ORD-85CA3985, ORD-19DC5AB0, ORD-071F2BA3"
1,DISP-20260120-GREEDY-9D42,greedy_priority,7.5,1899998.1,1999998.0,93.8,"NORTH_ZONE, CABA",2,False,2,2,False,"ORD-85CA3985, ORD-19DC5AB0"
14,DISP-20260120-GREEDY-SUB-DC32,greedy_efficiency (subset),8.31,1700055.48,2000065.27,103.9,"NORTH_ZONE, WEST_ZONE, CABA",3,False,3,2,True,"ORD-19DC5AB0, ORD-85CA3985, ORD-C8429AF5"
3,DISP-20260120-GREEDY-04D1,greedy_zone_north,6.74,1000182.97,1000182.97,84.2,NORTH_ZONE,1,True,3,1,False,"ORD-19DC5AB0, ORD-C4A1485D, ORD-F19ECF5B"
12,DISP-20260120-GREEDY-SUB-93BD,greedy_zone_caba (subset),8.0,1000167.7,1000167.7,100.0,CABA,1,True,4,1,True,"ORD-85CA3985, ORD-85EB94F2, ORD-E189D08A, ORD-..."
2,DISP-20260120-GREEDY-B27D,greedy_zone_caba,8.93,1000140.51,1000140.51,111.6,CABA,1,True,3,1,False,"ORD-85CA3985, ORD-85EB94F2, ORD-BAA8DE46"
16,DISP-20260120-GREEDY-SUB-0946,greedy_zone_north (subset),7.36,1000118.55,1000118.55,92.0,NORTH_ZONE,1,True,3,1,True,"ORD-19DC5AB0, ORD-C4A1485D, ORD-EB0E2EBD"
15,DISP-20260120-GREEDY-SUB-35E3,greedy_zone_caba (subset),7.09,1000081.78,1000081.78,88.6,CABA,1,True,3,1,True,"ORD-85CA3985, ORD-E189D08A, ORD-071F2BA3"
7,DISP-20260120-GREEDY-SUB-E96E,greedy_efficiency (subset),7.13,406.05,477.71,89.1,"NORTH_ZONE, SOUTH_ZONE, WEST_ZONE, CABA",4,False,6,0,True,"ORD-85EB94F2, ORD-C8429AF5, ORD-C4A1485D, ORD-..."


In [18]:
# Scatter: Priority vs Utilization
fig = px.scatter(
    summary_df,
    x="utilization_pct",
    y="adjusted_priority",
    color="zone_count",
    size="order_count",
    hover_data=["strategy", "zones", "total_pallets"],
    title="Candidate Comparison: Priority vs Utilization",
    labels={
        "utilization_pct": "Utilization (%)",
        "adjusted_priority": "Adjusted Priority",
        "zone_count": "Zone Count",
    },
    color_continuous_scale="RdYlGn_r",
)

# Add reference lines for capacity range
fig.add_vline(x=87.5, line_dash="dash", line_color="gray", annotation_text="Min (7.0p)")
fig.add_vline(x=106.25, line_dash="dash", line_color="gray", annotation_text="Max (8.5p)")

fig.update_layout(height=500)
fig.show()

In [19]:
# Bar chart: Adjusted priority per candidate
fig = px.bar(
    summary_df.sort_values("adjusted_priority", ascending=False),
    x="strategy",
    y="adjusted_priority",
    color="is_single_zone",
    title="Adjusted Priority by Strategy",
    color_discrete_map={True: "#4ECDC4", False: "#FF6B6B"},
    labels={"is_single_zone": "Single Zone"},
)
fig.update_layout(height=450, xaxis_tickangle=45)
fig.show()

In [20]:
# Zone composition per candidate (stacked bar)
zone_data = []
for c in unique_candidates:
    for zone, count in c.zone_breakdown.items():
        zone_data.append({
            "strategy": c.strategy.value,
            "zone": zone,
            "order_count": count,
        })

zone_df = pd.DataFrame(zone_data)

fig = px.bar(
    zone_df,
    x="strategy",
    y="order_count",
    color="zone",
    title="Zone Composition by Strategy",
    color_discrete_map=zone_colors,
)
fig.update_layout(height=450, xaxis_tickangle=45, barmode="stack")
fig.show()

### Comparison Charts (Without Mandatory Orders)

The following charts show the same visualizations but using candidates generated **without mandatory orders**. This allows us to see the real priority distribution without the 999999 priority boost from mandatory flags.

In [21]:
# Scatter: Priority vs Utilization (Non-Mandatory)
fig = px.scatter(
    non_mandatory_summary_df,
    x="utilization_pct",
    y="adjusted_priority",
    color="zone_count",
    size="order_count",
    hover_data=["strategy", "zones", "total_pallets"],
    title="Candidate Comparison: Priority vs Utilization (WITHOUT Mandatory)",
    labels={
        "utilization_pct": "Utilization (%)",
        "adjusted_priority": "Adjusted Priority (Real Scores)",
        "zone_count": "Zone Count",
    },
    color_continuous_scale="RdYlGn_r",
)

# Add reference lines for capacity range
fig.add_vline(x=87.5, line_dash="dash", line_color="gray", annotation_text="Min (7.0p)")
fig.add_vline(x=106.25, line_dash="dash", line_color="gray", annotation_text="Max (8.5p)")

fig.update_layout(height=500)
fig.show()

In [22]:
# Bar chart: Adjusted priority per candidate (Non-Mandatory)
fig = px.bar(
    non_mandatory_summary_df.sort_values("adjusted_priority", ascending=False),
    x="strategy",
    y="adjusted_priority",
    color="is_single_zone",
    title="Adjusted Priority by Strategy (WITHOUT Mandatory)",
    color_discrete_map={True: "#4ECDC4", False: "#FF6B6B"},
    labels={"is_single_zone": "Single Zone"},
)
fig.update_layout(height=450, xaxis_tickangle=45)
fig.show()

In [23]:
# Zone composition per candidate (Non-Mandatory)
zone_data_nm = []
for c in unique_non_mandatory:
    for zone, count in c.zone_breakdown.items():
        zone_data_nm.append({
            "strategy": c.strategy.value,
            "zone": zone,
            "order_count": count,
        })

zone_df_nm = pd.DataFrame(zone_data_nm)

fig = px.bar(
    zone_df_nm,
    x="strategy",
    y="order_count",
    color="zone",
    title="Zone Composition by Strategy (WITHOUT Mandatory)",
    color_discrete_map=zone_colors,
)
fig.update_layout(height=450, xaxis_tickangle=45, barmode="stack")
fig.show()

## 8. Zone Coherence Analysis

In [24]:
# Best single-zone candidate
best_single = get_best_single_zone_candidate(unique_candidates)

if best_single:
    print("üèÜ Best Single-Zone Candidate:")
    print(f"   Strategy: {best_single.strategy.value}")
    print(f"   Zone: {best_single.zones[0]}")
    print(f"   Orders: {len(best_single.order_ids)}")
    print(f"   Pallets: {best_single.total_pallets} ({best_single.utilization_pct}%)")
    print(f"   Priority: {best_single.total_priority} (adjusted: {best_single.adjusted_priority})")
else:
    print("No single-zone candidates available.")

üèÜ Best Single-Zone Candidate:
   Strategy: greedy_zone_north
   Zone: NORTH_ZONE
   Orders: 3
   Pallets: 6.74 (84.2%)
   Priority: 1000182.97 (adjusted: 1000182.97)


In [25]:
# Exceptional multi-zone candidates
if best_single:
    exceptional = get_exceptional_multizone_candidates(
        unique_candidates,
        best_single.adjusted_priority,
        config.multizone_exception_threshold,
    )
    
    print(f"\nüåê Exceptional Multi-Zone Candidates (>{config.multizone_exception_threshold:.0%} above best single):")
    if exceptional:
        for c in exceptional:
            improvement = (c.total_priority / best_single.adjusted_priority - 1) * 100
            print(f"   - {c.strategy.value}: {c.total_priority:.0f} priority (+{improvement:.1f}%)")
            print(f"     Zones: {c.zones}")
    else:
        print("   None - single-zone solutions are optimal for this order set.")


üåê Exceptional Multi-Zone Candidates (>30% above best single):
   - greedy_efficiency: 2000084 priority (+100.0%)
     Zones: ['NORTH_ZONE', 'CABA']
   - greedy_priority: 1999998 priority (+100.0%)
     Zones: ['NORTH_ZONE', 'CABA']
   - greedy_mandatory_nearest: 2000039 priority (+100.0%)
     Zones: ['NORTH_ZONE', 'CABA']
   - greedy_efficiency: 2000065 priority (+100.0%)
     Zones: ['NORTH_ZONE', 'WEST_ZONE', 'CABA']


In [26]:
# Compare best single vs best multi-zone
multi_zone_candidates = [c for c in unique_candidates if not c.is_single_zone]
best_multi = max(multi_zone_candidates, key=lambda c: c.adjusted_priority) if multi_zone_candidates else None

if best_single and best_multi:
    comparison_data = {
        "Metric": ["Strategy", "Orders", "Pallets", "Utilization", "Priority", "Adjusted Priority", "Zones"],
        "Best Single-Zone": [
            best_single.strategy.value,
            len(best_single.order_ids),
            best_single.total_pallets,
            f"{best_single.utilization_pct}%",
            best_single.total_priority,
            best_single.adjusted_priority,
            best_single.zones[0],
        ],
        "Best Multi-Zone": [
            best_multi.strategy.value,
            len(best_multi.order_ids),
            best_multi.total_pallets,
            f"{best_multi.utilization_pct}%",
            best_multi.total_priority,
            best_multi.adjusted_priority,
            ", ".join(best_multi.zones),
        ],
    }
    comparison_df = pd.DataFrame(comparison_data)
    print("\nüìä Single vs Multi-Zone Comparison:")
    display(comparison_df)


üìä Single vs Multi-Zone Comparison:


Unnamed: 0,Metric,Best Single-Zone,Best Multi-Zone
0,Strategy,greedy_zone_north,greedy_efficiency
1,Orders,3,3
2,Pallets,6.74,8.41
3,Utilization,84.2%,105.1%
4,Priority,1000182.97,2000083.92
5,Adjusted Priority,1000182.97,1900079.72
6,Zones,NORTH_ZONE,"NORTH_ZONE, CABA"


## 9. Ranking and Final Selection

In [27]:
# Rank candidates
ranked_candidates = rank_candidates(unique_candidates, config)

print("Candidates Ranked by Combined Score:")
print(f"  Weights: Priority={config.ranking_weight_priority}, "
      f"Utilization={config.ranking_weight_utilization}, "
      f"Zone Coherence={config.ranking_weight_zone_coherence}")
print()

for i, c in enumerate(ranked_candidates, 1):
    zone_label = "‚úì Single" if c.is_single_zone else f"‚úó Multi ({len(c.zones)})"
    print(f"  #{i}: {c.strategy.value}")
    print(f"      {c.total_pallets}p | {c.adjusted_priority:.0f} adj. priority | {zone_label}")

Candidates Ranked by Combined Score:
  Weights: Priority=0.5, Utilization=0.3, Zone Coherence=0.2

  #1: greedy_efficiency
      8.41p | 1900080 adj. priority | ‚úó Multi (2)
  #2: greedy_mandatory_nearest
      8.21p | 1900037 adj. priority | ‚úó Multi (2)
  #3: greedy_priority
      7.5p | 1899998 adj. priority | ‚úó Multi (2)
  #4: greedy_efficiency
      8.31p | 1700055 adj. priority | ‚úó Multi (3)
  #5: greedy_zone_caba
      8.93p | 1000141 adj. priority | ‚úì Single
  #6: greedy_zone_caba
      8.0p | 1000168 adj. priority | ‚úì Single
  #7: greedy_zone_north
      7.36p | 1000119 adj. priority | ‚úì Single
  #8: greedy_zone_caba
      7.09p | 1000082 adj. priority | ‚úì Single
  #9: greedy_zone_north
      6.74p | 1000183 adj. priority | ‚úì Single
  #10: greedy_zone_caba
      8.91p | 149 adj. priority | ‚úì Single
  #11: greedy_zone_north
      8.21p | 188 adj. priority | ‚úì Single
  #12: greedy_zone_south
      8.06p | 212 adj. priority | ‚úì Single
  #13: greedy_best_fit


In [28]:
# Top 5 candidates with full breakdown
top_5 = get_top_n_candidates(ranked_candidates, 5)

print("\n" + "=" * 70)
print("TOP 5 DISPATCH CANDIDATES")
print("=" * 70)

for i, c in enumerate(top_5, 1):
    print(f"\n{'‚îÄ' * 70}")
    print(f"Rank #{i}: {c.candidate_id}")
    print(f"{'‚îÄ' * 70}")
    print(f"Strategy:      {c.strategy.value}")
    print(f"Orders:        {len(c.order_ids)}")
    print(f"Pallets:       {c.total_pallets} ({c.utilization_pct}% utilization)")
    print(f"Priority:      {c.total_priority:.2f} (raw)")
    print(f"Adjusted:      {c.adjusted_priority:.2f} (after {c.zone_dispersion_penalty}x zone penalty)")
    print(f"Zones:         {', '.join(c.zones)}")
    print(f"Zone Breakdown: {c.zone_breakdown}")
    print(f"Mandatory:     {c.mandatory_count} orders included")
    print(f"\nOrders in this dispatch:")
    for order in c.orders:
        mand = "‚òÖ" if order["is_mandatory"] else " "
        print(f"  {mand} {order['order_id']}: {order['client_name'][:25]:<25} | "
              f"{order['pallets']:.2f}p | {order['priority_score']:.1f}pri | {order['zone_id']}")


TOP 5 DISPATCH CANDIDATES

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
Rank #1: DISP-20260120-GREEDY-B86C
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
Strategy:      greedy_efficiency
Orders:        3
Pallets:       8.41 (105.1% utilization)
Priority:      2000083.92 (raw)
Adjusted:      1900079.72 (after 0.95x zone penalty)
Zones:         NORTH_ZONE, CABA
Zone Breakdown: {'CABA': 2, 'NORTH_ZONE': 1}
Mandatory:     2 orders included

Orders in this dispatch:
  ‚òÖ ORD-85CA3985: Comercial San Martin      | 3.71p | 999999.0pri | CABA
  ‚òÖ ORD-19DC5AB0: Mayorista Don Juan        | 3.79p | 999999.0pri | NORTH_ZONE
    ORD-85EB94F2: Comercial El Puente       | 0

### Non-Mandatory Candidates Comparison

The following shows dispatch candidates generated **without mandatory orders**, allowing us to see the true priority scores without the 999999 boost from mandatory flags. This is useful for understanding which orders have the highest actual priority.

In [29]:
# Non-mandatory candidates summary (already generated earlier for charts)
print(f"Non-mandatory candidates: {len(unique_non_mandatory)} unique")
print("\nNon-Mandatory Candidates Summary:")
display(non_mandatory_summary_df)

Non-mandatory candidates: 19 unique

Non-Mandatory Candidates Summary:


Unnamed: 0,candidate_id,strategy,total_pallets,adjusted_priority,total_priority,utilization_pct,zones,zone_count,is_single_zone,order_count,mandatory_count,is_subset,order_ids
2,DISP-20260120-GREEDY-SUB-E96E,greedy_efficiency (subset),7.13,406.05,477.71,89.1,"NORTH_ZONE, SOUTH_ZONE, WEST_ZONE, CABA",4,False,6,0,True,"ORD-85EB94F2, ORD-C8429AF5, ORD-C4A1485D, ORD-..."
16,DISP-20260120-DP_OPT-SUB-6138,dp_optimal (subset),8.78,397.04,467.1,109.7,"NORTH_ZONE, SOUTH_ZONE, WEST_ZONE, CABA",4,False,7,0,True,"ORD-CE98BB79, ORD-C4A1485D, ORD-67B478B7, ORD-..."
10,DISP-20260120-GREEDY-SUB-2B52,greedy_efficiency (subset),7.63,380.39,447.52,95.4,"NORTH_ZONE, SOUTH_ZONE, WEST_ZONE, CABA",4,False,6,0,True,"ORD-85EB94F2, ORD-C8429AF5, ORD-C4A1485D, ORD-..."
17,DISP-20260120-GREEDY-SUB-4E75,greedy_zone_spillover (subset),7.87,302.69,356.1,98.4,"NORTH_ZONE, WEST_ZONE, CABA",3,False,4,0,True,"ORD-2E43FF51, ORD-C8429AF5, ORD-C4A1485D, ORD-..."
15,DISP-20260120-GREEDY-SUB-5109,greedy_zone_spillover (subset),7.19,270.56,284.8,89.9,"NORTH_ZONE, CABA",2,False,4,0,True,"ORD-85EB94F2, ORD-BAA8DE46, ORD-071F2BA3, ORD-..."
3,DISP-20260120-GREEDY-SUB-9B53,greedy_priority (subset),7.9,257.07,302.43,98.8,"NORTH_ZONE, SOUTH_ZONE, WEST_ZONE",3,False,3,0,True,"ORD-C4A1485D, ORD-2E43FF51, ORD-BAA10376"
11,DISP-20260120-GREEDY-SUB-5E4E,greedy_priority (subset),7.82,249.99,294.1,97.7,"NORTH_ZONE, SOUTH_ZONE, WEST_ZONE",3,False,3,0,True,"ORD-C4A1485D, ORD-BAA10376, ORD-1F187AFB"
5,DISP-20260120-GREEDY-SUB-45B0,greedy_zone_north (subset),7.4,232.64,232.64,92.5,NORTH_ZONE,1,True,3,0,True,"ORD-C4A1485D, ORD-F19ECF5B, ORD-47C6DDC7"
1,DISP-20260120-GREEDY-56E6,greedy_zone_west,7.94,225.94,225.94,99.2,WEST_ZONE,1,True,3,0,False,"ORD-2E43FF51, ORD-C8429AF5, ORD-D991C05F"
0,DISP-20260120-GREEDY-2DF4,greedy_zone_south,8.06,212.07,212.07,100.8,SOUTH_ZONE,1,True,4,0,False,"ORD-BAA10376, ORD-CE98BB79, ORD-71A0B5C1, ORD-..."


In [30]:
# Rank and show Top 5 Non-Mandatory Candidates
ranked_non_mandatory = rank_candidates(unique_non_mandatory, config)
top_5_non_mandatory = get_top_n_candidates(ranked_non_mandatory, 5)

print("\n" + "=" * 70)
print("TOP 5 DISPATCH CANDIDATES (WITHOUT MANDATORY)")
print("=" * 70)
print("These candidates show real priority scores without the 999999 mandatory boost")

for i, c in enumerate(top_5_non_mandatory, 1):
    print(f"\n{'‚îÄ' * 70}")
    print(f"Rank #{i}: {c.candidate_id}")
    print(f"{'‚îÄ' * 70}")
    print(f"Strategy:      {c.strategy.value}")
    print(f"Orders:        {len(c.order_ids)}")
    print(f"Pallets:       {c.total_pallets} ({c.utilization_pct}% utilization)")
    print(f"Priority:      {c.total_priority:.2f} (real priority, no mandatory boost)")
    print(f"Adjusted:      {c.adjusted_priority:.2f} (after {c.zone_dispersion_penalty}x zone penalty)")
    print(f"Zones:         {', '.join(c.zones)}")
    print(f"Zone Breakdown: {c.zone_breakdown}")
    print(f"\nOrders in this dispatch:")
    for order in c.orders:
        print(f"    {order['order_id']}: {order['client_name'][:25]:<25} | "
              f"{order['pallets']:.2f}p | {order['priority_score']:.1f}pri | {order['zone_id']}")


TOP 5 DISPATCH CANDIDATES (WITHOUT MANDATORY)
These candidates show real priority scores without the 999999 mandatory boost

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
Rank #1: DISP-20260120-DP_OPT-SUB-6138
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
Strategy:      dp_optimal
Orders:        7
Pallets:       8.78 (109.7% utilization)
Priority:      467.10 (real priority, no mandatory boost)
Adjusted:      397.04 (after 0.85x zone penalty)
Zones:         NORTH_ZONE, SOUTH_ZONE, WEST_ZONE, CABA
Zone Breakdown: {'SOUTH_ZONE': 3, 'NORTH_ZONE': 1, 'WEST_ZONE': 1, 'CABA': 2}

Orders in this dispatch:
    ORD-CE98BB79: Jugueter√≠a Fran Cardozo   | 2.19p | 51.5pri 

## 10. Export Candidates

In [31]:
# Create output directory
DISPATCH_DIR.mkdir(parents=True, exist_ok=True)

# Export all unique candidates
output_file = DISPATCH_DIR / f"dispatch_candidates.json"

export_candidates_to_json(ranked_candidates, output_file)

print(f"Exported {len(ranked_candidates)} candidates to:")
print(f"  {output_file}")

Exported 28 candidates to:
  c:\Users\Santi\Desktop\CV\portafolio\Eco-Bags-Delivery-Optimizer\output\dispatches\dispatch_candidates.json


In [32]:
# Export top candidate separately
if top_5:
    top_file = DISPATCH_DIR / f"top_dispatch.json"
    export_candidates_to_json([top_5[0]], top_file)
    print(f"\nTop candidate exported to:")
    print(f"  {top_file}")
    
# Show sample of exported JSON
print("\nüìÑ Sample Export Format:")
with open(output_file, "r") as f:
    data = json.load(f)
    # Show just first candidate summary
    if data["candidates"]:
        sample = data["candidates"][0]
        print(json.dumps({
            "candidate_id": sample["candidate_id"],
            "strategy": sample["strategy"],
            "summary": sample["summary"],
        }, indent=2))


Top candidate exported to:
  c:\Users\Santi\Desktop\CV\portafolio\Eco-Bags-Delivery-Optimizer\output\dispatches\top_dispatch.json

üìÑ Sample Export Format:
{
  "candidate_id": "DISP-20260120-GREEDY-B86C",
  "strategy": "greedy_efficiency",
  "summary": {
    "total_pallets": 8.41,
    "total_priority": 2000083.92,
    "utilization_pct": 105.1,
    "order_count": 3,
    "zones": [
      "NORTH_ZONE",
      "CABA"
    ],
    "zone_breakdown": {
      "CABA": 2,
      "NORTH_ZONE": 1
    },
    "is_single_zone": false,
    "zone_dispersion_penalty": 0.95,
    "adjusted_priority": 1900079.72,
    "mandatory_included": true,
    "mandatory_count": 2
  }
}


In [33]:
# Save candidates to database for next phases

# Recreate db connection with new methods
db = get_database_manager(DB_PATH)

# Drop old dispatch tables to clear old schema with generation_batch_id
with db.engine.begin() as conn:
    conn.execute(text("DROP TABLE IF EXISTS dispatch_candidate_orders"))
    conn.execute(text("DROP TABLE IF EXISTS dispatch_candidates"))

# Create fresh tables
db.create_tables()

# Save ranked candidates to database (will overwrite previous run)
saved_count = db.save_dispatch_candidates(ranked_candidates, ranked=True)
print(f"\nüíæ Saved {saved_count} candidates to database")
print(f"   (Previous candidates were overwritten)")

# Verify data was saved
print("\nüìä Verification - All candidates in database:")
latest = db.get_all_dispatch_candidates()
for c in latest[:3]:
    print(f"   #{c.rank}: {c.strategy} - {c.total_pallets}p, {c.adjusted_priority:.0f} adj.pri")


üíæ Saved 28 candidates to database
   (Previous candidates were overwritten)

üìä Verification - All candidates in database:
   #1: greedy_efficiency - 8.41p, 1900080 adj.pri
   #2: greedy_mandatory_nearest - 8.21p, 1900037 adj.pri
   #3: greedy_priority - 7.5p, 1899998 adj.pri


## Summary

This notebook demonstrated:

1. **Order Analysis**: Loaded pending orders with priority scores and zone assignments
2. **Mandatory Handling**: Identified must-include orders and handled overflow by selecting subset
3. **Zone Distribution**: Analyzed order distribution across zones
4. **Strategy Demo**: Walked through zone spillover strategy step-by-step
5. **Candidate Generation**: Generated candidates using 11 different strategies
6. **Deduplication**: Removed duplicate solutions from different strategies
7. **Comparison**: Visualized candidates by priority, utilization, and zones (with and without mandatory)
8. **Zone Coherence**: Compared single-zone vs multi-zone solutions
9. **Ranking**: Applied weighted ranking to select top candidates
10. **Export**: Saved candidates in JSON format and to database

### Database Output

The following tables were populated for use in next phases:

| Table | Description |
|-------|-------------|
| `dispatch_candidates` | Generated dispatch candidates with strategy, pallets, priority, zones |
| `dispatch_candidate_orders` | Relationship between candidates and orders |

### Pipeline Status

| Phase | Notebook | Status |
|-------|----------|--------|
| **Phase 1** | 01_base_data_setup | ‚úÖ Complete |
| **Phase 2** | 02_receipt_extraction | ‚úÖ Complete |
| **Phase 3** | 03_priority_score | ‚úÖ Complete |
| **Phase 4** | 04_order_selector | ‚úÖ Complete |
| **Phase 5** | 05_route_optimizer | ‚úÖ Complete |