# Phase 3: Priority Scoring System

This notebook demonstrates the **configurable priority scoring system** for ranking delivery orders.

## Overview

The system calculates priority scores based on four weighted factors:

**PRIORITY_SCORE = (w1 √ó URGENCY) + (w2 √ó PAYMENT) + (w3 √ó CLIENT) + (w4 √ó AGE)**

| Factor | Weight | Description |
|--------|--------|-------------|
| **Urgency** | 40% | Days until delivery deadline |
| **Payment** | 25% | Total amount and Payment status (paid/partial/pending) |
| **Client** | 20% | Client type (star/new/frequent/regular) |
| **Age** | 15% | Days since order was placed |

**Exception:** Orders marked as `is_mandatory = True` receive maximum priority (999999).

In [1]:
# Standard library imports 
import sys
from datetime import date
from pathlib import Path

# Add src to path for imports
project_root = Path.cwd().parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

# External imports
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Reload the scoring module to get updates
import importlib
if 'src.scoring' in sys.modules:
    importlib.reload(sys.modules['src.scoring'])

# Local imports
from src.database import (
    DatabaseManager,
    OrderModel,
)
from src.scoring import (
    load_scoring_config,
    calculate_urgency_score,
    calculate_payment_score,
    calculate_client_score,
    calculate_age_score,
    calculate_data_ranges,
    score_all_pending_orders,
    update_all_priority_scores,
)

print("Imports successful!")

Imports successful!


In [2]:
# Define paths
DATA_DIR = project_root / "data"
CONFIG_DIR = DATA_DIR / "config"
DB_PATH = DATA_DIR / "processed" / "delivery.db"
CONFIG_PATH = CONFIG_DIR / "scoring_weights.json"

# Initialize database manager
db = DatabaseManager(DB_PATH)

# Load scoring configuration
config = load_scoring_config(CONFIG_PATH)

print(f"üìÅ Database: {DB_PATH}")
print(f"üìÅ Config: {CONFIG_PATH}")
print(f"\nüìä Loaded Scoring Weights:")
print(f"   Urgency:  {config.weight_urgency:.0%}")
print(f"   Payment:  {config.weight_payment:.0%}")
print(f"   Client:   {config.weight_client:.0%}")
print(f"   Age:      {config.weight_age:.0%}")

üìÅ Database: c:\Users\Santi\Desktop\CV\portafolio\Eco-Bags-Delivery-Optimizer\data\processed\delivery.db
üìÅ Config: c:\Users\Santi\Desktop\CV\portafolio\Eco-Bags-Delivery-Optimizer\data\config\scoring_weights.json

üìä Loaded Scoring Weights:
   Urgency:  40%
   Payment:  25%
   Client:   20%
   Age:      15%


## 1. Current Scoring Configuration

Let's examine the full configuration loaded from `scoring_weights.json`:

In [3]:
# Display full configuration
print("=" * 60)
print("PRIORITY SCORING CONFIGURATION")
print("=" * 60)

print("\nüìè WEIGHTS (must sum to 1.0):")
print(f"   urgency:  {config.weight_urgency}")
print(f"   payment:  {config.weight_payment}")
print(f"   client:   {config.weight_client}")
print(f"   age:      {config.weight_age}")
total_weight = (config.weight_urgency + config.weight_payment + config.weight_client + config.weight_age)
print(f"   TOTAL:    {total_weight}")

print("\nüí≥ PAYMENT STATUS MULTIPLIERS:")
print(f"   paid:     {config.payment_multiplier_paid}x")
print(f"   partial:  {config.payment_multiplier_partial}x")
print(f"   pending:  {config.payment_multiplier_pending}x")

print("\nüë§ CLIENT SCORES:")
print(f"   star_client:  {config.client_star}")
print(f"   new_client:   {config.client_new}")
print(f"   frequent (>{config.frequent_threshold} orders):  {config.client_frequent}")
print(f"   regular (‚â•{config.regular_threshold} orders):   {config.client_regular}")
print(f"   occasional:   {config.client_occasional}")

print("\nüìä DYNAMIC SCORING:")
print("   All scores use actual data ranges (no hardcoded thresholds)")
print("   - Urgency: Based on actual days_to_deadline range")
print("   - Payment: Amount normalized by actual min/max, then √ó status multiplier")
print("   - Age: Based on actual order age range")

print("\nüö® MANDATORY SCORE:", config.mandatory_score)

PRIORITY SCORING CONFIGURATION

üìè WEIGHTS (must sum to 1.0):
   urgency:  0.4
   payment:  0.25
   client:   0.2
   age:      0.15
   TOTAL:    1.0

üí≥ PAYMENT STATUS MULTIPLIERS:
   paid:     1.0x
   partial:  0.6x
   pending:  0.3x

üë§ CLIENT SCORES:
   star_client:  100
   new_client:   80
   frequent (>5 orders):  60
   regular (‚â•2 orders):   40
   occasional:   20

üìä DYNAMIC SCORING:
   All scores use actual data ranges (no hardcoded thresholds)
   - Urgency: Based on actual days_to_deadline range
   - Payment: Amount normalized by actual min/max, then √ó status multiplier
   - Age: Based on actual order age range

üö® MANDATORY SCORE: 999999


## 2. Load Orders and Clients Data

In [4]:
# Load data from database
with db.get_session() as session:
    orders_df = pd.read_sql("SELECT * FROM orders", session.bind)
    clients_df = pd.read_sql("SELECT * FROM clients", session.bind)
    orders_clients_df = pd.read_sql(
        """
        SELECT o.*, c.business_name, c.is_star_client, c.is_new_client, c.zone_id as client_zone
        FROM orders o
        LEFT JOIN clients c ON o.client_id = c.client_id
        """,
        session.bind
    )

# Convert date columns
orders_df['issue_date'] = pd.to_datetime(orders_df['issue_date']).dt.date
orders_df['delivery_deadline'] = pd.to_datetime(orders_df['delivery_deadline']).dt.date
orders_clients_df['issue_date'] = pd.to_datetime(orders_clients_df['issue_date']).dt.date
orders_clients_df['delivery_deadline'] = pd.to_datetime(orders_clients_df['delivery_deadline']).dt.date

# Get order counts per client
client_order_counts = db.get_client_order_counts()

print(f"üì¶ Total Orders: {len(orders_df)}")
print(f"üë§ Total Clients: {len(clients_df)}")
print(f"üîÑ Pending Orders: {len(orders_df[orders_df['status'] == 'pending'])}")
print(f"\nüìä Payment Status Distribution:")
print(orders_df['payment_status'].value_counts().to_string())

üì¶ Total Orders: 41
üë§ Total Clients: 34
üîÑ Pending Orders: 26

üìä Payment Status Distribution:
payment_status
pending    20
paid       16
partial     5


## 3. Single Order Demo - Step by Step Scoring

Let's pick one order and calculate each scoring component step-by-step to keep things clear.

In [5]:
# Pick a sample order to demonstrate scoring
np.random.seed(19)  # For reproducibility

random_number = np.random.randint(0, len(orders_clients_df[orders_clients_df['status'] == 'pending']))

sample_order = orders_clients_df[orders_clients_df['status'] == 'pending'].iloc[random_number]

# Use January 15, 2026 as reference date
reference_date = date(2026, 1, 15)

print("=" * 70)
print("SAMPLE ORDER FOR SCORING DEMONSTRATION")
print("=" * 70)
print(f"\nüì¶ Order ID:         {sample_order['order_id']}")
print(f"üë§ Client:           {sample_order['business_name']}")
print(f"üìÖ Issue Date:       {sample_order['issue_date']}")
print(f"üìÖ Deadline:         {sample_order['delivery_deadline']}")
print(f"üí≥ Payment Status:   {sample_order['payment_status']}")
print(f"‚≠ê Star Client:      {sample_order['is_star_client']}")
print(f"üÜï New Client:       {sample_order['is_new_client']}")
print(f"üö® Is Mandatory:     {sample_order['is_mandatory']}")
print(f"\nüìÜ Reference Date: {reference_date}")

SAMPLE ORDER FOR SCORING DEMONSTRATION

üì¶ Order ID:         ORD-071F2BA3
üë§ Client:           Mayorista Santa Fe
üìÖ Issue Date:       2026-01-12
üìÖ Deadline:         2026-01-19
üí≥ Payment Status:   paid
‚≠ê Star Client:      0
üÜï New Client:       0
üö® Is Mandatory:     0

üìÜ Reference Date: 2026-01-15


In [6]:
# Helper class for client scoring (mimics ClientModel attributes)
class SimpleClient:
    def __init__(self, is_star: bool, is_new: bool):
        self.is_star_client = is_star
        self.is_new_client = is_new

# Calculate actual data ranges for dynamic scoring
data_ranges = calculate_data_ranges(db, reference_date)

print("=" * 70)
print("STEP-BY-STEP SCORING CALCULATION")
print("=" * 70)
print(f"üìä Data ranges: {data_ranges}")

# 1. Urgency Score
days_to_deadline = (sample_order['delivery_deadline'] - reference_date).days
urgency_score = calculate_urgency_score(
    sample_order['delivery_deadline'],
    reference_date=reference_date,
    min_days=data_ranges['min_days_to_deadline'],
    max_days=data_ranges['max_days_to_deadline']
)

print(f"\n1Ô∏è‚É£ URGENCY SCORE:")
print(f"   Days to deadline: {days_to_deadline}")
if days_to_deadline < 0:
    print(f"   üö® OVERDUE: {abs(days_to_deadline)} days past deadline!")
    print(f"   Penalty: 100 + ({abs(days_to_deadline)} √ó 10) = {min(150, 100 + abs(days_to_deadline) * 10)}")
print(f"   Raw Score: {urgency_score:.1f}")
print(f"   Weighted: {urgency_score * config.weight_urgency:.2f}")

# 2. Payment Score
payment_score = calculate_payment_score(
    sample_order['total_amount'],
    sample_order['payment_status'],
    config,
    p15_amount=data_ranges['p15_amount'],
    p85_amount=data_ranges['p85_amount']
)

status = sample_order['payment_status'].lower().strip()
multiplier = {'paid': 1.0, 'partial': 0.6}.get(status, 0.3)

print(f"\n2Ô∏è‚É£ PAYMENT SCORE:")
print(f"   Amount: ${sample_order['total_amount']:,.2f}")
print(f"   Range: P15=${data_ranges['p15_amount']:,.0f} to P85=${data_ranges['p85_amount']:,.0f}")
print(f"   Status: {sample_order['payment_status']} (√ó{multiplier})")
print(f"   Raw Score: {payment_score:.1f}")
print(f"   Weighted: {payment_score * config.weight_payment:.2f}")

# 3. Client Score
historical_count = client_order_counts.get(sample_order['client_id'], 1)
client_obj = SimpleClient(sample_order['is_star_client'], sample_order['is_new_client'])
client_score = calculate_client_score(client_obj, historical_count, config)

# Determine client type label
if sample_order['is_star_client']:
    client_type = "Star Client"
elif sample_order['is_new_client']:
    client_type = "New Client"
elif historical_count > config.frequent_threshold:
    client_type = f"Frequent ({historical_count} orders)"
elif historical_count >= config.regular_threshold:
    client_type = f"Regular ({historical_count} orders)"
else:
    client_type = f"Occasional ({historical_count} order)"

print(f"\n3Ô∏è‚É£ CLIENT SCORE:")
print(f"   Type: {client_type}")
print(f"   Raw Score: {client_score:.0f}")
print(f"   Weighted: {client_score * config.weight_client:.2f}")

# 4. Age Score
days_since_issue = (reference_date - sample_order['issue_date']).days
age_score = calculate_age_score(
    sample_order['issue_date'],
    reference_date=reference_date,
    max_days=data_ranges['max_age_days']
)

print(f"\n4Ô∏è‚É£ AGE SCORE:")
print(f"   Days since issue: {days_since_issue}")
print(f"   Raw Score: {age_score:.1f}")
print(f"   Weighted: {age_score * config.weight_age:.2f}")

# Final Score
final_score = (
    urgency_score * config.weight_urgency +
    payment_score * config.weight_payment +
    client_score * config.weight_client +
    age_score * config.weight_age
)

print("\n" + "=" * 70)
print(f"üéØ FINAL PRIORITY SCORE: {final_score:.2f}")
print("=" * 70)

STEP-BY-STEP SCORING CALCULATION
üìä Data ranges: {'min_days_to_deadline': -10, 'max_days_to_deadline': 14, 'p15_amount': 1460.44, 'p85_amount': 6063.900000000001, 'max_age_days': 13}

1Ô∏è‚É£ URGENCY SCORE:
   Days to deadline: 4
   Raw Score: 71.4
   Weighted: 28.57

2Ô∏è‚É£ PAYMENT SCORE:
   Amount: $756.97
   Range: P15=$1,460 to P85=$6,064
   Status: paid (√ó1.0)
   Raw Score: 20.0
   Weighted: 5.00

3Ô∏è‚É£ CLIENT SCORE:
   Type: Occasional (1 order)
   Raw Score: 20
   Weighted: 4.00

4Ô∏è‚É£ AGE SCORE:
   Days since issue: 3
   Raw Score: 23.1
   Weighted: 3.46

üéØ FINAL PRIORITY SCORE: 41.03


In [7]:
# Visualize the score breakdown with Plotly
components = ['Urgency', 'Payment', 'Client', 'Age']
raw_scores = [urgency_score, payment_score, client_score, age_score]
weights = [config.weight_urgency, config.weight_payment, config.weight_client, config.weight_age]
weighted_scores = [r * w for r, w in zip(raw_scores, weights)]

# Create subplot with two charts
fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=['Raw Component Scores (0-100)', 'Weighted Contribution to Final Score'],
    specs=[[{"type": "bar"}, {"type": "pie"}]]
)

# Raw scores bar chart
fig.add_trace(
    go.Bar(
        x=components,
        y=raw_scores,
        marker_color=['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4'],
        text=[f'{s:.1f}' for s in raw_scores],
        textposition='outside',
        name='Raw Score'
    ),
    row=1, col=1
)

# Weighted contribution pie chart
fig.add_trace(
    go.Pie(
        labels=components,
        values=weighted_scores,
        marker_colors=['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4'],
        textinfo='label+value',
        texttemplate='%{label}<br>%{value:.1f}',
        hole=0.4
    ),
    row=1, col=2
)

fig.update_layout(
    title=f"Score Breakdown for Order {sample_order['order_id']}<br><sup>Final Score: {final_score:.2f} (Dynamic Scoring)</sup>",
    showlegend=False,
    height=400,
)

fig.update_yaxes(range=[0, 120], row=1, col=1)
fig.show()

## 4. Score All Pending Orders

Now let's calculate priority scores for all pending orders and update the database.

In [8]:
# Calculate scores for all pending orders
all_breakdowns = score_all_pending_orders(db, CONFIG_PATH, reference_date)

print(f"üìä Calculated priority scores for {len(all_breakdowns)} pending orders")

# Convert to DataFrame for analysis
scores_data = []
for b in all_breakdowns:
    scores_data.append({
        'order_id': b['order_id'],
        'final_score': b['final_score'],
        'is_mandatory': b['is_mandatory'],
        'urgency_raw': b['components']['urgency']['raw'],
        'urgency_weighted': b['components']['urgency']['weighted'],
        'payment_raw': b['components']['payment']['raw'],
        'payment_weighted': b['components']['payment']['weighted'],
        'client_raw': b['components']['client']['raw'],
        'client_weighted': b['components']['client']['weighted'],
        'age_raw': b['components']['age']['raw'],
        'age_weighted': b['components']['age']['weighted'],
        'days_to_deadline': b['factors']['days_to_deadline'],
        'payment_status': b['factors']['payment_status'],
        'total_amount': b['factors']['total_amount'],
        'client_type': b['factors']['client_type'],
        'days_since_issue': b['factors']['days_since_issue'],
    })

# Keep full DataFrame with all orders (including mandatory)
scores_df_all = pd.DataFrame(scores_data)

# Merge with order details - keep all orders
scores_full_df_all = scores_df_all.merge(
    orders_clients_df[['order_id', 'business_name', 'delivery_zone_id', 'total_pallets', 'total_amount']],
    on='order_id'
)

# Create non-mandatory subsets for analysis (stats, rankings, etc.)
scores_df = scores_df_all[scores_df_all['is_mandatory'] == False].copy()
scores_full_df = scores_full_df_all[scores_full_df_all['is_mandatory'] == False].copy()
scores_full_df = scores_full_df.sort_values('final_score', ascending=False).reset_index(drop=True)

print(f"\nüìà Score Statistics (non-mandatory orders):")
print(f"   Min Score:    {scores_df['final_score'].min():.2f}")
print(f"   Max Score:    {scores_df['final_score'].max():.2f}")
print(f"   Mean Score:   {scores_df['final_score'].mean():.2f}")
print(f"   Median Score: {scores_df['final_score'].median():.2f}")

# Count mandatory orders
mandatory_count = scores_df_all['is_mandatory'].sum()
print(f"\nüö® Mandatory Orders: {mandatory_count}")

# Show top 5
print("\nüèÜ Top 5 (non-mandatory) Priority Orders:")
scores_full_df[['order_id', 'business_name', 'final_score', 'days_to_deadline', 'payment_status', 'client_type']].head()

üìä Calculated priority scores for 26 pending orders

üìà Score Statistics (non-mandatory orders):
   Min Score:    17.29
   Max Score:    102.26
   Mean Score:   56.04
   Median Score: 50.09

üö® Mandatory Orders: 2

üèÜ Top 5 (non-mandatory) Priority Orders:


Unnamed: 0,order_id,business_name,final_score,days_to_deadline,payment_status,client_type
0,ORD-C4A1485D,Mayorista El Gaucho,102.26,-6,pending,star_client
1,ORD-2E43FF51,Fiambreria La Esquina,100.65,-6,pending,star_client
2,ORD-BAA10376,Comercial Rivadavia,99.52,-10,paid,star_client
3,ORD-1F187AFB,Autoservicio El Trebol,92.32,-4,pending,new_client
4,ORD-85EB94F2,Comercial El Puente,85.92,-7,partial,new_client


In [9]:
# Update database with calculated scores
updated_count = update_all_priority_scores(db, CONFIG_PATH, reference_date)
print(f"‚úÖ Updated {updated_count} orders in the database with their priority scores")

‚úÖ Updated 26 orders in the database with their priority scores


## 5. Understanding Priority Scores - Order Comparison

Let's compare orders with different priority scores to understand **why** some orders are more urgent than others.

In [10]:
# Select three orders with different priority levels (high, medium, low)
non_mandatory = scores_full_df[~scores_full_df['is_mandatory']]

if len(non_mandatory) >= 3:
    high_priority = non_mandatory.iloc[0]  # Highest score
    low_priority = non_mandatory.iloc[-1]  # Lowest score
    mid_idx = len(non_mandatory) // 2
    mid_priority = non_mandatory.iloc[mid_idx]  # Middle score
    
    comparison_orders = [high_priority, mid_priority, low_priority]
    labels = ['üî¥ HIGH PRIORITY', 'üü° MEDIUM PRIORITY', 'üü¢ LOW PRIORITY']
    
    print("=" * 90)
    print("ORDER COMPARISON: HIGH vs MEDIUM vs LOW PRIORITY")
    print("=" * 90)
    
    for order, label in zip(comparison_orders, labels):
        # Calculate weighted scores
        urg_w = order['urgency_raw'] * config.weight_urgency
        pay_w = order['payment_raw'] * config.weight_payment
        cli_w = order['client_raw'] * config.weight_client
        age_w = order['age_raw'] * config.weight_age
        
        # Get amount (handle both column names from merge)
        amount = order.get('total_amount_x', order.get('total_amount', 0))
        
        print(f"\n{label}")
        print("-" * 55)
        print(f"Order:  {order['order_id']}  |  Client: {order['business_name']}")
        print(f"")
        print(f"                                   Raw Score  √ó Weight  = Weighted")
        print(f"  üìÜ Urgency (deadline: {order['days_to_deadline']:>3} days)  ‚Üí  {order['urgency_raw']:>6.1f}  √ó {config.weight_urgency:.2f}  = {urg_w:>6.2f}")
        print(f"  üí≥ Payment (${amount:>7,.0f}, {order['payment_status']:<7})  ‚Üí  {order['payment_raw']:>6.1f}  √ó {config.weight_payment:.2f}  = {pay_w:>6.2f}")
        print(f"  üë§ Client  ({order['client_type']:<12})      ‚Üí  {order['client_raw']:>6.0f}  √ó {config.weight_client:.2f}  = {cli_w:>6.2f}")
        print(f"  ‚è∞ Age     (issued: {order['days_since_issue']:>3} days ago)  ‚Üí  {order['age_raw']:>6.1f}  √ó {config.weight_age:.2f}  = {age_w:>6.2f}")
        print(f"  " + "-" * 53)
        print(f"  üéØ FINAL SCORE:                                    = {order['final_score']:>6.2f}")
    
    print("\n" + "=" * 90)
    print("üìñ EXPLANATION OF TERMS:")
    print("-" * 90)
    print("  ‚Ä¢ Urgency: Days UNTIL delivery is due (negative = OVERDUE, higher score)")
    print("  ‚Ä¢ Payment: Order amount + payment status (paid > partial > pending)")
    print("  ‚Ä¢ Client:  Client type (star_client > new_client > frequent > regular > occasional)")
    print("  ‚Ä¢ Age:     Days SINCE the order was placed (older orders get higher scores)")
    print("  ‚Ä¢ Raw scores are scaled 0-100 (urgency can go up to 150 for overdue penalty)")
else:
    print("Not enough orders for comparison")

ORDER COMPARISON: HIGH vs MEDIUM vs LOW PRIORITY

üî¥ HIGH PRIORITY
-------------------------------------------------------
Order:  ORD-C4A1485D  |  Client: Mayorista El Gaucho

                                   Raw Score  √ó Weight  = Weighted
  üìÜ Urgency (deadline:  -6 days)  ‚Üí   150.0  √ó 0.40  =  60.00
  üí≥ Payment ($  5,881, pending)  ‚Üí    29.0  √ó 0.25  =   7.26
  üë§ Client  (star_client )      ‚Üí     100  √ó 0.20  =  20.00
  ‚è∞ Age     (issued:  13 days ago)  ‚Üí   100.0  √ó 0.15  =  15.00
  -----------------------------------------------------
  üéØ FINAL SCORE:                                    = 102.26

üü° MEDIUM PRIORITY
-------------------------------------------------------
Order:  ORD-47C6DDC7  |  Client: Fiambreria Del Centro

                                   Raw Score  √ó Weight  = Weighted
  üìÜ Urgency (deadline:   5 days)  ‚Üí    64.3  √ó 0.40  =  25.71
  üí≥ Payment ($  4,410, paid   )  ‚Üí    71.3  √ó 0.25  =  17.81
  üë§ Client  (occasional

In [11]:
# Visual comparison of the three orders
if len(non_mandatory) >= 3:
    comparison_data = []
    for order, label in zip(comparison_orders, ['High', 'Medium', 'Low']):
        comparison_data.append({
            'Order': f"{label} Priority",
            'Priority': label,
            'Urgency': order['urgency_weighted'],
            'Payment': order['payment_weighted'],
            'Client': order['client_weighted'],
            'Age': order['age_weighted'],
            'Total': order['final_score']
        })
    
    comp_df = pd.DataFrame(comparison_data)
    
    # Create stacked bar chart
    fig = go.Figure()
    
    colors = {'Urgency': '#FF6B6B', 'Payment': '#4ECDC4', 'Client': '#45B7D1', 'Age': '#96CEB4'}
    
    for component in ['Urgency', 'Payment', 'Client', 'Age']:
        fig.add_trace(go.Bar(
            name=component,
            x=comp_df['Order'],
            y=comp_df[component],
            marker_color=colors[component],
            text=[f'{v:.1f}' for v in comp_df[component]],
            textposition='inside'
        ))
    
    fig.update_layout(
        barmode='stack',
        title='Score Component Breakdown: High vs Medium vs Low Priority Orders',
        xaxis_title='Order ID',
        yaxis_title='Weighted Score Contribution',
        legend_title='Component',
        height=500
    )
    
    # Add annotations for total scores
    for i, row in comp_df.iterrows():
        fig.add_annotation(
            x=row['Order'],
            y=row['Total'] + 2,
            text=f"Total: {row['Total']:.1f}",
            showarrow=False,
            font=dict(size=12, color='black')
        )
    
    fig.show()

### Interpretation of Score Differences

The chart above shows **why** orders have different priorities:

- **üî¥ High Priority Order**: Likely has an imminent deadline (high urgency), may be paid, and/or is from a star/frequent client
- **üü° Medium Priority Order**: Balanced mix of factors - perhaps has some days remaining but is partially paid
- **üü¢ Low Priority Order**: Likely has a distant deadline, pending payment, and is from an occasional client

The **Urgency** component (red) typically has the biggest impact since it carries 40% of the weight.

## 6. Results Visualization

Let's visualize the distribution and patterns in our priority scores.

In [12]:
# 1. Priority Score Distribution (Histogram)
fig = px.histogram(
    scores_full_df[~scores_full_df['is_mandatory']],  # Exclude mandatory for better visualization
    x='final_score',
    nbins=20,
    title='Priority Score Distribution (Non-Mandatory Orders)',
    labels={'final_score': 'Priority Score', 'count': 'Number of Orders'},
    color_discrete_sequence=['#45B7D1']
)

fig.update_layout(
    xaxis_title='Priority Score',
    yaxis_title='Number of Orders',
    height=400
)

fig.add_vline(
    x=scores_full_df[~scores_full_df['is_mandatory']]['final_score'].median(),
    line_dash="dash",
    line_color="red",
    annotation_text="Median",
    annotation_position="top"
)

fig.show()

In [13]:
# 2. Score vs Days to Deadline (Scatter)
fig = px.scatter(
    scores_full_df[~scores_full_df['is_mandatory']],
    x='days_to_deadline',
    y='final_score',
    color='payment_status',
    size='total_pallets',
    hover_data=['order_id', 'business_name', 'client_type'],
    title='Priority Score vs Days to Deadline',
    labels={
        'days_to_deadline': 'Days to Deadline',
        'final_score': 'Priority Score',
        'payment_status': 'Payment Status'
    },
    color_discrete_map={
        'paid': '#4ECDC4',
        'partial': '#FFE66D',
        'pending': '#FF6B6B'
    }
)

fig.update_layout(height=500)
fig.show()

In [14]:
# 3. Score Components Breakdown - Top 10 Orders
top_10 = scores_full_df[~scores_full_df['is_mandatory']].head(10)

fig = go.Figure()

components = [
    ('urgency_weighted', 'Urgency', '#FF6B6B'),
    ('payment_weighted', 'Payment', '#4ECDC4'),
    ('client_weighted', 'Client', '#45B7D1'),
    ('age_weighted', 'Age', '#96CEB4')
]

for col, name, color in components:
    fig.add_trace(go.Bar(
        name=name,
        x=top_10['order_id'],
        y=top_10[col],
        marker_color=color
    ))

fig.update_layout(
    barmode='stack',
    title='Score Components Breakdown - Top 10 Priority Orders (Dynamic Scoring)',
    xaxis_title='Order ID',
    yaxis_title='Weighted Score',
    legend_title='Component',
    height=500,
    xaxis_tickangle=-45
)

fig.show()

In [15]:
# 4. Scores by Zone (Box Plot)
fig = px.box(
    scores_full_df[~scores_full_df['is_mandatory']],
    x='delivery_zone_id',
    y='final_score',
    color='delivery_zone_id',
    title='Priority Scores by Delivery Zone',
    labels={
        'delivery_zone_id': 'Delivery Zone',
        'final_score': 'Priority Score'
    },
    color_discrete_sequence=px.colors.qualitative.Set2
)

fig.update_layout(height=450, showlegend=False)
fig.show()

In [16]:
# 5. Scores by Payment Status (Box Plot)
fig = px.box(
    scores_full_df[~scores_full_df['is_mandatory']],
    x='payment_status',
    y='final_score',
    color='payment_status',
    title='Priority Scores by Payment Status',
    labels={
        'payment_status': 'Payment Status',
        'final_score': 'Priority Score'
    },
    color_discrete_map={
        'paid': '#4ECDC4',
        'partial': '#FFE66D',
        'pending': '#FF6B6B'
    },
    category_orders={'payment_status': ['paid', 'partial', 'pending']}
)

fig.update_layout(height=450, showlegend=False)
fig.show()

In [17]:
# 6. Scores by Client Type (Box Plot) - Only show types with enough data
client_type_counts = scores_full_df[~scores_full_df['is_mandatory']]['client_type'].value_counts()
print("üìä Client Type Distribution:")
print(client_type_counts.to_string())
print()

# Filter to client types with at least 2 orders for meaningful boxplot
valid_client_types = client_type_counts[client_type_counts >= 2].index.tolist()
filtered_df = scores_full_df[
    (~scores_full_df['is_mandatory']) & 
    (scores_full_df['client_type'].isin(valid_client_types))
]

if len(valid_client_types) < len(client_type_counts):
    excluded = client_type_counts[client_type_counts < 2].index.tolist()
    print(f"‚ö†Ô∏è  Note: Excluded client types with <2 orders: {excluded}")
    print(f"   (Not enough data for meaningful boxplot visualization)")
    print()

# Create boxplot with only valid types
fig = px.box(
    filtered_df,
    x='client_type',
    y='final_score',
    color='client_type',
    title='Priority Scores by Client Type<br><sup>Only showing client types with ‚â•2 orders</sup>',
    labels={
        'client_type': 'Client Type',
        'final_score': 'Priority Score'
    },
    # Order by client score (highest to lowest)
    category_orders={'client_type': ['star_client', 'new_client', 'regular']},
    color_discrete_sequence=px.colors.qualitative.Bold
)

# Add count annotations
for ctype in valid_client_types:
    count = client_type_counts[ctype]
    avg_score = filtered_df[filtered_df['client_type'] == ctype]['final_score'].mean()
    fig.add_annotation(
        x=ctype,
        y=filtered_df[filtered_df['client_type'] == ctype]['final_score'].max() + 5,
        text=f"n={count}",
        showarrow=False,
        font=dict(size=10)
    )

fig.update_layout(height=450, showlegend=False)
fig.show()

# Show summary table for ALL client types (including those with few orders)
print("\nüìã FULL CLIENT TYPE SUMMARY (including all types):")
print("-" * 60)
for ctype in ['star_client', 'new_client', 'frequent', 'regular', 'occasional']:
    subset = scores_full_df[scores_full_df['client_type'] == ctype]
    if len(subset) > 0:
        avg = subset['final_score'].mean()
        print(f"   {ctype:<12}: {len(subset):>2} orders | Avg Score: {avg:>6.2f}")
    else:
        print(f"   {ctype:<12}:  0 orders | (no data)")

üìä Client Type Distribution:
client_type
occasional     9
new_client     7
star_client    4
regular        4




üìã FULL CLIENT TYPE SUMMARY (including all types):
------------------------------------------------------------
   star_client :  4 orders | Avg Score:  94.47
   new_client  :  7 orders | Avg Score:  64.64
   frequent    :  0 orders | (no data)
   regular     :  4 orders | Avg Score:  44.50
   occasional  :  9 orders | Avg Score:  37.41


## 7. Top 20 Orders with Full Breakdown

In [18]:
# Display top 20 orders with full breakdown (Raw + Weighted scores)
top_20 = scores_full_df.head(20).copy()

# Table 1: Order Info + Raw Scores
print("üìä TOP 20 PRIORITY ORDERS")
print("=" * 120)

raw_display = top_20[[
    'order_id', 'business_name', 'final_score', 'days_to_deadline', 'urgency_raw',
    'payment_status', 'total_amount_x', 'payment_raw',
    'client_type', 'client_raw', 'days_since_issue', 'age_raw'
]].copy()

raw_display.columns = [
    'Order ID', 'Client', 'Final Score', 'Days Left', 'Urgency',
    'Payment Status', 'Amount ($)', 'Payment',
    'Client Type', 'Client', 'Age (days)', 'Age'
]
raw_display['Amount ($)'] = raw_display['Amount ($)'].apply(lambda x: f"${x:,.0f}")
display(raw_display)

# Create visualization for Top 10
top_10 = scores_full_df.head(10).copy()

fig = go.Figure()

# Stacked bar chart showing weighted contributions
components = [
    ('urgency_weighted', 'Urgency (40%)', '#FF6B6B'),
    ('payment_weighted', 'Payment (25%)', '#4ECDC4'),
    ('client_weighted', 'Client (20%)', '#45B7D1'),
    ('age_weighted', 'Age (15%)', '#96CEB4')
]

for col, name, color in components:
    fig.add_trace(go.Bar(
        name=name,
        x=top_10['business_name'],
        y=top_10[col],
        marker_color=color,
        text=[f'{v:.1f}' for v in top_10[col]],
        textposition='inside'
    ))

# Add final score annotations
for i, row in top_10.iterrows():
    fig.add_annotation(
        x=row['business_name'],
        y=row['final_score'] + 3,
        text=f"Total: {row['final_score']:.1f}",
        showarrow=False,
        font=dict(size=10, color='black', weight='bold')
    )

fig.update_layout(
    barmode='stack',
    title='Top 10 Priority Orders - Score Component Breakdown<br><sup>Weighted scores sum to final priority score</sup>',
    xaxis_title='Client',
    yaxis_title='Priority Score',
    legend_title='Component',
    height=550,
    xaxis_tickangle=-30
)

fig.show()

üìä TOP 20 PRIORITY ORDERS


Unnamed: 0,Order ID,Client,Final Score,Days Left,Urgency,Payment Status,Amount ($),Payment,Client Type,Client.1,Age (days),Age
0,ORD-C4A1485D,Mayorista El Gaucho,102.26,-6,150.0,pending,"$5,881",29.044371,star_client,100.0,13,100.0
1,ORD-2E43FF51,Fiambreria La Esquina,100.65,-6,150.0,pending,"$4,644",22.599532,star_client,100.0,13,100.0
2,ORD-BAA10376,Comercial Rivadavia,99.52,-10,150.0,paid,"$1,614",22.664604,star_client,100.0,12,92.307692
3,ORD-1F187AFB,Autoservicio El Trebol,92.32,-4,140.0,pending,"$5,274",25.883131,new_client,80.0,12,92.307692
4,ORD-85EB94F2,Comercial El Puente,85.92,-7,150.0,partial,$452,12.0,new_client,80.0,6,46.153846
5,ORD-F19ECF5B,Distribuidora Pampa,81.71,-3,130.0,pending,"$1,971",8.662832,new_client,80.0,10,76.923077
6,ORD-ACD8A197,Distribuidora del Sur,75.45,3,78.571429,paid,"$3,714",59.1615,star_client,100.0,8,61.538462
7,ORD-C8429AF5,Supermercado Norte,67.27,-3,130.0,pending,$500,6.0,regular,40.0,5,38.461538
8,ORD-ED01AC3D,O'connor Coffee Shop,63.56,2,85.714286,pending,"$29,040",30.0,new_client,80.0,5,38.461538
9,ORD-D991C05F,Distribuidora Los Andes,58.02,-1,110.0,paid,"$2,084",30.842627,occasional,20.0,2,15.384615


## 8. Mandatory Orders

Orders marked as `is_mandatory = True` always receive maximum priority (score = 999999) and must be included in the next dispatch.

In [19]:
# Check for mandatory orders (use the full DataFrame that includes mandatory)
mandatory_orders = scores_full_df_all[scores_full_df_all['is_mandatory'] == True]

print(f"üö® MANDATORY ORDERS: {len(mandatory_orders)}")
print("=" * 80)

if len(mandatory_orders) > 0:
    print("\nThese orders MUST be included in the next dispatch:\n")
    mandatory_display = mandatory_orders[['order_id', 'business_name', 'delivery_zone_id', 'total_pallets', 'days_to_deadline', 'payment_status']].copy()
    mandatory_display.columns = ['Order ID', 'Client', 'Zone', 'Pallets', 'Days Left', 'Payment']
    print(mandatory_display.to_string(index=False))
    
    print(f"\n‚ö†Ô∏è  Total pallets from mandatory orders: {mandatory_orders['total_pallets'].sum():.1f}")
    if mandatory_orders['total_pallets'].sum() > 8:
        print("‚ö†Ô∏è  WARNING: Mandatory orders exceed truck capacity (8 pallets)!")
else:
    print("\n‚úÖ No mandatory orders at this time.")

üö® MANDATORY ORDERS: 2

These orders MUST be included in the next dispatch:

    Order ID               Client       Zone  Pallets  Days Left Payment
ORD-85CA3985 Comercial San Martin       CABA     3.71          3    paid
ORD-19DC5AB0   Mayorista Don Juan NORTH_ZONE     3.79         -1 partial

‚ö†Ô∏è  Total pallets from mandatory orders: 7.5


## 9. Export Results

Save the scoring results to CSV for review and further analysis.

In [20]:
# Export results to CSV and update database
output_dir = project_root / "output"
output_dir.mkdir(exist_ok=True)

# Full scores export (all orders including mandatory)
export_df = scores_full_df_all.copy()
export_df['scoring_date'] = reference_date
export_path = output_dir / "priority_scores.csv"
export_df.to_csv(export_path, index=False)

print(f"üìÅ Exported priority scores to CSV: {export_path}")
print(f"   Total orders: {len(export_df)}")
print(f"   Scoring date: {reference_date}")

# ============================================================================
# UPDATE DATABASE: Save priority scores to orders table
# ============================================================================
print("\n" + "=" * 60)
print("üíæ UPDATING DATABASE WITH PRIORITY SCORES")
print("=" * 60)

with db.get_session() as session:
    updated_orders = 0
    
    for _, row in scores_df_all.iterrows():
        order = session.query(OrderModel).filter_by(order_id=row['order_id']).first()
        if order:
            order.priority_score = row['final_score']
            updated_orders += 1
    
    session.commit()
    print(f"\n‚úÖ Updated priority_score for {updated_orders} orders in the database")

# Verify the update
with db.get_session() as session:
    # Check orders with priority scores
    orders_with_scores = session.query(OrderModel).filter(OrderModel.priority_score.isnot(None)).count()
    orders_without_scores = session.query(OrderModel).filter(OrderModel.priority_score.is_(None)).count()
    
    print(f"\nüìä DATABASE STATUS:")
    print(f"   Orders with priority_score: {orders_with_scores}")
    print(f"   Orders without priority_score: {orders_without_scores} (completed/cancelled)")

# ============================================================================
# SUMMARY STATISTICS
# ============================================================================
print("\n" + "=" * 60)
print("üìä SUMMARY STATISTICS")
print("=" * 60)
print(f"\nTotal pending orders scored: {len(scores_df_all)}")
print(f"Mandatory orders: {scores_df_all['is_mandatory'].sum()}")
print(f"\nScore distribution (non-mandatory):")
non_mandatory_scores = scores_df['final_score']
print(f"   Mean:   {non_mandatory_scores.mean():.2f}")
print(f"   Std:    {non_mandatory_scores.std():.2f}")
print(f"   Min:    {non_mandatory_scores.min():.2f}")
print(f"   25%:    {non_mandatory_scores.quantile(0.25):.2f}")
print(f"   50%:    {non_mandatory_scores.median():.2f}")
print(f"   75%:    {non_mandatory_scores.quantile(0.75):.2f}")
print(f"   Max:    {non_mandatory_scores.max():.2f}")

print(f"\nBy payment status:")
for status in ['paid', 'partial', 'pending']:
    subset = scores_df[scores_df['payment_status'] == status]
    if len(subset) > 0:
        print(f"   {status}: {len(subset)} orders, avg score: {subset['final_score'].mean():.2f}")

print(f"\nBy client type:")
for ctype in scores_df['client_type'].unique():
    subset = scores_df[scores_df['client_type'] == ctype]
    if len(subset) > 0:
        print(f"   {ctype}: {len(subset)} orders, avg score: {subset['final_score'].mean():.2f}")

üìÅ Exported priority scores to CSV: c:\Users\Santi\Desktop\CV\portafolio\Eco-Bags-Delivery-Optimizer\output\priority_scores.csv
   Total orders: 26
   Scoring date: 2026-01-15

üíæ UPDATING DATABASE WITH PRIORITY SCORES

‚úÖ Updated priority_score for 26 orders in the database

üìä DATABASE STATUS:
   Orders with priority_score: 26
   Orders without priority_score: 15 (completed/cancelled)

üìä SUMMARY STATISTICS

Total pending orders scored: 26
Mandatory orders: 2

Score distribution (non-mandatory):
   Mean:   56.04
   Std:    26.61
   Min:    17.29
   25%:    37.78
   50%:    50.09
   75%:    77.02
   Max:    102.26

By payment status:
   paid: 8 orders, avg score: 53.97
   partial: 3 orders, avg score: 48.22
   pending: 13 orders, avg score: 59.12

By client type:
   new_client: 7 orders, avg score: 64.64
   occasional: 9 orders, avg score: 37.41
   regular: 4 orders, avg score: 44.50
   star_client: 4 orders, avg score: 94.47


In [21]:
# Quick verification: Show score distribution by payment status and overdue status
print("üéØ FINAL VERIFICATION: Score Distribution Analysis")
print("=" * 60)

# Analyze score distribution (scores_df is already non-mandatory)
scores_analysis = scores_df.copy()
scores_analysis['is_overdue'] = scores_analysis['days_to_deadline'] < 0

print(f"üìä Score Statistics by Overdue Status:")
for overdue_status in [True, False]:
    subset = scores_analysis[scores_analysis['is_overdue'] == overdue_status]
    status_label = "OVERDUE" if overdue_status else "NOT OVERDUE"
    if len(subset) > 0:
        print(f"\n   {status_label} Orders ({len(subset)}):")
        print(f"      Urgency: {subset['urgency_raw'].mean():.1f} avg (range: {subset['urgency_raw'].min():.1f}-{subset['urgency_raw'].max():.1f})")
        print(f"      Payment: {subset['payment_raw'].mean():.1f} avg (range: {subset['payment_raw'].min():.1f}-{subset['payment_raw'].max():.1f})")
        print(f"      Final:   {subset['final_score'].mean():.1f} avg (range: {subset['final_score'].min():.1f}-{subset['final_score'].max():.1f})")

print(f"\nüí≥ Score Statistics by Payment Status:")
for payment_status in ['paid', 'partial', 'pending']:
    subset = scores_analysis[scores_analysis['payment_status'] == payment_status]
    if len(subset) > 0:
        print(f"\n   {payment_status.upper()} Orders ({len(subset)}):")
        print(f"      Payment Score: {subset['payment_raw'].mean():.1f} avg")
        print(f"      Final Score:   {subset['final_score'].mean():.1f} avg")

print(f"\nüë§ Score Statistics by Client Type:")
for client_type in ['star_client', 'new_client', 'regular']:
    subset = scores_analysis[scores_analysis['client_type'] == client_type]
    if len(subset) > 0:
        print(f"\n   {client_type.upper()} Clients ({len(subset)}):")
        print(f"      Client Score: {subset['client_raw'].mean():.1f} avg")
        print(f"      Final Score:   {subset['final_score'].mean():.1f} avg")

üéØ FINAL VERIFICATION: Score Distribution Analysis
üìä Score Statistics by Overdue Status:

   OVERDUE Orders (8):
      Urgency: 138.8 avg (range: 110.0-150.0)
      Payment: 19.7 avg (range: 6.0-30.8)
      Final:   86.0 avg (range: 58.0-102.3)

   NOT OVERDUE Orders (16):
      Urgency: 48.7 avg (range: 0.0-85.7)
      Payment: 31.3 avg (range: 6.0-71.5)
      Final:   41.1 avg (range: 17.3-75.5)

üí≥ Score Statistics by Payment Status:

   PAID Orders (8):
      Payment Score: 42.3 avg
      Final Score:   54.0 avg

   PARTIAL Orders (3):
      Payment Score: 18.8 avg
      Final Score:   48.2 avg

   PENDING Orders (13):
      Payment Score: 20.3 avg
      Final Score:   59.1 avg

üë§ Score Statistics by Client Type:

   STAR_CLIENT Clients (4):
      Client Score: 100.0 avg
      Final Score:   94.5 avg

   NEW_CLIENT Clients (7):
      Client Score: 80.0 avg
      Final Score:   64.6 avg

   REGULAR Clients (4):
      Client Score: 40.0 avg
      Final Score:   44.5 avg


## Summary

Phase 3 complete! The priority scoring system provides:

### Key Features
- **Multi-factor scoring**: Combines urgency, payment, client type, and order age
- **Configurable weights**: Easy adjustment via JSON config file
- **Zone penalties**: Discourages cross-zone deliveries for efficiency
- **Transparent breakdown**: Each score component visible for debugging

### Scoring Formula
```
Priority Score = (urgency √ó w1) + (payment √ó w2) + (client √ó w3) + (age √ó w4) - zone_penalty
```

### Pipeline Status

| Phase | Notebook | Status |
|-------|----------|--------|
| **Phase 1** | 01_base_data_setup | ‚úÖ Complete |
| **Phase 2** | 02_receipt_extraction | ‚úÖ Complete |
| **Phase 3** | 03_priority_score | ‚úÖ Complete |
| **Phase 4** | 04_order_selector | ‚úÖ Complete |
| **Phase 5** | 05_route_optimizer | ‚úÖ Complete |