# Customer Clustering Using K-Means for Route Optimization

This section performs customer clustering based on geographical coordinates using the K-Means algorithm.
The goal is to group nearby customers together to support optimized delivery routes in Estonia.

---

## Cell 1: Imports and Initial Setup

In [3]:
import os
import sys
import pandas as pd
import numpy as np
import chardet
from pathlib import Path
import json
import time
import re
import math
import gc
import warnings
import logging
import random
import requests
from datetime import datetime
import time
from tqdm import tqdm
import threading

warnings.filterwarnings('ignore')  # Suppress non-critical warnings

# Set up logging with formatting
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger('route_optimization')

print("✅ Imports complete")

✅ Imports complete


## Cell 2: Set up project paths and folders

In [5]:
def setup_project():
    """Set up project paths and folders"""
    project_root = Path.cwd()  # Current working directory
    input_path = project_root.parent / '02 Data' / 'Processed_data'
    output_path = project_root.parent / '02 Data' / 'Processed_data'
    
    # Check if input directory exists
    if not input_path.exists():
        print(f"Error: Input directory '{input_path}' does not exist.")
        print("Please create this directory or modify the path.")
        sys.exit(1)
    
    # Create output directory if it doesn't exist
    os.makedirs(output_path, exist_ok=True)
    print(f"Project setup complete. \n Input path: {input_path} \n Output path: {output_path}")
    
    return input_path, output_path

def load_api_key(file_path="api_keys.json"):
    """Load the HERE API key from a JSON file."""
    try:
        with open(file_path, 'r') as f:
            api_keys = json.load(f)
        api_key = api_keys.get("HERE_API_KEY")
        if not api_key:
            print("⚠️ No HERE API key found in the JSON file")
            return None
        return api_key
    except Exception as e:
        print(f"⚠️ Error loading API key: {e}")
        return None

# Test these functions
if __name__ == "__main__":
    print("Testing project setup functions...")
    # Comment out if you just want to define the functions without running
    api_key = load_api_key()
    if api_key:
        print(f"✅ API key loaded successfully")
    else:
        print("⚠️ No API key found. Will use fallback methods.")

Testing project setup functions...
✅ API key loaded successfully


## Cell 3: Data Loading Functions

In [7]:
def load_data(input_path):
    """Load and parse customer data file"""
    # List available CSV files in the input directory
    available_files = list(input_path.glob("*.csv"))
    if not available_files:
        print(f"No CSV files found in {input_path}")
        sys.exit(1)
    
    print("Available files:")
    for i, f in enumerate(available_files, start=1):
        print(f"{i}: {f.name}")
    
    # Prompt user to choose a file by number
    while True:
        try:
            choice = int(input(f"Choose file number (1-{len(available_files)}): ").strip()) - 1
            if 0 <= choice < len(available_files):
                break
            print(f"Please enter a number between 1 and {len(available_files)}")
        except ValueError:
            print("Please enter a valid number.")
    
    file_path = available_files[choice]
    
    # Detect file encoding
    print(f"Detecting encoding for {file_path.name}...")
    with open(file_path, 'rb') as file:
        result = chardet.detect(file.read())
    encoding = result['encoding']
    confidence = result['confidence']
    print(f"Detected encoding: {encoding} (confidence: {confidence:.1%})")
    
    # Analyze delimiter options
    print("\nAnalyzing potential delimiters:\n")
    delimiters = [',', ';', r'\t', '|']  # Raw string for tab to avoid escape issues
    delimiter_options = {}
    for i, delim in enumerate(delimiters, start=1):
        try:
            preview_df = pd.read_csv(file_path, engine='python', encoding=encoding, sep=delim, nrows=3)
            col_count = len(preview_df.columns)
            delimiter_options[i] = (delim, col_count)
            print(f"{i}: Delimiter '{delim}'\n   Found {col_count} columns")
            print(f"   Preview with option {i}:")
            print(preview_df.head(3))
            print("-" * 80 + "\n")
        except Exception as e:
            print(f"{i}: Error with delimiter '{delim}': {e}")
    
    # Suggest the delimiter with the most columns
    if delimiter_options:
        suggested = max(delimiter_options, key=lambda k: delimiter_options[k][1])
        print(f"Suggested option: {suggested} ('{delimiter_options[suggested][0]}') with {delimiter_options[suggested][1]} columns")
    else:
        print("No valid delimiters found. Please check the file format.")
        sys.exit(1)
    
    # Prompt user to choose delimiter option
    while True:
        try:
            delim_choice = input(f"\nChoose delimiter option (1-{len(delimiter_options)}) [default: {suggested}]: ").strip()
            if not delim_choice:
                delim_choice = suggested
            else:
                delim_choice = int(delim_choice)
            if delim_choice in delimiter_options:
                break
            print(f"Please enter a number between 1 and {len(delimiter_options)} or press Enter for default.")
        except ValueError:
            print("Please enter a valid number or press Enter for default.")
    
    chosen_delim, _ = delimiter_options[delim_choice]
    print(f"Using delimiter: '{chosen_delim}'")
    
    # Load the full CSV with chosen delimiter and encoding
    try:
        df = pd.read_csv(file_path, encoding=encoding, sep=chosen_delim)
        print(f"\n✅ Loaded {df.shape[0]} rows × {df.shape[1]} columns from {file_path.name}")
    except Exception as e:
        print(f"❌ Failed to load CSV: {e}")
        sys.exit(1)
    
    # Display data overview
    print("\nData Overview:")
    print(f"Column names: {', '.join(df.columns[:5])}, ... (and {len(df.columns)-5} more columns)" if len(df.columns) > 5 else f"Column names: {', '.join(df.columns)}")
    print(f"\nData types (first 5 columns):\n{df.dtypes[:5]}")
    print(f"... (and {len(df.columns)-5} more columns)" if len(df.columns) > 5 else "")
    print("\nSample data:")
    print(df.head())
    
    return df, file_path

## Cell 4: Data validation

In [9]:
def validate_data(df):
    """Validate the data before processing"""
    print("\n=== VALIDATING DATA QUALITY ===")
    
    # Essential columns for route optimization
    required_columns = ['latitude', 'longitude']
    id_columns = ['ABS Custumer no', 'ABS Customer no', 'customer_id']  # Check both spellings
    
    # Check for presence of required columns
    missing_required = [col for col in required_columns if col not in df.columns]
    if missing_required:
        print(f"❌ ERROR: Missing required columns: {', '.join(missing_required)}")
        print("Route optimization requires latitude and longitude coordinates.")
        raise ValueError(f"Missing required columns: {missing_required}")
    else:
        print("✅ Required location columns present")
    
    # Check if any ID column exists
    available_id_columns = [col for col in id_columns if col in df.columns]
    if available_id_columns:
        print(f"✅ ID column(s) found: {', '.join(available_id_columns)}")
    else:
        print("⚠️ Warning: No standard ID column found. Will use row indices for identification.")
    
    # Validate coordinate data
    invalid_lat = df[(df['latitude'] < -90) | (df['latitude'] > 90) | df['latitude'].isna()]
    invalid_lng = df[(df['longitude'] < -180) | (df['longitude'] > 180) | df['longitude'].isna()]
    
    # Report invalid coordinates with more detail
    if len(invalid_lat) > 0:
        print(f"⚠️ Found {len(invalid_lat)} rows with invalid latitude values")
        print(f"Sample of invalid latitudes: {df.loc[invalid_lat.index[:3], 'latitude'].tolist()}")
        print(f"Row indices with bad latitudes: {invalid_lat.index[:5].tolist()}...")
    
    if len(invalid_lng) > 0:
        print(f"⚠️ Found {len(invalid_lng)} rows with invalid longitude values")
        print(f"Sample of invalid longitudes: {df.loc[invalid_lng.index[:3], 'longitude'].tolist()}")
        print(f"Row indices with bad longitudes: {invalid_lng.index[:5].tolist()}...")
    
    valid_coordinates = len(df) - len(pd.concat([invalid_lat, invalid_lng]).drop_duplicates())
    print(f"✅ {valid_coordinates} out of {len(df)} rows ({valid_coordinates/len(df)*100:.1f}%) have valid coordinates")
    
    # Check for duplicate locations (might be intentional but worth noting)
    duplicate_coords = df.duplicated(subset=['latitude', 'longitude'], keep=False)
    if duplicate_coords.any():
        print(f"ℹ️ Found {duplicate_coords.sum()} rows with duplicate coordinates")
        print("   This might be expected if multiple deliveries go to the same location")
        
        # Show some examples of duplicated coordinates
        if duplicate_coords.sum() > 0:
            first_dup_idx = duplicate_coords[duplicate_coords].index[0]
            dup_lat = df.loc[first_dup_idx, 'latitude']
            dup_lng = df.loc[first_dup_idx, 'longitude']
            dups = df[(df['latitude'] == dup_lat) & (df['longitude'] == dup_lng)]
            print(f"   Example: These {len(dups)} rows share coordinates ({dup_lat}, {dup_lng}):")
            print(dups.head(3))
    
    # Check if cluster_name column exists (will be used as depot identifier)
    if 'cluster_name' in df.columns:
        print(f"\nValidating depot assignment column: 'cluster_name'")
        unique_values = df['cluster_name'].unique()
        null_count = df['cluster_name'].isna().sum()
        
        print(f"- Column data type: {df['cluster_name'].dtype}")
        print(f"- Unique values: {len(unique_values)} (showing first 5): {unique_values[:5]}")
        print(f"- Missing values: {null_count} ({null_count/len(df)*100:.1f}%)")
    else:
        print("\n❌ ERROR: 'cluster_name' column not found in the data.")
        print("This script requires the 'cluster_name' column for depot identification.")
        raise ValueError("Missing 'cluster_name' column required for depot identification")
    
    print("\n✅ Data validation complete")
    
    # Create a clean copy without invalid coordinates
    df_clean = df[
        (df['latitude'] >= -90) & (df['latitude'] <= 90) & 
        (df['longitude'] >= -180) & (df['longitude'] <= 180) &
        ~df['latitude'].isna() & ~df['longitude'].isna()
    ].copy()
    
    return df_clean

# Test this function if needed:
# if __name__ == "__main__":
#     # First load data from cell 3
#     # input_path, output_path = setup_project()
#     # df, file_path = load_data(input_path)
#     
#     # Then validate
#     # df_clean = validate_data(df)

## Cell 5: Depot Identification

In [11]:
def create_depot_dataframe(df_clean):
    """Create a dataframe of depots based on the cluster_name column"""
    print("\n=== DEPOT IDENTIFICATION ===")
    
    # Get unique depot values from cluster_name
    depot_names = sorted(df_clean['cluster_name'].unique())
    print(f"Found {len(depot_names)} unique depots: {', '.join([str(d) for d in depot_names])}")
    
    # Create depot dataframe from coordinates
    depot_info = []
    
    for i, depot_name in enumerate(depot_names):
        # Get all rows for this depot
        depot_rows = df_clean[df_clean['cluster_name'] == depot_name]
        
        # Use depot_latitude and depot_longitude if available
        if 'depot_latitude' in df_clean.columns and 'depot_longitude' in df_clean.columns:
            # Use the first row's depot coordinates
            depot_lat = depot_rows['depot_latitude'].iloc[0]
            depot_lng = depot_rows['depot_longitude'].iloc[0]
            
            # Check if all coords are the same for this depot
            if len(depot_rows) > 1:
                lat_same = (depot_rows['depot_latitude'] == depot_lat).all()
                lng_same = (depot_rows['depot_longitude'] == depot_lng).all()
                if not (lat_same and lng_same):
                    print(f"⚠️ Warning: Found different coordinates for depot '{depot_name}'")
                    print(f"Using coordinates from first occurrence: {depot_lat}, {depot_lng}")
        else:
            # If no depot coordinates, use centroid of locations in the cluster
            depot_lat = depot_rows['latitude'].mean()
            depot_lng = depot_rows['longitude'].mean()
            print(f"No depot coordinates found for '{depot_name}'. Using centroid: ({depot_lat:.6f}, {depot_lng:.6f})")
        
        # Count locations assigned to this depot
        location_count = len(depot_rows)
        
        # Store depot info
        depot_info.append({
            'cluster_id': i,
            'depot_name': depot_name,
            'latitude': depot_lat,
            'longitude': depot_lng,
            'location_count': location_count
        })
    
    # Create the depot dataframe
    depot_df = pd.DataFrame(depot_info)
    print(f"\n✅ Created information for {len(depot_df)} depots")
    print("\nDepot information:")
    print(depot_df)
    
    return depot_df

# Test this function if needed:
# if __name__ == "__main__":
#     # Load and validate data first
#     # input_path, output_path = setup_project()
#     # df, file_path = load_data(input_path)
#     # df_clean = validate_data(df)
#     
#     # Then create depot dataframe
#     # depot_df = create_depot_dataframe(df_clean)

## Cell 6: Operational Constraints

In [13]:
def get_operational_constraints():
    """Get operational constraints from user input"""
    print("\n=== OPERATIONAL CONSTRAINTS ===")
    print("Now we'll set some parameters for the route optimization.")
    print("These define the vehicle and time constraints for each route.")
    
    # Validate and get max weight with proper error handling
    while True:
        try:
            weight_input = input("Vehicle weight/package capacity (default 800): ").strip()
            max_weight = float(weight_input) if weight_input else 800
            if max_weight <= 0:
                print("Vehicle capacity must be positive. Please try again.")
                continue
            break
        except ValueError:
            print("Please enter a valid number.")
    
    # Validate and get max time
    while True:
        try:
            time_input = input("Maximum route time in hours (default 8): ").strip()
            hours = float(time_input) if time_input else 8
            if hours <= 0:
                print("Route time must be positive. Please try again.")
                continue
            max_time = hours * 60  # Convert to minutes
            break
        except ValueError:
            print("Please enter a valid number.")
            
    # Validate and get depot service time (new parameter)
    while True:
        try:
            depot_service_input = input("Service time at depot (loading/unloading) in minutes (default 45): ").strip()
            depot_service_time = float(depot_service_input) if depot_service_input else 45
            if depot_service_time < 0:
                print("Depot service time cannot be negative. Please try again.")
                continue
            break
        except ValueError:
            print("Please enter a valid number.")
    
    # Validate and get service time
    while True:
        try:
            service_input = input("Service time per unit in minutes (default 0.5): ").strip()
            service_time_per_unit = float(service_input) if service_input else 0.5
            if service_time_per_unit < 0:
                print("Service time cannot be negative. Please try again.")
                continue
            break
        except ValueError:
            print("Please enter a valid number.")
    
    print(f"\nOptimization parameters:")
    print(f"- Vehicle capacity: {max_weight} units")
    print(f"- Maximum route time: {max_time/60:.1f} hours ({max_time:.0f} minutes)")
    print(f"- Depot service time: {depot_service_time} minutes")
    print(f"- Service time per unit: {service_time_per_unit} minutes")
    
    # Save parameters for reference
    optimization_params = {
        'vehicle_capacity': max_weight,
        'max_route_time_min': max_time,
        'depot_service_time_min': depot_service_time,
        'service_time_per_unit_min': service_time_per_unit
    }
    
    return optimization_params

# Test this function if needed:
# if __name__ == "__main__":
#     optimization_params = get_operational_constraints()

## Cell 7: HERE API testing

In [15]:
def test_here_api_connection(api_key, depot_lat, depot_lng):
    """Test the HERE API connectivity with a simple routing request"""
    print("\nTesting HERE API connectivity...")
    
    # Create a simple request to test connectivity
    url = "https://router.hereapi.com/v8/routes"
    params = {
        'apiKey': api_key,
        'transportMode': 'car',
        'origin': f"{depot_lat},{depot_lng}",
        'destination': f"{depot_lat+0.01},{depot_lng+0.01}",  # Slightly offset destination
        'return': 'summary'
    }
    
    try:
        # Make the API request
        response = requests.get(url, params=params, timeout=10)
        response.raise_for_status()  # Raise exception for HTTP errors
        
        # Parse response
        route_data = response.json()
        
        if 'routes' in route_data and len(route_data['routes']) > 0:
            route_sections = route_data['routes'][0].get('sections', [])
            if route_sections:
                print(f"✅ HERE API connection successful!")
                print(f"Received {len(route_sections)} route sections in response")
                return True
            else:
                print("⚠️ HERE API response does not contain any route sections")
                return False
        else:
            print("⚠️ HERE API response does not contain any routes")
            return False
            
    except requests.exceptions.RequestException as e:
        print(f"❌ HERE API connection failed: {e}")
        return False
    except Exception as e:
        print(f"❌ Error testing HERE API: {e}")
        return False

def setup_distance_cache():
    """Setup a persistent disk-based cache for distance calculations"""
    import sqlite3
    import os
    
    # Create cache directory if it doesn't exist
    cache_dir = Path.cwd() / 'cache'
    os.makedirs(cache_dir, exist_ok=True)
    
    # Create database connection
    db_path = cache_dir / 'distance_cache.db'
    conn = sqlite3.connect(str(db_path))
    cursor = conn.cursor()
    
    # Create tables if they don't exist
    cursor.execute('''
    CREATE TABLE IF NOT EXISTS distance_cache (
        origin_lat REAL,
        origin_lng REAL,
        dest_lat REAL,
        dest_lng REAL,
        distance_km REAL,
        time_min REAL,
        timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
        PRIMARY KEY (origin_lat, origin_lng, dest_lat, dest_lng)
    )
    ''')
    
    cursor.execute('''
    CREATE INDEX IF NOT EXISTS idx_locations ON distance_cache (origin_lat, origin_lng, dest_lat, dest_lng)
    ''')
    
    conn.commit()
    conn.close()
    
    print(f"✅ Distance cache setup complete at {db_path}")
    return str(db_path)

def display_overall_status(status_dict, refresh_interval=1):
    """Display a real-time status message that updates in place"""
    last_message = ""
    
    while not status_dict.get('done', False):
        # Get current status information
        stage = status_dict.get('stage', 'Processing')
        detail = status_dict.get('detail', '')
        progress = status_dict.get('progress', '')
        
        # Format message
        if progress:
            status_msg = f"STATUS: {stage} - {detail} [{progress}]"
        else:
            status_msg = f"STATUS: {stage} - {detail}"
            
        # Only update if message changed
        if status_msg != last_message:
            # Clear line and move cursor to beginning
            sys.stdout.write('\r' + ' ' * 100 + '\r')
            # Print current status
            sys.stdout.write(status_msg)
            sys.stdout.flush()
            last_message = status_msg
            
        time.sleep(refresh_interval)
    
    # Final newline
    sys.stdout.write('\n')
    sys.stdout.flush()

## Cell 8: Routing functions

In [17]:
# Import additional libraries needed for concurrent processing and advanced algorithms
import concurrent.futures
import sqlite3
import itertools
import heapq
import random
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans, DBSCAN
from sklearn.metrics import pairwise_distances
import time as timing_module
import pickle
import os
from math import radians, sin, cos, sqrt, atan2
from pathlib import Path

# Utility to calculate aerial distance - for filtering, not fallback
def haversine_distance(lat1, lon1, lat2, lon2):
    """Calculate the great circle distance between two points on Earth."""
    # Convert decimal degrees to radians
    lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])
    
    # Haversine formula
    dlat = lat2 - lat1
    dlon = lon2 - lon1
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * atan2(sqrt(a), sqrt(1-a))
    distance = 6371 * c  # Radius of Earth in kilometers
    
    return distance

def get_from_cache(db_path, origin, destination):
    """Get distance and time from persistent cache"""
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    
    # Query the cache
    cursor.execute(
        "SELECT distance_km, time_min FROM distance_cache WHERE "
        "origin_lat = ? AND origin_lng = ? AND dest_lat = ? AND dest_lng = ?",
        (origin['latitude'], origin['longitude'], destination['latitude'], destination['longitude'])
    )
    
    result = cursor.fetchone()
    conn.close()
    
    if result:
        return result[0], result[1]
    return None

def add_to_cache(db_path, origin, destination, distance_km, time_min):
    """Add distance and time to persistent cache"""
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    
    # Insert into cache
    try:
        cursor.execute(
            "INSERT OR REPLACE INTO distance_cache (origin_lat, origin_lng, dest_lat, dest_lng, distance_km, time_min) "
            "VALUES (?, ?, ?, ?, ?, ?)",
            (origin['latitude'], origin['longitude'], destination['latitude'], destination['longitude'], 
             distance_km, time_min)
        )
        conn.commit()
    except Exception as e:
        print(f"Warning: Failed to cache results: {e}")
    finally:
        conn.close()

def make_api_request_with_retry(url, params, max_retries=3):
    """Make API request with retry logic and exponential backoff"""
    for attempt in range(max_retries):
        try:
            # Add exponential backoff between retries
            if attempt > 0:
                time.sleep(1 * (2 ** (attempt - 1)))  # 1, 2, 4 seconds
                
            response = requests.get(url, params=params, timeout=10)
            return response
        except requests.exceptions.RequestException as e:
            print(f"Attempt {attempt+1} failed: {e}")
            if attempt == max_retries - 1:
                raise
    return None

def batch_route_request(location_pairs, api_key, batch_size=25):
    """Fallback function that uses standard routing API instead of matrix API"""
    if not location_pairs:
        return {}
    
    # Prepare results dictionary
    results = {}
    
    # Process each pair individually
    print(f"Using individual routing requests instead of batch matrix API (slower)")
    
    # Add progress bar
    with tqdm(total=len(location_pairs), desc="Calculating routes", unit="pair") as pbar:
        for origin, destination in location_pairs:
            # Validate coordinates before making API requests
            try:
                orig_lat = float(origin['latitude'])
                orig_lng = float(origin['longitude'])
                dest_lat = float(destination['latitude'])
                dest_lng = float(destination['longitude'])
                
                # Check for valid coordinate ranges
                if not (-90 <= orig_lat <= 90) or not (-180 <= orig_lng <= 180) or \
                   not (-90 <= dest_lat <= 90) or not (-180 <= dest_lng <= 180):
                    print(f"⚠️ Skipping invalid coordinates: Origin({orig_lat}, {orig_lng}) → Destination({dest_lat}, {dest_lng})")
                    pbar.update(1)
                    continue
                    
            except (ValueError, TypeError, KeyError) as e:
                print(f"⚠️ Error validating coordinates: {e}")
                print(f"Origin: {origin}, Destination: {destination}")
                pbar.update(1)
                continue
            
            # Use the standard routing API
            url = "https://router.hereapi.com/v8/routes"
            
            # Prepare parameters with explicit formatting
            params = {
                'apiKey': api_key,
                'transportMode': 'car',
                'origin': f"{orig_lat:.6f},{orig_lng:.6f}",  # Format with precision
                'destination': f"{dest_lat:.6f},{dest_lng:.6f}",  # Format with precision
                'return': 'summary'
            }
            
            try:
                # Add small delay to avoid rate limiting
                time.sleep(0.3)
                
                # Make the request with retry
                response = make_api_request_with_retry(url, params)
                
                if response and response.status_code == 200:
                    route_data = response.json()
                    
                    # Extract distance and time
                    if 'routes' in route_data and len(route_data['routes']) > 0:
                        route_obj = route_data['routes'][0]
                        sections = route_obj.get('sections', [])
                        
                        if sections:
                            section = sections[0]
                            summary = section.get('summary', {})
                            
                            # Extract distance and time
                            distance_meters = summary.get('length', 0)
                            time_seconds = summary.get('duration', 0)
                            
                            # Convert to km and minutes
                            distance_km = distance_meters / 1000.0
                            time_min = time_seconds / 60.0
                            
                            # Store in results
                            pair_key = (origin['latitude'], origin['longitude'], 
                                        destination['latitude'], destination['longitude'])
                            results[pair_key] = (distance_km, time_min)
                elif response:
                    print(f"⚠️ Warning: API request failed with status code {response.status_code}")
                    print(f"Request parameters: {params}")
                    
                # Update progress bar
                pbar.update(1)
                    
            except Exception as e:
                print(f"❌ Error in route request: {e}")
                pbar.update(1)
    
    print(f"Processed {len(results)} out of {len(location_pairs)} route segments")
    return results

def get_route_segment_metrics(origin, destination, api_key, cache_db_path):
    """Calculate the distance and time between two points using HERE API with caching."""
    
    # Skip if coordinates are the same
    if (origin['latitude'] == destination['latitude'] and 
        origin['longitude'] == destination['longitude']):
        return (0, 0)
    
    # Validate coordinates first
    try:
        orig_lat = float(origin['latitude'])
        orig_lng = float(origin['longitude'])
        dest_lat = float(destination['latitude'])
        dest_lng = float(destination['longitude'])
        
        # Check for valid coordinate ranges
        if not (-90 <= orig_lat <= 90) or not (-180 <= orig_lng <= 180) or \
           not (-90 <= dest_lat <= 90) or not (-180 <= dest_lng <= 180):
            print(f"⚠️ Invalid coordinate values: Origin({orig_lat}, {orig_lng}) → Destination({dest_lat}, {dest_lng})")
            return None
            
    except (ValueError, TypeError, KeyError) as e:
        print(f"⚠️ Error validating coordinates: {e}")
        print(f"Origin: {origin}, Destination: {destination}")
        return None
    
    # Check cache first
    cached_result = get_from_cache(cache_db_path, origin, destination)
    if cached_result:
        return cached_result
    
    # Add rate limiting protection - sleep between requests
    time.sleep(0.3)  # Add slight delay to avoid rate limiting
    
    # Make a direct API call for this segment using validated coordinates
    url = "https://router.hereapi.com/v8/routes"
    
    # Prepare parameters with explicit formatting of coordinates
    params = {
        'apiKey': api_key,
        'transportMode': 'car',
        'origin': f"{orig_lat:.6f},{orig_lng:.6f}",  # Format with precision
        'destination': f"{dest_lat:.6f},{dest_lng:.6f}",  # Format with precision
        'return': 'summary'
    }
    
    try:
        # Simple GET request for this segment with retry
        response = make_api_request_with_retry(url, params)
        
        if not response or response.status_code != 200:
            if response:
                print(f"❌ HERE API error: Status code {response.status_code}")
                if response.status_code == 400:
                    print(f"Request parameters: {params}")
            return None
        
        # Parse response
        route_data = response.json()
        
        # Extract distance and time
        if 'routes' in route_data and len(route_data['routes']) > 0:
            route_obj = route_data['routes'][0]
            sections = route_obj.get('sections', [])
            
            if sections:
                section = sections[0]
                summary = section.get('summary', {})
                
                # Extract distance and time
                distance_meters = summary.get('length', 0)
                time_seconds = summary.get('duration', 0)
                
                # Convert to km and minutes
                distance_km = distance_meters / 1000.0
                time_min = time_seconds / 60.0
                
                # Cache the result
                add_to_cache(cache_db_path, origin, destination, distance_km, time_min)
                
                return distance_km, time_min
        
        print("⚠️ Warning: Unable to extract route metrics from API response")
        return None
            
    except Exception as e:
        print(f"❌ Error calculating route segment: {e}")
        return None

# ALNS implementation for Vehicle Routing Problem
class ALNS:
    """Adaptive Large Neighborhood Search for Vehicle Routing Problem"""
    
    def __init__(self, depot, locations, max_weight, max_time, service_time_per_unit, api_key, cache_db_path):
        """Initialize ALNS algorithm with problem data"""
        self.depot = depot
        self.locations = locations
        self.max_weight = max_weight
        self.max_time = max_time
        self.service_time_per_unit = service_time_per_unit
        self.api_key = api_key
        self.cache_db_path = cache_db_path
        
        # Pre-calculate weights and service times
        self.location_weights = {}
        self.service_times = {}
        for idx, location in locations.iterrows():
            # Extract weight with fallback options
            weight = 1.0
            if 'Net Weight' in location and pd.notna(location['Net Weight']):
                weight = float(location['Net Weight'])
            elif 'weight' in location and pd.notna(location['weight']):
                weight = float(location['weight'])
            
            self.location_weights[idx] = weight
            self.service_times[idx] = weight * service_time_per_unit
        
        # Pre-calculate depot distance for each location
        self.depot_distances = {}
        self.depot_times = {}
        
        # Build aerial distance matrix for quick filtering
        self.aerial_distances = {}
        coords = [(idx, loc['latitude'], loc['longitude']) for idx, loc in locations.iterrows()]
        
        # Calculate aerial distances between all pairs
        for i, (idx1, lat1, lng1) in enumerate(coords):
            # Calculate depot distance
            depot_dist = haversine_distance(depot['latitude'], depot['longitude'], lat1, lng1)
            self.aerial_distances[(depot['latitude'], depot['longitude'], lat1, lng1)] = depot_dist
            
            for j, (idx2, lat2, lng2) in enumerate(coords):
                if i != j:
                    dist = haversine_distance(lat1, lng1, lat2, lng2)
                    self.aerial_distances[(lat1, lng1, lat2, lng2)] = dist
    
    def destroy_random(self, solution, destroy_count):
        """Remove random locations from routes"""
        removed_locations = []
        all_locations = []
        
        # Collect all locations from all routes
        for route in solution:
            # Skip depot markers
            for loc_id in route['location_ids'][1:-1]:
                if loc_id != 'DEPOT_START' and loc_id != 'DEPOT_END':
                    all_locations.append(loc_id)
        
        # Randomly select locations to remove
        if all_locations:
            destroy_count = min(destroy_count, len(all_locations))
            to_remove = random.sample(all_locations, destroy_count)
            
            # Remove selected locations from routes
            new_solution = []
            for route in solution:
                new_route = route.copy()
                new_locs = []
                for loc_id in route['location_ids']:
                    if loc_id in to_remove:
                        if loc_id != 'DEPOT_START' and loc_id != 'DEPOT_END':
                            removed_locations.append(loc_id)
                    else:
                        new_locs.append(loc_id)
                
                # Update route
                new_route['location_ids'] = new_locs
                
                # Only keep routes with customers (apart from depot start/end)
                if len(new_locs) > 2:  # DEPOT_START, customers, DEPOT_END
                    new_solution.append(new_route)
                    
            return new_solution, removed_locations
        
        return solution, []
    
    def destroy_related(self, solution, destroy_count):
        """Remove related (geographically close) locations from routes"""
        if not solution:
            return solution, []
        
        removed_locations = []
        
        # Randomly select a seed location
        all_locations = []
        for route in solution:
            for loc_id in route['location_ids'][1:-1]:
                if loc_id != 'DEPOT_START' and loc_id != 'DEPOT_END':
                    all_locations.append(loc_id)
        
        if not all_locations:
            return solution, []
        
        # Select a random seed
        seed_id = random.choice(all_locations)
        seed_location = None
        
        # Find the seed location data
        for idx, location in self.locations.iterrows():
            if ('ABS Custumer no' in location and location['ABS Custumer no'] == seed_id) or \
               ('customer_id' in location and location['customer_id'] == seed_id) or \
               idx == seed_id:
                seed_location = location
                break
        
        if seed_location is None:
            return self.destroy_random(solution, destroy_count)
        
        # Calculate distances from seed to all other locations
        distances_to_seed = []
        for idx, location in self.locations.iterrows():
            loc_id = None
            if 'ABS Custumer no' in location:
                loc_id = location['ABS Custumer no']
            elif 'customer_id' in location:
                loc_id = location['customer_id']
            else:
                loc_id = idx
                
            if loc_id in all_locations and loc_id != seed_id:
                # Use precomputed aerial distance
                aerial_key = (seed_location['latitude'], seed_location['longitude'], 
                             location['latitude'], location['longitude'])
                
                # Handle reverse key if needed
                if aerial_key not in self.aerial_distances:
                    aerial_key = (location['latitude'], location['longitude'],
                                 seed_location['latitude'], seed_location['longitude'])
                
                distance = self.aerial_distances.get(aerial_key, float('inf'))
                distances_to_seed.append((loc_id, distance))
        
        # Sort by distance
        distances_to_seed.sort(key=lambda x: x[1])
        
        # Select closest locations
        destroy_count = min(destroy_count - 1, len(distances_to_seed))
        to_remove = [seed_id] + [loc_id for loc_id, _ in distances_to_seed[:destroy_count]]
        
        # Remove selected locations from routes
        new_solution = []
        for route in solution:
            new_route = route.copy()
            new_locs = []
            for loc_id in route['location_ids']:
                if loc_id in to_remove:
                    if loc_id != 'DEPOT_START' and loc_id != 'DEPOT_END':
                        removed_locations.append(loc_id)
                else:
                    new_locs.append(loc_id)
            
            # Update route
            new_route['location_ids'] = new_locs
            
            # Only keep routes with customers (apart from depot start/end)
            if len(new_locs) > 2:  # DEPOT_START, customers, DEPOT_END
                new_solution.append(new_route)
        
        return new_solution, removed_locations
    
    def destroy_worst(self, solution, destroy_count):
        """Remove locations that contribute most to total route cost"""
        if not solution:
            return solution, []
        
        removed_locations = []
        location_costs = []
        
        # Calculate cost contribution for each location
        for route_idx, route in enumerate(solution):
            # Skip depot markers
            customer_locs = [loc for loc in route['location_ids'][1:-1] 
                            if loc != 'DEPOT_START' and loc != 'DEPOT_END']
            
            for i, loc_id in enumerate(customer_locs):
                # Cost is distance from previous to this + this to next
                cost = 0
                
                # Get the location
                loc_data = None
                for idx, location in self.locations.iterrows():
                    if ('ABS Custumer no' in location and location['ABS Custumer no'] == loc_id) or \
                       ('customer_id' in location and location['customer_id'] == loc_id) or \
                       idx == loc_id:
                        loc_data = location
                        break
                
                if loc_data is None:
                    continue
                
                # Add to cost list
                location_costs.append((loc_id, cost))
        
        # Sort by cost (descending)
        location_costs.sort(key=lambda x: x[1], reverse=True)
        
        # Select highest cost locations
        destroy_count = min(destroy_count, len(location_costs))
        to_remove = [loc_id for loc_id, _ in location_costs[:destroy_count]]
        
        # Remove selected locations from routes
        new_solution = []
        for route in solution:
            new_route = route.copy()
            new_locs = []
            for loc_id in route['location_ids']:
                if loc_id in to_remove:
                    if loc_id != 'DEPOT_START' and loc_id != 'DEPOT_END':
                        removed_locations.append(loc_id)
                else:
                    new_locs.append(loc_id)
            
            # Update route
            new_route['location_ids'] = new_locs
            
            # Only keep routes with customers (apart from depot start/end)
            if len(new_locs) > 2:  # DEPOT_START, customers, DEPOT_END
                new_solution.append(new_route)
        
        return new_solution, removed_locations
    
    def repair_greedy(self, solution, removed_locations):
        """Insert removed locations back into routes greedily"""
        if not removed_locations:
            return solution
        
        # Create copy of solution for modification
        new_solution = [route.copy() for route in solution]
        
        # Create a new route if no routes exist
        if not new_solution:
            new_route = {
                'route_id': '1',
                'vehicle_number': 1,
                'depot_name': self.depot['depot_name'],
                'location_ids': ['DEPOT_START', 'DEPOT_END'],
                'weight': 0,
                'distance_km': 0,
                'time_min': 45,  # Depot service time
                'stops': 0
            }
            new_solution = [new_route]
        
        # For each removed location, find best insertion position
        for loc_id in removed_locations:
            best_cost_increase = float('inf')
            best_route_idx = -1
            best_position = -1
            
            # Get location data
            loc_data = None
            for idx, location in self.locations.iterrows():
                if ('ABS Custumer no' in location and location['ABS Custumer no'] == loc_id) or \
                   ('customer_id' in location and location['customer_id'] == loc_id) or \
                   idx == loc_id:
                    loc_data = location
                    break
            
            if loc_data is None:
                continue
            
            # Get weight of location
            loc_weight = self.location_weights.get(loc_id, 1.0)
            
            # Try inserting in each route at each position
            for route_idx, route in enumerate(new_solution):
                # Check if adding this location would exceed capacity
                if route['weight'] + loc_weight > self.max_weight:
                    continue
                
                # Get customer locations in route
                route_locs = route['location_ids']
                
                # Try inserting at each position
                for pos in range(1, len(route_locs)):  # Skip inserting at DEPOT_START (0)
                    # Calculate cost increase of inserting here
                    cost_increase = 0  # This is a simplification - in real implementation, calculate actual cost
                    
                    # If this is better than current best, update
                    if cost_increase < best_cost_increase:
                        best_cost_increase = cost_increase
                        best_route_idx = route_idx
                        best_position = pos
            
            # If no feasible insertion found, create a new route
            if best_route_idx == -1:
                new_route = {
                    'route_id': str(len(new_solution) + 1),
                    'vehicle_number': len(new_solution) + 1,
                    'depot_name': self.depot['depot_name'],
                    'location_ids': ['DEPOT_START', loc_id, 'DEPOT_END'],
                    'weight': loc_weight,
                    'distance_km': 0,  # Placeholder, recalculated later
                    'time_min': 45,    # Initial depot service time
                    'stops': 1
                }
                new_solution.append(new_route)
            else:
                # Insert at best position
                route = new_solution[best_route_idx]
                route['location_ids'].insert(best_position, loc_id)
                route['weight'] += loc_weight
                route['stops'] += 1
        
        # Recalculate route metrics (distance, time) for all modified routes
        for route in new_solution:
            self.recalculate_route_metrics(route)
        
        return new_solution
    
    def repair_regret(self, solution, removed_locations):
        """Insert removed locations using regret heuristic"""
        if not removed_locations:
            return solution
        
        # Create copy of solution for modification
        new_solution = [route.copy() for route in solution]
        
        # Create a new route if no routes exist
        if not new_solution:
            new_route = {
                'route_id': '1',
                'vehicle_number': 1,
                'depot_name': self.depot['depot_name'],
                'location_ids': ['DEPOT_START', 'DEPOT_END'],
                'weight': 0,
                'distance_km': 0,
                'time_min': 45,  # Depot service time
                'stops': 0
            }
            new_solution = [new_route]
        
        # While there are locations to insert
        remaining_locations = removed_locations.copy()
        
        while remaining_locations:
            best_regret = -float('inf')
            best_location = None
            best_route_idx = -1
            best_position = -1
            
            # For each remaining location
            for loc_id in remaining_locations:
                # Get location data
                loc_data = None
                for idx, location in self.locations.iterrows():
                    if ('ABS Custumer no' in location and location['ABS Custumer no'] == loc_id) or \
                       ('customer_id' in location and location['customer_id'] == loc_id) or \
                       idx == loc_id:
                        loc_data = location
                        break
                
                if loc_data is None:
                    continue
                
                # Get weight of location
                loc_weight = self.location_weights.get(loc_id, 1.0)
                
                # Calculate insertion costs for all positions in all routes
                insertion_costs = []
                route_positions = []
                
                for route_idx, route in enumerate(new_solution):
                    # Check if adding this location would exceed capacity
                    if route['weight'] + loc_weight > self.max_weight:
                        continue
                    
                    # Get customer locations in route
                    route_locs = route['location_ids']
                    
                    # Try inserting at each position
                    for pos in range(1, len(route_locs)):  # Skip inserting at DEPOT_START (0)
                        # Calculate cost increase of inserting here
                        cost_increase = 0  # This is a simplification - in real implementation, calculate actual cost
                        
                        insertion_costs.append(cost_increase)
                        route_positions.append((route_idx, pos))
                
                # Sort costs and calculate regret
                if insertion_costs:
                    insertion_costs, route_positions = zip(*sorted(zip(insertion_costs, route_positions)))
                    
                    # Calculate regret value (difference between best and second best)
                    if len(insertion_costs) >= 2:
                        regret = insertion_costs[1] - insertion_costs[0]
                    else:
                        regret = insertion_costs[0]
                    
                    # If this location has higher regret, it becomes the candidate
                    if regret > best_regret:
                        best_regret = regret
                        best_location = loc_id
                        best_route_idx, best_position = route_positions[0]
            
            # If a location was selected
            if best_location is not None:
                # Insert at best position
                if best_route_idx != -1:
                    route = new_solution[best_route_idx]
                    route['location_ids'].insert(best_position, best_location)
                    route['weight'] += self.location_weights.get(best_location, 1.0)
                    route['stops'] += 1
                else:
                    # Create new route if no feasible insertion
                    new_route = {
                        'route_id': str(len(new_solution) + 1),
                        'vehicle_number': len(new_solution) + 1,
                        'depot_name': self.depot['depot_name'],
                        'location_ids': ['DEPOT_START', best_location, 'DEPOT_END'],
                        'weight': self.location_weights.get(best_location, 1.0),
                        'distance_km': 0,  # Placeholder, recalculated later
                        'time_min': 45,    # Initial depot service time
                        'stops': 1
                    }
                    new_solution.append(new_route)
                
                # Remove this location from the list
                remaining_locations.remove(best_location)
            else:
                # If no feasible insertion was found for any location, create a new route
                if remaining_locations:
                    loc_id = remaining_locations[0]
                    new_route = {
                        'route_id': str(len(new_solution) + 1),
                        'vehicle_number': len(new_solution) + 1,
                        'depot_name': self.depot['depot_name'],
                        'location_ids': ['DEPOT_START', loc_id, 'DEPOT_END'],
                        'weight': self.location_weights.get(loc_id, 1.0),
                        'distance_km': 0,  # Placeholder, recalculated later
                        'time_min': 45,    # Initial depot service time
                        'stops': 1
                    }
                    new_solution.append(new_route)
                    remaining_locations.remove(loc_id)
                else:
                    break
        
        # Recalculate route metrics for all modified routes
        for route in new_solution:
            self.recalculate_route_metrics(route)
        
        return new_solution
    
    def recalculate_route_metrics(self, route):
        """Recalculate distance, time, and other metrics for a route"""
        # This is a simplified version - in real implementation, use actual API data
        route['distance_km'] = 0
        route['time_min'] = 45  # Initial depot service time
        
        # Return if route has no customers
        if len(route['location_ids']) <= 2:
            return
        
        # Get location IDs excluding depot
        location_ids = route['location_ids'][1:-1]  # Skip DEPOT_START and DEPOT_END
        
        # Calculate route metrics
        prev_loc = self.depot
        total_distance = 0
        total_time = 45  # Start with depot service time
        
        for loc_id in location_ids:
            # Find location data
            loc_data = None
            for idx, location in self.locations.iterrows():
                if ('ABS Custumer no' in location and location['ABS Custumer no'] == loc_id) or \
                   ('customer_id' in location and location['customer_id'] == loc_id) or \
                   idx == loc_id:
                    loc_data = location
                    break
            
            if loc_data is None:
                continue
            
            # Get routing metrics from HERE API (or cache)
            result = get_route_segment_metrics(prev_loc, loc_data, self.api_key, self.cache_db_path)
            
            if result is not None:
                distance_km, time_min = result
                total_distance += distance_km
                total_time += time_min
                
                # Add service time
                service_time = self.service_times.get(loc_id, 0.5)
                total_time += service_time
            
            # Update previous location
            prev_loc = loc_data
        
        # Add return to depot
        result = get_route_segment_metrics(prev_loc, self.depot, self.api_key, self.cache_db_path)
        
        if result is not None:
            distance_km, time_min = result
            total_distance += distance_km
            total_time += time_min + 45  # Add depot service time
        
        # Update route metrics
        route['distance_km'] = total_distance
        route['time_min'] = total_time
    
    def run(self, iterations=50):
        """Run ALNS algorithm to optimize routes"""
        # Create initial solution with greedy insertion
        print("Creating initial solution with greedy insertion...")
        current_solution = self.create_initial_solution()
        
        best_solution = current_solution.copy()
        best_cost = self.evaluate_solution(best_solution)
        
        # Simulated annealing parameters
        temperature = 100
        cooling_rate = 0.95
        
        # Destroy and repair operators with weights
        destroy_ops = [
            (self.destroy_random, 1.0),
            (self.destroy_related, 1.0),
            (self.destroy_worst, 1.0)
        ]
        
        repair_ops = [
            (self.repair_greedy, 1.0),
            (self.repair_regret, 1.0)
        ]
        
        # Weights for adaptive operator selection
        destroy_weights = [1.0] * len(destroy_ops)
        repair_weights = [1.0] * len(repair_ops)
        
        # Performance tracking
        destroy_scores = [0] * len(destroy_ops)
        repair_scores = [0] * len(repair_ops)
        
        print(f"Starting ALNS with {iterations} iterations...")
        
        # Add progress bar for iterations
        with tqdm(total=iterations, desc="ALNS Optimization", unit="iter") as pbar:
            for i in range(iterations):
                # Select destroy and repair operators
                destroy_idx = self.select_operator(destroy_weights)
                repair_idx = self.select_operator(repair_weights)
                
                destroy_op = destroy_ops[destroy_idx][0]
                repair_op = repair_ops[repair_idx][0]
                
                # Calculate destroy percentage (adjust over time)
                destroy_percentage = 0.4 - (0.3 * i / iterations)
                destroy_count = max(1, int(len(self.locations) * destroy_percentage))
                
                # Apply destroy operator
                destroyed_solution, removed_locations = destroy_op(current_solution, destroy_count)
                
                # Apply repair operator
                new_solution = repair_op(destroyed_solution, removed_locations)
                
                # Evaluate new solution
                new_cost = self.evaluate_solution(new_solution)
                current_cost = self.evaluate_solution(current_solution)
                
                # Update best solution if better
                if new_cost < best_cost:
                    best_solution = new_solution.copy()
                    best_cost = new_cost
                    score = 10  # Big improvement score
                    pbar.set_description(f"New best! Cost: {best_cost:.2f}")
                else:
                    # Accept with probability based on temperature
                    delta = new_cost - current_cost
                    acceptance_prob = math.exp(-delta / temperature) if delta > 0 else 1.0
                    
                    if random.random() < acceptance_prob:
                        current_solution = new_solution
                        score = 5 if new_cost < current_cost else 3  # Small improvement or accepted worse
                    else:
                        score = 1  # Not accepted
                
                # Update operator scores
                destroy_scores[destroy_idx] += score
                repair_scores[repair_idx] += score
                
                # Cool down temperature
                temperature *= cooling_rate
                
                # Every 10 iterations, update operator weights
                if (i + 1) % 10 == 0:
                    # Update weights
                    destroy_weights = self.update_weights(destroy_weights, destroy_scores)
                    repair_weights = self.update_weights(repair_weights, repair_scores)
                    
                    # Reset scores
                    destroy_scores = [0] * len(destroy_ops)
                    repair_scores = [0] * len(repair_ops)
                    
                # Update progress bar with cost information
                pbar.set_postfix(best_cost=f"{best_cost:.2f}", current_cost=f"{current_cost:.2f}")
                pbar.update(1)
        
        # Recalculate metrics for best solution
        for route in best_solution:
            self.recalculate_route_metrics(route)
        
        # Ensure route IDs and vehicle numbers are sequential
        for i, route in enumerate(best_solution):
            route['route_id'] = str(i + 1)
            route['vehicle_number'] = i + 1
        
        print(f"ALNS completed. Best solution has {len(best_solution)} routes with total cost {best_cost:.2f}")
        
        return best_solution
    
    def create_initial_solution(self):
        """Create initial solution with greedy insertion"""
        # Sort locations by distance from depot
        location_distances = []
        for idx, location in self.locations.iterrows():
            # Calculate aerial distance to depot
            dist = haversine_distance(
                self.depot['latitude'], self.depot['longitude'],
                location['latitude'], location['longitude']
            )
            loc_id = None
            if 'ABS Custumer no' in location:
                loc_id = location['ABS Custumer no']
            elif 'customer_id' in location:
                loc_id = location['customer_id']
            else:
                loc_id = idx
                
            location_distances.append((loc_id, dist))
        
        # Sort by distance (ascending)
        location_distances.sort(key=lambda x: x[1])
        
        # Initialize solution
        solution = []
        
        # Create routes by inserting locations in order
        current_route = {
            'route_id': '1',
            'vehicle_number': 1,
            'depot_name': self.depot['depot_name'],
            'location_ids': ['DEPOT_START', 'DEPOT_END'],
            'weight': 0,
            'distance_km': 0,
            'time_min': 90,  # Round trip depot service time
            'stops': 0
        }
        
        for loc_id, _ in location_distances:
            # Get location weight
            loc_weight = self.location_weights.get(loc_id, 1.0)
            
            # Check if adding to current route would exceed capacity
            if current_route['weight'] + loc_weight <= self.max_weight:
                # Insert before DEPOT_END
                current_route['location_ids'].insert(-1, loc_id)
                current_route['weight'] += loc_weight
                current_route['stops'] += 1
            else:
                # Recalculate route metrics
                self.recalculate_route_metrics(current_route)
                
                # Add route to solution if it has any stops
                if current_route['stops'] > 0:
                    solution.append(current_route)
                
                # Create new route
                current_route = {
                    'route_id': str(len(solution) + 1),
                    'vehicle_number': len(solution) + 1,
                    'depot_name': self.depot['depot_name'],
                    'location_ids': ['DEPOT_START', loc_id, 'DEPOT_END'],
                    'weight': loc_weight,
                    'distance_km': 0,
                    'time_min': 90,
                    'stops': 1
                }
        
        # Add last route if it has any stops
        if current_route['stops'] > 0:
            self.recalculate_route_metrics(current_route)
            solution.append(current_route)
        
        return solution
    
    def evaluate_solution(self, solution):
        """Evaluate solution quality (lower is better)"""
        if not solution:
            return float('inf')
        
        # Cost components
        total_distance = sum(route['distance_km'] for route in solution)
        total_time = sum(route['time_min'] for route in solution)
        vehicle_count = len(solution)
        
        # Calculate cost (weighted sum)
        cost = 0.4 * total_distance + 0.3 * (total_time / 60) + 0.3 * vehicle_count * 100
        
        return cost
    
    def select_operator(self, weights):
        """Select operator based on weights"""
        total = sum(weights)
        r = random.random() * total
        
        cumulative = 0
        for i, weight in enumerate(weights):
            cumulative += weight
            if r <= cumulative:
                return i
        
        return len(weights) - 1
    
    def update_weights(self, weights, scores):
        """Update operator weights based on scores"""
        # Reaction factor (how quickly weights adapt)
        reaction = 0.8
        
        # Apply score adjustments
        new_weights = []
        for i, (weight, score) in enumerate(zip(weights, scores)):
            # Avoid division by zero
            if sum(scores) > 0:
                new_weight = weight * (1 - reaction) + reaction * (score / sum(scores))
            else:
                new_weight = weight
            
            # Ensure minimum weight
            new_weight = max(0.1, new_weight)
            new_weights.append(new_weight)
        
        return new_weights

def process_cluster_matrix(cluster_id, pairs, locations_df, api_key, cache_db_path):
    """Process distance matrix for a cluster using individual API calls"""
    # Skip if no pairs
    if not pairs:
        return cluster_id, None
        
    print(f"Processing {len(pairs)} location pairs for cluster {cluster_id}...")
    
    # Get batch results using standard routing API with progress bar
    batch_results = {}
    with tqdm(total=len(pairs), desc=f"Cluster {cluster_id}", unit="pair") as pbar:
        batch_size = 50
        for i in range(0, len(pairs), batch_size):
            batch = pairs[i:i+batch_size]
            batch_result = batch_route_request(batch, api_key)
            batch_results.update(batch_result)
            pbar.update(len(batch))
    
    # Convert to distance/time matrices
    cluster_locs = locations_df[locations_df['geo_cluster'] == cluster_id]
    indices = cluster_locs.index.tolist()
    n_locs = len(indices)
    
    # Initialize matrices
    dist_matrix = np.zeros((n_locs, n_locs))
    time_matrix = np.zeros((n_locs, n_locs))
    
    # Fill matrices with results
    for i, idx1 in enumerate(indices):
        loc1 = cluster_locs.loc[idx1]
        
        for j, idx2 in enumerate(indices):
            if i == j:
                # Zero distance for same location
                dist_matrix[i, j] = 0
                time_matrix[i, j] = 0
                continue
                
            loc2 = cluster_locs.loc[idx2]
            
            # Try to get from batch results
            pair_key = (loc1['latitude'], loc1['longitude'], loc2['latitude'], loc2['longitude'])
            
            # Try reverse key if not found
            if pair_key not in batch_results:
                pair_key = (loc2['latitude'], loc2['longitude'], loc1['latitude'], loc1['longitude'])
            
            if pair_key in batch_results:
                distance, time = batch_results[pair_key]
                dist_matrix[i, j] = distance
                time_matrix[i, j] = time
            else:
                # If not in batch results, get individual result
                result = get_route_segment_metrics(
                    {'latitude': loc1['latitude'], 'longitude': loc1['longitude']},
                    {'latitude': loc2['latitude'], 'longitude': loc2['longitude']},
                    api_key, cache_db_path
                )
                
                if result:
                    distance, time = result
                    dist_matrix[i, j] = distance
                    time_matrix[i, j] = time
                else:
                    # If no result, use placeholder values
                    dist_matrix[i, j] = float('inf')  # Make this path unattractive for routing
                    time_matrix[i, j] = float('inf')
    
    # Return matrices with indices
    return cluster_id, {
        'indices': indices,
        'dist_matrix': dist_matrix,
        'time_matrix': time_matrix
    }

def optimize_routes_global(locations_df, depot_info, max_weight, max_time, service_time_per_unit, depot_service_time=45):
    """
    Global optimization algorithm for vehicle routing problem.
    Creates efficiently packed routes that minimize total distance and time.
    Includes improved hierarchical distance calculation and ALNS optimization.
    """
    import numpy as np
    import pandas as pd
    from sklearn.cluster import KMeans
    import time as timing_module
    import concurrent.futures
    
    # Record start time for performance tracking
    start_time = timing_module.time()
    
    # Make sure we have a valid HERE API key
    api_key = load_api_key()
    if not api_key:
        print("❌ ERROR: No HERE API key found. Route optimization requires HERE API.")
        print("Please check the api_keys.json file and try again.")
        raise ValueError("Missing HERE API key required for route optimization")
    
    # Setup the distance cache
    cache_db_path = setup_distance_cache()
    
    print(f"Starting global optimization with {len(locations_df)} locations")
    print(f"Constraints: max_weight={max_weight}, max_time={max_time} min, depot_service_time={depot_service_time} min")
    
    # Calculate weight for each location
    locations_df = locations_df.copy()
    
    # Compute weights with fallback options
    def compute_weight(row):
        if 'Net Weight' in row and pd.notna(row['Net Weight']):
            return float(row['Net Weight'])
        elif 'weight' in row and pd.notna(row['weight']):
            return float(row['weight'])
        else:
            return 1.0
    
    locations_df['_weight'] = locations_df.apply(compute_weight, axis=1)
    locations_df['_service_time'] = locations_df['_weight'] * service_time_per_unit
    
    # STEP 1: HIERARCHICAL CLUSTERING APPROACH
    # ------------------------------------------------------------
    print("\nSTEP 1: Hierarchical clustering of locations...")
    
    # Extract coordinates for clustering
    coordinates = locations_df[['latitude', 'longitude']].values
    
    # Calculate total weight
    total_weight = locations_df['_weight'].sum()
    
    # Determine minimum number of vehicles needed based on weight
    min_vehicles_needed = max(1, int(np.ceil(total_weight / max_weight)))
    print(f"Minimum vehicles needed based on total weight ({total_weight:.1f}): {min_vehicles_needed}")
    
    # Determine clustering approach based on dataset size
    if len(locations_df) <= 10:
        # For small datasets, use simple clustering (or none if all fit in one vehicle)
        if total_weight <= max_weight * 0.9:
            num_clusters = 1
            print(f"All locations can fit in a single vehicle (total weight: {total_weight:.1f} / {max_weight})")
        else:
            num_clusters = min_vehicles_needed
            print(f"Using {num_clusters} clusters for small dataset")
        
        # Use K-means for small datasets
        if num_clusters > 1:
            kmeans = KMeans(n_clusters=num_clusters, random_state=42, n_init=10)
            locations_df['geo_cluster'] = kmeans.fit_predict(coordinates)
        else:
            locations_df['geo_cluster'] = 0
    else:
        # For larger datasets, use a hierarchical approach
        print("Using hierarchical clustering approach for larger dataset")
        
        # Step 1.1: First use DBSCAN to identify dense regions and outliers
        eps = 0.1  # approximately 10km
        min_samples = 3
        dbscan = DBSCAN(eps=eps, min_samples=min_samples, metric='haversine')
        locations_df['density_cluster'] = dbscan.fit_predict(np.radians(coordinates))
        
        # Count outliers (-1 cluster)
        n_outliers = sum(locations_df['density_cluster'] == -1)
        n_clusters = len(set(locations_df['density_cluster'])) - (1 if -1 in locations_df['density_cluster'] else 0)
        print(f"DBSCAN identified {n_clusters} dense regions and {n_outliers} outliers")
        
        # Step 1.2: For each dense region, apply K-means to create sub-clusters
        final_clusters = []
        next_cluster_id = 0
        
        # Process dense regions
        for cluster_id in set(locations_df['density_cluster']):
            if cluster_id == -1:
                continue  # Handle outliers separately
                
            # Get locations in this dense region
            cluster_df = locations_df[locations_df['density_cluster'] == cluster_id]
            cluster_coords = cluster_df[['latitude', 'longitude']].values
            cluster_weight = cluster_df['_weight'].sum()
            
            # Determine number of sub-clusters needed
            n_subclusters = max(1, int(np.ceil(cluster_weight / max_weight)))
            
            if len(cluster_df) <= n_subclusters:
                # If few locations, assign each to its own cluster
                for i, idx in enumerate(cluster_df.index):
                    locations_df.loc[idx, 'geo_cluster'] = next_cluster_id + i
                next_cluster_id += len(cluster_df)
            else:
                # Apply K-means to create sub-clusters
                sub_kmeans = KMeans(n_clusters=n_subclusters, random_state=42, n_init=10)
                sub_clusters = sub_kmeans.fit_predict(cluster_coords)
                
                # Assign sub-cluster IDs
                for i, idx in enumerate(cluster_df.index):
                    locations_df.loc[idx, 'geo_cluster'] = next_cluster_id + sub_clusters[i]
                
                next_cluster_id += n_subclusters
        
        # Step 1.3: Handle outliers
        outlier_df = locations_df[locations_df['density_cluster'] == -1]
        
        if len(outlier_df) > 0:
            # For outliers, use K-means with higher number of clusters
            outlier_coords = outlier_df[['latitude', 'longitude']].values
            outlier_weight = outlier_df['_weight'].sum()
            
            # Determine number of clusters for outliers
            n_outlier_clusters = max(1, int(np.ceil(outlier_weight / max_weight)))
            
            if len(outlier_df) <= n_outlier_clusters:
                # If few outliers, assign each to its own cluster
                for i, idx in enumerate(outlier_df.index):
                    locations_df.loc[idx, 'geo_cluster'] = next_cluster_id + i
                next_cluster_id += len(outlier_df)
            else:
                # Apply K-means for outliers
                outlier_kmeans = KMeans(n_clusters=n_outlier_clusters, random_state=42, n_init=10)
                outlier_clusters = outlier_kmeans.fit_predict(outlier_coords)
                
                # Assign outlier cluster IDs
                for i, idx in enumerate(outlier_df.index):
                    locations_df.loc[idx, 'geo_cluster'] = next_cluster_id + outlier_clusters[i]
                
                next_cluster_id += n_outlier_clusters
    
    # Final cluster information
    num_clusters = len(locations_df['geo_cluster'].unique())
    print(f"Final clustering: {num_clusters} clusters for optimization")
    
    # STEP 2: EFFICIENT BATCH DISTANCE CALCULATION
    # ------------------------------------------------------------
    print("\nSTEP 2: Building distance matrices with batch processing...")
    
    # Calculate depot distances for all locations
    print(f"Calculating distances from depot to all {len(locations_df)} locations using batch API...")
    
    # Prepare location pairs for batch processing
    depot_location_pairs = []
    
    for _, loc in locations_df.iterrows():
        depot_location_pairs.append((
            {'latitude': depot_info['latitude'], 'longitude': depot_info['longitude']},
            {'latitude': loc['latitude'], 'longitude': loc['longitude']}
        ))
    
    # Process depot distances in batches
    depot_batch_results = batch_route_request(depot_location_pairs, api_key)
    
    # Extract distances and times from batch results
    distances_from_depot = []
    times_from_depot = []
    
    for _, loc in locations_df.iterrows():
        pair_key = (depot_info['latitude'], depot_info['longitude'], loc['latitude'], loc['longitude'])
        
        if pair_key in depot_batch_results:
            distance, time = depot_batch_results[pair_key]
            distances_from_depot.append(distance)
            times_from_depot.append(time)
        else:
            # If not in batch results, get individual result
            result = get_route_segment_metrics(
                {'latitude': depot_info['latitude'], 'longitude': depot_info['longitude']},
                {'latitude': loc['latitude'], 'longitude': loc['longitude']},
                api_key, cache_db_path
            )
            
            if result:
                distance, time = result
                distances_from_depot.append(distance)
                times_from_depot.append(time)
            else:
                # If no result, use placeholder values
                print(f"❌ ERROR: Unable to get depot distance for location ({loc['latitude']}, {loc['longitude']})")
                distances_from_depot.append(None)
                times_from_depot.append(None)
    
    # Add to dataframe
    locations_df['_distance_from_depot'] = distances_from_depot
    locations_df['_time_from_depot'] = times_from_depot
    
    # Remove locations with no valid depot distances
    invalid_locations = locations_df[locations_df['_distance_from_depot'].isna()]
    if not invalid_locations.empty:
        print(f"⚠️ Warning: Removing {len(invalid_locations)} locations with no valid depot distances")
        locations_df = locations_df[~locations_df['_distance_from_depot'].isna()]
    
    # Calculate intra-cluster distance matrices
    print("Building distance matrices within each cluster using batch API...")
    
    # Create a dictionary to hold location pairs by cluster
    cluster_pairs = {}
    
    # Group locations by cluster
    for cluster_id in locations_df['geo_cluster'].unique():
        cluster_locs = locations_df[locations_df['geo_cluster'] == cluster_id]
        
        if len(cluster_locs) <= 1:
            continue  # Skip singleton clusters
            
        # Create pairs of locations within this cluster
        pairs = []
        indices = cluster_locs.index.tolist()
        
        for i, idx1 in enumerate(indices):
            loc1 = cluster_locs.loc[idx1]
            
            for j, idx2 in enumerate(indices):
                if i >= j:
                    continue  # Skip duplicate and self pairs
                    
                loc2 = cluster_locs.loc[idx2]
                
                # Apply filtering based on aerial distance
                aerial_dist = haversine_distance(
                    loc1['latitude'], loc1['longitude'],
                    loc2['latitude'], loc2['longitude']
                )
                
                # Only include if aerial distance is reasonable
                if aerial_dist < 50:  # 50km threshold
                    pairs.append((
                        {'latitude': loc1['latitude'], 'longitude': loc1['longitude']},
                        {'latitude': loc2['latitude'], 'longitude': loc2['longitude']}
                    ))
        
        cluster_pairs[cluster_id] = pairs
    
    # Process distance matrices using concurrent processing
    cluster_distance_matrices = {}
    
    # Process clusters (limiting concurrency to avoid rate limiting)
    with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
        # Submit tasks
        future_to_cluster = {
            executor.submit(process_cluster_matrix, cluster_id, pairs, locations_df, api_key, cache_db_path): cluster_id
            for cluster_id, pairs in cluster_pairs.items()
        }
        
        # Process results as they complete
        for future in concurrent.futures.as_completed(future_to_cluster):
            cluster_id = future_to_cluster[future]
            try:
                cluster_id, matrix_data = future.result()
                if matrix_data:
                    cluster_distance_matrices[cluster_id] = matrix_data
                    print(f"✓ Completed distance matrix for cluster {cluster_id}")
            except Exception as e:
                print(f"❌ Error processing cluster {cluster_id}: {e}")
    
    # STEP 3: ADAPTIVE LARGE NEIGHBORHOOD SEARCH (ALNS)
    # ------------------------------------------------------------
    print("\nSTEP 3: Running Adaptive Large Neighborhood Search optimization...")
    
    # Use ALNS for optimization
    alns = ALNS(depot_info, locations_df, max_weight, max_time, service_time_per_unit, api_key, cache_db_path)
    all_routes = alns.run(iterations=50)  # Adjust iteration count as needed
    
    # STEP 4: ROUTE CONSOLIDATION WITH PROXIMITY PRIORITIZATION
    # ------------------------------------------------------------
    if len(all_routes) > 1:
        print("\nSTEP 4: Attempting to consolidate routes with proximity prioritization...")
        all_routes = consolidate_routes_with_proximity(all_routes, max_weight, max_time, api_key, cache_db_path)
    
    # Generate final summary
    total_distance = sum(route['distance_km'] for route in all_routes)
    total_time = sum(route['time_min'] for route in all_routes)
    total_weight = sum(route.get('weight', 0) for route in all_routes)
    
    print(f"\n✅ Optimization complete in {timing_module.time() - start_time:.1f} seconds!")
    print(f"Created {len(all_routes)} routes with total distance: {total_distance:.2f}km")
    print(f"Total time: {total_time/60:.2f}hrs, Total weight: {total_weight:.2f}")
    
    # Return the created routes and no remaining locations
    return all_routes, total_distance, total_time, len(all_routes), pd.DataFrame()

def consolidate_routes_with_proximity(routes, max_weight, max_time, api_key, cache_db_path):
    """
    Try to consolidate routes using geographic proximity and time compatibility.
    Prioritizes merging geographically close routes with compatible time windows.
    """
    import copy
    
    if len(routes) <= 1:
        return routes
    
    print(f"Starting with {len(routes)} routes, attempting consolidation...")
    
    # Calculate route centers of gravity
    route_centers = []
    for route in routes:
        coords = []
        weights = []
        
        # Get coordinates of all stops
        if 'route' in route:
            for lat, lng in route['route'][1:-1]:  # Skip depot
                coords.append((lat, lng))
                weights.append(1)
        
        # If no route coordinates, skip
        if not coords:
            route_centers.append((None, None))
            continue
        
        # Calculate weighted center
        lat_sum = sum(coord[0] * weight for coord, weight in zip(coords, weights))
        lng_sum = sum(coord[1] * weight for coord, weight in zip(coords, weights))
        total_weight = sum(weights)
        
        if total_weight > 0:
            center = (lat_sum / total_weight, lng_sum / total_weight)
        else:
            center = (None, None)
            
        route_centers.append(center)
    
    # Calculate aerial distances between all route centers
    route_distances = {}
    for i in range(len(routes)):
        for j in range(i+1, len(routes)):
            center_i = route_centers[i]
            center_j = route_centers[j]
            
            # Skip if any center is invalid
            if None in center_i or None in center_j:
                continue
                
            # Calculate aerial distance
            dist = haversine_distance(center_i[0], center_i[1], center_j[0], center_j[1])
            route_distances[(i, j)] = dist
    
    # Sort route pairs by distance
    sorted_pairs = sorted(route_distances.items(), key=lambda x: x[1])
    
    # Start merging from closest pairs
    improvement = True
    iteration = 0
    max_iterations = 5
    
    while improvement and iteration < max_iterations:
        improvement = False
        iteration += 1
        
        # Make a copy of routes for this iteration
        current_routes = copy.deepcopy(routes)
        
        # Add _consolidated flag to track merged routes
        for route in current_routes:
            route['_consolidated'] = False
        
        # Try merging in order of proximity
        for (i, j), distance in sorted_pairs:
            # Skip if already consolidated
            if current_routes[i].get('_consolidated', False) or current_routes[j].get('_consolidated', False):
                continue
            
            route1 = current_routes[i]
            route2 = current_routes[j]
            
            # Check if combined weight would exceed capacity
            combined_weight = route1.get('weight', 0) + route2.get('weight', 0)
            if combined_weight > max_weight:
                continue
            
            # Check time compatibility
            # For now, simple check of total time - but could be enhanced with detailed time windows
            combined_time = route1.get('time_min', 0) + route2.get('time_min', 0) - 90  # Remove duplicate depot time
            if combined_time > max_time:
                continue
            
            # Try to merge routes
            merged_route = merge_routes_enhanced(route1, route2, api_key, cache_db_path)
            
            # Check if merged route is feasible
            if merged_route and merged_route.get('time_min', float('inf')) <= max_time:
                print(f"Found feasible merge: Routes {route1['route_id']} + {route2['route_id']} → New route with {merged_route['stops']} stops")
                
                # Mark original routes as consolidated
                route1['_consolidated'] = True
                route2['_consolidated'] = True
                
                # Add the merged route
                merged_route['_consolidated'] = False
                current_routes.append(merged_route)
                
                # Note improvement
                improvement = True
                break
        
        if improvement:
            # Create new routes list without the consolidated ones
            routes = [r for r in current_routes if not r.get('_consolidated', False)]
            
            # Recalculate route centers and distances for next iteration
            route_centers = []
            for route in routes:
                coords = []
                weights = []
                
                # Get coordinates of all stops
                if 'route' in route:
                    for lat, lng in route['route'][1:-1]:  # Skip depot
                        coords.append((lat, lng))
                        weights.append(1)
                
                # If no route coordinates, skip
                if not coords:
                    route_centers.append((None, None))
                    continue
                
                # Calculate weighted center
                lat_sum = sum(coord[0] * weight for coord, weight in zip(coords, weights))
                lng_sum = sum(coord[1] * weight for coord, weight in zip(coords, weights))
                total_weight = sum(weights)
                
                if total_weight > 0:
                    center = (lat_sum / total_weight, lng_sum / total_weight)
                else:
                    center = (None, None)
                    
                route_centers.append(center)
            
            # Recalculate distances
            route_distances = {}
            for i in range(len(routes)):
                for j in range(i+1, len(routes)):
                    center_i = route_centers[i]
                    center_j = route_centers[j]
                    
                    # Skip if any center is invalid
                    if None in center_i or None in center_j:
                        continue
                        
                    # Calculate aerial distance
                    dist = haversine_distance(center_i[0], center_i[1], center_j[0], center_j[1])
                    route_distances[(i, j)] = dist
            
            # Resort pairs
            sorted_pairs = sorted(route_distances.items(), key=lambda x: x[1])
            
            print(f"Consolidated to {len(routes)} routes")
    
    # Ensure consistent route IDs and vehicle numbers
    for i, route in enumerate(routes):
        route_id = str(i + 1)
        route['route_id'] = route_id
        route['vehicle_number'] = i + 1
        # Remove any internal flags
        if '_consolidated' in route:
            del route['_consolidated']
    
    print(f"Final route count after consolidation: {len(routes)}")
    return routes

def merge_routes_enhanced(route1, route2, api_key, cache_db_path):
    """
    Enhanced route merging with improved sequence optimization.
    """
    import copy
    
    # Get all stops excluding the depot for both routes
    stops1 = route1.get('location_ids', [])[1:-1]  # Skip DEPOT_START and DEPOT_END
    stops2 = route2.get('location_ids', [])[1:-1]
    
    if not stops1 or not stops2:
        return None  # Can't merge if either has no stops
    
    # Create a new route
    merged_route = copy.deepcopy(route1)
    
    # Try different merging strategies and pick the best
    merge_options = []
    
    # Option 1: Append route2 to route1
    option1 = ['DEPOT_START'] + stops1 + stops2 + ['DEPOT_END']
    merge_options.append(option1)
    
    # Option 2: Append route1 to route2
    option2 = ['DEPOT_START'] + stops2 + stops1 + ['DEPOT_END']
    merge_options.append(option2)
    
    # Option 3: Try to find a better ordering based on proximity
    # Simplified approach for now
    option3 = ['DEPOT_START'] + stops1 + stops2 + ['DEPOT_END']
    merge_options.append(option3)
    
    # Evaluate each option
    best_option = option1  # Default to option 1
    
    # Use best option for merge
    merged_route['location_ids'] = best_option
    
    # Merge weights
    merged_route['weight'] = route1.get('weight', 0) + route2.get('weight', 0)
    merged_route['stops'] = len(best_option) - 2  # Exclude DEPOT_START and DEPOT_END
    
    # Merge route coordinates
    if 'route' in route1 and 'route' in route2:
        # Simple approach - actual merging would depend on the selected option
        merged_route['route'] = [route1['route'][0]] + route1['route'][1:-1] + route2['route'][1:-1] + [route1['route'][-1]]
    
    # Set combined cost estimates
    merged_route['distance_km'] = route1.get('distance_km', 0) + route2.get('distance_km', 0) - 5.0  # Approximate saving
    merged_route['time_min'] = route1.get('time_min', 0) + route2.get('time_min', 0) - 90.0  # Remove duplicate depot time
    
    # Set new route ID
    merged_route['route_id'] = f"{min(int(route1['route_id']), int(route2['route_id']))}"
    merged_route['vehicle_number'] = int(merged_route['route_id'])
    
    return merged_route

## Cell 9: Route creation

In [19]:
def create_routes(locations_df, depot_info, max_weight, max_time, service_time_per_unit, depot_service_time=45, start_vehicle_count=0):
    """
    Create multiple routes to serve all locations from a depot.
    This version fully uses the global optimization approach from Cell 8.
    """
    print(f"Using global optimization approach for {len(locations_df)} locations")
    
    # Call the global optimization function directly
    all_routes, total_distance, total_time, vehicle_count, remaining_locations = optimize_routes_global(
        locations_df, 
        depot_info, 
        max_weight, 
        max_time, 
        service_time_per_unit, 
        depot_service_time
    )
    
    # Apply vehicle numbering based on start_vehicle_count
    if start_vehicle_count > 0:
        for i, route in enumerate(all_routes):
            route['vehicle_number'] = start_vehicle_count + i + 1
            route['route_id'] = str(start_vehicle_count + i + 1)
    
    return all_routes, total_distance, total_time, vehicle_count, remaining_locations


def run_optimization(df_clean, depot_df, optimization_params, status_updater=None):
    """Run the optimization for each depot - with reduced concurrency"""
    print("\n=== OPTIMIZING ROUTES ===")
    print("Starting route optimization process...")

    # Update status if function provided
    if status_updater:
        status_updater('Optimization', 'Testing API connection')
    
    # Initialize results containers
    all_routes = []
    routes_summary = []
    locations_not_visited = []
    
    # Performance tracking
    optimization_start_time = time.time()
    
    # Global vehicle counter to ensure unique IDs across depots
    global_vehicle_count = 0
    
    # Test HERE API connectivity once at the beginning
    api_key = load_api_key()
    if api_key:
        # Setup distance cache
        cache_db_path = setup_distance_cache()
        
        # Use coordinates from the first depot for testing
        first_depot = depot_df.iloc[0]
        test_here_api_connection(
            api_key, 
            first_depot['latitude'], 
            first_depot['longitude']
        )
    else:
        print("⚠️ No API key available for optimization")
        return [], [], []
    
    # Process depots sequentially to avoid rate limiting
    for idx, depot_row in depot_df.iterrows():
        depot_name = depot_row['depot_name']
        print(f"\nProcessing depot: {depot_name}")
        
        # Get locations assigned to this depot using cluster_name
        depot_locations = df_clean[df_clean['cluster_name'] == depot_name]
        
        print(f"Found {len(depot_locations)} locations assigned to this depot")
        
        # Skip if no locations for this depot
        if len(depot_locations) == 0:
            print(f"⚠️ No locations assigned to depot: {depot_name}")
            continue
        
        print(f"Optimizing routes for {len(depot_locations)} locations...")
        
        # Run optimization with the global vehicle count to ensure unique IDs
        depot_routes, total_distance, total_time, vehicle_count, unvisited = create_routes(
            depot_locations, 
            depot_row, 
            optimization_params['vehicle_capacity'], 
            optimization_params['max_route_time_min'], 
            optimization_params['service_time_per_unit_min'],
            optimization_params['depot_service_time_min'],
            global_vehicle_count  # Pass the current global vehicle count
        )
        
        # Update the global vehicle count
        global_vehicle_count += vehicle_count
        
        # Update results
        all_routes.extend(depot_routes)
        
        # Create summary
        summary = {
            'depot_name': depot_name,
            'total_locations': len(depot_locations),
            'locations_visited': len(depot_locations) - len(unvisited),
            'locations_unvisited': len(unvisited),
            'routes_created': vehicle_count,
            'total_distance_km': total_distance,
            'total_time_hours': total_time / 60
        }
        routes_summary.append(summary)
        
        # Record unvisited locations
        if not unvisited.empty:
            for i, loc in unvisited.iterrows():
                locations_not_visited.append({
                    'location_id': i,
                    'depot_name': depot_name,
                    'latitude': loc['latitude'],
                    'longitude': loc['longitude'],
                    'weight': loc.get('weight', 1)
                })
    
    total_optimization_time = time.time() - optimization_start_time
    print(f"\n✅ Optimization complete in {total_optimization_time:.1f} seconds!")
    
    return all_routes, routes_summary, locations_not_visited

## Cell 10: Data Enchancement

In [21]:
def enhance_original_data(df_clean, all_routes, depot_df=None):
    """
    Enhance the original dataset with route information, ensuring consistent route positions.
    Uses on-demand route calculations and parallel processing for efficiency.
    """
    # Create a copy of the original data
    enhanced_df = df_clean.copy()
    
    # Add the new columns with default values
    enhanced_df['route_id'] = None  # Use None instead of np.nan for string values
    enhanced_df['route_position'] = np.nan
    enhanced_df['route_distance_km'] = np.nan
    enhanced_df['travel_time_from_last_stop'] = np.nan
    enhanced_df['service_time_min'] = np.nan
    enhanced_df['cumulative_time_min'] = np.nan
    enhanced_df['location_type'] = "CUSTOMER"  # Default location type
    
    # Initialize a list to collect all stops, including depots
    all_stops = []
    
    # Setup distance cache
    cache_db_path = setup_distance_cache()
    api_key = load_api_key()
    
    # First, enrich routes with depot coordinates from depot_df
    if depot_df is not None:
        print("Enriching routes with depot information...")
        for route in all_routes:
            if 'depot_name' in route:
                # Look up depot coordinates from depot_df
                matching_depots = depot_df[depot_df['depot_name'] == route['depot_name']]
                if not matching_depots.empty:
                    depot_row = matching_depots.iloc[0]
                    route['depot_latitude'] = depot_row['latitude']
                    route['depot_longitude'] = depot_row['longitude']
                    
                    # Add route key with coordinates if it doesn't exist
                    if 'route' not in route:
                        # Add depot at beginning and end of route
                        route['route'] = [(depot_row['latitude'], depot_row['longitude'])]
                        
                        # Add customer locations in between (if we have them)
                        for loc_id in route.get('location_ids', [])[1:-1]:  # Skip DEPOT_START and DEPOT_END
                            found = False
                            for idx, row in df_clean.iterrows():
                                if ('ABS Custumer no' in row and row['ABS Custumer no'] == loc_id) or \
                                   ('customer_id' in row and row['customer_id'] == loc_id) or \
                                   idx == loc_id:
                                    route['route'].append((row['latitude'], row['longitude']))
                                    found = True
                                    break
                            
                            if not found:
                                print(f"⚠️ Could not find coordinates for location {loc_id} in route {route['route_id']}")
                        
                        # Add depot at end
                        route['route'].append((depot_row['latitude'], depot_row['longitude']))
                else:
                    print(f"⚠️ Could not find depot information for {route['depot_name']}")

    # Define a function to process a single route
    def process_route(route_info):
        route_id = route_info['route_id']
        depot_name = route_info['depot_name']
        
        print(f"Processing route {route_id} for depot {depot_name}")
        
        # Get depot coordinates for adding depot entries
        depot_coords = None
        depot_lat = None
        depot_lng = None
        if 'route' in route_info and len(route_info['route']) >= 2:
            try:
                depot_coords = route_info['route'][0]  # First coordinate is depot
                depot_lat = float(depot_coords[0])
                depot_lng = float(depot_coords[1])
                
                # Validate ranges
                if not (-90 <= depot_lat <= 90) or not (-180 <= depot_lng <= 180):
                    print(f"⚠️ Invalid depot coordinates: {depot_lat}, {depot_lng}")
                    depot_lat = depot_lng = None
            except (ValueError, TypeError, IndexError) as e:
                print(f"⚠️ Error parsing depot coordinates: {depot_coords}, {e}")
                depot_lat = depot_lng = None
        
        # If no valid route coordinates, try to get from depot_info
        if depot_lat is None:
            # Try to get from route attributes directly instead of using hasattr
            if 'depot_latitude' in route_info and 'depot_longitude' in route_info:
                try:
                    depot_lat = float(route_info['depot_latitude'])
                    depot_lng = float(route_info['depot_longitude'])
                    
                    # Validate ranges
                    if not (-90 <= depot_lat <= 90) or not (-180 <= depot_lng <= 180):
                        print(f"⚠️ Invalid depot info coordinates: {depot_lat}, {depot_lng}")
                        depot_lat = depot_lng = None
                except (ValueError, TypeError) as e:
                    print(f"⚠️ Error parsing depot info coordinates: {e}")
        
        # Skip routes with no stops
        if 'location_ids' not in route_info or not route_info['location_ids']:
            print(f"Skipping route {route_id} - no location_ids found")
            return []
            
        # Get location IDs for this route - these are the customer/location identifiers
        location_ids = route_info['location_ids']
        print(f"Route {route_id} has {len(location_ids)} location_ids: {location_ids}")
        
        # Get segment-specific distances and times
        segment_distances = route_info.get('segment_distances', [])
        segment_times = route_info.get('segment_times', [])
        service_times = route_info.get('location_service_times', [])
        
        # If we don't have pre-calculated segments, calculate them on-demand
        if len(segment_distances) != len(location_ids) - 1 or len(segment_times) != len(location_ids) - 1:
            print(f"Calculating missing segment data for route {route_id}")
            
            # Re-calculate all segments
            segment_distances = []
            segment_times = []
            service_times = []
            
            # Calculate depot service time
            depot_service_time = 45  # default
            if 'depot_service_time_min' in route_info:
                depot_service_time = route_info['depot_service_time_min']
            service_times.append(depot_service_time)
            
            # Process all locations to get their data
            route_locs = []
            
            # Add depot as first location with validated coordinates
            if depot_lat is not None and depot_lng is not None:
                depot_loc = {
                    'latitude': depot_lat,
                    'longitude': depot_lng,
                    'is_depot': True
                }
                route_locs.append(depot_loc)
            else:
                print(f"⚠️ Missing valid depot coordinates for route {route_id}")
                # Create a placeholder to maintain sequence
                depot_loc = {
                    'latitude': 0.0,  # Invalid value that will be caught by validation
                    'longitude': 0.0,
                    'is_depot': True
                }
                route_locs.append(depot_loc)
            
            # Find and add all customer locations
            for loc_id in location_ids[1:-1]:  # Skip DEPOT_START and DEPOT_END
                # Find this location in the dataset
                loc_data = None
                matches = None
                
                # Try different strategies to match the location ID
                if isinstance(loc_id, (int, float)):
                    # Look for this ID in common ID columns
                    if 'ABS Custumer no' in enhanced_df.columns:
                        matches = enhanced_df['ABS Custumer no'] == loc_id
                    elif 'ABS Customer no' in enhanced_df.columns:
                        matches = enhanced_df['ABS Customer no'] == loc_id
                    elif 'customer_id' in enhanced_df.columns:
                        matches = enhanced_df['customer_id'] == loc_id
                    else:
                        # Fallback to index if it's numeric
                        matches = enhanced_df.index == loc_id
                else:
                    # If it's not numeric, it might be the index as string
                    try:
                        # Try to convert to numeric if it's a string representation of a number
                        numeric_id = float(loc_id) if isinstance(loc_id, str) else loc_id
                        matches = enhanced_df.index == numeric_id
                    except (ValueError, TypeError):
                        # If conversion fails, it might be a string ID
                        if 'customer_id' in enhanced_df.columns and enhanced_df['customer_id'].dtype == 'object':
                            matches = enhanced_df['customer_id'] == loc_id
                        else:
                            # No matches if we can't figure out how to match
                            matches = pd.Series(False, index=enhanced_df.index)
                
                # If we found a match, get the location data
                if matches is not None and matches.any():
                    loc_row = enhanced_df[matches].iloc[0]
                    # Validate coordinates before adding
                    try:
                        lat = float(loc_row['latitude'])
                        lng = float(loc_row['longitude'])
                        
                        # Check valid ranges
                        if not (-90 <= lat <= 90) or not (-180 <= lng <= 180):
                            print(f"⚠️ Invalid location coordinates for {loc_id}: {lat}, {lng}")
                            continue
                            
                        loc_data = {
                            'latitude': lat,
                            'longitude': lng,
                            'is_depot': False
                        }
                        
                        # Calculate service time
                        if 'Net Weight' in loc_row and pd.notna(loc_row['Net Weight']):
                            weight = float(loc_row['Net Weight'])
                        elif 'weight' in loc_row and pd.notna(loc_row['weight']):
                            weight = float(loc_row['weight'])
                        else:
                            weight = 1.0
                        
                        service_time = weight * 0.5  # Default service time per unit
                        if 'service_time_per_unit_min' in route_info:
                            service_time = weight * route_info['service_time_per_unit_min']
                        
                        service_times.append(service_time)
                        route_locs.append(loc_data)
                    except (ValueError, TypeError) as e:
                        print(f"⚠️ Error parsing location coordinates for {loc_id}: {e}")
                else:
                    print(f"⚠️ Warning: Could not find match for location_id {loc_id} in route {route_id}")
            
            # Add depot as last location
            route_locs.append(depot_loc if depot_loc else {
                'latitude': 0.0,
                'longitude': 0.0,
                'is_depot': True
            })
            service_times.append(depot_service_time)
            
            # Calculate segments with progress indication
            print(f"Calculating {len(route_locs)-1} segments for route {route_id}")
            for i in range(len(route_locs) - 1):
                origin = route_locs[i]
                destination = route_locs[i + 1]
                
                # Get routing metrics from HERE API (or cache)
                result = get_route_segment_metrics(origin, destination, api_key, cache_db_path)
                
                if result is not None:
                    distance_km, time_min = result
                    segment_distances.append(distance_km)
                    segment_times.append(time_min)
                else:
                    print(f"⚠️ Warning: Could not get routing metrics for segment {i} in route {route_id}")
                    # Use fallback values based on aerial distance
                    try:
                        aerial_dist = haversine_distance(
                            float(origin['latitude']), float(origin['longitude']),
                            float(destination['latitude']), float(destination['longitude'])
                        )
                        # Approximate time based on average speed of 50km/h
                        aerial_time = aerial_dist * 60 / 50  # minutes
                        segment_distances.append(aerial_dist)
                        segment_times.append(aerial_time)
                        print(f"Using aerial distance fallback: {aerial_dist:.2f}km, {aerial_time:.2f}min")
                    except (ValueError, TypeError) as e:
                        print(f"⚠️ Error calculating aerial distance: {e}")
                        segment_distances.append(0)
                        segment_times.append(0)
        
        # ===== PHASE 1: Find and calculate basic values for all stops =====
        # Collect all stops in this route
        route_stops = []
        sequence_position = 0
        processed_location_ids = set()
        
        # Process DEPOT_START
        depot_service_time = service_times[0] if len(service_times) > 0 else 45  # default 45 min
        
        if depot_coords and location_ids[0] == "DEPOT_START":
            depot_start_row = {
                'route_id': route_id,
                'sequence_idx': -1,  # Special index to ensure it sorts first
                'raw_data': {
                    'route_distance_km': 0,
                    'travel_time_from_last_stop': 0,
                    'service_time_min': depot_service_time,
                },
                'is_depot': True,
                'is_depot_start': True,
                'location_type': "DEPOT_START",
                'cluster_name': depot_name,
                'latitude': depot_lat,
                'longitude': depot_lng,
                'distance_to_depot_km': 0,
                'depot_latitude': depot_lat,
                'depot_longitude': depot_lng,
                'depot_formatted_address': f"Depot {depot_name}"
            }
            route_stops.append(depot_start_row)
            print(f"Added DEPOT_START for route {route_id}")
        
        # Process all customer stops
        customer_count = 0
        for idx, location_id in enumerate(location_ids[1:-1], 1):  # Skip DEPOT_START and DEPOT_END
            # Increment the sequence position for this stop in the route
            sequence_position += 1
            
            # Check if this location_id has already been processed
            if location_id in processed_location_ids:
                print(f"⚠️ Skipping duplicate location_id {location_id} in route {route_id}")
                continue
                
            # Track that we've processed this location_id
            processed_location_ids.add(location_id)
            
            # Try to find this location in the dataset
            matches = None
            
            # Try different strategies to match the location ID
            if isinstance(location_id, (int, float)):
                # Look for this ID in common ID columns
                if 'ABS Custumer no' in enhanced_df.columns:
                    matches = enhanced_df['ABS Custumer no'] == location_id
                elif 'ABS Customer no' in enhanced_df.columns:
                    matches = enhanced_df['ABS Customer no'] == location_id
                elif 'customer_id' in enhanced_df.columns:
                    matches = enhanced_df['customer_id'] == location_id
                else:
                    # Fallback to index if it's numeric
                    matches = enhanced_df.index == location_id
            else:
                # If it's not numeric, it might be the index as string
                try:
                    # Try to convert to numeric if it's a string representation of a number
                    numeric_id = float(location_id) if isinstance(location_id, str) else location_id
                    matches = enhanced_df.index == numeric_id
                except (ValueError, TypeError):
                    # If conversion fails, it might be a string ID
                    if 'customer_id' in enhanced_df.columns and enhanced_df['customer_id'].dtype == 'object':
                        matches = enhanced_df['customer_id'] == location_id
                    else:
                        # No matches if we can't figure out how to match
                        matches = pd.Series(False, index=enhanced_df.index)
            
            # If we found matches, process this stop
            if matches is not None and matches.any():
                matching_count = matches.sum()
                customer_count += 1
                
                # Calculate segment distance and travel time - use actual HERE API data
                segment_idx = idx - 1  # Index into segment arrays (excluding depot start)
                
                if segment_idx < len(segment_distances):
                    segment_distance = segment_distances[segment_idx]
                    segment_travel_time = segment_times[segment_idx]
                else:
                    print(f"❌ Missing segment data for stop {idx} in route {route_id}")
                    continue
                
                # Process each matching row
                for match_idx in enhanced_df[matches].index:
                    # Calculate service time
                    segment_service_time = 0
                    if len(service_times) > idx:
                        segment_service_time = service_times[idx]
                    else:
                        # Try to calculate it from the data
                        try:
                            # Get the specific row
                            row = enhanced_df.loc[match_idx]
                            
                            # Extract weight with fallback options
                            weight = None
                            if 'Net Weight' in row and pd.notna(row['Net Weight']):
                                weight = float(row['Net Weight'])
                            elif 'weight' in row and pd.notna(row['weight']):
                                weight = float(row['weight'])
                            else:
                                weight = 1.0
                            
                            # Calculate service time based on weight
                            service_time_per_unit = 0.5  # Default value
                            if 'service_time_per_unit_min' in route_info:
                                service_time_per_unit = route_info['service_time_per_unit_min']
                            
                            segment_service_time = weight * service_time_per_unit
                        except Exception as e:
                            print(f"Error calculating service time: {e}")
                            segment_service_time = 0.5
                    
                    # Store all data for this stop
                    stop_data = {
                        'route_id': route_id,
                        'sequence_idx': sequence_position,  # Store sequence for sorting
                        'match_criteria': pd.Series([True if i == match_idx else False for i in enhanced_df.index], 
                                                   index=enhanced_df.index),
                        'raw_data': {
                            'route_distance_km': segment_distance,
                            'travel_time_from_last_stop': segment_travel_time,
                            'service_time_min': segment_service_time,
                        },
                        'is_depot': False,
                        'is_multiple_match': matching_count > 1,
                        'location_type': "CUSTOMER",
                        'location_id': location_id,
                        'depot_lat': depot_lat,
                        'depot_lng': depot_lng
                    }
                    route_stops.append(stop_data)
                    print(f"Added customer stop for location_id {location_id} in route {route_id}, sequence {sequence_position}")
            else:
                print(f"⚠️ Warning: Could not find match for location_id {location_id} in route {route_id}")
        
        # Process DEPOT_END
        if depot_coords and location_ids[-1] == "DEPOT_END":
            # Use the appropriate index for last segment
            last_idx = len(segment_distances) - 1
            if last_idx >= 0 and last_idx < len(segment_distances):
                last_segment_distance = segment_distances[last_idx]
                last_segment_time = segment_times[last_idx]
                depot_end_service_time = service_times[-1] if len(service_times) > 0 else 45  # default 45 min
            else:
                print(f"❌ Missing segment data for return to depot in route {route_id}")
                return route_stops
            
            depot_end_row = {
                'route_id': route_id,
                'sequence_idx': 999999,  # Special index to ensure it sorts last
                'raw_data': {
                    'route_distance_km': last_segment_distance,
                    'travel_time_from_last_stop': last_segment_time,
                    'service_time_min': depot_end_service_time,
                },
                'is_depot': True,
                'is_depot_end': True,
                'location_type': "DEPOT_END",
                'cluster_name': depot_name,
                'latitude': depot_lat,
                'longitude': depot_lng,
                'distance_to_depot_km': 0,
                'depot_latitude': depot_lat,
                'depot_longitude': depot_lng,
                'depot_formatted_address': f"Depot {depot_name}",
                'depot_geocode_confidence': 1.0  # Add a default confidence value for the depot
            }
            route_stops.append(depot_end_row)
            print(f"Added DEPOT_END for route {route_id}")
        
        # Print diagnostics
        print(f"Route {route_id} has {len(route_stops)} total stops ({customer_count} customers)")
        
        # ===== PHASE 2: Assign consistent route positions =====
        # Sort stops by sequence_idx to ensure correct order
        route_stops.sort(key=lambda x: x['sequence_idx'])
        
        # Assign clean, sequential positions
        position = 0  # Start at 0 for DEPOT_START, then increment
        
        for stop in route_stops:
            # Always assign sequential positions, regardless of stop type
            stop['route_position'] = position
            position += 1
            
            # For diagnostics
            stop_type = "DEPOT_START" if stop.get('is_depot_start', False) else \
                       "DEPOT_END" if stop.get('is_depot_end', False) else \
                       "CUSTOMER"
            print(f"Assigned position {stop['route_position']} to {stop_type} in route {route_id}")
        
        # ===== PHASE 3: Calculate cumulative times =====
        cumulative_time = 0
        
        for i, stop in enumerate(route_stops):
            travel_time = stop['raw_data']['travel_time_from_last_stop']
            service_time = stop['raw_data']['service_time_min']
            
            if i == 0:
                # First stop (DEPOT_START) - just service time
                cumulative_time = service_time
            else:
                # All subsequent stops: add travel time and service time
                cumulative_time += travel_time + service_time
            
            stop['cumulative_time_min'] = cumulative_time
            print(f"Stop {stop['route_position']} in route {route_id}: travel={travel_time}, service={service_time}, cumulative={cumulative_time}")
        
        return route_stops
    
    # Process all routes with reduced concurrency to avoid rate limiting
    print(f"Processing {len(all_routes)} routes to enhance data...")
    with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
        # Submit tasks
        future_to_route = {executor.submit(process_route, route): route['route_id'] for route in all_routes}
        
        # Process results as they complete with a progress bar
        with tqdm(total=len(future_to_route), desc="Processing routes", unit="route") as pbar:
            for future in concurrent.futures.as_completed(future_to_route):
                route_id = future_to_route[future]
                try:
                    route_stops = future.result()
                    all_stops.extend(route_stops)
                except Exception as e:
                    print(f"❌ Error processing route {route_id}: {e}")
                    import traceback
                    traceback.print_exc()
                # Update progress bar
                pbar.update(1)
    
    # ===== PHASE 4: Update the dataframe with all calculated values =====
    # First process the customer stops
    print(f"Updating dataframe with route information...")
    for stop in all_stops:
        if not stop.get('is_depot', False):
            matches = stop['match_criteria']
            
            # Update all route information at once
            enhanced_df.loc[matches, 'route_id'] = str(stop['route_id'])
            enhanced_df.loc[matches, 'route_position'] = stop['route_position']
            enhanced_df.loc[matches, 'route_distance_km'] = stop['raw_data']['route_distance_km']
            enhanced_df.loc[matches, 'travel_time_from_last_stop'] = stop['raw_data']['travel_time_from_last_stop']
            enhanced_df.loc[matches, 'service_time_min'] = stop['raw_data']['service_time_min']
            enhanced_df.loc[matches, 'cumulative_time_min'] = stop['cumulative_time_min']
            enhanced_df.loc[matches, 'location_type'] = stop['location_type']
            
            # Add depot data if needed
            if 'distance_to_depot_km' not in enhanced_df.columns:
                enhanced_df['distance_to_depot_km'] = np.nan
            
            # Add other depot-related columns if needed
            if 'depot_latitude' not in enhanced_df.columns and stop.get('depot_lat') is not None:
                enhanced_df['depot_latitude'] = np.nan
                enhanced_df.loc[matches, 'depot_latitude'] = stop['depot_lat']
                
            if 'depot_longitude' not in enhanced_df.columns and stop.get('depot_lng') is not None:
                enhanced_df['depot_longitude'] = np.nan
                enhanced_df.loc[matches, 'depot_longitude'] = stop['depot_lng']
    
    # Round numeric columns for better readability
    print("Finalizing output format...")
    if 'route_distance_km' in enhanced_df.columns:
        enhanced_df['route_distance_km'] = enhanced_df['route_distance_km'].round(2)
    if 'travel_time_from_last_stop' in enhanced_df.columns:
        enhanced_df['travel_time_from_last_stop'] = enhanced_df['travel_time_from_last_stop'].round(1)
    if 'service_time_min' in enhanced_df.columns:
        enhanced_df['service_time_min'] = enhanced_df['service_time_min'].round(1)
    if 'cumulative_time_min' in enhanced_df.columns:
        enhanced_df['cumulative_time_min'] = enhanced_df['cumulative_time_min'].round(1)
    if 'distance_to_depot_km' in enhanced_df.columns:
        enhanced_df['distance_to_depot_km'] = enhanced_df['distance_to_depot_km'].round(2)
    
    # Create a DataFrame for the depot stops
    depot_rows = []
    for stop in all_stops:
        if stop.get('is_depot', False):
            depot_row = {
                'route_id': stop['route_id'],
                'route_position': stop['route_position'],
                'route_distance_km': stop['raw_data']['route_distance_km'],
                'travel_time_from_last_stop': stop['raw_data']['travel_time_from_last_stop'],
                'service_time_min': stop['raw_data']['service_time_min'],
                'cumulative_time_min': stop['cumulative_time_min'],
                'location_type': stop['location_type'],
                'cluster_name': stop['cluster_name'],
                'latitude': stop['latitude'],
                'longitude': stop['longitude'],
                'distance_to_depot_km': stop['distance_to_depot_km'],
                'depot_latitude': stop['depot_latitude'],
                'depot_longitude': stop['depot_longitude'],
                'depot_formatted_address': stop['depot_formatted_address'],
                'depot_geocode_confidence': stop.get('depot_geocode_confidence', 1.0)  # Ensure this field exists
            }
            depot_rows.append(depot_row)
    
    if depot_rows:
        print(f"Adding {len(depot_rows)} depot stops to final dataset...")
        # Create a DataFrame from depot rows
        depot_df = pd.DataFrame(depot_rows)
        
        # Make sure it has the same columns as the original dataframe
        for col in enhanced_df.columns:
            if col not in depot_df.columns:
                depot_df[col] = np.nan
                
        # Round numeric columns in depot_df
        if 'route_distance_km' in depot_df.columns:
            depot_df['route_distance_km'] = depot_df['route_distance_km'].round(2)
        if 'travel_time_from_last_stop' in depot_df.columns:
            depot_df['travel_time_from_last_stop'] = depot_df['travel_time_from_last_stop'].round(1)
        if 'service_time_min' in depot_df.columns:
            depot_df['service_time_min'] = depot_df['service_time_min'].round(1)
        if 'cumulative_time_min' in depot_df.columns:
            depot_df['cumulative_time_min'] = depot_df['cumulative_time_min'].round(1)
        if 'distance_to_depot_km' in depot_df.columns:
            depot_df['distance_to_depot_km'] = depot_df['distance_to_depot_km'].round(2)
        
        # Concatenate the enhanced dataframe with the depot rows
        combined_df = pd.concat([enhanced_df, depot_df], ignore_index=True)
        
        # Sort by route_id and route_position for a clean output
        combined_df = combined_df.sort_values(by=['route_id', 'route_position']).reset_index(drop=True)
        
        print("✅ Enhancement complete!")
        return combined_df
    else:
        # Sort the enhanced dataframe by route_id and route_position
        enhanced_df = enhanced_df.sort_values(by=['route_id', 'route_position']).reset_index(drop=True)
        print("✅ Enhancement complete!")
        return enhanced_df

## Cell 11: Process Results with Enhanced Persistent Caching

In [23]:
def process_results(all_routes, routes_summary, locations_not_visited, file_path, output_path, df_clean, depot_df=None):
    """Process and display optimization results with enhanced original data - Single file output"""
    # Create enhanced data with route information - pass depot_df
    enhanced_df = enhance_original_data(df_clean, all_routes, depot_df)
    
    # Rename 'Route Number' to 'legacy_route_number'
    if 'Route Number' in enhanced_df.columns:
        enhanced_df = enhanced_df.rename(columns={'Route Number': 'legacy_route_number'})
    
    # Clean up and enhance the export data:
    # 1. Fill in empty 'customer' cells with depot names for depot rows
    mask_depot = enhanced_df['location_type'].isin(['DEPOT_START', 'DEPOT_END'])
    if 'Customer' in enhanced_df.columns:  # Note the capitalization in the original data
        enhanced_df.loc[mask_depot, 'Customer'] = enhanced_df.loc[mask_depot, 'cluster_name'] + \
            " (" + enhanced_df.loc[mask_depot, 'location_type'] + ")"
    
    # 2. Fill in empty 'formatted_address' cells with depot addresses
    if 'formatted_address' in enhanced_df.columns and 'depot_formatted_address' in enhanced_df.columns:
        enhanced_df.loc[mask_depot, 'formatted_address'] = enhanced_df.loc[mask_depot, 'depot_formatted_address']
    
    # 3. Fill in empty 'geocode_confidence' cells with depot geocode confidence
    if 'geocode_confidence' in enhanced_df.columns and 'depot_geocode_confidence' in enhanced_df.columns:
        enhanced_df.loc[mask_depot, 'geocode_confidence'] = enhanced_df.loc[mask_depot, 'depot_geocode_confidence']
    
    # 4. Remove specified columns
    columns_to_remove = [
        'row_no', 
        'Full address', 
        'depot_latitude', 
        'depot_longitude', 
        'depot_formatted_address', 
        'depot_geocode_confidence'
    ]
    
    # Only remove columns that exist in the dataframe
    columns_to_remove = [col for col in columns_to_remove if col in enhanced_df.columns]
    if columns_to_remove:
        enhanced_df = enhanced_df.drop(columns=columns_to_remove)
    
    # Summary dataframes for display purposes only
    routes_df = pd.DataFrame(all_routes) if all_routes else pd.DataFrame()
    summary_df = pd.DataFrame(routes_summary) if routes_summary else pd.DataFrame()
    unvisited_df = pd.DataFrame(locations_not_visited) if locations_not_visited else pd.DataFrame()
    
    # Print optimization results
    if not all_routes:
        print("\\n❌ No routes could be created with the given constraints.")
        print("Try adjusting your parameters (increase vehicle capacity, driving time, etc.)")
    else:
        print("\\n=== OPTIMIZATION RESULTS ===")
        print(f"Total routes created: {len(all_routes)}")
        
        # Use safer summation with error handling
        try:
            total_distance = summary_df['total_distance_km'].sum()
            total_time = summary_df['total_time_hours'].sum()
            print(f"Total distance: {total_distance:.1f} km")
            print(f"Total time: {total_time:.1f} hours")
        except Exception as e:
            print(f"Error calculating totals: {e}")
        
        # Safely calculate visited percentage
        try:
            visited = summary_df['locations_visited'].sum() 
            total = summary_df['total_locations'].sum()
            percent = (visited / total) * 100 if total > 0 else 0
            print(f"Locations served: {visited} out of {total} ({percent:.1f}%)")
        except Exception as e:
            print(f"Error calculating location statistics: {e}")
        
        # Calculate efficiency metrics
        try:
            if visited > 0 and total_distance > 0:
                stops_per_km = visited / total_distance
                print(f"Efficiency: {stops_per_km:.2f} stops per km")
            
            if total_time > 0:
                stops_per_hour = visited / total_time
                print(f"Productivity: {stops_per_hour:.2f} stops per hour")
        except Exception as e:
            print(f"Error calculating efficiency metrics: {e}")
        
        # Display routes summary
        if not summary_df.empty:
            # Format the columns for better display
            try:
                summary_display = summary_df.copy()
                summary_display['total_distance_km'] = summary_display['total_distance_km'].round(1)
                summary_display['total_time_hours'] = summary_display['total_time_hours'].round(1)
                
                print("\\nRoutes summary by depot:")
                display_cols = [
                    'depot_name', 
                    'routes_created', 
                    'locations_visited', 
                    'locations_unvisited',
                    'total_distance_km',
                    'total_time_hours'
                ]
                
                # Check all columns exist before displaying
                missing_cols = [col for col in display_cols if col not in summary_display.columns]
                if missing_cols:
                    print(f"⚠️ Warning: Missing columns in summary: {missing_cols}")
                    display_cols = [col for col in display_cols if col in summary_display.columns]
                
                # Display the summary table with all metrics
                if display_cols:
                    print(summary_display[display_cols].to_string(index=False, 
                                                             formatters={
                                                                 'total_distance_km': '{:.1f} km'.format,
                                                                 'total_time_hours': '{:.1f} hrs'.format
                                                             }))
                else:
                    print("No valid columns to display in summary")
            except Exception as e:
                print(f"Error displaying route summary: {e}")
        
        # Show details about unvisited locations if any
        if not unvisited_df.empty:
            unvisited_count = len(unvisited_df)
            print(f"\\nUnvisited locations: {unvisited_count}")
            print("Reasons may include:")
            print("- Weight constraints exceeded")
            print("- Time constraints exceeded")
            print("- Locations too distant from depots")
            
            # Calculate percentage of unvisited locations
            if total > 0:
                unvisited_pct = (unvisited_count / total) * 100
                print(f"Unvisited percentage: {unvisited_pct:.1f}%")
    
    # Save the enhanced original data - single CSV output
    output_filename = f"{file_path.stem}_with_routes.csv"
    output_file = output_path / output_filename
    
    try:
        # Save enhanced data to CSV
        enhanced_df.to_csv(output_file, index=False)
        print(f"\\n✅ Enhanced data with route details saved to: {output_file}")
        
        # Cache route data for future use
        cache_dir = Path.cwd() / 'cache'
        os.makedirs(cache_dir, exist_ok=True)
        routes_cache_file = cache_dir / f"{file_path.stem}_routes.pkl"
        
        with open(routes_cache_file, 'wb') as f:
            pickle.dump(all_routes, f)
        
        print(f"✅ Route data cached for future use at: {routes_cache_file}")
    except Exception as e:
        print(f"❌ Error saving results: {e}")
    
    return enhanced_df, len(all_routes)

## Cell 12: Main Function

In [25]:
def main():
    """Main function to run the route optimization"""
    print("=== ROUTE OPTIMIZATION WITH HERE API ===")
    
    # Create a status dictionary to track overall progress
    status_dict = {
        'stage': 'Starting',
        'detail': 'Initializing',
        'progress': '0%',
        'done': False
    }
    
    # Start a thread to display status
    status_thread = threading.Thread(target=display_overall_status, args=(status_dict,), daemon=True)
    status_thread.start()
    
    try:
        # Load the HERE API key first to verify it's available
        status_dict.update({'stage': 'Setup', 'detail': 'Checking API key'})
        api_key = load_api_key()
        if not api_key:
            print("❌ ERROR: No HERE API key found. Route optimization requires HERE API.")
            print("Please create an api_keys.json file with a valid HERE API key.")
            status_dict['done'] = True
            return
        else:
            print(f"✅ HERE API key loaded successfully")
        
        # Setup distance cache
        status_dict.update({'detail': 'Setting up distance cache'})
        cache_db_path = setup_distance_cache()
        print(f"Using distance cache at: {cache_db_path}")
        
        # Setup project
        status_dict.update({'detail': 'Setting up project paths'})
        input_path, output_path = setup_project()
        
        # Load data
        status_dict.update({'stage': 'Data Loading', 'detail': 'Reading input file'})
        df, file_path = load_data(input_path)
        
        # Check if we have cached routes for this file
        status_dict.update({'detail': 'Checking for cached routes'})
        cache_dir = Path.cwd() / 'cache'
        routes_cache_file = cache_dir / f"{file_path.stem}_routes.pkl"
        cached_routes = None
        
        if routes_cache_file.exists():
            try:
                with open(routes_cache_file, 'rb') as f:
                    cached_routes = pickle.load(f)
                print(f"Found cached routes for {file_path.stem} with {len(cached_routes)} routes")
                use_cache = input("Do you want to use cached routes? This will skip optimization. (y/n): ").strip().lower()
                if use_cache != 'y':
                    cached_routes = None
                    print("Skipping cached routes, will run full optimization")
                else:
                    print("Using cached routes to save time and API calls")
            except Exception as e:
                print(f"⚠️ Error loading cached routes: {e}")
                cached_routes = None
        
        # Validate data
        status_dict.update({'stage': 'Data Preparation', 'detail': 'Validating data'})
        df_clean = validate_data(df)
        
        # Create depot dataframe
        status_dict.update({'detail': 'Identifying depots'})
        depot_df = create_depot_dataframe(df_clean)
        
        # If using cached routes, skip optimization
        if cached_routes:
            status_dict.update({'stage': 'Using Cached Results', 'detail': 'Skipping optimization'})
            print("Using cached routes. Skipping optimization.")
            all_routes = cached_routes
            
            # Create a placeholder summary
            routes_summary = []
            for route in all_routes:
                depot_name = route.get('depot_name', 'Unknown')
                # Count depot locations
                depot_locs = df_clean[df_clean['cluster_name'] == depot_name]
                
                # Find if this depot is already in the summary
                existing = next((s for s in routes_summary if s['depot_name'] == depot_name), None)
                
                if existing:
                    existing['routes_created'] += 1
                    existing['total_distance_km'] += route.get('distance_km', 0)
                    existing['total_time_hours'] += route.get('time_min', 0) / 60
                else:
                    routes_summary.append({
                        'depot_name': depot_name,
                        'total_locations': len(depot_locs),
                        'locations_visited': len(depot_locs),  # Assuming all visited
                        'locations_unvisited': 0,
                        'routes_created': 1,
                        'total_distance_km': route.get('distance_km', 0),
                        'total_time_hours': route.get('time_min', 0) / 60
                    })
            
            # No unvisited locations when using cached routes
            locations_not_visited = []
        else:
            # Get operational constraints
            status_dict.update({'stage': 'Configuration', 'detail': 'Setting constraints'})
            optimization_params = get_operational_constraints()
            
            # Run optimization with status updates
            status_dict.update({'stage': 'Optimization', 'detail': 'Starting optimization process'})
            try:
                # Define a function to update status from within run_optimization
                def update_optimization_status(stage, detail, progress=''):
                    status_dict.update({
                        'stage': f'Optimization: {stage}', 
                        'detail': detail, 
                        'progress': progress
                    })
                
                # Modified run_optimization function to update status
                def run_optimization_with_status(df_clean, depot_df, optimization_params):
                    """Run the optimization for each depot - with added status updates"""
                    print("\n=== OPTIMIZING ROUTES ===")
                    print("Starting route optimization process...")
                    
                    # Initialize results containers
                    all_routes = []
                    routes_summary = []
                    locations_not_visited = []
                    
                    # Performance tracking
                    optimization_start_time = time.time()
                    
                    update_optimization_status('Setup', 'Testing API connection')
                    
                    # Global vehicle counter to ensure unique IDs across depots
                    global_vehicle_count = 0
                    
                    # Test HERE API connectivity once at the beginning
                    api_key = load_api_key()
                    if api_key:
                        # Setup distance cache
                        cache_db_path = setup_distance_cache()
                        
                        # Use coordinates from the first depot for testing
                        first_depot = depot_df.iloc[0]
                        test_here_api_connection(
                            api_key, 
                            first_depot['latitude'], 
                            first_depot['longitude']
                        )
                    else:
                        print("⚠️ No API key available for optimization")
                        return [], [], []
                    
                    # Process depots sequentially to avoid rate limiting
                    total_depots = len(depot_df)
                    for idx, depot_row in depot_df.iterrows():
                        depot_name = depot_row['depot_name']
                        update_optimization_status('Depots', f'Processing depot: {depot_name}', f'{idx+1}/{total_depots}')
                        
                        print(f"\nProcessing depot: {depot_name}")
                        
                        # Get locations assigned to this depot using cluster_name
                        depot_locations = df_clean[df_clean['cluster_name'] == depot_name]
                        
                        print(f"Found {len(depot_locations)} locations assigned to this depot")
                        
                        # Skip if no locations for this depot
                        if len(depot_locations) == 0:
                            print(f"⚠️ No locations assigned to depot: {depot_name}")
                            continue
                        
                        print(f"Optimizing routes for {len(depot_locations)} locations...")
                        update_optimization_status('Routing', f'Optimizing routes for {len(depot_locations)} locations', f'Depot {idx+1}/{total_depots}')
                        
                        # Run optimization with the global vehicle count to ensure unique IDs
                        depot_routes, total_distance, total_time, vehicle_count, unvisited = create_routes(
                            depot_locations, 
                            depot_row, 
                            optimization_params['vehicle_capacity'], 
                            optimization_params['max_route_time_min'], 
                            optimization_params['service_time_per_unit_min'],
                            optimization_params['depot_service_time_min'],
                            global_vehicle_count  # Pass the current global vehicle count
                        )
                        
                        # Update the global vehicle count
                        global_vehicle_count += vehicle_count
                        
                        # Update results
                        all_routes.extend(depot_routes)
                        
                        # Create summary
                        summary = {
                            'depot_name': depot_name,
                            'total_locations': len(depot_locations),
                            'locations_visited': len(depot_locations) - len(unvisited),
                            'locations_unvisited': len(unvisited),
                            'routes_created': vehicle_count,
                            'total_distance_km': total_distance,
                            'total_time_hours': total_time / 60
                        }
                        routes_summary.append(summary)
                        
                        # Record unvisited locations
                        if not unvisited.empty:
                            for i, loc in unvisited.iterrows():
                                locations_not_visited.append({
                                    'location_id': i,
                                    'depot_name': depot_name,
                                    'latitude': loc['latitude'],
                                    'longitude': loc['longitude'],
                                    'weight': loc.get('weight', 1)
                                })
                    
                    total_optimization_time = time.time() - optimization_start_time
                    print(f"\n✅ Optimization complete in {total_optimization_time:.1f} seconds!")
                    
                    update_optimization_status('Complete', f'Created {len(all_routes)} routes', '100%')
                    
                    return all_routes, routes_summary, locations_not_visited
                
                # Run the optimization with status updates
                all_routes, routes_summary, locations_not_visited = run_optimization_with_status(
                    df_clean, depot_df, optimization_params
                )
                
            except Exception as e:
                print(f"\n❌ ERROR during optimization: {e}")
                import traceback
                traceback.print_exc()
                print("Please check your HERE API key, input data, and try again.")
                status_dict['done'] = True
                return
        
        # Process results
        status_dict.update({'stage': 'Finalizing', 'detail': 'Processing results', 'progress': '90%'})
        enhanced_df, route_count = process_results(
            all_routes, routes_summary, locations_not_visited, file_path, output_path, df_clean, depot_df
        )
        
        status_dict.update({'stage': 'Complete', 'detail': f'Created {route_count} routes', 'progress': '100%'})
        print("\n=== OPTIMIZATION COMPLETE ===")
        print(f"Created {route_count} routes for {len(enhanced_df)} locations")
        
        # Analyze the distance cache usage
        status_dict.update({'detail': 'Analyzing cache statistics'})
        try:
            conn = sqlite3.connect(cache_db_path)
            cursor = conn.cursor()
            cursor.execute("SELECT COUNT(*) FROM distance_cache")
            cache_size = cursor.fetchone()[0]
            conn.close()
            
            print(f"\n=== CACHE STATISTICS ===")
            print(f"Distance cache contains {cache_size} pre-calculated routes")
            print(f"Cache is stored at: {cache_db_path}")
            print(f"Next time you run optimization on this data, it will be much faster!")
        except Exception as e:
            print(f"Error analyzing cache: {e}")
    
    finally:
        # Ensure status thread is terminated
        status_dict['done'] = True
        time.sleep(1.5)  # Give status thread time to finish

# Run the main function
if __name__ == "__main__":
     main()

=== ROUTE OPTIMIZATION WITH HERE API ===
STATUS: Starting - Initializing [0%]                                                                ✅ HERE API key loaded successfully
✅ Distance cache setup complete at C:\Users\User\Dropbox\Personal\CareerFoundry\06 Sourcing data\Anaconda\03 Scripts\cache\distance_cache.db
Using distance cache at: C:\Users\User\Dropbox\Personal\CareerFoundry\06 Sourcing data\Anaconda\03 Scripts\cache\distance_cache.db
Project setup complete. 
 Input path: C:\Users\User\Dropbox\Personal\CareerFoundry\06 Sourcing data\Anaconda\02 Data\Processed_data 
 Output path: C:\Users\User\Dropbox\Personal\CareerFoundry\06 Sourcing data\Anaconda\02 Data\Processed_data
Available files:
1: 03_1_depot_centered_clusters.csv
2: 03_1_depot_centered_clusters_test.csv
3: time_and_km.csv
STATUS: Data Loading - Reading input file [0%]                                                      

Choose file number (1-3):  2


Detecting encoding for 03_1_depot_centered_clusters_test.csv...
Detected encoding: UTF-8-SIG (confidence: 100.0%)

Analyzing potential delimiters:

1: Delimiter ','
   Found 1 columns
   Preview with option 1:
                                                                                                                                                                                                                                                                                            ABS Custumer no;Route Number;Customer;Full address;Service;DeliveryQty;Net Weight;latitude;longitude;formatted_address;geocode_confidence;cluster_id;cluster_name;distance_to_depot_km;depot_latitude;depot_longitude;depot_formatted_address;depot_geocode_confidence
128.0;4424.0;Tallinn                               Noorkuu 2//4//6// Vana -Rannamõisa tee 1e KÜ (3... Haabersti Tallinn             13516 Harju Maakond                                Eesti;1.0;1;Production LOO;18.28512461023954;59... Loo      


Choose delimiter option (1-4) [default: 2]:  


Using delimiter: ';'

✅ Loaded 10 rows × 18 columns from 03_1_depot_centered_clusters_test.csv

Data Overview:
Column names: ABS Custumer no, Route Number, Customer, Full address, Service, ... (and 13 more columns)

Data types (first 5 columns):
ABS Custumer no    float64
Route Number       float64
Customer            object
Full address        object
Service             object
dtype: object
... (and 13 more columns)

Sample data:
   ABS Custumer no  Route Number  \
0            128.0        4424.0   
1        3319758.0        4024.0   
2        3340737.0        4414.0   
3        3424427.0        4434.0   
4        3623344.0        1021.0   

                                            Customer  \
0  Tallinn,Noorkuu 2//4//6// Vana -Rannamõisa tee...   
1                        KOTZEBUE TN 14 KÜ (3319758)   
2      J.KUNDERI 33 KORTERIÜHISTU, TALLINN (3340737)   
3                HOUSE HALDUS OÜ/JAKOBI 28 (3424427)   
4                               FLEXOIL OÜ (3623344)   

           

Do you want to use cached routes? This will skip optimization. (y/n):  n


Skipping cached routes, will run full optimization

=== VALIDATING DATA QUALITY ===
✅ Required location columns present
✅ ID column(s) found: ABS Custumer no
✅ 10 out of 10 rows (100.0%) have valid coordinates
ℹ️ Found 2 rows with duplicate coordinates
   This might be expected if multiple deliveries go to the same location
   Example: These 2 rows share coordinates (59.39643, 24.7973):
   ABS Custumer no  Route Number                   Customer  \
8        3642198.0        1421.0  HOUSE HALDUS OÜ (3642198)   
9        3642198.0        4424.0  HOUSE HALDUS OÜ (3642198)   

           Full address Service  DeliveryQty  Net Weight  latitude  longitude  \
8  Kopli tee 40, Peetri     MAS          1.0         6.1  59.39643    24.7973   
9  Kopli tee 40, Peetri     MAS          1.0         6.1  59.39643    24.7973   

                                   formatted_address  geocode_confidence  \
8  Kopli tee 40, Peetri, Rae, 75312 Harju Maakond...                 1.0   
9  Kopli tee 40, Peetri,

Vehicle weight/package capacity (default 800):  700
Maximum route time in hours (default 8):  9
Service time at depot (loading/unloading) in minutes (default 45):  
Service time per unit in minutes (default 0.5):  0.97



Optimization parameters:
- Vehicle capacity: 700.0 units
- Maximum route time: 9.0 hours (540 minutes)
- Depot service time: 45 minutes
- Service time per unit: 0.97 minutes

=== OPTIMIZING ROUTES ===
Starting route optimization process...
✅ Distance cache setup complete at C:\Users\User\Dropbox\Personal\CareerFoundry\06 Sourcing data\Anaconda\03 Scripts\cache\distance_cache.db

Testing HERE API connectivity...
✅ HERE API connection successful!
Received 1 route sections in response

Processing depot: Production LOO
Found 10 locations assigned to this depot
Optimizing routes for 10 locations...
Using global optimization approach for 10 locations
✅ Distance cache setup complete at C:\Users\User\Dropbox\Personal\CareerFoundry\06 Sourcing data\Anaconda\03 Scripts\cache\distance_cache.db
Starting global optimization with 10 locations
Constraints: max_weight=700.0, max_time=540.0 min, depot_service_time=45 min

STEP 1: Hierarchical clustering of locations...
Minimum vehicles needed based on

Calculating routes:   0%|          | 0/10 [00:00<?, ?pair/s]

STATUS: Optimization: Routing - Optimizing routes for 10 locations [Depot 1/1]                      

Calculating routes: 100%|██████████| 10/10 [00:05<00:00,  1.86pair/s]


Processed 9 out of 10 route segments
Building distance matrices within each cluster using batch API...
Processing 45 location pairs for cluster 0...


Cluster 0:   0%|          | 0/45 [00:00<?, ?pair/s]

Using individual routing requests instead of batch matrix API (slower)



Calculating routes:   0%|          | 0/45 [00:00<?, ?pair/s][A
Calculating routes:   2%|▏         | 1/45 [00:00<00:24,  1.83pair/s][A
Calculating routes:   4%|▍         | 2/45 [00:01<00:23,  1.86pair/s][A
Calculating routes:   7%|▋         | 3/45 [00:01<00:22,  1.85pair/s][A
Calculating routes:   9%|▉         | 4/45 [00:02<00:22,  1.83pair/s][A
Calculating routes:  11%|█         | 5/45 [00:02<00:21,  1.84pair/s][A
Calculating routes:  13%|█▎        | 6/45 [00:03<00:21,  1.85pair/s][A
Calculating routes:  16%|█▌        | 7/45 [00:03<00:20,  1.85pair/s][A
Calculating routes:  18%|█▊        | 8/45 [00:04<00:20,  1.82pair/s][A
Calculating routes:  20%|██        | 9/45 [00:04<00:19,  1.82pair/s][A
Calculating routes:  22%|██▏       | 10/45 [00:05<00:19,  1.76pair/s][A
Calculating routes:  24%|██▍       | 11/45 [00:06<00:19,  1.78pair/s][A
Calculating routes:  27%|██▋       | 12/45 [00:06<00:18,  1.78pair/s][A
Calculating routes:  29%|██▉       | 13/45 [00:07<00:17,  1.79pair/s

Processed 37 out of 45 route segments
✓ Completed distance matrix for cluster 0

STEP 3: Running Adaptive Large Neighborhood Search optimization...
Creating initial solution with greedy insertion...
Starting ALNS with 50 iterations...


New best! Cost: 64.17: 100%|██████████| 50/50 [00:07<00:00,  6.48iter/s, best_cost=64.17, current_cost=71.76]


ALNS completed. Best solution has 1 routes with total cost 64.17

✅ Optimization complete in 39.0 seconds!
Created 1 routes with total distance: 82.52km
Total time: 3.88hrs, Total weight: 104.00

✅ Optimization complete in 39.3 seconds!
✅ Distance cache setup complete at C:\Users\User\Dropbox\Personal\CareerFoundry\06 Sourcing data\Anaconda\03 Scripts\cache\distance_cache.db
Enriching routes with depot information...
Processing 1 routes to enhance data...
Processing route 1 for depot Production LOO
Route 1 has 12 location_ids: ['DEPOT_START', 3639184.0, 3424427.0, 3623344.0, 128.0, 3638912.0, 3319758.0, 3340737.0, 3642198.0, 3642198.0, 3632290.0, 'DEPOT_END']
Calculating missing segment data for route 1
Calculating 11 segments for route 1


Processing routes: 100%|██████████| 1/1 [00:00<00:00, 83.47route/s]

Added DEPOT_START for route 1
Added customer stop for location_id 3639184.0 in route 1, sequence 1
Added customer stop for location_id 3424427.0 in route 1, sequence 2
Added customer stop for location_id 3623344.0 in route 1, sequence 3
Added customer stop for location_id 128.0 in route 1, sequence 4
Added customer stop for location_id 3638912.0 in route 1, sequence 5
Added customer stop for location_id 3319758.0 in route 1, sequence 6
Added customer stop for location_id 3340737.0 in route 1, sequence 7
Added customer stop for location_id 3642198.0 in route 1, sequence 8
Added customer stop for location_id 3642198.0 in route 1, sequence 8
⚠️ Skipping duplicate location_id 3642198.0 in route 1
Added customer stop for location_id 3632290.0 in route 1, sequence 10
Added DEPOT_END for route 1
Route 1 has 12 total stops (9 customers)
Assigned position 0 to DEPOT_START in route 1
Assigned position 1 to CUSTOMER in route 1
Assigned position 2 to CUSTOMER in route 1
Assigned position 3 to CUST





=== CACHE STATISTICS ===
Distance cache contains 26871 pre-calculated routes
Cache is stored at: C:\Users\User\Dropbox\Personal\CareerFoundry\06 Sourcing data\Anaconda\03 Scripts\cache\distance_cache.db
Next time you run optimization on this data, it will be much faster!

