# Transportation Optimization Analysis

## Objective
The goal of this notebook is to demonstrate how to optimize the relationship between Distribution Centers (DCs) and Customers. 

We will:
1. Generate synthetic data for customer locations and demands.
2. Use an **Unsupervised Machine Learning algorithm (K-Means Clustering)** to group customers into optimal regions and determine the ideal locations for Distribution Centers to minimize transportation costs (distance).
3. Visualize the results on an interactive map using `folium`.


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
import folium

# Set visual style
sns.set_style('whitegrid')

## 1. Data Generation
We will generate random customer data located approximately in the US East Coast region.

In [2]:
np.random.seed(42)

# Configuration
n_customers = 200
region_lat_range = (32.0, 42.0)  # Approx US East Coast latitude
region_lon_range = (-82.0, -72.0) # Approx US East Coast longitude

# Generate Data
customer_data = pd.DataFrame({
    'customer_id': range(n_customers),
    'lat': np.random.uniform(region_lat_range[0], region_lat_range[1], n_customers),
    'lon': np.random.uniform(region_lon_range[0], region_lon_range[1], n_customers),
    'demand': np.random.randint(10, 500, n_customers)
})

print("Sample Customer Data:")
customer_data.head()

Sample Customer Data:


Unnamed: 0,customer_id,lat,lon,demand
0,0,35.745401,-75.579684,370
1,1,41.507143,-81.1586,295
2,2,39.319939,-80.383713,282
3,3,37.986585,-73.014458,378
4,4,33.560186,-75.935709,71


## 2. ML Optimization: K-Means Clustering
By clustering customers based on their geographical location, we can identify the "centroid" of each cluster. These centroids represent the **optimal locations** for our Distribution Centers to minimize the average distance to all customers in that cluster.

In [3]:
# Define the number of Distribution Centers we want to open
n_dcs = 5

# Prepare features for clustering (Latitude and Longitude)
X = customer_data[['lat', 'lon']]

# Apply K-Means
kmeans = KMeans(n_clusters=n_dcs, random_state=42, n_init=10)
customer_data['cluster_label'] = kmeans.fit_predict(X)

# Get the coordinates of the optimal DC locations (cluster centers)
dc_locations = kmeans.cluster_centers_
dc_data = pd.DataFrame(dc_locations, columns=['lat', 'lon'])
dc_data['dc_id'] = range(n_dcs)

print("Optimal Distribution Center Locations:")
print(dc_data)

Optimal Distribution Center Locations:
         lat        lon  dc_id
0  33.969758 -74.398236      0
1  36.316758 -78.516473      1
2  39.255251 -74.591371      2
3  40.355051 -79.619228      3
4  33.647223 -80.557255      4




## 3. Visualization
We will visualise the solution using a `folium` map. 
- **Black Stars**: Distribution Centers (Optimal Locations)
- **Colored Dots**: Customers (Color-coded by assigned DC)
- **Lines**: Connecting customers to their assigned DC

In [4]:
# Initialize Map focused on the average location
center_lat = customer_data['lat'].mean()
center_lon = customer_data['lon'].mean()
m = folium.Map(location=[center_lat, center_lon], zoom_start=6, tiles='CartoDB positron')

# Define color map for clusters
colors = ['red', 'blue', 'green', 'purple', 'orange', 'darkred', 'cadetblue', 'darkgreen']

# 1. Plot Customers and connections
for idx, row in customer_data.iterrows():
    cluster_id = int(row['cluster_label'])
    color = colors[cluster_id % len(colors)]
    
    # Customer Marker
    folium.CircleMarker(
        location=[row['lat'], row['lon']],
        radius=4,
        color=color,
        fill=True,
        fill_color=color,
        fill_opacity=0.7,
        popup=f"Customer {int(row['customer_id'])} | Demand: {row['demand']}"
    ).add_to(m)
    
    # Line to DC
    dc_lat = dc_data.iloc[cluster_id]['lat']
    dc_lon = dc_data.iloc[cluster_id]['lon']
    
    folium.PolyLine(
        locations=[[row['lat'], row['lon']], [dc_lat, dc_lon]],
        color=color,
        weight=0.5,
        opacity=0.4
    ).add_to(m)

# 2. Plot DCs (Centroids)
for idx, row in dc_data.iterrows():
    folium.Marker(
        location=[row['lat'], row['lon']],
        popup=f"<b>Distribution Center {int(row['dc_id'])}</b>",
        icon=folium.Icon(color='black', icon='star', prefix='fa')
    ).add_to(m)

# Display map
m

## 4. Conclusion
The map above visually demonstrates optimal clustering. Each cluster is served by a centrally located DC, minimizing the 'last-mile' travel distance.