# Introduction

This is my solution to the Hash Code drone delivery problem, using optimization and routing routines from the Google OR-Tools package. Let me know what you think and feel free to leave an upvote if you like this kernel! Please comment if you have any ideas on how to improve this code - there surely are lots of improvements possible.

Credits for data extraction:
[Application of Google OR-Tools](https://www.kaggle.com/jpmiller/application-of-google-or-tools)

The general process of this solution is quite straightforward:
* We determine where each product unit is to be delivered from to minimize total delivery distance. This does not make use of redistributing units between warehouses, we simply look at where products are now and where they need to go and attempt to deliver them on the most direct route possible. Certain products are only stored in one or two warehouses and may need to be delivered across the entire map.
* For each warehouse separately, we look at all deliveries which need to be executed from there and combine them into single delivery routes as efficiently as possible to maximize the load utilization and minimize the travel distance of each route. We group orders by their "difficulty" (i.e., total weight and distance) to get a higher score, aka minimize the overall waiting time for the completion of an order.
* Combining all single routes from all warehouses, we attempt to schedule each drone such that the distance between the end of a single route and the start of the next one are as close together as possible (this is the most computing-intensive part). We do this in steps, such that "easy/fast" orders are completed first for a higher score.

Let's start by importing some handy packages:

In [None]:
from datetime import datetime
import matplotlib.cm as mcm
import matplotlib.pyplot as plt
import numpy as np
from ortools.constraint_solver import routing_enums_pb2
from ortools.constraint_solver import pywrapcp
from ortools.graph import pywrapgraph
import pandas as pd
from tqdm import tqdm

# Settings
Set some parameters which change performance a bit (100k points should be possible with a wide range of these).

In [None]:
RND_STATE = 123123
# fixing random seed
# only used for plotting

LOCSPLIT = 1
# how often to split the delivery map of a single warehouse before optimizing delivery routes
# LOCSPLIT = 1, no map splitting is performed
# LOCSPLIT = 2, map is split into 4 sections (2 on the row level and 2 on the column level)
# LOCSPLIT = 3, map is split into 9 sections
# etc etc


WDSPLIT = 8
# how often to split list of deliveries of a single warehouse by priority (weight-distance)
# before optimizing delivery routes

BATCHES = 10
# number of route batches which are to be turned into schedules
# processing time increases significantly with less batches
# (higher = more focus on priority, lower = more focus on efficiency)

# Extract Data
First step is extracting the data into a useful format, credits to the kernel [Data Extraction and EDA](https://www.kaggle.com/jpmiller/demo-one-delivery-per-drone) where I essentially copied this code over from.

In [None]:
print('Extracting data')

# =============================================================================
# load problem file
# =============================================================================
with open('/kaggle/input/hashcode-drone-delivery/busy_day.in') as file:
    line_list = file.read().splitlines()
    
# =============================================================================
# problem parameters
# =============================================================================
ROWS, COLS, DRONES, TURNS, MAXLOAD = map(int, line_list[0].split())
   
# =============================================================================
# load product information
# =============================================================================
weights = line_list[2].split()
products_df = pd.DataFrame({'weight': weights})

wh_count = int(line_list[3])
wh_endline = (wh_count*2)+4

wh_invs = line_list[5:wh_endline+1:2]
for i, wh_inv in enumerate(wh_invs):
    products_df[f'wh{i}_inv'] = wh_inv.split()

# products_df has shape [400,11]
# (# of products, [weight, wh0_inv, wh1_inv,...])
products_df = products_df.astype(int)

# =============================================================================
# load warehouse locations
# =============================================================================
wh_locs = line_list[4:wh_endline:2]
wh_rows = [wl.split()[0] for wl in wh_locs]
wh_cols = [wl.split()[1] for wl in wh_locs]

warehouse_df = pd.DataFrame(
    {'wh_row': wh_rows, 'wh_col': wh_cols}).astype(np.uint16)

# =============================================================================
# load order information
# =============================================================================
order_locs = line_list[wh_endline+1::3]
o_rows = [ol.split()[0] for ol in order_locs]
o_cols = [ol.split()[1] for ol in order_locs]

orders_df = pd.DataFrame({'row': o_rows, 'col': o_cols})

orders_df[orders_df.duplicated(keep=False)].sort_values('row')

orders_df['product_count'] = line_list[wh_endline+2::3]

order_array = np.zeros((len(orders_df), len(products_df)), dtype=np.uint16)
orders = line_list[wh_endline+3::3]
for i,order in enumerate(orders):
    products = [int(prod) for prod in order.split()]
    for p in products:
        order_array[i, p] += 1

df = pd.DataFrame(data=order_array,
                  columns=['p_'+ str(i) for i in range(400)],
                  index=orders_df.index)

orders_df = orders_df.astype(int).join(df)

print('... success')

Now, let's have a look what that looks like in format of a map. Simply plotting all warehouses (colors) and all orders (black) here.

In [None]:
cmap = mcm.get_cmap('plasma')
wh_colors = [cmap(iii) for iii in np.linspace(0, 1, num=10)]

fig = plt.figure(figsize=(12,8))
ax = plt.subplot()
ax.scatter(warehouse_df['wh_col'], warehouse_df['wh_row'],
           color=wh_colors, ec='k', s=48, zorder=10)
ax.scatter(orders_df['col'], orders_df['row'],
           color='k', s=1)
ax.set_xlabel('Column')
ax.set_ylabel('Row')
ax.axis('equal')
plt.show()

# Optimize product distribution
Now the first real step toward a solution is to figure out where products are needed and where they are currently located. We look at this for each product separately, determining the current storage location (source) for all available units and all locations where this product needs to be delivered to (sink). From this information we can figure out the optimal way of distributing this product to all customers while minimizing the total distance which has to be covered to satisfy all needs.
This is done by constructing a graph connecting all sources and sinks, where the cost of each connection is equal to the distance between the adjoining source (warehouse) and sink (customer). A solution is found by performing a minimum-cost flow optimization and the results for each product are saved in a combined dataframe.

In [None]:
print('Optimizing product distribution')

# we use this array to note which warehouse
# each item is to be delivered from
from_wh_df = np.full((len(orders_df), len(products_df), np.max(order_array)), -1)

# number of warehouses
NWH = len(warehouse_df)

for ppp in range(len(products_df)):
    # how many units of this product are in each warehouse
    sources = warehouse_df.copy()
    sources['inv'] = products_df.loc[ppp][1:].values
    sources = sources[sources['inv'] > 0]
    
    # where does this product need to be delivered to
    sinks = orders_df.loc[:, ['row', 'col', 'p_{}'.format(ppp)]]
    sinks = sinks[sinks['p_{}'.format(ppp)] > 0]
    
    # set up simple min cost flow solver
    min_cost_flow = pywrapgraph.SimpleMinCostFlow()
    
    # add all arcs
    for iii in sources.index:
        for jjj in sinks.index:
            dist = np.ceil(np.sqrt(
                (sources.loc[iii, 'wh_row'] - sinks.loc[jjj, 'row'])**2 +
                (sources.loc[iii, 'wh_col'] - sinks.loc[jjj, 'col'])**2
                ))
            # add NWH to keep nodes uniquely identifiable
            min_cost_flow.AddArcWithCapacityAndUnitCost(
                iii, jjj+NWH, int(sinks.loc[jjj, 'p_{}'.format(ppp)]), int(dist))
        # add arcs to "overflow" node collecting exceed product
        min_cost_flow.AddArcWithCapacityAndUnitCost(
                iii, int(1e4), int(1e4), int(1e4))
    
    # add supplies
    for iii in sources.index:
        min_cost_flow.SetNodeSupply(iii, int(sources.loc[iii, 'inv']))
    for jjj in sinks.index:
        min_cost_flow.SetNodeSupply(jjj+NWH, -int(sinks.loc[jjj, 'p_{}'.format(ppp)]))
    min_cost_flow.SetNodeSupply(
        int(1e4), int(sinks['p_{}'.format(ppp)].sum()) - int(sources['inv'].sum()))
        
    # solve and put result into from_wh_df
    if min_cost_flow.Solve() == min_cost_flow.OPTIMAL:
        for iii in range(min_cost_flow.NumArcs()):
            if not min_cost_flow.Flow(iii):
                continue
            if min_cost_flow.UnitCost(iii) == int(1e4):
                continue
            # need to subtract NWH again to get actual order number
            thisorder = min_cost_flow.Head(iii)-NWH
            thiswh = int(min_cost_flow.Tail(iii))
            thisnum = min_cost_flow.Flow(iii)
            for ttt in range(from_wh_df.shape[-1]):
                if not thisnum:
                    break
                if from_wh_df[thisorder, ppp, ttt] == -1:
                    from_wh_df[thisorder, ppp, ttt] = thiswh
                    thisnum -= 1
    else:
        print('product_id', ppp, 'distribution could not be optimized')
        raise Exception(f'product_id {ppp}: distribution could not be optimized')

print('... success')

# Define "weight-distance"
Weight-distance indicates the "difficulty" of an order, i.e. the amount of resources needed for its fulfillment. We define it as the sum of the product **weight units * distance units** for each product which is to be delivered in order to fully ship this order.

For example, order A includes product 1 with weight 50 which needs to be shipped over a distance of 10 distance units (50 * 10 = 500 wd units) and product 2 with weigth 100 which needs to be shipped over a distance of 150 distance units (100 * 150 = 15000 wd units) - a total WD score of 15500. 

We use these scores to 
* bundle products for easy-to-finish orders together on single delivery flights (routes)
* give these routes priority when assigning delivery schedules (combinations of routes)

in order to finish "easy/quick" orders first and longer ones later. In a real life problem this might not be the perfect way of optimizing, as we prioritize finishing certain orders before others for the cost of a somewhat less optimal (but still pretty good) resource utilization (drones might not be loaded as much as they could or might travel a bit further than absolutely necessary).

In [None]:
weight_dist = np.zeros((len(orders_df)), dtype=int)
weight_sum = np.zeros((len(orders_df)), dtype=int)

for www in warehouse_df.index:
    tmp = np.where(from_wh_df == www)
    weight_this_wh = np.zeros((len(orders_df)), dtype=int)
    for iii in range(len(tmp[0])):
        weight_this_wh[tmp[0][iii]] += products_df.loc[tmp[1][iii], 'weight']

    weight_sum += weight_this_wh
    weight_dist += np.ceil(np.sqrt(
        (orders_df['row'] - warehouse_df.loc[www,'wh_row'])**2 +
        (orders_df['col'] - warehouse_df.loc[www,'wh_col'])**2
        )).astype(int) * weight_this_wh
    
orders_df['weight_dist'] = weight_dist
orders_df['weight_sum'] = weight_sum

# Create delivery routes
We now know all single deliveries which need to be executed and try to combine them into delivery routes where load utilization is maximum and travel distance is minimum. Essentially, we combine products with appropriate weights which need to be delivered from the same warehouse into the same region of the map into efficient delivery routes using Google OR-Tools' routing logic.

Depending on the number of single products needing to be delivered this can be somewhat inefficient and slow if we wanted to calculate in one go *all* routes for *all* products which are to be delivered from the same warehouse - we can hence form subgroups depending on the delivery location before attempting this optimization.

However, in order to obtain better scores (deliver orders as early as possible, rather than as efficiently as possible) we can also create routes from subgroups of all product deliveries from a warehouse but grouped by the weight-distance score introduced above. 

Or simply a combination of the two.

In [None]:
try:
    routes_df = pd.read_pickle('routes_df')
    raise Exception
except:
    # dataframe to store all single routes
    routes_df = pd.DataFrame()

    # for each warehouse
    for www in warehouse_df.index:
        print(f'Calculating routes from warehouse {www}')

        # get list with all orders receiving items from this warehouse and
        # make dataframe of sink locations for this warehouse
        #####itemtable = from_wh_df.drop(['row','col','product_count'], axis=1)
        idx = np.where(from_wh_df == www)
        orderlist = idx[0]
        rowlist = orders_df.loc[idx[0],'row'].values
        collist = orders_df.loc[idx[0],'col'].values
        itemlist = idx[1]
        weightlist = products_df['weight'].iloc[itemlist].values
        distlist = np.ceil(np.sqrt((warehouse_df.loc[www,'wh_col'] - collist) ** 2 + 
                                   (warehouse_df.loc[www,'wh_row'] - rowlist) ** 2)).astype(int)
        weightdistlist = orders_df.loc[idx[0], 'weight_dist'].values


        all_sinks_df = pd.DataFrame(
            {'order': orderlist,
             'row': rowlist,
             'col': collist,
             'item': itemlist,
             'weight': weightlist,
             'dist': distlist,
             'weight_dist': weightdistlist})

        # split into sections by delivery location on map before calculating routes
        # by col
        sort_col = all_sinks_df['col'].sort_values().values
        lims_col = np.linspace(0, len(sort_col), num=LOCSPLIT+1).astype(int)
        lims_col = [0, *sort_col[lims_col[1:-1]], COLS]        
        for ccc in range(LOCSPLIT):
            mincol = lims_col[ccc]
            maxcol = lims_col[ccc+1]
            c_sinks_df = all_sinks_df[(all_sinks_df['col'] >= mincol) &
                                      (all_sinks_df['col'] < maxcol)]
            # by row
            sort_row = c_sinks_df['row'].sort_values().values
            lims_row = np.linspace(0, len(sort_row), num=LOCSPLIT+1).astype(int)
            lims_row = [0, *sort_row[lims_row[1:-1]], ROWS]
            for rrr in range(LOCSPLIT):
                minrow = lims_row[rrr]
                maxrow = lims_row[rrr+1]
                r_sinks_df = c_sinks_df[(c_sinks_df['row'] >= minrow) &
                                    (c_sinks_df['row'] < maxrow)]
                # by weight-distance (group "easier"/faster orders)
                wd_sinks_df = r_sinks_df.sort_values(by='weight_dist').reset_index(drop=True)
                for ddd in range(WDSPLIT):
                    minidx = int(ddd / WDSPLIT * len(wd_sinks_df))
                    maxidx = int((ddd+1) / WDSPLIT * len(wd_sinks_df))
                    sinks_df = wd_sinks_df[minidx:maxidx].reset_index(drop=True)
                    
                    print('Warehouse:', www, '\t section:', ccc, rrr, ddd, '\t # of items:', len(sinks_df))
                    if not len(sinks_df):
                        continue

                    # add warehouse to list of nodes
                    sinks_df.loc[len(sinks_df)] = [
                        -1, warehouse_df.loc[www,'wh_row'], warehouse_df.loc[www,'wh_col'], -1, 0, 0, 0] 

                    # calculate distance matrix
                    R, C = np.meshgrid(sinks_df['row'].values, sinks_df['col'].values)
                    distance_matrix = np.ceil(np.sqrt((R-R.T)**2 + (C-C.T)**2)).astype(int)
                    # set distances for return to warehouse from any point to zero
                    # i.e., choose an arbitrary end for the route
                    distance_matrix[:-1, -1] = 0

                    # maximum number of vehicles (routes) allowed
                    NV = int(len(sinks_df))

                    # set up routing index manager
                    manager = pywrapcp.RoutingIndexManager(
                        len(distance_matrix),                 # problem size
                        NV,                                   # max number of vehicles (routes)
                        len(distance_matrix)-1                # start node identification
                        )
                    # create routing model
                    routing = pywrapcp.RoutingModel(manager)

                    # create and register a transit callback
                    def distance_callback(from_index, to_index):
                        # convert from routing variable Index to distance matrix NodeIndex.
                        from_node = manager.IndexToNode(from_index)
                        to_node = manager.IndexToNode(to_index)
                        return distance_matrix[from_node][to_node]
                    transit_callback_index = routing.RegisterTransitCallback(distance_callback)

                    # define arc costs
                    routing.SetArcCostEvaluatorOfAllVehicles(transit_callback_index)

                    # set up weight constraints    
                    def demand_callback(from_index):
                        from_node = manager.IndexToNode(from_index)
                        return int(sinks_df['weight'][from_node])
                    demand_callback_index = routing.RegisterUnaryTransitCallback(
                        demand_callback)
                    routing.AddDimension(
                        demand_callback_index,
                        0,                                     # no capacity slack
                        MAXLOAD,                               # maximum drone weight
                        True,                                  # start cumul to zero
                        'Capacity')
                    
                    # set first solution heuristic
                    search_parameters = pywrapcp.DefaultRoutingSearchParameters()
                    search_parameters.first_solution_strategy = (
                        routing_enums_pb2.FirstSolutionStrategy.AUTOMATIC)

                    # solve
                    solution = routing.SolveWithParameters(search_parameters)

                    # create routes from solution    
                    for vehicle_id in range(NV):
                        index = routing.Start(vehicle_id)
                        route_distance = 0
                        route_load = 0
                        route_nodes = []
                        while not routing.IsEnd(index):
                            node_index = manager.IndexToNode(index)

                            previous_index = index
                            index = solution.Value(routing.NextVar(index))

                            this_dist = routing.GetArcCostForVehicle(
                                previous_index, index, vehicle_id)

                            route_distance += this_dist
                            route_load += sinks_df.loc[node_index, 'weight']
                            route_nodes.append(node_index)
                        if not route_distance:
                            continue
                        
                        # pre-build command structure for submission
                        items = sinks_df.loc[route_nodes[1:], 'item'].values
                        orders = sinks_df.loc[route_nodes[1:], 'order'].values.astype(int)
                        cmds = []                        
                        # load 
                        unique_items, item_count = np.unique(items, return_counts=True)
                        for i in range(len(unique_items)):
                            cmds.append('L {} {} {}'.format(www, unique_items[i], item_count[i]))
                        # deliver 
                        unique_orders, order_idx = np.unique(orders, return_index=True)
                        unique_orders = unique_orders[np.argsort(order_idx)]
                        for o in unique_orders:
                            orderitems = items[orders == o]
                            uq_oi, uq_oicnt = np.unique(orderitems, return_counts=True)
                            for i in range(len(uq_oi)):
                                cmds.append('D {} {} {}'.format(o, uq_oi[i], uq_oicnt[i]))
                                
                        # save useful route details in routes_df
                        routes_df = routes_df.append(
                            {'start_row': sinks_df.loc[route_nodes[0], 'row'],
                             'start_col': sinks_df.loc[route_nodes[0], 'col'],
                             'end_row': sinks_df.loc[route_nodes[-1], 'row'],
                             'end_col': sinks_df.loc[route_nodes[-1], 'col'],
                             'dist': route_distance,
                             'weight': route_load,
                             'weight_dists': list(sinks_df.loc[
                                 route_nodes[1:], 'weight_dist']),
                             'orders': list(orders),
                             'order_dists': list(sinks_df.loc[
                                 route_nodes[1:], 'dist']),
                             'cmds': cmds},
                            ignore_index=True)
                        
    # save optimized route list to pickle for later use
    pd.to_pickle(routes_df, 'routes_df')

The dataframe <code>routes_df</code> holds all the information we need for solving this problem, including pre-built command structures to give to whichever drone will eventually fly a given route.

In [None]:
routes_df.head()

To get a measure of the efficiency of this assignment, we can check the average weight a drone leaves a depot with - here more than 90% load utilization.

In [None]:
routes_df['weight'].mean()

Let's have a look at some random routes. Lines are color-coded by the warehouse which they start from.

In [None]:
cmap = mcm.get_cmap('plasma')
wh_colors = [cmap(iii) for iii in np.linspace(0, 1, num=10)]

fig = plt.figure(figsize=(12,8))
ax = plt.subplot()
# plot warehouses
ax.scatter(warehouse_df['wh_col'], warehouse_df['wh_row'],
           color=wh_colors, ec='k', s=48, zorder=10)

selec_routes = routes_df.sample(n=50, random_state=RND_STATE)

for iii in range(len(selec_routes.index)):
    # get orders covered by this route
    idxs = np.unique(selec_routes.iloc[iii]['orders'], return_index=True)[1]
    tmp_orders = [selec_routes.iloc[iii]['orders'][idx] for idx in sorted(idxs)]
    # get warehouse this route starts from
    tmp_wh = warehouse_df[(selec_routes['start_col'].iloc[iii] == warehouse_df['wh_col']) & 
                          (selec_routes['start_row'].iloc[iii] == warehouse_df['wh_row'])].index[0]
    # plot drop-off locations
    ax.scatter(orders_df.loc[tmp_orders]['col'], orders_df.loc[tmp_orders]['row'],
               color=wh_colors[tmp_wh], s=20)
    # plot route lines
    tmp_cols = np.concatenate([[selec_routes['start_col'].iloc[iii]], orders_df.loc[tmp_orders]['col'].values])
    tmp_rows = np.concatenate([[selec_routes['start_row'].iloc[iii]], orders_df.loc[tmp_orders]['row'].values])
    for jjj in range(len(tmp_cols)-1):
        ax.plot(tmp_cols[jjj:jjj+2], tmp_rows[jjj:jjj+2],
                color=wh_colors[tmp_wh])

# plot all order locations
ax.scatter(orders_df['col'], orders_df['row'],
           color='k', s=1, alpha=0.7)

ax.set_xlabel('Column')
ax.set_ylabel('Row')
ax.axis('equal')
plt.show()

# Turn routes into schedules
We now have a stack of single delivery routes which optimize load utilization and delivery distance. These need to be distributed across all available drones while minimizing total travel distance / distribution time and prioritizing short/quick orders. Following the same scheme as in the previous step, we use Google OR-Tools' routing logic to determine a schedule for each drone. Note that we batch the available routes by weight-distance value, i.e. by priority. This does not only lead to better scores but also to shorter computation times, compared to trying to find a single optimal solution using all routes (where distance and total time are minimized).

In [None]:
routes_df = pd.read_pickle('routes_df')

# calculate average weight distance of deliveries (partly) fulfilled with each route
# and use for ordering / prioritisation
routes_df['wd_avg'] = routes_df['weight_dists'].apply(lambda x: int(np.mean(x)))
routes_df = routes_df.sort_values(by='wd_avg')

try:
    schedules_df = pd.read_pickle('schedules_df')
except:
    # dataframe to save list of routes for each drone
    schedules_df = pd.DataFrame()

    # process in batches
    route_split = np.linspace(0, len(routes_df), num=BATCHES+1).astype(int)

    # last drone location
    start_loc = np.full((DRONES), -1, dtype=int)

    for bbb in tqdm(range(BATCHES)):
        #print(datetime.now(), 'Optimizing route distribution for batch', bbb+1, 'of', BATCHES)

        all_routes_df = routes_df.copy()
        all_routes_df = all_routes_df.iloc[route_split[bbb]:route_split[bbb+1]]
        all_idx = all_routes_df.index.values
        all_routes_df = all_routes_df.reset_index(drop=True)

        def getRouteID(node_idx: int):
            if node_idx >= len(all_idx):
                return -1
            else:
                return all_idx[node_idx]

        # add depot to all_routes_df (first start point and dummy endpoint)
        all_routes_df = all_routes_df.append(
            {'start_row': warehouse_df.loc[0, 'wh_row'],
             'start_col': warehouse_df.loc[0, 'wh_col'],
             'end_row': warehouse_df.loc[0, 'wh_row'],
             'end_col': warehouse_df.loc[0, 'wh_col'],
             'dist': 0,
             'cmds': []},
            ignore_index=True)
        WH_IDX = len(all_routes_df)-1

        # add other start points to all_routes_df
        for lll in range(len(start_loc)):
            if start_loc[lll] == -1:
                # drone starts at depot
                start_loc[lll] = WH_IDX
            else:
                # drone starts at other location which needs to be added

                all_routes_df = all_routes_df.append(
                    {'start_row': routes_df.loc[start_loc[lll], 'end_row'],
                     'start_col': routes_df.loc[start_loc[lll], 'end_col'],
                     'end_row': routes_df.loc[start_loc[lll], 'end_row'],
                     'end_col': routes_df.loc[start_loc[lll], 'end_col'],
                     'dist': 0,
                     'cmds': []},
                    ignore_index=True)
                start_loc[lll] = len(all_routes_df)-1

        end_loc = np.full((DRONES), WH_IDX, dtype=int)

        # calculate distance matrix
        # distance is from the end of route A to the end of route B
        # while following the prescribed route B
        TO, FROM = np.meshgrid(np.arange(len(all_routes_df), dtype=int),
                               np.arange(len(all_routes_df), dtype=int))
        flat_from = np.ravel(FROM)
        flat_to = np.ravel(TO)
        SHP = flat_from.shape
        # distance to start point
        to_start = np.ceil(np.sqrt(
                    (all_routes_df.lookup(flat_from, np.full(SHP, 'end_col')) -
                     all_routes_df.lookup(flat_to, np.full(SHP, 'start_col')))**2 +
                    (all_routes_df.lookup(flat_from, np.full(SHP, 'end_row')) -
                     all_routes_df.lookup(flat_to, np.full(SHP, 'start_row')))**2
                    ))
        # distance of route itself
        route_dist = all_routes_df.lookup(flat_to, np.full(SHP, 'dist'))
        # total distance = getting to start + route distance
        distance_matrix = (to_start + route_dist).reshape(FROM.shape).astype(int)
        # replace some values
        for fromroute in range(len(all_routes_df)):
            # self node distance is zero
            distance_matrix[fromroute, fromroute] = 0
            # distance back to depot is zero (arbitrary end location)
            # (drones don't have to return to depot by the end of their tour)
            distance_matrix[fromroute, WH_IDX] = 0

        # set up route optimizer
        manager = pywrapcp.RoutingIndexManager(len(distance_matrix), DRONES,
                                               start_loc.tolist(), end_loc.tolist())  
        routing = pywrapcp.RoutingModel(manager)

        # create and register a transit callback
        def distance_callback(from_index, to_index):
            from_node = manager.IndexToNode(from_index)
            to_node = manager.IndexToNode(to_index)
            return distance_matrix[from_node][to_node]
        transit_callback_index = routing.RegisterTransitCallback(distance_callback)

        # define arc cost
        routing.SetArcCostEvaluatorOfAllVehicles(transit_callback_index)

        # force routes to have roughly the same length
        dimension_name = 'Distance'
        routing.AddDimension(transit_callback_index,
            0,  # no slack
            int(1e5),  # vehicle maximum travel distance
            True,  # start cumul to zero
            dimension_name)
        distance_dimension = routing.GetDimensionOrDie(dimension_name)
        distance_dimension.SetGlobalSpanCostCoefficient(int(100))

        # set first solution heuristic
        search_parameters = pywrapcp.DefaultRoutingSearchParameters()
        search_parameters.first_solution_strategy = (
            routing_enums_pb2.FirstSolutionStrategy.PATH_CHEAPEST_ARC)
        # solve
        solution = routing.SolveWithParameters(search_parameters)

        # create schedules from solution
        for drone_id in range(DRONES):  
            index = routing.Start(drone_id)
            route_distance = 0
            route_nodes = []
            while not routing.IsEnd(index):
                node_index = manager.IndexToNode(index)

                previous_index = index
                index = solution.Value(routing.NextVar(index))

                this_dist = routing.GetArcCostForVehicle(
                    previous_index, index, drone_id)

                route_distance += this_dist
                route_nodes.append(getRouteID(node_index))
            # create empty in schedules_df if first schedule for this drone is empty
            if not route_distance and schedules_df.index.max() < drone_id:
                schedules_df = schedules_df.append(
                    {'routes': [],
                     'cmds': [],
                     'dist': 0,
                     'load_count': 0},
                    ignore_index=True)
                start_loc[lll] = -1
                continue

            # combine pre-built command structures
            cmds = []
            load_count = 0
            for i in route_nodes[1:]:
                for c in routes_df.loc[i, 'cmds']:
                    cmds.append(c)
                    if 'L' in c:
                        load_count += 1

            # save to dataframe
            if schedules_df.index.max() < drone_id or np.isnan(schedules_df.index.max()):
                schedules_df = schedules_df.append(
                    {'routes': route_nodes[1:],
                     'cmds': cmds,
                     'dist': route_distance,
                     'load_count': load_count},
                    ignore_index=True)
            else:
                schedules_df.at[drone_id, 'routes'] = schedules_df.loc[drone_id, 'routes'] + route_nodes[1:]
                schedules_df.at[drone_id, 'cmds'] = schedules_df.loc[drone_id, 'cmds'] + cmds
                schedules_df.at[drone_id, 'dist'] = schedules_df.loc[drone_id, 'dist'] + route_distance
                schedules_df.at[drone_id, 'load_count'] = schedules_df.loc[drone_id, 'load_count'] + load_count

            start_loc[lll] = route_nodes[-1]

    pd.to_pickle(schedules_df, 'schedules_df')

Now we have created <code>schedules_df</code> which holds information on the single drone schedules.

In [None]:
schedules_df.head()

Let's have a look at a single drone's route throughout the simulation. Blue lines indicate routes flown with a load, red ones indicate return trips back to deposits without loads. That looks pretty messy!

In [None]:
cmap = mcm.get_cmap('plasma')
wh_colors = [cmap(iii) for iii in np.linspace(0, 1, num=10)]

fig = plt.figure(figsize=(12,8))
ax = plt.subplot()
# plot warehouses
ax.scatter(warehouse_df['wh_col'], warehouse_df['wh_row'],
           color='tab:red', ec='k', s=48, zorder=10)

selec_schedule = schedules_df.loc[1]

# get routes covered by this schedule
tmp_routes = selec_schedule['routes']

for iii, rrr in enumerate(tmp_routes):
    # get orders covered by this route
    idxs = np.unique(routes_df.loc[rrr]['orders'], return_index=True)[1]
    tmp_orders = [routes_df.loc[rrr]['orders'][idx] for idx in sorted(idxs)]
    # plot drop-off locations
    ax.scatter(orders_df.loc[tmp_orders]['col'], orders_df.loc[tmp_orders]['row'],
               color='tab:blue', s=20)
    # plot route lines
    tmp_cols = np.concatenate([[routes_df['start_col'].loc[rrr]], orders_df.loc[tmp_orders]['col'].values])
    tmp_rows = np.concatenate([[routes_df['start_row'].loc[rrr]], orders_df.loc[tmp_orders]['row'].values])
    for jjj in range(len(tmp_cols)-1):
        ax.plot(tmp_cols[jjj:jjj+2], tmp_rows[jjj:jjj+2],
                color='tab:blue')
    # plot connection between last route dropoff and start of next route
    if iii < len(tmp_routes)-1:
        ax.plot([tmp_cols[-1], routes_df['start_col'].loc[tmp_routes[iii+1]]],
                [tmp_rows[-1], routes_df['start_row'].loc[tmp_routes[iii+1]]],
                color='tab:red', alpha=0.5)


# plot all order locations
ax.scatter(orders_df['col'], orders_df['row'],
           color='k', s=1, alpha=0.7)

ax.set_xlabel('Column')
ax.set_ylabel('Row')
ax.axis('equal')
plt.show()


And in animated form...

In [None]:
from IPython.display import display, clear_output
from IPython.display import HTML

from matplotlib import animation

cmap = mcm.get_cmap('plasma')
wh_colors = [cmap(iii) for iii in np.linspace(0, 1, num=10)]

selec_schedule = schedules_df.loc[1]

# get routes covered by this schedule
tmp_routes = selec_schedule['routes']

def getCoords(num: int):
    if num < 0:
        # warehouse
        wh_id = - (num + 1)
        col = warehouse_df['wh_col'].loc[wh_id]
        row = warehouse_df['wh_row'].loc[wh_id]
        kind = 'W'
    else:
        # order
        col = orders_df['col'].loc[num]
        row = orders_df['row'].loc[num]
        kind = 'O'
    return col, row, kind

full_sched = []

for iii, rrr in enumerate(tmp_routes):
    # get start warehouse
    if not iii:
        full_sched = [-1]
    # get orders covered by this route
    idxs = np.unique(routes_df.loc[rrr]['orders'], return_index=True)[1]
    tmp_orders = [routes_df.loc[rrr]['orders'][idx] for idx in sorted(idxs)]
    full_sched = full_sched + tmp_orders
    # get start warehouse of the next route
    if iii < len(tmp_routes)-1:
        tmp_wh = warehouse_df[(routes_df['start_col'].loc[tmp_routes[iii+1]] == warehouse_df['wh_col']) & 
                              (routes_df['start_row'].loc[tmp_routes[iii+1]] == warehouse_df['wh_row'])].index[0]

        full_sched = full_sched + [-tmp_wh-1]
        
plot_cols = []
plot_rows = []
plot_kind = []
for nnn in full_sched:
    thiscol, thisrow, thiskind = getCoords(nnn)
    plot_cols.append(thiscol)
    plot_rows.append(thisrow)
    plot_kind.append(thiskind)
    
fig = plt.figure(figsize=(12,8))
ax = plt.subplot()
# plot warehouses
ax.scatter(warehouse_df['wh_col'], warehouse_df['wh_row'],
           color='tab:red', ec='k', s=48, zorder=10)
# plot all order locations
ax.scatter(orders_df['col'], orders_df['row'],
           color='k', s=1, alpha=0.7)

# plot settings
ax.set_xlabel('Column')
ax.set_ylabel('Row')
ax.axis('equal')

lines = [ax.plot([], [], color='tab:blue' if plot_kind[j+1]=='O' else 'tab:red',
                 alpha=1 if plot_kind[j+1]=='O' else 0.5)[0]
         for j in range(len(plot_cols)-1)]

def animate(i):
    lines[i].set_data(plot_cols[i:i+2], plot_rows[i:i+2])
    
anim = animation.FuncAnimation(fig, animate, frames=len(plot_cols)-1, interval=200)

HTML(anim.to_jshtml())

# Create submission file
We simply iterate through the schedules created in the previous step and write the submission file.

In [None]:
# =============================================================================
# format output
# =============================================================================
cmd_len = 0
with open('submission.csv', 'w') as file:
    # get length of commands
    for iii in schedules_df.index:
        cmd_len += len(schedules_df.loc[iii, 'cmds'])
    # write commands
    file.write('{}\n'.format(int(cmd_len)))
    for iii in schedules_df.index:
        for jjj in schedules_df.loc[iii, 'cmds']:
            file.write('{} {}\n'.format(iii, jjj))
        

# Calculate score

Let's go through the commands, drone by drone, carefully tracking time as well as currently loaded weight. We make sure to note a timestamp for each delivery, and we create a list of inventory actions (loading / unloading) which we'll use in the next cell to track warehouse inventory.

In [None]:
submission = pd.read_csv('submission.csv')
allcommands = submission[submission.columns[0]].values

delivery_times = orders_df.copy()
missing_items = orders_df.copy()

inventory_ops = pd.DataFrame(columns=['action', 'wh', 'item', 'count', 'turn'])

for ddd in tqdm(range(DRONES)):
    dronecommands = [iii for iii in allcommands if iii.split()[0] == str(ddd)]
    currentloc = warehouse_df.loc[0].values
    currenttime = 0
    currentweight = 0
    for cmd in dronecommands:
        _, action, locidx, prod, count = cmd.split(' ')

        # add time steps required to reach new location and perform loading / unloading / delivery
        if action == 'L' or action == 'U':
            newloc = warehouse_df.loc[int(locidx)].values
        elif action == 'D':
            newloc = orders_df.loc[int(locidx), ['row', 'col']].values
        elif action == 'W':
            # no further action needed in case of "wait" commands
            currenttime += locidx
            continue
        dist = int(np.ceil(np.sqrt(np.sum((currentloc-newloc)**2))))
        currenttime += dist
        # add one step for loading / unloading / delivery itself
        currenttime += 1
        # check if end of simulation is reached
        if currenttime > TURNS:
            raise Exception('Maximum simulation time exceeded')

        # update current location
        currentloc = np.copy(newloc)
        
        # update drone weight
        if action == 'L':
            currentweight += int(count) * products_df.loc[int(prod),'weight']
        elif action == 'D' or action == 'U':
            currentweight -= int(count) * products_df.loc[int(prod),'weight']
            
        if currentweight > MAXLOAD:
            raise Exception('Maximum drone load exceeded')
        
        # note latest delivery time of each item for each order
        # and note how many items were delivered
        if action == 'D':
            if missing_items.at[int(locidx), 'p_{}'.format(prod)]:
                delivery_times.at[int(locidx), 'p_{}'.format(prod)] = currenttime
                missing_items.at[int(locidx), 'p_{}'.format(prod)] -= int(count)
            else:
                raise Exception('Too many items delivered')
                
        # save list of loading / unloading operations for checking warehouse inventory
        if action == 'L' or action == 'U':
            inventory_ops = inventory_ops.append({
                'action': action,
                'wh': int(locidx),
                'item': int(prod),
                'count': int(count),
                'turn': currenttime
            }, ignore_index=True)

In [None]:
delivery_times

Let's quickly check whether we don't happen to try and remove products which aren't in stock...

In [None]:
for wh in range(len(warehouse_df)):
    for item in tqdm(range(len(products_df))):
        # all inventory operations at this warehouse involving this product
        tmp = inventory_ops[
            (inventory_ops['wh'] == wh) &
            (inventory_ops['item'] == item)
        ]
        if not len(tmp):
            continue
        tmp = tmp.sort_values(by='turn')
        # get initial stock
        inv = products_df.loc[item, f'wh{wh}_inv']
        # if overall fewer or just as many products are removed as are stored
        # in the warehouse, all good
        if len(tmp[tmp['action'] == 'L']) <= inv:
            continue
        # otherwise, "simulate" loading and unloading to see whether inventory goes negative
        for iii in tmp.index:
            if tmp.loc[iii, 'action'] == 'L':
                inv -= 1
            else:
                inv += 1
            # check inventory after each step
            if inv < 0:
                raise Exception('Removal of unstocked product attempted')

Lastly, we check out when the last item was delivered for each order (using delivery_times) and whether all required items were delivered (using missing_items), and then calculate the final score according to the equation given in the instruction file.

In [None]:
completion_times = np.max(delivery_times.iloc[:,3:-2].values, axis=1)
completed = np.max(missing_items.iloc[:,3:-2].values > 0, axis=1) <= 0
tmp = np.where(completed)[0]
order_scores = np.ceil(100 * (TURNS - completion_times[tmp]) / TURNS)
print('Score:', int(np.sum(order_scores)))